You are on page 1of 88

INFORMATION SYSTEMS EXAMINATION BOARD

QA/Tester Study Document For Software Testing Foundation Course Exam

January 2001

Table of Contents
Principles of Testing (session 1).............................................................................3 Testing Terminology............................................................................................3 Why Testing is Necessary...................................................................................4 Fundamental Test Process.................................................................................7 Psychology of Testing.......................................................................................11 Re-testing and Regression Testing...................................................................14 Expected Results..............................................................................................15 Prioritisation of Tests.........................................................................................16 Testing in the Lifecycle (session 2)......................................................................17 Models for Testing.............................................................................................17 Economics of Testing........................................................................................21 High Level Test Planning..................................................................................22 Component Testing...........................................................................................24 Integration Testing in the Small.........................................................................28 System Testing..................................................................................................30 Non-Functional System Testing........................................................................31 Integration Testing in the Large........................................................................32 Acceptance Testing...........................................................................................33 Maintenance testing..........................................................................................34 Static Testing (session 3)......................................................................................36 What are Reviews?...........................................................................................36 Reviews and the test process...........................................................................37 Types of review ................................................................................................38 Static analysis...................................................................................................43 Dynamic Testing Techniques (session 4).............................................................45 About Testing Techniques................................................................................45 Black and White Box Testing............................................................................46 Black Box Test Techniques...............................................................................48 White Box Test Techniques..............................................................................50 Error Guessing..................................................................................................52 Test Management (session 5)..............................................................................54 Organisation......................................................................................................54 Configuration Management...............................................................................57 Test Estimation, Monitoring and Control...........................................................59 Incident Management........................................................................................61 Standards for Testing........................................................................................63 Tool Support for Testing [CAST] (session 6)........................................................64 Types of CAST Tool..........................................................................................64 Tool Selection and Implementation...................................................................68 The tool selection process................................................................................69 The implementation process.............................................................................74

Page 2 of 88

Principles of Testing (session 1)


Testing Terminology
New standard There is a lot of terminology surrounding testing, but not until recently has there been an industry standard. A new standard (first published in August 1998) seeks to provide a standard set of terms: BS 7925-1 Glossary of Testing Terms. Although a British Standard, it is being adopted by the International Standards Organisation (ISO) and will hopefully become an ISO standard within two or three years. Error, fault and failure Three terms that have specific meanings are error, fault and failure. Error: a human action that produces an incorrect result. Fault: a manifestation of an error in software. Failure: a deviation of the software from its expected delivery or service. An error is something that a human does, we all make mistakes and when we do whilst developing software it is known as an error. The result of an error being made is a fault. It is something that is wrong in the software (source code or documentation - specifications, manuals, etc.). Faults are also known as a defects or bugs but in this course we will use the term fault. When a system or piece of software produces an incorrect result or does not perform the correct action this is known as a failure. Failures are caused by faults in the software. Note that software system can contain faults but still never fail (this can occur if the faults are in those parts of the system that are never used). Reliability Another term that should be understood is reliability. A system is said to be reliable when it performs correctly for long periods of time. However, the same system used by two different people may appear reliable to one but not to the other. This is because the different people use the system in different ways. Reliability: the probability that the software will not cause the failure of the system for a specified time under specified conditions. The definition of reliability therefore includes the phrase 'under specified conditions'. When reporting on the reliability of a system it is important to explain under what conditions the system will achieve the specified level of reliability. For example, a system may achieve a reliability of no more than one failure per month providing no more than 10 people use the system simultaneously.

Page 3 of 88

Why Testing is Necessary


Why do software faults occur? All of the products of software development (specifications, source code, test documents) are written by human beings. All human beings are prone to make errors regardless of how experienced or skilled they are. So a number of faults in software are inevitable. Furthermore, many of the computer systems being developed today are so large and complex that it is not generally possible for any one person to understand every aspect of a system. Many people have specialised in particular areas or aspects, for example we have database experts, software design experts, algorithm experts, network and operating system experts. Also, there are experts in specific areas of the businesses that the systems support. Good documentation (specifications and the like) are necessary to communicate all the relevant information each person needs to complete their parts of a system. However, people are fallible and so make errors causing faults in the specifications. When dealing with one part of a system, a systems designer or programmer may think they understand some other aspect outside of their particular area of expertise but they may not. Assumptions are often made because it is easier and quicker to assume one thing than to find out what the real answer should be. What do software faults cost? The cost of software faults that are not detected before a system is put into live operation varies, depending on a number of factors. Some of the more spectacular software failures are well known. For example the European Space Agency's Ariane 5 rocket exploded 37 seconds into its launch because of a software fault. This fault cost $7billion. Although it had been tested a series of errors meant that out-of-date test data was used. Another example is the Mariner Space probe. This was meant to be going to Venus but lost its way as a result of a software fault. The Fortran program had a full stop instead of a comma in a FOR (looping) statement that the compiler accepted as an assignment statement. One final example is where the expected results were not calculated beforehand and in this instance cost American Airlines $50million. By way of background to this example, airlines never want to fly with empty seats - they would prefer to offer discount seats than to fly empty. However they also do not want to discount the seat when they could receive the full price. Airlines run complex 'yield management programs' to achieve the balance right. But these programs often change a lot to increase profitability. One such change was catastrophic for American Airlines. The program gave reasonable - but wrong - results for the number of discount seats! A spokesman said 'We're convinced that if we would have done more thorough testing, we would have discovered the problem before the software was ever brought on-line'. Of course not every software fault causes such huge failure costs (fortunately!). Some failures are simple spelling mistakes or a miss-alignment in a report. There is a vast array of different costs but there is a problem - software is not linear. Small faults can have large effects. For example a spelling mistake on a screen title said ABC SOFTWEAR, a simple mistake but it cost the company a sale.

Page 4 of 88

The consequences of faults in safety critical systems can be much more serious. The follow examples are all real cases. Therac-25 (Canada) - this machine was used in the treatment of cancer patients. It had the capability of delivering both x-rays and gamma rays. The normal dose of x-rays is relatively low while gamma ray dosage is much higher. Switching from one type of ray to the other was a slow process but the operators discovered that if they typed fast it could be made to switch more quickly. However, it only appeared to have switched. Six people died as a result of having too high an x-ray dose. This was a design fault. A train driver in Washington DC was killed when an empty train failed to stop in a stabling siding. The train was fully automated, driver just opens and closes doors. Driver had been requesting manual operation, as the trains had been overshooting the stops - this was denied by the humans in central control - RISKS Forum, 26 Jan 1998. Korean Airlines Flight 801, Boeing 747, 29 survivors from 254 passengers. Came in 1000 ft too low. Combination of normal factors: severe weather, ground proximity system not switched on (or automatically disabled when landing gear lowered), secondary radar system should have sounded an alarm - either didnt or wasnt noticed. - CW 14 Aug 1987. Airbus crashes into trees at an air show at Habsham, Germany. Software protected engines by preventing them from being accelerated as fast as the pilot required to avoid hitting some trees after a low fly pass.

The final example is not of a safety critical system, rather a banking system. However, this did contribute to a death. An elderly man bought presents for his grandchildren, and as a result became overdrawn for the first time in his life. The banks system automatically sent threatening letters and charged him for being overdrawn. The man committed suicide. Source: client from major UK bank.

Testing is necessary Testing is necessary because software is likely to have faults in it and it is better (cheaper, quicker and more expedient) to find and remove these faults before it is put into live operation. Failures that occur during live operation are much more expensive to deal with than failures than occur during testing prior to the release of the software. Of course other consequences of a system failing during live operation include the possibility of the software supplier being sued by the customers! Testing is also necessary so we can learn about the reliability of the software (that is, how likely it is to fail within a specified time under specified conditions). Testing is not to prove that the software has no faults because it is impossible to prove that software has no faults. Neither is testing done because testing is included in the project plan. Testing should be included in the project plan, but this is not the reason for testing. (Why is this testing included in the plan this is the reason for testing.)

Page 5 of 88

Testing is essential in the development of any software system. Testing is needed in order to assess what the system actually does, and how well it does it, in its final environment. A system without testing is merely a paper exercise - it may work or it may not, but there is no way of knowing it without testing it. Exhaustive testing Exhaustive testing is defined as exercising every possible combination of inputs and preconditions. This usually amounts to an impractical number of test cases. It is required if we are to ensure that no faults exist in a system, that is, we must try every possible combination of inputs and preconditions to ensure that the users will not find one that we don't. For most systems it is relatively easy to identify millions of possible test cases (not necessarily all desirable test cases). For example, a system that accepts an input value corresponding to an employee's annual salary may be expected to handle figures in the range 0.00 to 999,999.00. Given that every different input value could be made a separate test case this means that we can identify 100,000,000 test cases (the number of 1p's in the range) just from this one input value. Of course, no one would reasonably expect to have to try out every one of these input values but that is what exhaustive testing would require. Exhaustive testing would also require each of these 100,000,000 input values to be tried in different combinations with other inputs and with different preconditions (such as data values in a database). For even simple systems the number of test cases required for exhaustive testing is likely to run into millions of millions of millions. Exhaustive testing is, therefore, impractical (it is not considered impossible). How much testing is enough? It is possible to do enough testing but determining the how much is enough is difficult. Simply doing what is planned is not sufficient since it leaves the question has to how much should be planned. What is enough testing can only be confirmed by assessing the results of testing. If lots of faults are found with a set of planned tests it is likely that more tests will be required to assure that the required level of software quality is achieved. On the other hand, if very few faults are found with the planned set of tests, then (providing the planned tests can be confirmed as being of good quality) no more tests will be required. Saying that enough testing is done when the customers or end-users are happy is a bit late, even though it is a good measure of the success of testing. However, this may not be the best test stopping criteria to use if you have very demanding end-users who are never happy! Why not stop testing when you have proved that the system works? It is not possible to prove that a system works without exhaustive testing. Have you tested enough when you are confident that the system works correctly? This may be a reasonable test stopping criterion, but we need to understand how well justified that confidence is. It is easy to give yourself false confidence in the quality of a system if you do not do good testing. Ultimately, the answer to "How much testing is enough?" is "It depends!" (this was first pointed out by Bill Hetzel in his book "The Complete Guide to Software Testing"). It depends on risk, the risk of missing faults, of incurring high failure costs, of losing creditability and market share. All of these suggest the more

Page 6 of 88

testing is better. However, it also depends on the risk of missing a market window and the risk of overtesting (doing ineffective testing) which suggest that less testing may be better. We should use risk to determine where to place the emphasis when testing by prioritising our test cases. Different criteria can be used to prioritise testing including complexity, criticality, visibility and reliability. Examples of these are given in later sessions of this course. Testing measures software quality Testing and quality go hand in hand! Basically we dont know how good the software is until we have run some tests. Once we have run some good tests we can state how many faults we have found (of each severity level) and also predict how many faults remain (of each severity level). This idea of predicting the remaining number of faults is discussed further in session 5. Other factors that influence testing Other factors that affect our decision on how much testing to perform include possible contractual obligations. For example, a contract between a customer and a software supplier for a bespoke system may require the supplier to achieve 100% statement coverage (coverage measures are discussed in session 4). Similarly, legal requirements may impose a particular degree of thoroughness in testing although it is more likely that any legal requirements will require detailed records to be kept (this could add to the administration costs of the testing). In some industries (such as the pharmaceutical industry and safety critical industries such as railroad switching and air traffic control) there are standards defined that have the intent of ensuring rigorous testing.

Fundamental Test Process


What is the Fundamental Test Process? This section describes the Fundamental Test Process. This is a test process that is documented in the standard BS7925-2 Software Component Testing. It therefore relates most closely to component testing but is considered general enough to apply to all levels of testing (i.e. component, integration in the small, system, integration in the large, and acceptance testing). It is perhaps most applicable to a fairly formal testing environment (such as mission critical). Most commercial organisations have less rigorous testing processes and this should be kept in mind. The Fundamental Test Process comprises five activities: Planning, Specification, Execution, Recording, and Checking for Test Completion. The test process always begins with Test Planning and ends with Checking for Test Completion. Any and all of the activities may be repeated (or at least revisited) since a number of iterations may be required before the completion criteria defined during the Test Planning activity are met. One activity does not have to be finished before another is started; later activities for one test case may occur before earlier activities for another. The five activities are described in the following sections.

Page 7 of 88

Planning The basic philosophy is to plan well. All good testing is based upon good test planning. There should already be an overall test strategy and possibly a project test plan in place. This Test Planning activity produces a test plan specific to a level of testing (e.g. system testing). These test level specific test plans should state how the test strategy and project test plan apply to that level of testing and state any exceptions to them. When producing a test plan clearly define the scope of the testing and state all the assumptions being made. Identify any other software required before testing can commence (e.g. stubs & drivers, word processor, spreadsheet package or other 3rd party software) and state the completion criteria to be used to determine when this level of testing is complete. Example completion criteria are (some are better than others and using a combination of criteria is usually better than using just one): 100% statement coverage; 100% requirement coverage; all screens / dialogue boxes / error messages seen; 100% of test cases have been run; 100% of high severity faults fixed; 80% of low & medium severity faults fixed; maximum of 50 known faults remain; maximum of 10 high severity faults predicted; time has run out; testing budget is used up. A good test case A good test case is effective. That is, it will find a fault. This does not mean that a test case that does not find a fault is not a good one, since this implies that the fault that the test case could have found is not present (i.e. it gives us some confidence in the software, and this in itself has value). Perhaps we should say that a good test case has the potential to find a fault. A good test case is exemplary, meaning that it does more than one thing for us (is capable of finding more than one fault). A good test case is evolvable. As the software changes so too will some of the tests need changing to reflect different functionality, new features, etc. The effort to update test cases is usually very significant. However, much can be done when designing test cases to reduce or minimise the amount of maintenance effort needed to update test cases to make them compatible with later versions of the software. A good test case is economic. A test case that requires 50 people to come into the office on a Saturday morning and all be poised at their keyboards at 9am is expensive to perform and it can only be run once a week. A test case that can be run at the touch of a button and only lasts two seconds is much more economic. We should design test cases to be economic, evolvable, exemplary and effective. However, we often need to design a balance between effective/exemplary and evolvable/economic.

Page 8 of 88

Specification The fundamental test process describes this activity as designing the test cases using the techniques selected during planning. For each test case specify its objective and the initial state of the software, the input sequence and the expected outcome. Since this is a little vague we have broken down the Test Specification activity into three distinct tasks to provide a more helpful explanation. Test Specification can be considered as three separate tasks: identify test conditions determine what is to be tested; design test cases determine how the what's (test conditions) are going to be exercised; build test cases - implementation of the test cases (scripts, data, etc.). Identify conditions Formal testing techniques mostly concentrate on this task, i.e. identifying test conditions. Sometimes a brainstorming session is also good for this. When we initially brainstorm we will think of a few conditions - that in turn will trigger more, but remember that the first 50% produced is unlikely to be the best 50%. Therefore, if you need 100 test conditions, identify 200 (or even 1000) and pick the best 100 test conditions. Test cases that exercise the most important test conditions will be effective (recall the four attributes of a good test case). Design test cases Designing good test cases is a skill. To be exemplary a test case should exercise several test conditions but to be economic and evolvable it should not be too big nor too complex. Predicting the expected outcome is its own syllabus topic later on. The term outcome is used in preference to output because the outcome comprises everything that has been output and what has been changed, deleted, and not changed. It is frequently necessary to design not only individual test cases but also whole sets of test cases each with different objectives. For example, regression tests, performance tests, and detailed tests of a particular function. Build test cases This task involves making the test cases a reality. Writing test procedures or test scripts, creating or acquiring the test data and implementing the expected results. These are all the pre-requisites prior to test execution. Execution The purpose of this activity is to execute all of the test cases (though not necessarily all in one go). This can be done either manually or with the use of a test execution automation tool (providing the test cases have been designed and built as automated test cases in the previous stage).

Page 9 of 88

The order in which the test cases are executed is significant. The most important test cases should be executed first. In general, the most important test cases are the ones that are most likely to find the most serious faults but may also be those that concentrate on the most important parts of the system. There are a few situations in which we may not wish to execute all of the test cases. When testing just fault fixes we may select a subset of test cases that focus on the fix and any likely impacted areas (most likely all the test cases will have been run in a previous test effort). If too many faults are found by the first few tests we may decide that it is not worth executing the rest of them (at least until the faults found so far have been fixed). In practice time pressures may mean that there is time to execute only a subset of the specified test cases. In this case it is particularly important to have prioritised the test cases to ensure that at least the most important ones are executed. If any other ideas for test conditions or test cases occur they should be documented where they can be considered for inclusion. Recording In practice the Test Recording activity is done in parallel with Test Execution. To start with we need to record the versions of the software under test and the test specification being used. Then for each test case we should record the actual outcome and the test coverage levels achieved for those measures specified as test completion criteria in the test plan. In this way we will be marking off our progress. The actual outcome should be compared against the expected outcome and any discrepancy found logged and analysed in order to establish where the fault lies. It may be that the test case was not executed correctly in which case it should be repeated. The fault may lie in the environment set-up or be the result of using the wrong version of software under test. The fault may also lie in the specification of the test case for example, the expected outcome could be wrong. Of course the fault may also be in the software under test! In these cases the fault should be fixed and the test case executed again. The records made should be detailed enough to provide an unambiguous account of the testing carried out. They may be used to establish that the testing was carried out according to the plan. Check completion This activity has the purpose of checking the records against the completion criteria specified in the test plan. If these criteria are not met, it will be necessary to go back to the specification stage to specify more test cases to meet the completion criteria. There are many different types of coverage measure and different coverage measures apply to different levels of testing. (Coverage measures are described in session 4.) Comparison of the five activities. Comparing these five activities of the Fundamental Test Process it is easy to see that the first two activities (Test Planning and Test Specification) are intellectually challenging. Planning how much testing to do, determining appropriate completion criteria, etc. requires careful analysis and thought. Similarly, specifying test cases (identifying the most important test conditions and designing good test cases) requires a good understanding of all the issues involved and skill in balancing them. It is these intellectual tasks that govern the quality of test cases.

Page 10 of 88

The next two activities (Test Execution and Test Recording) involve predominantly clerical tasks. Furthermore, executing and recording are activities that are repeated many times whereas the first two activities, Test Planning and Test Specification are performed only once (they may be revisited if the completion criteria are not met the first time around but they are not repeated from scratch). The Test Execution and Test Recording activities can be largely automated and there are significant benefits in doing so.

Psychology of Testing
Why test? There are many reasons for testing including building confidence in the software under test, demonstrating conformance to requirements or a functional specification, to find faults, reduce costs, demonstrate a system meets user needs and assessing the quality of the software. However, one reason that is not valid (but often used erroneously) is to prove that the software is correct. This is wrong simply because it is not possible to prove that a software system is correct. It is not possible to prove a system has no faults. It is only possible to prove that a system has faults - by finding some of them! How do we measure the quality of software? The number of faults found is a common way to start. The more faults we find, the worse the quality is, so we can easily measure poor quality software. What about the converse: the fewer faults we find, the better the quality of the software? This is not true, since finding few or no faults can mean one of three things: good software, poor testing or both poor testing and poor software. Without knowing independently about the quality of the testing, no justified conclusions can be drawn about the quality of the software. If you have a faulty measuring instrument, you cannot draw any justified conclusions about what you are measuring. If I discovered that I had lost weight, I might be delighted, but if my scales are broken, I am not actually any thinner - my delight is unjustified. If we do lots of testing, our confidence will rise, but this is because confidence is a psychological factor. Consider how differently you might regard a piece of software if you run all of the easy tests first, so most of the early tests work correctly, compared to running all of the most difficult tests first, so that most of the tests fail the software. The same software with the same set of tests would give two very different initial impressions to your assessment of confidence, even though the quality of that software is the same each time.

You cannot have justified confidence in the quality of the software unless you have confidence in the quality of the testing.

Looking for faults is an effective testing approach The overall purpose of testing is to give confidence that the system is working well. But this definition of testing is not effective for test case construction; Glenford Myers shows that the purpose of testing is to

Page 11 of 88

find faults, as is pointed out so effectively in his book "The Art of Software Testing", Wiley, 1979. But finding faults destroys confidence - obviously you cannot have confidence that something is working when it has just been proved otherwise. Therefore, is the purpose of testing to destroy confidence, rather than to build it? This is a paradox because both are true.

The best way to build confidence is to try and destroy it.

If no faults are found, the natural assumption is that the software has no faults; the assumption that the testing was of poor quality is rarely made but is much more likely. In fact, you cannot justifiably conclude anything about the quality of the software tested without some knowledge of the quality of the testing. Yet when testing reveals no faults, our natural reaction is to be pleased, because we always like to think the best of ourselves. To be honest, we don't really want to succeed at finding faults because we never meant to put them in to begin with. This leads to the "Catch-22" of testing.

Testing is looking for something you don't really want to find.

Misunderstanding the testing paradox leads to misconceptions about testing. If you observed the way people behave, you would think they believed that it is the testing process that creates faults in software. This is like believing that it is the sunshine streaming in through a window that creates dust in the air. The sunshine, like testing, only makes visible what was already there. "Less testing means less debugging" is actually true; the fewer faults you find, the fewer you would put right before you release the software to the next stage. However, not finding them does not mean they are not there. It is sometimes said that we should "build quality in, not test bugs out". Although it is certainly a good idea to build quality in, there are misconceptions here as well. There is a tendency to omit the last two words. But it is not a choice between quality and testing; testing is a way to build quality in, as will become evident in session 2. It is not true that faults are "tested out" in any case. Debugging is the process that removes faults; testing reveals them. Valid and invalid, positive and negative, clean and dirty tests Choosing test inputs that the software is expected to handle is an important thing to do. For example, if a program is expected to accept an integer in the range 1 to 1000 then test cases that input an integer in this range are regarded as valid tests in the sense that they use valid input value. These are also known as positive test or clean tests. However, where we choose to use test inputs outside of this range such as 4500, -20 or even Hello world, these test cases are regarded as invalid tests in the sense that they use invalid input values. These are also known as negative or dirty tests.

Page 12 of 88

Testing can be perceived as a destructive process Testing can be seen as a destructive process. Looking for faults in something has a negative connotation, it implies that the something is faulty to start with. The point is, when it comes to software it usually is faulty in some way! (Of course developers are inclined to believe the best of their software, because they created it, it is their baby.) Presenting faults to authors and managers Care should be taken when communicating fault information to developers and managers. In ancient Greece messengers who brought bad news were executed so some things have improved! However, we must still tread carefully. Dashing up to a developer and saying "You fool, you did this wrong" is not likely to encourage him or her to investigate the problem. More likely the developer will go on the defensive, perhaps arguing that it is not a fault but it is the tester who does not understand it. A more successful approach may be to approach the developer saying "I don't understand this, would you mind explaining it to me please?". In demonstrating it to the tester the developer may then spot the fault and offer to fix it there and then. Cem Kaner (co-author of "Testing Computer Software") says that the best tester is not the one that finds most faults but the one that manages to have the most faults fixed. This requires a good relationship with developers. Independent testing is more effective Testing done by someone who has not been involved in the development of the software under test is likely to make a better job of it (i.e. find more faults) than someone who has been involved in the software's development. This is because the person who has been involved in the development process will have a restricted view of the software, a developers view. In this case any assumptions made during the development process are likely to be carried over into testing. The other person will be able to view the software independently of the development process and be able to challenge any assumptions the developers made. Levels of independence There are many levels of independence, that is, ways of achieving a greater or lesser amount of independence in software testing. It is important to appreciate that the independence is most required at the specification stage. It is here that the test cases are designed and it is the design of the design test cases (more specifically, the identification and selection of the test conditions) that governs the quality of the test cases. People who say that programmers should not test their own code often miss this point. What they should say is that programmers are not the best people to specify all of the tests for their own code. However, programmers should contribute to the specification of the test cases since they can contribute a good technical understanding of the software. There are good reasons why programmers should execute the tests on their own code. Perhaps the best of these is that it is cheaper. A programmer who finds a fault in his or her own code can quickly fix it and execute the test again to confirm the fix. The fact that a fault was found and fixed does not need to be documented. If someone else were to run the test, they probably would need to document it as a safe and

Page 13 of 88

secure way of communicating the details to the programmer. The programmer would then need to reproduce the failure and fix it. The fault report would then be updated and make its way back to the tester who would repeat the test to confirm the fix was correct. This takes much more effort in total and yet no more has been achieved: one fault has been found and fixed. We will achieve no independence if only the person who wrote the software specifies the tests. If another developer from the same team were to specify the tests a little independence is achieved. More independence can be achieved if someone outside of the development team specifies the tests (such as a test team). Further independence can be achieved by having an outside agency undertake the testing though this in itself introduces some different problems. Perhaps the greatest level of independence can be achieved by having a tool generate test cases but these are not likely to be particularly good quality tests. Issues of independence are discussed further in session 5.

Re-testing and Regression Testing


What is re-testing? After a test fails and a software fault is reported, we can expect a new version of the software that has had the fault fixed. In this case we will need to execute the test again to confirm that the fault has indeed been fixed. This is known as re-testing. When re-testing, it is important to ensure that the test is executed in exactly the same way as it was the first time, using the same inputs, data and environment. If the test passes we may be tempted to assume that the fault fix was correct but in practice it is better to undertake some regression testing as well. What is regression testing? Like re-testing, regression testing involves executing test cases that have been executed before. The difference is that for regression testing the test cases probably passed the last time they were executed (compare this with the test cases executed when re-testing - they failed the last time). The term "regression testing" is something of a misnomer. It would be better if it were called "antiregression" or "progression" testing because we are executing tests with the intent of checking that the system has not regressed (that is, it does not now have more faults in it as a result of some change). It is common for organisations to have what is usually called a regression test suite or regression test pack. This is a set of test cases that is specifically used for regression testing. They are designed to collectively exercise most functions (certainly the most important ones) in a system but not test any one in detail. It is appropriate to have a regression test suite at every level of testing (component testing, integration testing, system testing, etc.). All of the test cases in a regression test suite would be executed every time a new version of software is produced and this makes them idea candidates for automation. If the regression test suite is very large it may be more appropriate to select a subset for execution. Regression tests are executed whenever the software changes, either as a result of fault fixes or new or changed functionality. It is also a good idea to execute them when some aspect of the environment changes,

Page 14 of 88

for example when a new version of a database management system is introduced or a new version of a source code compiler is used. Maintenance of a regression test suite should be carried out so it evolves over time in line with the software. As new functionality is added to a system new regression tests should be added and as old functionality is changed or removed so too should regression tests be changed or removed. As new tests are added a regression test suite may become very large. If all the tests have to be executed manually it may not be possible to execute them all every time the regression suite is used. In this case a subset of the test cases has to be chosen. This selection should be made in light of the latest changes that have been made to the software. Sometimes a regression test suite of automated tests can become so large that it is not always possible to execute them all. It may be possible and desirable to eliminate some test cases from a large regression test suite for example if they are repetitive (tests which exercise the same conditions) or can be combined (if they are always run together). Another approach is to eliminate test cases that have not found a fault for a long time (though this approach should be used with some care!).

Expected Results
What are expected results? An expected result is what the software is supposed to produce when a test is executed. We usually refer to the expected outcome (or results) rather than the expected output since outcome is a more encompassing word. Outcome includes everything that has been created, changed, or deleted and also includes things that should not have changed. In other words it is the difference between the state of the system and its environment before and after executing the test. Expected results must be defined in advance It is important that expected results be specified in advance of tests being executed though not doing so is a fairly common practice. If we do not specify the expected results for a test the intention would be to verify the actual results by viewing them at the time we execute the test. This has the advantage of reducing the amount of work we have to do when specifying the tests (less documentation, less effort). However, this approach has the disadvantage of being a less reliable. There may be a subconscious desire to see the tests pass (less work to do - no fault report to write up). It is also a less rigorous approach since a result that looks plausible may be thought correct even when it isn't wholly correct. Calculating an expected result in advance of test execution and then comparing the actual result with this is a much more reliable approach. It also means that the tester does not have to have the knowledge that would allow him or her to properly verify test outcomes. Where do expected results come from? Expected results should be derived from a specification of what the system should do (for example a requirement or functional specification). In other words the expected results are determined by considering the required behaviour of the system. Some approaches to testing lead testers to consider the structure of the system when designing test cases, in particular by examining the source code. While this is a good approach to identify test inputs it is vital that the expected result is not derived in this way. The expected result should be derived from the required

Page 15 of 88

behaviour of the system, not its actual behaviour. Basing expected results on constructs and values within the source code will have the effect of generating test cases that test the that software does what the software does, whereas what we really need is test cases that test that the software does what the software should do. The 'Oracle Assumption' For most systems it is possible to predict the expected results for any test case but there are a few systems where this is not the case. For example, systems that predict situations or perform long and complex calculations that cannot practically be performed manually. The Oracle Assumption is the name given to the (usually correct) assumption that it is possible to predict the expected results.

Prioritisation of Tests
Importance of prioritising We have already said that we cannot test everything, exhaustive testing (testing all combinations of inputs and preconditions) is impractical. It is easy enough to identify far more test cases than we will ever have time to execute so we need an approach to selecting a subset of them. Selecting test cases at random is not an effective strategy. We need to use a more intelligent approach that helps identify which tests are most important. In short, we must prioritise our tests.

Prioritise tests so that whenever you stop testing you have done the best testing in the time available.

How to prioritise There are many different criteria that can be used to prioritise tests and they can be used in combination. Possible ranking criteria include the following: tests that would find the most severe failures; tests that would find the most visible failures; tests that would find the most likely faults; ask the end-user to prioritise the requirements (and test those first); test first the areas of the software that have had most faults in the past; test most those areas of the software that are most complex or critical. There is further discussion and examples of how to prioritise tests in sessions 4 and 5.

Page 16 of 88

Testing in the Lifecycle (session 2)


This session concentrates on the relationship between testing and all the other development activities that occur throughout the software development lifecycle. In particular it looks at the different levels of testing.

Models for Testing


Test Design and Test Execution Testing is often considered something which is done after software has been written; after all, the argument runs, you can't test something which doesn't exist, can you? This idea makes the assumption that testing is merely test execution, the running of tests. Of course, tests cannot be executed without working software (although static analysis can be carried out). But testing activities include more than running tests, because before they can be run, the tests themselves need to be written and designed. As Boris Beizer points out in his book Software Testing Techniques, Van Nostrand Reinhold, 1993, the act of designing a test is one of the most effective ways of finding faults. If tests need to be designed, when in the life cycle should the test design activities happen? The discussion below, on the timing of the test design activities, applies no matter what software life cycle model is used during development or maintenance. Many descriptions of life cycles do not make the proper placing of test design activities clear. For example, many development methods advocate early test planning (which is good), but it is the actual construction of concrete individual test inputs and expected outcomes which is most effective at revealing faults in a specification, and this aspect is often not explicit. The Development of Quality Ideas Before the industrial revolution, individuals made goods; each craftsman was responsible for the quality of whatever they produced. When mass production started, it was important that all goods produced were of the same basic standard so that they would work together when assembled. The quality controller stood at the end of the production line and assessed the quality of the components produced. They were either accepted if they were of the correct quality, or they were thrown away. This approach is a product-based error detection method, which takes place after the manufacturing process, and discards faulty goods. In the 1940's, the quality assurance approach looked at the process of making the products. If fewer defective components were manufactured, there wouldn't be so much waste at the end of the line. By improving the production process, fewer errors causing faulty goods be made, so better quality goods would be produced. This approach is a process-based fault prevention method. Note that the quality controller is still needed, because some faulty goods will still be made. The more recent quality ideas are the management of total quality, and levels of process maturity such as the Capability Maturity Model (CMM). In these approaches, all levels of staff are involved in a culture change, using root cause analysis to examine the underlying reasons for process faults being made in the first place, and implementing continuous improvement of the processes.

Page 17 of 88

The Waterfall Model, Pre-Waterfall, and Damage to Testing Most software engineers are familiar with a software life cycle model; the waterfall was the first such model to be generally accepted. Before this, there were informal mental models of the software development process, but they were fairly simple. The process of producing software was referred to as "programming", and it was integrated very closely with testing. The programmers would write some code, try it out, and write some more. After a lot of iterations, the program would emerge. The point is that testing was very much an integral part of the software production process. The main difference in the waterfall model was that the programming steps were spelled out. Now instead of programming, there are a number of distinct stages such as requirements analysis, structural or architectural design, detailed design, coding, and then finally testing. Although the stratification of software production activities is very helpful, notice what the effect has been on testing. Now it comes last (after the "interesting" part?), and is no longer an integral part of the whole process. This is a significant change, and has actually damaged the practice of testing and hence affected the quality of the software produced in ways that are often not appreciated. The problems with testing in the classic waterfall model are that testing is very much product-based, and applied late in the development schedule. The levels of detail of test activities are not acknowledged, and testing is now vulnerable to schedule pressure, since it occurs last. The V-Model A better approach is illustrated by the V-Model. Here the test activities are spelled out to the same level of detail as the design activities. Software is designed on the left-hand (downhill) part of the model, and built and tested on the right-hand (uphill) part of the model. The correspondences between the left and right hand activities can also be shown by the lines across the middle of the V, showing the test levels from component testing at the bottom, integration and system testing, and acceptance testing at the top level. However, even the V-model is often not exploited to its full potential from a testing point of view. When are tests designed: As late as possible? A common misconception is that tests should be designed as late as possible in the life cycle, i.e. only just before they are needed. The reason for this is supposedly to save time and effort, and to make progress as quickly as possible. But this is progress only from a deadline point of view, not progress from a quality point of view, and the quality problems, whose seeds are sown here, will come back to haunt the product later on. No test can be designed without a specification, since the specification is the source of what the correct results of the test should be. Even if that specification is not formally written down or fully completed, the test design activity will reveal faults in whatever specification the tests are based on. This applies to code, a part of the system, the system as a whole, or the user's view of the system. If test design is left until the last possible moment, then the faults are found much later when they are much more expensive to fix. In addition, the faults in the highest levels, the requirements specification, are found

Page 18 of 88

last - these are also the most critical and most important faults. The actual effect of this approach is the most costly and time-consuming approach to testing and software development. When are tests designed: As early as possible If tests are going to be designed anyway, there is no additional effort required to move a scheduled task to a different place in the schedule. If tests are designed as early as possible, the inevitable effect of finding faults in the specification comes early, when those faults are cheapest to fix. In addition, the most significant faults are found first. This means that those faults are not built in to the next stage, e.g. major requirement faults are not designed in, so faults are prevented. An argument against this approach is that if the tests are already designed, they will need to be maintained. There will be inevitable changes due to subsequent life cycle development stages that will affect the earlier stages. This is correct, but the cost of maintaining tests must be compared with the costs of the late testing approach, not simply be accepted as negating the good points. In fact, the extent of the test design detail should be determined in part by the maintenance costs, so that less detail (but always some detail) should be designed if extensive changes are anticipated. One of the frequent headaches in software development is a rash of requirement change requests that come from users very late in the life cycle; a major contributing cause for this is the user acceptance test design process. When the users only begin to think about their tests just before the acceptance tests are about to start, they realise the faults and shortcomings in the requirement specification, and request changes to it. If they had designed their tests at the same time as they were specifying those requirements, the very mental activity of test design would have identified those faults before the system had built them. The way in which the system will be tested also serves to provide another dimension to the development; the tests form part of the specification. If you know how it will be tested, you are much more likely to build something that will pass those tests. The end result of designing tests as early as possible is that quality is built in, costs are reduced, and time is saved in test running because fewer faults are found, giving an overall reduction in cost and effort. This is how testing activities help to build quality into the software development process. This can be taken one stage further as recommended by a number of experts (Beizer, 1990, Hetzel, 1991, Quentin, 1992), by designing tests before specifying what is to be tested. The tests then act as a requirement for what will be built. Verification and Validation BS7925-1 defines verification as "the process of evaluating a system or component to determine whether the products of the given development phase satisfy the conditions imposed at the start of that phase". This definition is a little confusing since the reference to 'a system or component' implies that verification relates only to the products of software development created on the right hand side of the V-Model. However, this implication is short lived since it is seemingly contradicted by the words 'products of the given development phase', which could imply any development phase. Verification is indeed applicable at every stage. The 'conditions imposed at the start of that phase' are the key to understanding verification. These

Page 19 of 88

conditions should be generic in that they should apply to any product of that phase and be used to ensure that the development phase has worked well. They are checks on the quality of the product such as 'documentation must be unambiguous', 'document conforms to standard template', and in the case of an actual system, 'has the system been assembled correctly'. The full definition of the term 'validation' as given by BS 7925-1 is "The determination of the correctness of the products of software development with respect to the user needs and requirements". Here the 'products of software development' include requirement, functional and design specifications, and source code (in fact, all that is created during the development stages on the left-hand side of the V-Model). Test plans, test specifications, test cases, etc. are also products of software development. The 'with respect to the user needs and requirements' means that the checks may be unique to a particular system since different systems are developed to meet different user needs. (While this last statement may be rather obvious it is worth stating when comparing validation with verification.) Validation of each of the products of software development typically involves comparing one product with its parent. For example, (using the terminology given in the V-Model of this course) to validate a project specification we would compare it with the business requirement specification. This involves checking completeness and consistency, for example by checking that the project specification addresses all of the business requirements. Validating the requirement specification may seem a little tricky given that there is probably no higher level specification. However, the validation activity is not limited to comparing one document against another. User requirements can be validated by several other means such as discussing them with end users and comparing them against your own or someone else's knowledge of the user's business and working practices. Forms of documentation other than a formal statement of requirements may be used such as contracts, memos or letters describing individual or partial requirements. Reports of surveys, market research and user group meetings may also provide a rich source of information against which a formal requirements document can be validated. In fact many of these different approaches may from time to time be applicable to the validation of any product of software development (designs, source code, etc.). A purpose of executing tests on the system is to ensure that the delivered system has the functionality defined by the system specification. This best fits as a validation activity (since it is checking that the system has the functions that are required i.e. that it is the right system). Verification at the system test phase is more to do with ensuring that a complete system has been built. In terms of software this is rarely a large independent task rather it is subsumed by the validation activities. However, if it were treated as an independent task it would seek to ensure that the delivered system conforms to the standards defined for all delivered systems. For example, all systems must include on-line help, display a copyright notice at startup, conform to user interface standards, conform to product configuration standards, etc. Many people have trouble remembering which is which, and what they both mean. Barry Boehm's definitions represent a good way to remember them: Verification is building the product right, and Validation is building the right product. Thus verification checks the correctness of the results of one development stage with respect to some pre-defined rules about what it should produce, while validation checks back against what the users really want (or what they have specified).

Page 20 of 88

The impact of early test design on development scheduling Testers are often under the misconception that they are constrained by the order in which software is built. The worst extreme is to have the last piece of software written be the one that is needed for testing to start. However, with test design taking place early in the life cycle, this need not be the case. By designing the tests early, the order in which the system should ideally be put together for testing is defined during the architectural or logical design stages. This means that the order in which software is developed can be specified before it is built. This gives the greatest opportunity for parallel testing and development activities, enabling development time scales to be minimised. This can enable total test execution schedules to be shortened and gives a more even distribution of test effort across the software development life cycle.

Economics of Testing
Testing is expensive? We are constantly presented with the statement testing is expensive - but when we make this statement what are we comparing the cost of testing with? If we compare the cost of testing with the cost of the basic development effort testing may appear expensive. However, this would be a false picture because the quality of the software that development delivers has a dramatic impact on the effort required to test it. The more faults there are in the software the longer testing will take since time will be spent reporting faults and re-testing them. Asking the cost of testing is actually the wrong thing to do. It is much more instructive to ask the cost of not testing i.e. what have we saved the company by finding faults. A development manager once said to a test manager If it wasnt for you we wouldnt have any bugs. (Of cause he meant 'faults' not 'bugs' but he hadn't been on this course!) Another manager said, Stop testing and you wont raise any more faults. Both of these statements overlook the fact that the faults are already in the software by the time it is handed over to testing. Testing does not insert the faults it merely reveals them. If they are not revealed then they cannot be fixed and if they are not fixed they are likely to cause a much higher cost once the faulty software is released to the end-users. What do software faults cost? The cost of faults escalates as we progress from one stage of the development life cycle to the next. A requirement fault found during a review of the requirement specification will cost very little to correct since the only thing that needs changing is the requirement specification document. If a requirement fault is not found until system testing then the cost of fixing it is much higher. The requirement specification will need to be changed together with the functional and design specifications and the source code. After these changes some component and integration testing will need to be repeated and finally some of the system testing. If the requirement fault is not found until the system has been put into real use then the cost is even higher since after being fixed and re-tested the new version of the system will have to be shipped to all the end users affected by it.

Page 21 of 88

Furthermore, faults that are found in the field (i.e. by end-users during real use of the system) will cost the end-users time and effort. It may be that the fault makes the users' work more difficult or perhaps impossible to do. The fault could cause a failure that corrupts the users data and this in turn takes time and effort to repair. The longer a specification fault remains undetected the more likely it is that it will cause other faults because it may encourage false assumptions. In this way faults can be multiplied so the cost of one particular fault can be considerably more than the cost of fixing it. The cost of testing is generally lower than the cost associated with major faults (such as poor quality product and/or fixing faults) although few organisations have figures to confirm this.

High Level Test Planning


Before planning the following should be set in place (taken from TMap a test management method introduced in Structured Testing of Information Systems by Martin Pol and Erik van Veenendaal, published by Kluwer, 1999). Organisational strategy - who does what. Identify people involved (all departments and interfaces involved in the process). This will depend on your environment. In one organisation it may be fairly static whereas another may vary from project to project. Examine the requirements, identify the test basis documents (i.e. the documents that are to be used to derive test cases). Test organisation, responsibilities, reporting lines. Test deliverables, test plans, specifications, incident reports, summary report. Schedule and resources, people and machines. Purpose The purpose of high level test planning is to produce a high-level test plan! A high-level test plan is synonymous with a project test plan and covers all levels of testing. It is a management document describing the scope of the testing effort, resources required, schedules, etc. There is a standard for test documentation. It is ANSI/IEEE 829 "Standard for Software Test Documentation". This outlines a whole range of test documents including a test plan. It describes the information that should be considered for inclusion in a test plan under 16 headings. These are described below. Content of a high level Test Plan Test Plan Identifier Some unique reference for this document. Introduction A guide to what the test plan covers and references to other relevant documents such as the Quality Assurance and Configuration Management plans.

Page 22 of 88

Test Items The physical things that are to be tested such as executable programs, data files or databases. The version numbers of these, details of how they will be handed over to testing (on disc, tape, across the network, etc.) and references to relevant documentation. Features to be Tested The logical things that are to be tested, i.e. the functionality and features. Features not to be Tested The logical things (functionality / features) that are not to be tested. Approach The activities necessary to carry out the testing in sufficient detail to allow the overall effort to be estimated. The techniques and tools that are to be used and the completion criteria (such as coverage measures) and constraints such as environment restrictions and staff availability. Item Pass / Fail Criteria For each test item the criteria for passing (or failing) that item such as the number of known (and predicted) outstanding faults. Suspension / Resumption Criteria The criteria that will be used to determine when (if) any testing activities should be suspended and resumed. For example, if too many faults are found with the first few test cases it may be more cost effective to stop testing at the current level and wait for the faults to be fixed. Test Deliverables What the testing processes should provide in terms of documents, reports, etc. Testing Tasks Specific tasks, special skills required and the inter-dependencies. Environment Details of the hardware and software that will be needed in order to execute the tests. Any other facilities (including office space and desks) that may be required. Responsibilities Who is responsible for which activities and deliverables. Staffing and Training Needs Staff required and any training they will need such as training on the system to be tested (so they can understand how to use it) training in the business or training in testing techniques or tools. Schedule Milestones for delivery of software into testing, availability of the environment and test deliverables. Risks and Contingencies What could go wrong and what will be done about it to minimise adverse impacts if anything does go wrong. Approvals Names and when approved. This is rather a lot to remember (though in practice you will be able to use the test documentation standard IEEE 829 as a checklist). To help you remember what is and is not included in a test plan, consider the following table that maps most of the headings onto the acronym SPACE.

Scope

Test Items, Features to be Tested, Features not to be Tested.

Page 23 of 88

People Approach Criteria Environment


Deliverables; Tasks; Risks and Contingencies.

Staffing and Training Needs, Schedule, Responsibilities. Approach. Item Pass/Fail Criteria, Suspension and Resumption Criteria. Environment.

There are three important headings missing:

You may be able to think of something memorable for the acronym DTR (or one of the other combinations) to help you recall these. The remaining headings are more to do with administration than test planning: Test Plan Identifier; Introduction; Approvals.

Component Testing
What is Component Testing? BS7925-1 defines a component as "A minimal software item for which a separate specification is available". Components are relatively small pieces of software that are, in effect the building blocks from which the system is formed. They may also be referred to as modules, units or programs and so this level of testing may also be known as module, unit and program testing. For some organisations a component can be just a few lines of source code while for others it can be a small program. Component testing then, is the lowest level of testing (i.e. it is at the bottom on the V-Model software development life cycle). It is the first level of testing to start executing test cases (but should be the last to specify test cases). It is the opportunity to test the software in isolation and therefore in the greatest detail, looking at its functionality and structure, error handling and interfaces. Because it is just a component being tested, it is often necessary to have a test harness or driver to form an executable program that can be executed. This will usually have to be developed in parallel with the component or may be created by adapting a driver for another component. This should be kept as simple as possible to reduce the risk of faults in the driver obscuring faults in the component being tested. Typically, drivers need to provide a means of taking test input from the tester or a file, passing it on to the component, receiving the output from the component and presenting it to the tester for comparison with the expected outcome. The programmer who wrote the code most often performs component testing. This is sensible because it is the most economic approach. A programmer who executes test cases on his or her own code can usually

Page 24 of 88

track down and fix any faults that may be revealed by the tests relatively quickly. If someone else were to execute the test cases they may have to document each failure. Eventually the programmer would come to investigate each of the fault reports, perhaps having to reproduce them in order to determine their causes. Once fixed, the fixed software would then be re-tested by this other person to confirm each fault had indeed been fixed. This amounts to more effort and yet the same outcome: faults fixed. Of course it is important that some independence is brought into the test specification activity. The programmer should not be the only person to specify test cases (see Session 1 "Independence"). Both functional and structural test case design techniques are appropriate though the extent to which they are used should be defined during the test planning activity. This will depend on the risks involved, for example, how important, critical or complex they are. Component Test Strategy The Software Component Testing Standard BS7925-2 requires that a Component Test Strategy be documented before any of the component test process activities are carried out (including the component test planning). The component test strategy should include the following information: the test techniques that are to be used for component testing and the rationale for their choice; the completion criteria for component testing and the rationale for their choice (typically these will be test coverage measures); the degree of independence required during the specification of test cases; the approach required (either isolation, top-down, bottom-up, or a combination of these); the environment in which component tests are to be executed (including hardware and software such as stubs, drivers and other software components); the test process to be used, detailing the activities to be performed and the inputs and outputs of each activity (this must be consistent with the fundamental test process). The Component Test Strategy is not necessarily a whole document but could be a part of a larger document such as a corporate or divisional Testing or Quality Manual. In such cases it is likely to apply to a number of projects. However, it could be defined for one project and form part of a specific project Quality Plan or be incorporated into the Project Component Test Plan. An example is shown in Figure 2.1. Project: Office Suite Section 5 Component Test Strategy Introduction This section defines the component test strategy for the Office Suite project. Exceptions Any exceptions to this strategy must be documented in the relevant Component Test Plan together with their justification. Exceptions do not need formal approval but must be justified. Design Techniques The following techniques must be used for all components: Equivalence Partitioning and Boundary Value Analysis. In addition, for high criticality components, Decision Testing must also be used. Quality Plan

Page 25 of 88

The rationale for the use of these techniques is that they have proven effective in the past and are covered by the ISEB Software Testing Foundation Certificate syllabus. All testers on this project are required to have attained this certificate. Decision Testing is more expensive and therefore reserved for only the most critical components. Completion Criteria 100% coverage of valid equivalence partitions. 50% coverage of valid boundary values providing no boundary faults are found, 100% coverage of valid boundary values if one or more boundary faults are found. 30% coverage of all invalid conditions (invalid equivalence partitions and invalid boundary values). For critical components 100% Decision Coverage must also be achieved. The rationale for these completion criteria is that 100% coverage of valid equivalence partitions will ensure systematic coverage of the basic functionality of components is exercised. The post-project review of component testing on the Warehouse project recommended 50% coverage of valid boundaries providing no boundary faults are found as an acceptable way to divert more testing effort onto the most critical components. Independence Component Test Plans must be reviewed by at least one person other than the developer/tester responsible for the components in question. Test Specifications must be reviewed by at least two other people. Approach All critical components shall be tested in isolation using stubs and drivers in place of interfacing components. Non critical components may be integrated using a bottom-up integration strategy and drivers but the hierarchical depth of untested components in any one baseline must not exceed three. All specified test cases of components that are not concerned with an applications user interface shall be automated. Environment All automated component test cases shall be run in the standard component test environment. Process The component test process to be used shall conform to the generic component test process defined in the Software Component Testing Standard BS7925-2:1998. Figure 2.1 Example Component Test Strategy. Note that in this example the Component Test Strategy forms a part of a project Quality Plan. Project Component Test Plan The Software Component Testing Standard BS7925-2 requires that a Project Component Test Plan be documented before any of the component test process activities are carried out (including the component test planning). The Project Component Test Plan specifies any changes for this project to the Component Test Strategy and any dependencies between components that affect the order of component testing. The order of component testing will be affected by the chosen approach to component testing specified in the Component Test Strategy (isolation, top-down, bottom-up, or a combination of these) and may also be influenced by overall project management and work scheduling considerations. Strictly speaking, there are no dependencies between component tests because all components are tested in isolation. However, a

Page 26 of 88

desire to begin the integration of tested components before all component testing is complete forces the sequence of component testing to be driven by the requirements of integration testing in the small. The Project Component Test Plan is not necessarily a whole document but could be a part of a larger document such as an overall Project Test Plan. An example Project Component Test Plan is shown in Figure 2.2.

Project: Office Suite Section 2 Project Component Test Plan

Project Test Plan

Introduction This section defines the project component test plan for the Office Suite project. Exceptions There are no exceptions to the Project Component Test Strategy. Dependencies The dependencies between components of different functional groups governs the order in which the component tests for a functional group of components should be performed. This order is shown below. Graphics (GFX) File Access (FAC) Message Handling (MES) Argument Handling (ARG) ... Figure 2.2 Example Project Component Test Plan. Note that this example is not complete, the list of functional groups has been cut short. Component test process The component test process follows the Fundamental Test Process described in Session 1. The five activities are: Component Test Planning; Component Test Specification; Component Test Execution; Component Test Recording; and Checking for Component Test Completion. The component test process always begins with Component Test Planning and ends with Checking for Component Test Completion. Any and all of the activities may be repeated (or at least revisited) since a number of iterations may be required before the completion criteria defined during the Component Test Planning activity are met. One activity does not have to be finished before another is started; later activities for one test case may occur before earlier activities for another.

Page 27 of 88

Test design techniques The Software Component Testing Standard BS7925-2 defines a number of test design and test measurement techniques that can be used for component testing. These include both black box and white box techniques. The standard also allows other test design and test measurement techniques to be defined so you do not have to restrict yourself to the techniques defined by the standard in order to comply with it.

Integration Testing in the Small


What is Integration Testing in the Small? Integration testing in the small is bringing together individual components (modules/units) that have already been tested in isolation. The objective is to test that the set of components function together correctly by concentrating on the interfaces between the components. We are trying to find faults that couldnt be found at an individual component testing level. Although the interfaces should have been tested in component testing, integration testing in the small makes sure that the things that are communicated are correct from both sides, not just from one side of the interface. This is an important level of testing but one that is sadly often overlooked. As more and more components are combined together then a subsystem may be formed which has more system like functionality that can be tested. At this stage it may also be useful to test non-functional aspects such as performance. For integration testing in the small there are two choices that have to be made: how many components to combine in one go; in what order to combine components. The decision over which choices to make are what is called the integration strategy. There are two main integration strategies: Big Bang and incremental. These are described in separate sections below. Big Bang integration "Big Bang" integration means putting together all of the components in one go. The philosophy is that we have already tested all of the components so why not just throw them all in together and test the lot? The reason normally given for this approach is that is saves time - or does it? If we encounter a problem it tends to be harder to locate and fix the faults. If the fault is found and fixed then re-testing usually takes a lot longer. In the end the Big Bang strategy does not work - it actually takes longer this way. This approach is based on the [mistaken] assumption that there will be no faults. Incremental integration Incremental integration is where a small number of components are combined at once. At a minimum, only one new component would be added to the baseline at each integration step. This has the advantage of much easier fault location and fixing, as well as faster and easier recovery if things do go badly wrong. (The finger of suspicion would point to the most recent addition to the baseline.)

Page 28 of 88

However, having decided to use an incremental approach to integration testing we have to make a second choice: in what order to combine the components. This decision leads to three different incremental integration strategies: top-down, bottom-up and functional incrementation. Top-down integration and Stubs As its name implies, top-down integration combines components starting with the highest levels of a hierarchy. Applying this strategy strictly, all components at a given level would be integrated before any at the next level down would be added. Because it starts from the top, there will be missing pieces of the hierarchy that have not yet been integrated into a baseline; in order to test the partial system that comprises the baseline, a stub is used to substitute for the missing components. A stub replaces a called component in integration testing in the small. It is a small self-contained program that may do no more than display its own name and then return. It is a good idea to keep stubs as simple as possible; otherwise they may end up being as complex as the components they are replacing. As with all integration in the small strategies, there are advantages and disadvantages to the approach. One advantage is that we are working to the same structure as the overall system and this will be tested most often as we build each baseline. Senior Managers tend to like this approach because the system can be demonstrated early (but beware that this can often paint a false impression of the systems readiness). The disadvantages are that is needs stubs (so too do the incremental integration strategies but this perhaps need more of them). Creating stubs means extra work though it should save more effort in the long run. The details of the system are not tested until last and yet these may be the most important parts of the software. Bottom-up integration and Drivers Bottom-up integration is the opposite of top-down. Applying it strictly, all components at the lowest levels of the hierarchy would be integrated before any of the higher level ones. Because the calling structure is missing, this strategy requires a way of activating the baseline, e.g. by calling the component at the top of a baseline. These small programs are called "drivers" because they drive the baseline. Drivers are also known as test harnesses or scaffolding. They are usually specifically written for each baseline though there are a few tools on the market which provide some general purpose support. Bottom-up integration may still need stubs as well though it is likely to use fewer of them. Functional integration The last integration strategy to be considered is what the syllabus refers to as "functional". We show two examples. Minimum capability is a functional integration strategy because it is aiming to achieve a basic functionality working with a minimum number of components integrated. Thread integration is minimum capability with respect to time; the history or thread of processing determines the minimum number of components to integrate together.

Page 29 of 88

Integration guidelines You will need to balance the advantages gained from adding small increments to your baselines with the effort needed to make that approach work well. For example, if you are spending more time writing stubs and drivers than you would have spent locating faults in a larger baseline, then you should consider having larger increments. However, for critical components, adding only one component at a time would probably be best. Keep stubs and drivers as simple as possible. If they are not written correctly they could invalidate the testing performed. If the planning for integration testing in the small is done at the right place in the life cycle, i.e. on the lefthand side of the V-model before any code has been written, then the integration order determines the order in which the components should be written by developers. This can save significant time.

System Testing
System testing has two important aspects, which are distinguished in the syllabus: functional system testing and non-functional system testing. The non-functional aspects are often as important as the functional, but are generally less well specified and may therefore be more difficult to test (but not impossible). An independent test group usually performs system testing. Functional System Testing Functional system testing gives us the first opportunity to test the system as a whole and is in a sense the final baseline of integration testing in the small. Typically we are looking at end to end functionality from two perspectives. One of these perspectives is based on the functional requirements and is called requirement-based testing. The other perspective is based on the business process and is called business process-based testing. Requirements-based testing Requirement-based testing uses a specification of the functional requirements for the system as the basis for designing tests. A good way to start is to use the table of contents of the requirement specification as an initial test inventory or list of items to test (or not to test). We should also prioritise the requirements based on risk criteria (if this is not already done in the specification) and use this prioritise the tests. This will ensure that the most important and most critical tests are included in the system testing effort. Business process-based testing Business process-based testing uses knowledge of the business profiles (or expected business profiles). Business profiles describe the birth to death situations involved in the day to day business use of the system. For example, a personnel and payroll system may have a business profile along the lines of: someone joins company, he or she is paid on a regular basis, he or she leaves the company.

Page 30 of 88

Another business process-based view is given by user profiles. User profiles describe how much time users spend in different parts of the system. For example, consider a simple bank system that has just three functions: account maintenance, account queries and report generation. Users of this system might spend 50% of their time using this system performing account queries, 40% of their time performing account maintenance and 10% of their time generating reports. User profile testing would require that 50% of the testing effort is spent testing account queries, 40% is spent testing account maintenance and 10% is spent testing report generation. Use cases are popular in object-oriented development. These are not the same as test cases, since they tend to be a bit "woolly" but they form a useful basis for test cases from a business perspective. Note that we are still looking for faults in system testing, this time in end-to-end functionality and in things that the system as a whole can do that could not be done by only a partial baseline.

Non-Functional System Testing


Load, performance & stress testing Performance tests include timing tests such as measuring response times to a PC over a network, but may also include response times for performing a database back-up for example. Load tests, or capacity or volume tests are test designed to ensure that the system can handle what has been specified, in terms of processing throughput, number of terminals connected, etc. Stress tests see what happens if we go beyond those limits. Usability testing Testing for usability is very important, but cannot be done well by technical people; it needs to have input from real users. Security testing Whatever level of security is specified for the system must be tested, such as passwords, level of authority, etc. Configuration and Installation testing There can be many different aspects to consider here. Different users may have different hardware configurations such as amount of memory; they may have different software as well, such as word processor versions or even games. If the system is supposed to work in different configurations, it must be tested in all or at least a representative set of configurations. Upgrade paths also need to be tested; sometimes an upgrade of one part of the system can be in conflict with other parts.

Page 31 of 88

How will the new system or software be installed on user sites? The distribution mechanism should be tested. The final intended environment may even have physical characteristics that can influence the working of the system. Reliability testing and other qualities If a specification says "the system will be reliable", this is untestable. Qualities such as reliability, maintainability, portability, availability etc. need to be expressed in measurable terms in order to be testable. Mean Time Between Failures (MTBF) is one way of quantifying reliability. A good way of specifying and testing for such qualities is found in Tom Gilb, Principles of Software Engineering Management, Addison-Wesley, 1988, and is described in an optional supplement to this course. Back-up and Recovery testing Testing recovery is more important that testing of back-ups; in fact, recovery is a test of the back-up procedures. Recovery tests should be carried out at regular intervals so that the procedures are rehearsed and somewhat familiar if they are ever needed for a real disaster. Documentation testing We produce documentation for two reasons: for users and for maintenance. Both types of documents can be reviewed or Inspected, but they should also be tested. A test of a user manual is to give it to a potential end user who knows nothing about the system and see if they can perform some standard tasks.

Integration Testing in the Large


What is Integration Testing in the Large? This stage of testing is concerned with the testing of the system with other systems and networks. There is an analogy of building a house - our finished house (system) now needs to talk with the outside world: our house needs electricity, gas, water, communications, TV etc to function properly. So too does our system - it needs to interface with different networks and operating systems and communications middleware; our house needs to co-exist with other houses and blend in with the community - so too does our system - it needs to sit alongside other systems such as billing, stock, personnel systems etc.; our new system may need information from outside the organisation such as interest rates, foreign exchange etc and this is obtained via external data interchange (EDI). A good example of EDI is the way in which our wages are transferred to our Bank Accounts. Our house receives things from outside organisations such as the Post Office or delivery trucks; our new system may be required to work with different 3rd Party Packages - not directly involved with the System Under Test. Different faults will be found during this level of testing and we must be prepared to plan and execute such tests if they are considered vital for the success of our business. In reality this level of testing will probably be done in conjunction with system testing rather than as a separate testing stage. However it is now a visible testing stage, and integration testing in the large is an explicit testing phase in the syllabus.

Page 32 of 88

In terms of planning - it should be planned the same way as Integration testing in the small (i.e. testing interfaces/connections one at a time). This will reduce the risk of not being able to locate the faults quickly. Like all testing stages we must identify the risks during the planning phase - which areas would cause most severity if they were not to work? Perhaps we are developing a piece of software that is to be used at a number of different locations throughout the world - then testing the system within a Local Area Network (LAN) and comparing with the response over a Wide Area Network (WAN) is essential. When we plan Integration Testing in the Large there are a number of resources we might need, such as different operating systems, different machine configurations and different network configurations. These must all be thought through before the testing actually commences. We must consider what machines we will need and it might be worthwhile talking to some of the hardware manufacturers as they sometimes offer test sites with different machine configurations set up.

Acceptance Testing
User Acceptance Testing User Acceptance Testing is the final stage of validation. This is the time that customers get their hands on the system (or should do) and the end product of this is usually a sign-off from the users. One of the problems is that this is rather late in the project for users to be involved - any problems found now are too late to do anything about them. This is one reason why Rapid Application Development (RAD) has become popular - users are involved earlier and testing is done earlier. However, the users should have been involved in the test specification of the Acceptance Tests at the start of the project. They should also have been involved in reviews throughout the project, and there is nothing to say that they cannot be involved in helping to design System and Integration tests. So there really should be no surprises! The approach in this stage is a mixture of scripted and unscripted and the model office concept is sometimes used. This is where a replica of the real environment is set up. Why users should be involved It is the end users' responsibility to perform acceptance testing. Sometimes the users are tempted to say to the technical staff: "You know more about computers than we do, so you do the acceptance testing for us". This is like asking the used car salesman to take a test drive for you! The users bring the business perspective to the testing. They understand how the business actually functions in all of its complexity. They will know of the special cases that always seem to cause problems. They can also help to identify sensible work-arounds, and they gain a detailed understanding of the system if they are involved in the acceptance testing. The differences between system testing and acceptance testing are: done by users, not technical staff; focuses on building confidence rather than finding faults;

Page 33 of 88

focuses on business-related cases rather than obscure error handling. Contract acceptance testing If a system is the subject of a legally binding contract, there may be aspects directly related to the contract that need to be tested. It is important to ensure that the contractual documents are kept up to date; otherwise you may be in breach of a contract while delivering what the users want (instead of what they specified two years ago). However, it is not fair for users to expect that the contract can be ignored, so the testing must be against the contract and any agreed changes. Alpha and Beta testing Both alpha and beta testing are normally used by software houses that produce mass-market shrinkwrapped software packages. This stage of testing is after system testing; it may include elements of integration testing in the large. The alpha or beta testers are given a pre-release version of the software and are asked to give feedback on the product. Alpha and beta testing is done where there are no identifiable "end users" other than the general public. The difference between alpha and beta testing is where they are carried out. Alpha testing is done on the development site - potential customers would be invited in to their offices. Beta testing is done on customer sites - the software is sent out to them.

Maintenance testing
What is Maintenance testing? Maintenance Testing is all about preserving the quality we have already achieved. We do not want the system to regress. It is worth noting that there is a different sequence with Maintenance Testing. In development we start from small components and work up to the full system; in maintenance testing, we can start from the top with the whole system. This means that we can make sure that there is no effect on the whole system before testing the individual fix. We also have different data - there is live data available in maintenance testing, whereas in development testing we had to build the test data. A breadth test is a shallow but broad test over the whole system, often used as a regression suite. Depth tests explore specific areas such as changes and fixes. Impact analysis investigates the likely effects of changes, so that the testing can be deeper in the riskier areas. Poor or missing specifications It is often argued that Maintenance Testing is the hardest type of testing to do because: there are no specifications; any documentation is out-of-date; lack of regression test scripts; knowledge base is limited due to age of the system (and programmers!).

Page 34 of 88

If you do not have good specifications, it can be argued that you cannot test. The specification is the oracle that tells the tester what the system should do. So what do we do? Although this is a difficult situation, it is very common, and there are ways to deal with it. Make contact with those who know the system, i.e. the users. Find out from them what the system does do, if not what it should do. Anything that you do learn: document. Document your assumptions as well so that other people have a better place to start than you did. Track what it is costing the company in not having good, well maintained specs To find out what the system should do, you will need some form of oracle. This could be the way the system works now - many Year 2000 tests used the current system as the oracle for date-changed code. Another suggestions is to look in user manuals or guides (if they exist). Finally, you may need to go back to the experts and "pick their brains". You can validate what is already there but not verify it (nothing to verify against).

Page 35 of 88

Static Testing (session 3)


This session looks at Static Testing techniques. These techniques are referred to as "static" because the software is not executed; rather the specifications, documentation and source code that comprise the software are examined in varying degrees of detail. There are two basic types of static testing. One of these is people-based and the other is tool-based. Peoplebased techniques are generally known as reviews but there are a variety of different ways in which reviews can be performed. The tool-based techniques examine source code and are known as "static analysis". Both of these basic types are described in separate sections below.

What are Reviews?


Reviews is the generic name given to people-based static techniques. More or less any activity that involves one or more people examining something could be called a review. There are a variety of different ways in which reviews are carried out across different organisations and in many cases within a single organisation. Some are very formal, some are very informal, and many lie somewhere between the two. The chances are that you have been involved in reviews of one form another. One person can perform a review of his or her own work or of someone elses work. However, it is generally recognised that reviews performed by only one person are not as effective as reviews conducted by a group of people all examining the same document (or whatever it is that is being reviewed). Review techniques for individuals Desk checking and proof reading are two techniques that can be used by individuals to review a document such as a specification or a piece of source code. They are basically the same processes: the reviewer double-checks the document or source code on their own. Data stepping is a slightly different process for reviewing source code: the reviewer follows a set of data values through the source code to ensure that the values are correct at each step of the processing. Review techniques for groups The static techniques that involve groups of people are generically referred to as reviews. Reviews can vary a lot from very informal to highly formal, as will be discussed in more detail shortly. Two examples of types of review are walkthroughs and Inspection. A walkthrough is a form of review that is typically used to educate a group of people about a technical document. Typically the author "walks" the group through the ideas to explain them and so that the attendees understand the content. Inspection is the most formal of all the formal review techniques. Its main focus during the process is to find faults, and it is the most effective review technique in finding them (although the other types of review also find some faults). Inspection is discussed in more detail below.

Page 36 of 88

Reviews and the test process


Benefits of reviews There are many benefits from reviews in general. They can improve software development productivity and reduce development timescales. They can also reduce testing time and cost. They can lead to lifetime cost reductions throughout the maintenance of a system over its useful life. All this is achieved (where it is achieved) by finding and fixing faults in the products of development phases before they are used in subsequent phases. In other words, reviews find faults in specifications and other documents (including source code) which can then be fixed before those specifications are used in the next phase of development. Reviews generally reduce fault levels and lead to increased quality. This can also result in improved customer relations. Reviews are cost-effective There are a number of published figures to substantiate the cost-effectiveness of reviews. Freedman and Weinberg quote a ten times reduction in faults that come into testing with a 50% to 80% reduction in testing cost. Yourdon in his book on Structured Walkthroughs found that faults were reduced by a factor of ten. Gilb and Graham give a number of documented benefits for software Inspection, including 25% reduction in schedules, a 28 times reduction in maintenance cost, and finding 80% of defects in a single pass (with a mature Inspection process) and 95% in multiple passes. What can be Inspected? Anything written down can be Inspected. Many people have the impression that Inspection applies mainly to code (probably because Fagan's original article was on "Design and code inspection"). However, although Inspection can be performed on code, it gives more value if it is performed on more "upstream" documents in the software development process. It can be applied to contracts, budgets, and even marketing material, as well as to policies, strategies, business plans, user manuals, procedures and training material. Inspection also applies to all types of system development documentation, such as requirements, feasibility studies and designs. It is also very appropriate to apply to all types of test documentation such as test plans, test designs and test cases. In fact even with Fagan's original method, it was found to be very effective applied to testware. What can be reviewed? Anything that can be Inspected can also be reviewed, but reviews can apply to more things than just those ideas that are written down. Reviews can be done on visions, strategic plans and "big picture" ideas. Project progress can be reviewed to assess whether work is proceeding according to the plans. A review is also the place where major decisions may be made, for example about whether or not to develop a given feature. Reviews and Inspections are complementary. Inspection excludes discussion and solution optimising, but these activities are often very important. Any type of review that tries to combine more than one objective tends not to work as well as those with a single focus. It works better to use Inspection to find faults and to use reviews to discuss, come to a consensus and make decisions. What to review / Inspect? Looking at the V life cycle diagram that was discussed in Session 2, reviews and Inspections apply to everything on the left-hand side of the V-model. Note that the reviews apply not only to the products of development but also to the test documentation that is produced early in the life cycle. We have found that reviewing the business needs alongside the Acceptance Tests works really well. It clarifies issues that might otherwise have been overlooked. This is yet another way to find faults as early as possible in the life cycle so that they can be removed at the least cost.

Page 37 of 88

Costs of reviews You cannot gain the benefits of reviews without investing in doing them, and this does have a cost. As a rough guide, something between 5% and 15% of project effort would typically be spent on reviews. If Inspections are being introduced into an organisation, then 15% is a recommended guideline. Once the Inspection process is mature, this may go down to around 5%. Note that 10% is half a day a week. Remember that the cost of reviews always needs to be balanced against the cost of not doing them, and finding the faults (which are already there) much later when it will be much more expensive to fix them. The costs of reviews are mainly in people's time, i.e. it is an effort cost, but the cost varies depending on the type of review. The leader or moderator of the review may need to spend time in planning the review (this would not be done for an informal review, but is required for Inspection). The studying of the documents to be reviewed by each participant on their own is normally the main cost (although in practice this may not be done as thoroughly as it should). If a meeting is held, the cost is the length of the meeting times the number of people present. The fixing of any faults found or the resolution of issues found may or may not be followed up by the leader. In the more formal review techniques, metrics or statistics are recorded and analysed to ensure the continued effectiveness and efficiency of the review process. Process improvement should also be a part of any review process, so that lessons learned in a review can be folded back into development and testing processes. (Inspection formally includes process improvement; most other forms of review do not.)

Types of review
We have now established that reviews are an important part of software testing. Testers should be involved in reviewing the development documents that tests are based on, and should also review their own test documentation. In this section, we will look at different types of reviews, and the activities that are done to a greater or lesser extent in all of them. We will also look at the Inspection process in a bit more detail, as it is the most effective of all review types. Characteristics of different review types Informal review As its name implies, this is very much an ad hoc process. Normally it simply consists of someone giving their document to someone else and asking them to look it over. A document may be distributed to a number of people, and the author of the document would hope to receive back some helpful comments. It is a very cheap form of review because there is no monitoring of metrics, no meeting and no follow-up. It is generally perceived to be useful, and compared to not doing any reviews at all, it is. However, it is probably the least effective form of review (although no one can prove that since no measurements are ever done!) Technical review or Peer review A technical review may have varying degrees of formality. This type of review does focus on technical issues and technical documents. A peer review would exclude managers from the review. The success of this type of review typically depends on the individuals involved - they can be very effective and useful, but sometimes they are very wasteful (especially if the meetings are not well disciplined), and can be rather subjective. Often this level of review will have some documentation, even if just a list of issues raised. Sometimes metrics will be kept. This type of review can find important faults, but can also be used to resolve difficult technical problems, for example deciding on the best way to implement a design. Decision-making review This type of review is closely related to the previous one (in fact the syllabus does not distinguish them). In this type of review, which may be technical or managerial, the focus is on discussing the issues, coming to

Page 38 of 88

a consensus and making decisions, for example about whether a given feature should be included in the next release or not. Walkthrough A walkthrough is typically led by the author of a document, for the purpose of educating the participants about the content so that everyone understands the same thing. A walkthrough may include "dry runs" of business scenarios to show how the system would handle certain specific situations. For technical documents, it is often a peer group technique. Inspection An Inspection is the most formal of the formal review techniques. There are strict entry and exit criteria to the Inspection process, it is led by a trained Leader or moderator (not the author), there are defined roles for searching for faults based on defined rules and checklists. Metrics are a required part of the process. Characteristics of reviews in general Objectives and goals The objectives and goals of reviews in general normally include the verification and validation of documents against specifications and standards. Some types of review also have an objective of achieving a consensus among the attendees (but not Inspection). Some types of review have process improvement as a goal (this is formally included in Inspection). Activities There are a number of activities that may take place for any review. The planning stage is part of all except informal reviews. In Inspection (and possibly other reviews), an overview or kickoff meeting is held to put everyone "in the picture" about what is to be reviewed and how the review is to be conducted. This pre-meeting may be a walkthrough in its own right. The preparation or individual checking is usually where the greatest value is gained from a review process. Each person spends time on the review document (and related documents), becoming familiar with it and/or looking for faults. In some reviews, this part of the process is optional (at least in practice). In Inspection it is required. Most reviews include a meeting of the reviewers. Informal reviews probably do not, and Inspection does not hold a meeting if it would not add economic value to the process. Sometimes the meeting time is the only time people actually look at the document. Sometimes the meetings run on for hours and discuss trivial issues. The best reviews (of any level of formality) ensure that value is gained from the meeting. The more formal review techniques include follow-up of the faults or issues found to ensure that action has been taken on everything raised (Inspection does, as do some forms of technical or peer review). The more formal review techniques collect metrics on cost (time spent) and benefits achieved. Roles and responsibilities For any of the formal reviews (i.e. not informal reviews), there is someone responsible for the review of a document (the individual review cycle). This may be the author of the document (walkthrough) or an independent Leader or moderator (formal reviews and Inspection). The responsibility of the Leader is to ensure that the review process works. He or she may distribute documents, choose reviewers, mentor the reviewers, call and lead the meeting, perform follow-up and record relevant metrics.

Page 39 of 88

The author of the document being reviewed or Inspected is generally included in the review, although there are some variants that exclude the author. The author actually has the most to gain from the review in terms of learning how to do their work better (if the review is conducted in the right spirit!). The reviewers or Inspectors are the people who bring the added value to the process by helping the author to improve his or her document. In some types of review, individual checkers are given specific types of fault to look for to make the process more effective. Managers have an important role to play in reviews. Even if they are excluded from some types of peer review, they can (and should) review management level documents with their peers. They also need to understand the economics of reviews and the value that they bring. They need to ensure that the reviews are done properly, i.e. that adequate time is allowed for reviews in project schedules. There may be other roles in addition to these, for example an organisation-wide co-ordinator who would keep and monitor metrics, or someone to "own" the review process itself - this person would be responsible for updating forms, checklists, etc. Deliverables The main deliverable from a review is the changes to the document that was reviewed. The author of the document normally edits these. For Inspection, the changes would be limited to faults found as violations of accepted rules. In other types of review, the reviewers suggest improvements to the document itself. Generally the author can either accept or reject the changes suggested. If the author does not have the authority to change a related document (e.g. if the review found that a correct design conflicted with an incorrect requirement specification), then a change request may be raised to change the other document(s). For Inspection and possibly other types of review, process improvement suggestions are a deliverable. This includes improvements to the review or Inspection process itself and also improvements to the development process that produced the document just reviewed. (Note that these are improvements to processes, not to reviewed documents.) The final deliverable (for the more formal types of review, including Inspection) is the metrics about the costs, faults found, and benefits achieved by the review or Inspection process. Pitfalls Reviews are not always successful. They are sometimes not very effective, so faults that could have been found slip through the net. They are sometimes very inefficient, so that people feel that they are wasting their time. Often insufficient thought has gone into the definition of the review process itself - it just evolves over time. One of the most common causes for poor quality in the review process is lack of training, and this is more critical the more formal the review. Another problem with reviews is having to deal with documents that are of poor quality. Entry criteria to the review or Inspection process can ensure that reviewers' time is not wasted on documents that are not worthy of the review effort. A lack of management support is a frequent problem. If managers say that they want reviews to take place but don't allow any time in the schedules for the, this is only "lip service" not commitment to quality. Long-term, it can be disheartening to become expert at detecting faults if the same faults keep on being injected into all newly written documents. Process improvements are the key to long-term effectiveness and efficiency. Inspection Typical reviews versus Inspection There are a number of differences between the way most people practice reviews and the Inspection process as described in Software Inspection by Gilb and Graham, Addison-Wesley, 1993.

Page 40 of 88

In a typical review, the document is given out in advance, there are typically dozens of pages to review, and the instructions are simply "Please review this." In Inspection, it is not just the document under review that is given out in advance, but also source or predecessor documents. The number of pages to focus the Inspection on is closely controlled, so that Inspectors (checkers) check a limited area in depth - a chunk or sample of the whole document. The instructions given to checkers are designed so that each individual checker will find the maximum number of unique faults. Special defect-hunting roles are defined, and Inspectors are trained in how to be most effective at finding faults. In typical reviews, sometimes the reviewers have time to look through the document before the meeting, and some do not. The meeting is often difficult to arrange and may last for hours. In Inspection, it is an entry criterion to the meeting that each checker has done the individual checking. The meeting is highly focused and efficient. If it is not economic, then a meeting may not be held at all, and it is limited to two hours. In a typical review, there is often a lot of discussion, some about technical issues but much about trivia. Comments are often mainly subjective, along the lines of "I don't like the way you did this" or "Why didn't you do it this way?" In Inspection, the process is objective. The only thing that is permissible to raise as an issue is a potential violation of an agreed Rule (the Rulesets are what the document should conform to). Discussion is severely curtailed in an Inspection meeting or postponed until the end. The Leader's role is very important to keep the meetings on track and focused and to keep pulling people away from trivia and pointless discussion. Many people keep on doing reviews even if they don't know whether it is worthwhile or not. Every activity in the Inspection process is done only if its economic value is continuously proven. Inspection is more Inspection contains many mechanisms that are additional to those found in other formal reviews. These include the following: Entry criteria, to ensure that we don't waste time Inspecting an unworthy document; Training for maximum effectiveness and efficiency; Optimum checking rate to get the greatest value out of the time spent by looking deep; Prioritising the words: Inspect the most important documents and their most important parts; Standards are used in the Inspection process; there are a number of Inspection standards also; Process improvement is built in to the Inspection process Exit criteria ensure that the document is worth and that the Inspection process was carried out correctly One of the most powerful exit criteria is the quantified estimate of the remaining defects per page. This may be say 3 per page initially, but can be brought down by orders of magnitude over time. Inspection is better Typical reviews are probably only 10% to 20% effective at detecting existing faults. The return on investment is usually not known because no one keeps track even of their cost. When Inspection is still being learned, its effectiveness is around 30% to 40% (this is demonstrated in Inspection training courses). Once Inspection is well established and mature, this process can find up to 80% of faults in a single pass, 95% in multiple passes. The return on investment ranges from 6 hours to 30 for every hour spent.

Page 41 of 88

The Inspection process The diagram shows a product document infected with faults. The document must pass through the entry gate before it is allowed to start the Inspection process. The Inspection Leader performs the planning activities. A Kickoff meeting is held to "set the scene" about the documents and the process. The Individual Checking is where most of the benefits are gained. 80% or more of the faults found will be found in this stage. A meeting is held (if economic). The editing of the document is done by the author or the person now responsible for the document. This involves redoing some of the activities that produced the document initially, and it also may require Change Requests to documents not under the control of the editor. Process improvement suggestions may be raised at any time, for improvements either to the Inspection process or to the development process. The document must pass through the Exit gate before it is allowed to leave the Inspection process. There are two aspects to investigate here: is the product document now ready (e.g. has some action been taken on all issues logged), and was the Inspection process carried out properly? For example, if the checking rate was too fast, then the checking has not been done properly. A gleaming new improved document is the result of the process, but there is still a "blob" on it. It is not economic to be 100% effective in Inspection. At least with Inspection you consciously predict the levels of remaining faults rather than fallaciously assuming that we have found them all! How the checking rate enables deep checking in Inspection There is a dramatic difference of Inspection to normal reviews, and that is in the depth of checking. This is illustrated by the picture of a document. Initially there are no faults visible. Typically in reviews, the time and size of document determine the checking rate. So for example if you have 2 hours available for a review and the document is 100 pages long, then the checking rate will be 50 page per hour. (Any two of these three factors determine the third.) This is equivalent to "skimming the surface" of the document. We will find some faults - in this example we have found one major and two minor faults. Our typical reaction is now to think: "This review was worthwhile wasn't it - it found a major fault. Now we can fix that and the two other minor faults, and the document will now be OK." Think: are we missing anything here? Inspection is different. We do not take any more time, but it is the optimum rate for the type of document that is used to determine the size of the document that will be checked in detail. So if the optimum rate is one page per hour and we have two hours, then the size of the sample or chunk will be 2 pages. (Note that the optimum rate needs to be established over time for different types of document and will depend on a number of factors, and it is based on prioritised words (logical page rather than physical page). Of course it doesn't take an hour just to read a single page, but the checking done in Inspection includes comparing each paragraph or sentence on the target page with all source documents, checking each paragraph or phrase against relevant rule sets, both generic and specific, working through checklists for different role assignments, as well as the time to read around the target page to set the context. If checking is done to this level of thoroughness, it is not at all difficult to spend an hour on one page!) How does this depth-oriented approach affect the faults found? On the picture, we have gone deep in the Inspection on a limited number of pages. We have found the major one found in the other review plus two (other) minors, but we have also found a deep-seated major fault, which we would never have seen or even suspected if we had not spent the time to go deep. There is no guarantee that the most dangerous faults are lying near the surface! When the author comes to fix this deep-seated fault, he or she can look through the rest of the document for similar faults, and all of them can then be corrected. So in this example we will have corrected 5 major faults instead of one. This gives tremendous leverage to the Inspection process - you can fix faults you didn't find!

Page 42 of 88

Inspection surprises To summarise the Inspection process, there are a number of things about Inspection which surprise people. The fundamental importance of the Rules is what makes Inspection objective rather than a subjective review. The Rules are democratically agreed as applying (this helping to defuse author defensiveness) and by definition a fault is a Rule violation. The slow checking rates are surprising, but the value to be gained by depth gives far greater long-term gains than surface-skimming review that miss major deep-seated problems. The strict entry and exit criteria help to ensure that Inspection gives value for money. The logging rates are much faster than in typical reviews (30 to 60 seconds; typical reviews log one thing in 3 to 10 minutes). This ensures that the meeting is very efficient. One reason this works is that the final responsibility for all changes is fully given to the author, who has total responsibility for final classification of faults as well as the content of all fixes. More information on Inspection can be found in the book Software Inspection, Tom Gilb and Dorothy Graham, Addison-Wesley, 1993, ISBN 0-201-63181-4.

Static analysis
What can static analysis do? Static analysis is a form of automated testing. It can check for violations of standards and can find things that may or may not be faults. Static analysis is descended from compiler technology. In fact, many compilers may have static analysis facilities available for developers to use if they wish. There are also a number of stand-alone static analysis tools for various different computer programming languages. Like a compiler, the static analysis tool analyses the code without executing it, and can alert the developer to various things such as unreachable code, undeclared variables, etc. Static analysis tools can also compute various metrics about code such as cyclomatic complexity. Data flow analysis Data flow analysis is the study of program variables. A variable is basically a location in the computer's memory that has a name so that the programmer can refer to it more conveniently in the source code. When a value is put into this location, we say that the variable is "defined". When that value is accessed, we say that it is "used". For example, in the statement "x = y + z", the variables y and z are used because the values that they contain are being accessed and added together. The result of this addition is then put into the memory location called x, so x is defined. The significance of this is that static analysis tools can perform a number of simple checks. One of these checks is to ensure that every variable is defined before it is used. If a variable is not defined before it is used, the value that it contains may be different every time the program is executed and in any case is unlikely to contain the correct value. This is an example of a data flow fault. Another check that a static analysis tool can make is to ensure that every time a variable is defined it is used somewhere later on in the program. If it isnt, then why was defined in the first place? This is known as a data flow anomaly and although can be a perfectly harmless fault, it can also indicate something more serious is at fault. Control flow analysis Control flow analysis can find infinite loops, inaccessible code, and many other suspicious aspects. However, not all of the things found are necessarily faults; defensive programming may result in code that is technically unreachable.

Page 43 of 88

Cyclomatic complexity Cyclomatic complexity is related to the number of decisions in a program or control flow graph. The easiest way to compute it is to count the number of decisions (diamond-shaped boxes) on a control flow graph and add 1. Working from code, count the total number of IF's and any loop constructs (DO, FOR, WHILE, REPEAT) and add 1. The cyclomatic complexity does reflect to some extent how complex a code fragment is, but it is not the whole story. Other static metrics Lines of code (LOC or KLOC for 1000s of LOC) is a measure of the size of a code module. Operands and operators is a very detailed measurement devised by Halstead, but not much used now. Fan-in is related to the number of modules that call (in to) a given module. Modules with high fan-in are found at the bottom of hierarchies, or in libraries where they are frequently called. Modules with high fan-out are typically at the top of hierarchies, because they call out to many modules (e.g. the main menu). Any module with both high fan-in and high fan-out probably needs re-designing. Nesting levels relate to how deeply nested statements are within other IF statements. This is a good metric to have in addition to cyclomatic complexity, since highly nested code is harder to understand than linear code, but cyclomatic complexity does not distinguish them. Other metrics include the number of function calls and a number of metrics specific to object-oriented code. Limitations and advantages Static analysis has its limitations. It cannot distinguish "fail-safe" code from real faults or anomalies, and may create a lot of spurious failure messages. Static analysis tools do not execute the code, so they are not a substitute for dynamic testing, and they are not related to real operating conditions. However, static analysis tools can find faults that are difficult to see and they give objective quality information about the code. We feel that all developers should use static analysis tools, since the information they can give can find faults very early when they are very cheap to fix.

Page 44 of 88

Dynamic Testing Techniques (session 4)


About Testing Techniques
The need for testing techniques In Session 1 (Section 1.2.4) we explained that testing everything is known as exhaustive testing (defined as exercising every combination of inputs and preconditions) and demonstrated that it is an impractical goal. Therefore, as we cannot test everything we have to select a subset of all possible tests. In practice the subset we select is a very tiny subset and yet it has to have a high probability of finding most of the faults in a system. Experience and experiments have shown us that selecting a subset at random is neither very effective nor very efficient (even if it is tool supported). We have to select tests using some intelligent thought process. Test techniques are such thought processes. What is a testing technique? A testing technique is a thought process that helps us select a good set of tests from the total number of all possible tests for a given system. Different techniques offer different ways of looking at the software under test, possibly challenging assumptions made about it. Each technique provides a set of rules or guidelines for the tester to follow in identifying test conditions and test cases. They are based on either a behavioural or a structural model of the system In other words, they are based on an understanding of the system's behaviour (functions and non-functional attributes such as performance or ease of use - what the system does) or its structure, how it does it. There are a lot of different testing techniques and those that have been published have been found to be successful at identifying tests that find faults. The use of testing techniques is 'best practice' though they should not be used to the exclusion of any other approach. Put simply, a testing technique is a means of identifying good tests. Recall from Section 1.2.6 that a good test case is four things: effective - has potential to find faults; exemplary - represents other test cases; evolvable - easy to maintain; economic - doesnt cost much to use.

Advantages of Techniques Different people using the same technique on the same system will almost certainly arrive at different test cases but they will have a similar probability of finding faults. This is because the technique will guide them into having a similar or the same view of the system and to make similar or the same assumptions.

Page 45 of 88

Using techniques makes testing more effective

Using techniques makes testing more effective. This means that more faults will be found with less effort. Because a technique focuses on a particular type of fault it becomes more likely that the tests will find more of that type of fault. By selecting appropriate testing techniques it is possible to control more accurately what is being tested and so reduces the chances of overlap between different test cases. Systematic techniques are measurable, meaning that it is possible to quantify the extent of their use making it possible to gain an objective assessment of the thoroughness of testing with respect to the use of each testing technique. This is useful for comparison of one test effort to another and for providing confidence in the adequacy of testing.

Black and White Box Testing


Types of Testing Technique There are many different types of software testing technique, each with its own strengths and weaknesses. Each individual technique is good at finding particular types of fault and relatively poor at finding other types. For example, a technique that explores the upper and lower limits of a single input range is more likely to find boundary value faults than faults associated with combinations of inputs. Similarly, testing performed at different stages in the software development life cycle is going to find different types of faults; component testing is more likely to find coding faults than system design faults. Each testing technique falls into one of a number of different categories. Broadly speaking there are two main categories, static and dynamic. However, dynamic techniques are subdivided into two more categories, structural and behavioural. Behavioural techniques can be further subdivided into functional and non-functional techniques. Each of these is summarised below. Static Testing Techniques

As the name implies, static testing techniques are used before the software is executed. They could be called non-execution techniques. Most static testing techniques can be used to test any form of document including source code, design, functional and requirement specifications. However, static analysis is a tool supported version that concentrates on testing formal languages and so is most often used to statically test source code.

Page 46 of 88

Functional Testing Techniques (Black Box)

Functional testing techniques are also known as black-box and input / output-driven testing techniques because they view the software as a black box with inputs and outputs, but have no knowledge of how it is structured inside the box. In essence, the tester is concentrating on the function of the software, that is, what it does, not how it does it. Structural Testing Techniques (White Box)

Structural testing techniques use the internal structure of the software to derive test cases. They are commonly called white-box or glass-box techniques (implying you can see into the system) since they require knowledge of how the software is implemented, that is, how it works. For example, a structural technique may be concerned with exercising loops in the software. Different test cases may be derived to exercise the loop once, twice, and many times. This may be done regardless of the functionality of the software. Non-Functional Testing Techniques Non-functional aspects of a system (also known as quality aspects) include performance, usability portability, maintainability, etc. This category of technique is concerned with examining how well the system does something, not what it does or how it does it. Techniques to test these non-functional aspects are less procedural and less formalised than those of other categories as the actual tests are more dependent on the type of system, what it does and the resources available for the tests. How to specify non-functional tests is outside the scope of the syllabus for this course but an approach to doing so is outlined in the supplementary section at the back of notes. The approach uses quality attribute templates, a technique from Tom Gilbs book Principles of Software Engineering Management, Addison-Wesley, 1988. Black Box versus White Box Black box techniques are appropriate at all stages of testing (Component Testing through to User Acceptance Testing). While individual components form part of the structure of a system, when performing Component Testing it is possible to view the component itself as a black box, that is, design test cases based on its functionality without regard for its structure. Similarly, white box techniques can be used at all stages of testing but are typically used most predominately at Component and Integration Testing in the Small.

Page 47 of 88

Black Box Test Techniques


Techniques Defined in BS 7925-2 The Software Component Testing Standard BS7925-2 defines the following black-box testing techniques: Equivalence Partitioning; Boundary Value Analysis; State Transition Testing; Cause-Effect Graphing; Syntax Testing; Random Testing. The standard also defines how other techniques can be specified. This is important since it means that anyone wishing to conform to the Software Component Testing Standard is not restricted to using the techniques that the standard defines. Equivalence Partitioning & Boundary Value Analysis Equivalence partitioning Equivalence Partitioning is a good all-round functional black-box technique. It can be applied at any level of testing and is often a good technique to use first. It is a common sense approach to testing, so much so that most testers practise it informally even though they may not realise it. However, while it is better to use the technique informally than not at all, it is superior to use the technique in a formal way to attain the full benefits that it can deliver. The idea behind the technique is to divide or partition a range of test conditions into groups or sets that can be considered the same or equivalent, hence 'equivalence partitioning'. Equivalence partitions are also known as equivalence classes, the two terms mean exactly the same thing. The benefit of doing this is that we need test only one condition from each partition. This is because we are assuming that all the conditions in one partition will be treated in the same way by the software. If one condition in a partition works, we assume all of the conditions in that partition will work and so there is no point in testing any of these others. Conversely, if one of the conditions in a partition does not work, then we assume that none of the conditions in that partition will work so again there is no point in testing any more in that partition. Of course these are simplifying assumptions that may not always be right but writing them down at least gives other the chance to challenge the assumptions and hopefully help identify more accurate equivalence partitions. For example, a savings account in a bank earns a different rate of interest depending on the balance in the account. In order to test the software that calculates the interest due we can identify the ranges of balance values that each earns a different rate of interest. If we say that a balance in the range 0 to 100 has a 3% interest rate, a balance between 100 and 1,000 has a 5% interest rate, and balances of 1,000 and over have a 7% interest rate. We would initially identify three equivalence partitions: 0 - 100, 100.01 999.99, and 1,000 and above. When designing the test cases for this software we would ensure that these three equivalence partitions were each covered once. So we might choose to calculate the interest on balances of 50, 260 and 1,348. Had we not have identified these partitions it is possible that at least one

Page 48 of 88

of them could have been missed at the expense of testing another one several times over (such as with the balances of 30, 140, 250, and 400). Boundary value analysis Boundary value analysis is based on testing on and around the boundaries between partitions. If you have done "range checking", you were probably using the boundary value analysis technique, even if you weren't aware of it. Note that we have both valid boundaries (in the valid partitions) and invalid boundaries (in the invalid partitions). Design Test Cases Having identified the conditions that you wish to test, the next step is to design the test cases. The more test conditions that can be covered in a single test case, the fewer the test cases that are needed. Generally, each test case for invalid conditions should cover only one condition. This is because programs typically stop processing input as soon as they encounter the first fault. However, if it is known that the software under test is required to process all input regardless of its validity it is sensible to continue as before and design test cases that cover as many invalid conditions in one go as possible. In either case, there should be separate test cases covering valid and invalid conditions. The test cases to cover the boundary conditions are done in a similar way. Why do both EP and BVA? Technically, because every boundary is in some partition, if you did only boundary value analysis (BVA) you would also have tested every equivalence partition (EP). However this approach will cause problems when the value fails was it only the boundary value that failed or did the whole partition fail? Also by testing only boundaries we would probably not give the users too much confidence as we are using extreme values rather than normal values. We recommend that you test the partitions separately from boundaries - this means choosing partition values that are NOT boundary values. What partitions and boundaries you exercise and which first depends on your objectives. If your goal is the most thorough approach, then follow the traditional approach and test valid partitions, then invalid partitions, then valid boundaries and finally invalid boundaries. However if you are under time pressure and cannot test everything (and who isn't), then your objective will help you decide what to test. If you are after user confidence with minimum tests, you may do valid partitions only. If you want to find as many faults as possible as quickly as possible, you may start with invalid boundaries. State Transition Testing Because this technique is not specifically required by the ISEB syllabus, it is not covered in this edition of the student notes.

Page 49 of 88

White Box Test Techniques


White box techniques are normally used after an initial set of tests has been derived using black box techniques. They are most often used to measure "coverage" - how much of the structure has been exercised or covered by a set of tests. Coverage measurement is best done using tools, and there are a number of such tools on the market. These tools can help to increase productivity and quality. They increase quality by ensuring that more structural aspects are tested, so faults on those structural paths can be found. They increase productivity and efficiency by highlighting tests that may be redundant, i.e. testing the same structure as other tests (although this is not necessarily a bad thing!) What are Coverage Techniques? Coverage techniques serve two purposes: test measurement and test case design. They are often used in the first instance to assess the amount of testing performed by tests derived from functional techniques. They are then used to design additional tests with the aim of increasing the test coverage. Coverage techniques are a good way of generating additional test cases that are different from existing tests and in any case they help ensure breadth of testing in the sense that test cases that achieve 100% coverage in any measure will be exercising all parts of the software. There is also danger in these techniques. 100% coverage does not mean 100% tested. Coverage techniques measure only one dimension of a multidimension concept. Two different test cases may achieve exactly the same coverage but the input data of one may find an error that the input data of the other doesnt. Further more, coverage techniques measure coverage of the software code that has been written, they cannot say anything about the software that has not been written. If a function has not been implemented only functional testing techniques will reveal the fact. In common with all structural testing techniques, coverage techniques are best used on areas of software code where more thorough testing is required. Safety critical code, code that is vital to the correct operation of a system, and complex pieces of code are all examples of where structural techniques are particularly worth applying. They should always be used in addition to functional testing techniques rather than as an alternative to them. Test coverage can be measured based on a number of different structural elements in software. The simplest of these is statement coverage which measures the number of executable statements executed by a set of tests and is usually expressed in terms of the percentage of all executable statements in the software under test. In fact, all coverage techniques yield a result which is the number of elements covered expressed as a percentage of the total number of elements. Statement coverage is the simplest and perhaps the weakest of all coverage techniques. The adjectives weak and strong applied to coverage techniques refers to their likelihood in finding errors. The stronger a technique the more errors you can expect to find with test cases designed using that technique with the same measure of coverage.

Page 50 of 88

Types of Coverage There are a lot of structural elements that can be used for coverage. Each technique uses a different element, the most popular are described in later sections. Besides statement coverage, there are number of different types of control flow coverage techniques most of which are tool supported. These include branch or decision coverage, LCSAJ (linear code sequence and jump) coverage, condition coverage and condition combination coverage. Any representation of a system is in effect a model against which coverage may be assessed. Call tree coverage is another example for which tool support is commercially available. Another popular, but often misunderstood, coverage measure is path coverage. Path coverage is usually taken to mean branch or decision coverage because both these techniques seek to cover 100% of the paths through the code. However, strictly speaking for any code that contains a loop, path coverage is impossible since a path that travels round the loop say 3 times is different from the path that travels round the same loop 4 times. This is true even if the rest of the paths are identical. So if it is possible to travel round the loop an unlimited number of times then there are an unlimited number of paths through that piece of code. For this reason it is more correct to talk about independent path segment coverage though the shorter term path coverage is frequently used. There is currently very little tool support available for data flow coverage techniques, though tool support is growing. Data flow coverage techniques include definitions, uses, and definition-use pairs. Other, more specific, coverage measures include things like database structural elements (records, fields, and sub-fields) and files. State transition coverage is also possible. It is worth checking for any new tools, as the test tool market can develop quite rapidly. How to Measure Coverage For most practical purposes coverage measurement is something that requires tool support. However, a knowledge of steps needed to measure coverage is useful in understanding the relative merits of each technique. 1. 2. 3. 4. 5. Decide on the structural element to be used. Count the structural elements. Instrument the code. Run the tests for which coverage measure is required. Using the output from the instrumentation, determine the percentage of elements exercised.

Instrumenting the code (step 3) implies inserting code along-side each structural element in order to record that the associated structural element has been exercised. Determine the actual coverage measure (step 5) is then a matter of analysing the recorded information. When a specific coverage measure is required or desired but not attained, additional test cases have to be designed with the aim of exercising some or all of the structural elements not yet reached. These are then run through the instrumented code and a new coverage measure determined. This is repeated until the required coverage measure is achieved.

Page 51 of 88

Finally the instrumentation should be removed. However, in practice the instrumentation should be done to a copy of the source code such that it can be deleted once you have finished measuring coverage. This avoids any errors that could be made when removing instrumentation. In any case all the tests ought to be re-run again on the uninstrumented code. Statement Coverage Statement coverage is the number of executable statements exercised by a test or test suite. This is calculated by: Statement Coverage = x 100% Typical ad hoc testing achieves 60% to 75% statement coverage. Branch & Decision Testing / Coverage Branch coverage is the number of branches (decisions) exercised by a test or test suite. This is calculated by: Branch Coverage = x 100% Typical ad hoc testing achieves 40% to 60% branch coverage. Branch coverage is stronger than statement coverage since it may require more test cases to achieve the same measure. For example, consider the code segment shown below. if a > b c=0 endif To achieve 100% statement coverage of this code segment just one test case is required which ensures variable a contains a value that is greater than the value of variable b. However, branch coverage requires each decision to have had both a true and false outcome. Therefore, to achieve 100% branch coverage, a second test case is necessary. This will ensure that the decision statement if a > b has a false outcome. Note that 100% branch coverage guarantees 100% statement coverage. Branch and decision coverage are actually slightly different for less than 100% coverage, but at 100% coverage they give exactly the same results.

Error Guessing
Although it is true that testing should be rigorous, thorough and systematic, this is not all there is to testing. There is a definite role for non-systematic techniques. Many people confuse error guessing with ad hoc testing. Ad hoc testing is unplanned and usually done before (or instead of) rigorous testing. Error guessing is done last as a supplement to rigorous techniques.

Page 52 of 88

Error guessing is a technique that should always be used after other more formal techniques. The success of error guessing is very much dependent on the skill of the tester as good testers know where the errors are most likely to lurk. Some people seem to be naturally good at testing and others are good testers because they have a lot of experience either as a tester or working with a particular system and so are able to pin point its weaknesses. This why error guessing is best done after more formal techniques. In using other techniques the tester is likely to gain a better understanding of the system, what it does and how it works. With a better understanding anyone is likely to be more able to think of ways in which the system may not work properly. There are no rules for error guessing. The tester is encouraged to think of situations in which the software may not be able to cope. Typical conditions to try include divide by zero, blank (i.e. no) input, empty files and the wrong kind of data (e.g. alphabetic characters where numeric are required). If anyone ever says of a system or the environment in which it is to operate That could never happen, it might be a good idea to test that condition as such assumptions about what will and will not happen in the live environment are often the cause of failures. Error guessing is known by a number of names including experience-driven testing, heuristic testing and lateral testing.

Page 53 of 88

Test Management (session 5)


This section looks at the process of managing our tests and test processes. Organisation issues such as types of test teams, responsibilities, etc. will be important if we are to see effective testing within our organisation. Strong disciplines such as Configuration Management not only assist the developer but also should be seen as a complete lifecycle discipline that includes our testware. Test management activities include estimation, monitoring, control and the recording and tracking of incidents. At any time we should know how well we are doing and understand what controlling actions we can take to keep testing on target.

Organisation
The importance of independence It is important to realise that companies will have different requirements when it comes to organisational structures. The different stages of testing will be performed within organisations with varying degrees of independence using a variety of different approaches. We have already seen that independence is important for effective testing. This is highlighted when we look at the effectiveness of reviews. Greater independence gives a more objective view of the document being reviewed/inspected. Authors are less likely to take things personally and reviewers less vulnerable to pressure the more independent the review process becomes. If we were to plot the number of faults found over a period of time if we were to release the product to the end users we would perhaps expect the number of faults found to diminish, but instead the number found increases. The reason for this is that the users have a different worldview. Therefore if our aim is to find as many faults as possible then we need to have as many of these different worldviews as possible. There are however advantages of both familiarity and independence and one should not replace familiarity with independence we need both. With independence we are providing that different, more objective assessment of the software and perhaps find faults others wouldnt. However the programmer knows and understands the software and will know where problems are most likely to occur. Organisational structures for testing We must recognise that whilst independence is important there are varying degrees of independence which have advantages and disadvantages associated. Each of the broad options is discussed in turn below.

Page 54 of 88

Developers Only This is where the programmer will test his or her own code. They know the code best and are more familiar with it, so they may find problems that a less technical tester would miss. They can also find and fix faults very cheaply at this stage. However, they might not be the best people to try to break it. There is a tendency to see what you meant instead of what is actually there, so they may miss things that an independent mind would see. It may also be a rather subjective assessment of their own work; they want to show how good they are, not how easily their software can be made to fall over! Development Team Responsibility (buddy testing) In this regime the developers work together and tests will be designed (and also probably run) by another developer other than the one who wrote the code. This is sometimes referred to as buddy testing. As the name suggests this is usually operated under friendly conditions. It should be considered more advantageous than the previous option as a certain level of independence is introduced whilst maintaining a technical perspective. However, as already intimated a technical perspective only is certainly insufficient when it comes to testing a system. Also, for buddy testing to work effectively, time must be allocated to design (and run) the tests by the buddy. This could take longer due to learning curves, work priority and pressure of their own work. Testers on the Development Team One of the members of a development team is assigned the responsibility for testing. This person may already be an experienced tester (with or without development experience) and could be brought into the team specifically to take the testing responsibility. Although working along-side the developers this person will not have a detailed knowledge of the system from a technical perspective. This gives greater independence in their testing and yet encourages a team spirit in which developers and tester are working toward the same goal. However, testers in this situation might find themselves undermined and unreasonable pressure placed on them to do all the testing because it is deemed to be their job. They may be corruptible by peer pressure and were it is only one tester on the team it provides only a single view. Dedicated team of testers This team is often referred to as the Independent Test Group (ITG) or Independent Test Unit (ITU) and will usually be totally independent of Development with different reporting structures. These teams usually are looked upon as the testing experts and will have a high level of testing experience. There can, however, be a high degree of over-reliance on this team to perform all necessary tests including those that should be undertaken by developers (particularly component testing). Alternatively little or no component testing is performed leaving the independent testers to find coding faults that could have been found and fixed more cheaply had component testing been performed. In these situations the independent testing team becomes a bottleneck and test responsibility, rather than shared, is left to the ITU.

Page 55 of 88

Whilst being a completely separate department has its advantages, it does have drawbacks namely confrontation and an over-the-wall mentality. Both of which need to be resolved if we are to see an effective and efficient test regime. Internal Test Consultants Ideally, a team of internal test consultants would comprise of highly specialised experts in the testing world. In practice they tend to be selected on the basis of having the most experience or perhaps the most enthusiasm for testing (not bad criteria but not necessarily the same as the ideal!). They offer advice and guidance on various testing issues such as procedures, test design, test automation, reviews, etc. and may perform health checks on testing throughout the company. They do not undertake the testing themselves but can assure consistency of testing across different projects. Advice and guidance is often required and they are usually in a better position to influence and challenge company procedures. However, they are not given authority and someone still has to undertake the testing. Outside Organisation Some companies provide a testing service. This can be undertaken on their site or they may send a number of testers to manage and perform the testing on the developer site. These companies usually specialise in a certain industry (such as insurance, finance, banking, etc.) and so can provide in depth specialist business knowledge. As they are outside organisations, they will not be drawn into internal politics. They can be expensive and any experience they gain will be lost from the project once their testing is complete. Usual choices The most common levels of independence at each of stages of testing is the following: component testing - performed by programmers (in some cases, buddy testing); integration testing in the small - poorly defined and seldom done well by anyone; system testing - performed by an independent test team; acceptance testing performed by users or user representatives. Resourcing issues Independence is important in testing, as an independent mind will see things that may be missed by the person who developed the software. However, familiarity also has benefits. It is not a question of achieving one or the other, but of achieving a good balance. Different levels of testing can use different approaches to achieve independence. For example, the use of test design techniques gives independence of thought. A test strategy should state what levels of independence are required for each level of testing. A good mix of skills is important within our projects. We must however consider the skill set for the team. The following are a useful reminder of the sort of skill set we will need in order to facilitate good testing. Technique Specialists specialise in the use of test design techniques such as Equivalence Partitioning and Boundary Value Analysis. They can then become the source of knowledge, advice and guidance for the rest of the team.

Page 56 of 88

Automation Specialists These have a keen understanding and desire to specialise in the test automation arena. They can develop automation standards and promote good automation practices. They generally need programming skills, since test scripts for automated tools are programming languages. Database Experts Everyone in the team should have an understanding of the underlying database, but not everyone need know the intricate details of the database environment. Business Skills Having key people in the test team with business knowledge is essential for testing from a business perspective. Usability Experts We have already seen that testing for usability is often poorly done because it is often poorly specified. Test (and development) teams could benefit by having specialised expertise in this area. Test Environment Experts Maintaining our test environments is crucial for successful testing and this task should not be underestimated in terms of its complexity and sensitivity. Test Managers Essential people who encourage, motivate and protect the rest of the test team.

Configuration Management
What is configuration management? Our systems are made up of a number of items (or things). Configuration Management is all about effective and efficient management and control of these items. During the lifetime of the system many of the items will change. They will change for a number of reasons; new features, fault fixes, environment changes, etc. We might also have different items for different customers, such as version A contains modules 1,2,3,4 & 5 and version B contains modules 1,2,3,6 & 7. We may need different modules depending on the environments they run under (such as Windows NT and Windows 2000). An indication of a good Configuration Management system is to ask ourselves whether we can go back two releases of our software and perform some specific tests with relative ease. Problems resulting from poor configuration management Often organisations do not appreciate the need for good configuration management until they experience one or more of the problems that can occur without it. Some problems that commonly occur as a result of poor configuration management systems include: the inability to reproduce a fault reported by a customer; two programmers have the same module out for update and one overwrites the others change; unable to match object code with source code; do not know which fixes belong to which versions of the software; faults that have been fixed reappear in a later release; a fault fix to an old version needs testing urgently, but tests have been updated. Definition of configuration management A good definition of configuration management is given in the ANSI/IEEE Standard 729-1983, Software Engineering Terminology. This says that configuration management is:

Page 57 of 88

the process of identifying and defining Configuration Items in a system, controlling the release and change of these items throughout the system life cycle, recording and reporting the status of configuration items and change requests, and verifying the completeness and correctness of configuration items. This definition neatly breaks down configuration management into four key areas: configuration identification; configuration control; configuration status accounting; and configuration audit. Configuration identification is the process of identifying and defining Configuration Items in a system. Configuration Items are those items that have their own version number such that when an item is changed, a new version is created with a different version number. So configuration identification is about identifying what are to be the configuration items in a system, how these will be structured (where they will be stored in relation to each other) the version numbering system, selection criteria, naming conventions, and baselines. A baseline is a set of different configuration items (one version of each) that has a version number itself. Thus, if program X comprises modules A and B, we could define a baseline for version 1.1 of program X that comprises version 1.1 of module A and version 1.1 of module B. If module B changes, a new version (say 1.2) of module B is created. We may then have a new version of program X, say baseline 2.0 that comprises version 1.1 of module A and version 1.2 of module B. Configuration control is about the provision and management of a controlled library containing all the configuration items. This will govern how new and updated configuration items can be submitted into and copied out of the library. Configuration control is also determines how fault reporting and change control is handled (since fault fixes usually involve new versions of configuration items being created). Status accounting enables traceability and impact analysis. A database holds all the information relating to the current and past states of all configuration items. For example, this would be able to tell us which configuration items are being updated, who has them and for what purpose. Configuration auditing is the process of ensuring that all configuration management procedures have been followed and of verifying the current state of any and all configuration items is as it is supposed to be. We should be able to ensure that a delivered system is a complete system (i.e. all necessary configuration items have been included and extraneous items have not been included). Configuration management in testing Just about everything used in testing can reasonably be place under the control of a configuration management system. That is not to say that everything should. For example, actual test results may not be though in some industries (e.g. pharmaceutical) it can be a legal requirement to do so.

Page 58 of 88

Test Estimation, Monitoring and Control


Estimating testing is no different Test estimation in many ways is no different to estimating other activities such as software design or programming. We must break down the activities into well defined tasks that, in turn, can be estimated. There are a number of methods that can be employed to estimate the testing effort required, these are described below. Guessing (Finger in the Air F.I.A. approach). This is not a good method to base our final estimate on, because it can be easily questioned and very often challenged. Whilst it is not advisable to rely solely on this method, it does have its use and can be reasonably accurate depending on past project experience and the expertise of the estimator. Past project knowledge. To base our estimates on previous, similar projects is a reasonable thing to do. This is only effective if we have recorded such data from previous projects. Again, it can be easily challenged and estimates may be reduced as a result. Work Breakdown Structures (WBS). Here we identify tasks that make up the test activities and estimate each one in turn. Testers who could verify that the estimate is realistic can then review each task estimate. If not then estimates would be re-worked. Should the estimates not be approved, then each task in turn can be questioned as to relevance, criticality, urgency and importance the estimates can then be adjusted accordingly. Whatever method is used, test estimation will always be required in advance and must be reviewed throughout the project once further information is obtained. Estimating testing is different Whilst test estimation is similar to estimating any other activity in some respects, in others ways it is very different. The main reasons are as follows: it is not an independent activity testing is completely reliant on development delivering the software to an agreed date and to an agreed quality standard. Should the quality of the software not be as good as we expected then we will spend more time reporting an greater number of faults and retesting them; it is reliant on attaining the agreed system that there are no new surprise features added by the developers. If extra features are added, then these will need testing and this will affect the schedules; it is reliant on a stable test environment if the test environment is volatile then this again will affect the schedules. Estimating iterations There is one major difference in estimating testing compared to estimating other tasks. Most activities, once they are done, that's it, they are finished and complete. However, testing, once it is done it does not stay "done" - it has to be done again and again. Successful tests will find faults, but once they are fixed, re-tests and regression tests are needed. This can result in a number of test iterations or test cycles. Three or four test iterations are typical. It may not be necessary to perform all of the tests with every iteration (often it is not possible because of time constraints) but it is certainly desirable to do so with the last iteration (though this too is more often an ideal than a reality).

Page 59 of 88

Past history is often a good guide to the likely number of iterations. For example, if the release of the system underwent five test iterations then there is a good chance that the next release will need at least four and possibly six. Other measures that can be used to help estimate test effort more accurately include: an estimate of the number of faults that are likely to be found during testing; the percentage of nested faults (faults that can only be found after another one has been fixed); and the percentage of faults fixed incorrectly (together with the time to wait for each release). An added complication to estimating iterations is that not all iterations will use all of the tests. Some may contain only checks for correct fixes, for example, while others may be a complete regression test of all tests in a suite. Time to report faults One of the important tasks that testers must do is to report incidents or faults found. It is important that faults are reported in such a way that a developer can quickly reproduce them, otherwise they will not be able to fix the fault. But how much time is spent in writing up fault reports? The more time you spend reporting faults the less testing you will actually do! However, if a developer cannot reproduce a fault he or she will have wasted time trying. The tester will then have to spend more time recalling the information that he or she should have reported in the first place. The faults there are in the software the more faults the testers will have to report. How many can they be reasonably expected to report before this fault reporting time (and future retesting time) has a significant detrimental impact on the planned testing effort? When estimating test effort it is important to consider how many faults are likely to be found and so how much time will be spent reporting and retesting them. Monitoring progress Once we have started testing we must monitor our test progress and take corrective action should things go wrong. Effective monitoring and control is vital in the test management process. Recording the number of tests run against the number of tests passed and the number of tests planned is one good way to show test progress. This is a powerful visual aid when plotted on a graph. This type of graph is called an S-curve because its shape resembles the letter S (somewhat elongated). S-curves give early warning of problems. For example, if the number of tests passing falls significantly below the number run there may be a number of different reasons. Perhaps there are lots of faults being found or there are only one or two faults that are affecting many tests. In either case more development effort needs to be resourced to fix the faults. A faulty test environment could cause the problem, so in this case more development resource is likely to resolve it. Other useful measures include the number of incidents or faults raised together with either the number resolved and / or the number of faults expected.

Page 60 of 88

Test control Test control is about management actions and decision that affect the testing process, tasks and people with a view to making a testing effort achieve its objectives. This may be the original or a modified plan. Modifying an original plan in the light of new knowledge (i.e. what testing has revealed so far) is a frequently necessary and prudent step. The use of entry and exit criteria are perhaps one of the simplest and yet highly effective control mechanisms available to managers. Entry criteria are conditions that must be meet before the associated activity can start. Similarly, exit criteria are conditions that must be meet before an activity can be declared complete. The exit criteria of one activity are often the same as the entry criteria of the next activity. For example, the exit criteria for component testing might be that all components have been tested sufficiently to achieve 100% statement coverage and all known faults have been fixed. These could also be the entry criteria for the next testing activity (supposedly integration in the small, but could also apply to system testing). However, the next activity might have additional entry criteria. For example, entry criteria for system testing might include something about the availability of the test environment. Tightening (or loosening) entry and exit criteria are just one the actions managers can take. Reallocation of resources such as acquiring more testers or developers, and moving people from one task to another to focus attention on more important areas is often an effective controlling action. There are some factors that testing can affect indirectly, such as which faults are fixed first. However, one thing that cannot be affected by testing is the number of faults that are already in the software being tested. The testing only affects whether or not those faults are found. Once a controlling action has been taken some form of feedback is essential in order to see the effect of action. Neither the testers nor the Test Manager should make the decision about when to release a product. The testers and Test Manager are responsible for supplying accurate and objective information about the software quality so that whoever does make the release decision makes it on the basis of solid facts.

Incident Management
What is an incident? An incident is any event that occurs during testing that requires subsequent investigation or correction. Usually, the event is a mismatch between the actual and expected results of a test (a failure occurs). The cause of this can be one of a number of things: a fault in the software; a fault in the test (e.g. expected result was wrong); the environment was wrong; the test was run incorrectly (e.g. entered the wrong input); or a documentation or specification fault (i.e. what the specification says is wrong).

Page 61 of 88

Monitoring incidents Incident reports can be analysed to monitor and improve the test process. For example, if a significant number of incidents reported during system testing turn out to be coding faults that could have been found by component testing then this tells us that the component testing process should be improved. Reporting incidents Whilst we can log incidents at any stage throughout the lifecycle, it is advisable to only log incidents after hand-over from development or at least when someone other than the author of the software performs the testing. This is largely because the benefit of a developer logging incidents on his or her code before they hand it to anyone else is vastly outweighed by the cost of doing. It is much cheaper for a developer to simple fix the problem and retest it than it is for him or her to stop and log the problem before doing. There is also a psychological cost to the developer effectively having to log all the faults in their own code is not a big motivator. There is a distinct danger that we do not spend enough time logging an incident. We must remember that it is our responsibility to raise incidents factually and with enough detail for the developer to do their job efficiently. Otherwise we could end up with the situation that the fault cannot be reproduced by the developer so the incident report is returned with a request for more information, or worse still, the incident is ignored. Spending extra time logging sufficient information such that it can be reliably and quickly reproduced will be of great benefit. Information we should record typically includes: test id (the test that failed); details of the test environment (e.g. whether run on Windows NT or Windows 2000, etc.); id and version of the software under test; both actual and expected results (for comparison purposes and to help developers track the fault); severity (impact of the failure to the customer/user); priority (the urgency of fixing the fault); name of the tester or automated test information; and any other relevant information needed so that the developer can reproduce and fix the fault. Basically anything that the developer needs to know in order to reproduce the fault with ease. We should not tell them how to code the changes though! There might be other information that can be recorded that will help you and your organisation with metrics and monitoring of progress (for example, the effort spent on handling the incident). Tracking incidents Incidents should be tracked from inception through the various stages to their final resolution. For any incident logged we must be in a position of knowing its exact status, whether it be waiting for further action, be with a developer for fixing, with the tester for re-testing, or has been re-tested and cleared.

Page 62 of 88

Severity versus priority It is useful to distinguish severity and priority, because they are different aspects. Severity is related to the impact of a failure caused by a fault. Priority is related to the urgency of fixing a fault. For example, if a fault is holding up a series of automated tests, it could have high priority even if the impact on the user is low.

Standards for Testing


There are a number of different standards for testing from the Quality Management standard (ISO 9000 series) which tells us that testing should be done, to industry specific standards that detail the level of testing to be performed. The most recent testing standards are those adopted by the ISEB namely BS 7925-1 and BS 7925-2. The latter tells us how we should perform our testing.

Page 63 of 88

Tool Support for Testing [CAST] (session 6)


Types of CAST Tool
Tool support is available for testing in every stage of the software development life cycle. However, this does not mean to say that all testing activities can be automated or indeed made automatic. Tool support for many testing activities usually facilitates greater productivity and greater accuracy but still requires manual participation throughout. Requirements testing tools Requirements testing tools are a relatively new type of tool that helps with the task of analysing requirements. These tools can work on requirement specifications written in a formal structured language or just plain English. Although they cannot help validate requirements (i.e. tell you if the requirements are what the end user actually wants) they can help with verifying the requirements (i.e. checking conformance to standards for requirements specifications). A modern word processing application can be seen as a very basic requirement-testing tool since one of the functions of these tools is to check grammar. Ambiguity in a requirement specification often leads to serious faults in the delivered system and these ambiguities are sometimes caused by poor grammar. Proper requirements testing tools offer a much richer set of functionality than mere grammar checking (though this too is one of their functions). For example, they can check for consistent use of key terms throughout a specification and derive a list of possible test conditions for acceptance testing (though this should sensibly be used as a starting point for further development or as an additional source of ideas for cross checking purposes). Possible pitfalls of these tools include false confidence. The fact that the tool does not find anything wrong with a requirement specification does not imply it is perfect but someone is likely to see it that way! They do require manual intervention, they are not automatic and they certainly cannot correct all the (potential) faults that they find. Perhaps the most obvious pitfall is that the requirements have to be written down. Many organisations fail to produce a complete requirement specification and for them this type of tool will have limited value (unless it proves to be the catalyst for more complete requirements specifications). Static analysis tools Static analysis tools analyse source code without executing it. They are a type of super compiler that will highlight a much wider range of real or potential problems than compilers do. Static analysis tools detect all of certain types of fault much more effectively and cheaply than can be achieved by any other means. For example, they can highlight unreachable code, some infinite loops, use of a variable prior to its definition, and redefinition of a variable without an intervening use. These and many more potential faults can be difficult to see when reading source code but can be picked up within seconds by a static analysis tool. Such tools also calculate various metrics for the code such as M cCabes cyclomatic complexity, Halstead metrics and many more. These can be used, for example, to direct testing effort to where it is most needed.

Page 64 of 88

Although an extremely valuable type of testing tool, it is one that is not used by many organisations. The pitfalls are more psychological than real, for example, a static analysis tool may highlight something that is not going to cause a failure of the software. This is because it is a static view of the software. Test design tools Test design tools help to derive test inputs. They are sometimes referred to as test case generators though this is a rather exaggerated claim. A proper test case includes the expected outcome (i.e. what the result of running the test case should be). No tool will ever be able to generate the expected outcome (other than for the most simple and possibly least needed test cases). Thus we prefer to call them partial test case generators. Test design tools usually work from a formal specification, an actual user interface or from source code. In the first case the specification has to in a formal language that the test design tool understands or for some tools a CASE (Computer Aided Software Engineering) tool can hold it. A CASE tool captures much of the information required by the test design tool as the system is being designed and therefore saves the need to re-specify the design information in a different format just for the test tool. Where a test design tool uses the user interface of an application the user interface has to be implemented before the test design tools can be used. It is also a fairly restricted set of test inputs that it can generate since they concentrate on testing the user interface rather than the underlying functionality of the software. This is still useful though. When it is the source code that is used to generate test inputs they are useful for checking that the code does what the code does. It is often more useful to check that the code does what the code should do (which is what tests based on a specification of what the system should do come in). Data preparation tools Data preparation tools manipulate existing data or generate new test data. Where new data is generated the tool uses a set of instructions or rules supplied by you that describe the format and content of the data to be generated. For example, if you require a lot of names and addresses to populate a database you would specify the valid set of characters and the maximum and minimum lengths of each field and let the tool generate as many records as you require. The names and addresses it generates will not be sensible English names but they will conform to the rules you laid down and so will be valid for the purposes of testing. Starting with actual data and manipulating it to ensure data privacy and/or reduce its size can generate more realistic test data. This type of tool makes it possible to generate large volumes of data (as required for volume, performance and stress testing for example) when it is needed. This makes it more manageable since the large volumes do not necessarily have to be kept since they can be regenerated whenever required. On the downside, the technical set up for complex test data may be rather difficult or at least very tedious. Test running tools Test running tools enable tests to be executed automatically and in some cases the test outputs to be compared to the expected outputs. They are most often used to automate regression testing and usually offer some form of capture/replay facility to record a test being performed manually so it can then replay

Page 65 of 88

the same key strokes. The recording is captured in a test script that can be edited (or written from scratch) and is used by the tool to re-perform the test. These tools are applicable to test execution at any level: unit, integration, system or acceptance testing. The benefits include faster test execution and unattended test execution reducing manual effort and permitting more tests to be run in less time. The pitfalls are enormous and have caused as many as half of all test automation projects to fail in the long term. The cost of automating a test case is usually far more (between 2 and 10 times) than the cost of running the same test case manually. The cost of maintaining the automated test cases (updating them to work on new versions of the software) can also become larger than the manual execution cost. It is possible to avoid these pitfalls but it is not necessarily easy to do so. Comparison tools Test running tools usually offer some form of dynamic comparison facilities that enable the output to the screen during the execution of a test case to be compared with the expected output. However, they are not as good at comparing other types of test outcome such as changes to a database and generated report files. For this a stand-alone comparison tool can be used. These tools offer vastly improved speed and accuracy over manual methods. They will highlight all differences they find, even the ones you are not interested in unless you can specify some form of mask or filter to hide those expected differences such as dates and times. Specifying masks may not be an easy task. Test harnesses and drivers Not all software can be turned into an executable program. For example, a library function that a programmer may use as a building block within his or her program should be tested separately first. This requires a harness or driver, a separate piece of source code that is used to pass test data into the function and receive the output from it. At unit testing and the early stages of integration testing these are usually custom built though there are a few commercial tools that can provide some support (though they are likely to be language specific). At later stages of testing such as system and acceptance testing, harnesses (also called simulators) may be required to simulate hardware systems that are not available or cannot be used until the software is demonstrably shown to be reliable. For example, software that controls some aspect of an aircraft needs to be known to work before it is installed in a real aircraft! Performance testing tools If performance measurement is something you need to do then a performance testing tool is a must. Such tools are able to provide a lot of very accurate measures of response times, service times and the like. Other tools in this category are able to simulate loads including multiple users, heavy network traffic and database accesses. Although they are not always easy to set up, simulating particular loads is usually much more cost effective than using a lot of people and/or hardware.

Page 66 of 88

Dynamic analysis tools Dynamic analysis tools assess the system while the software is running. For example, tools that can detect memory leaks are dynamic analysis tools. A memory leak occurs if a program does not release blocks of memory when it should, so the block has leaked out of the pool of memory blocks available to all programs. Eventually the faulty program will end up owning all of memory; nothing can run, the system hangs up and must be re-booted. Debugging tools Debugging tools are traditionally used by programmers to help investigate problems in their source code. They allow the code to be executed one instruction at a time and the value of variables to be examined and set. This latter facility makes them particularly good for testing since specific conditions can be simulated within the source code by using the debugger tool to set variables to particular values. Test management tools Test management tools help throughout the software development lifecycle. This category covers tool support for test planning and monitoring, but also incident management (fault tracking) tools.For example, some test management tools help with decomposition of the system functionality into test cases and are able to track and cross-reference from requirements through to test cases and back again. In this way, if a requirement changes it is possible for the test management tool to highlight those test cases that will need to be updated and rerun. Similarly if a test case fails, the tools will be able to highlight the requirement(s) that are affected. Many test management tools are integrated with (or provide an interface to) other testing tools. This can be exceedingly helpful since it becomes possible to cause automated tests to be executed simply from the test management tool and information on the success or failure of the tests Coverage tools Coverage tools assess how much of the software under test has been exercised by a set of tests. They can do this by a number of methods but the most common is for the tool to instrument the source code. This involves the tool in inserting new instructions within the original code such that when it is executed the new code writes to a data file recording the fact that it has been executed. After a set of test cases have been executed the tool then examines this data file to determine which parts of the original source code have been executed and (more importantly) which parts have not been executed. Coverage tools are most commonly used at unit test level. For example, branch coverage is often a requirement for testing safety-critical or safety-related systems. However, some coverage tools can also measure the coverage of design level constructs such as call trees. It is important not to set some arbitrary coverage measure as a target without a good understanding of the consequences of doing so. Achieving 100% branch coverage may seem like a good idea but it can be a very expensive goal, and may not be the best testing that could be done in the circumstances. Coverage has to be justified against other means of achieving the desired level of quality of software.

Page 67 of 88

Tool Selection and Implementation


This section gives a brief overview of the tool selection and implementation process as described by the syllabus. Many of the points are described in further detail below in the following sections that look more closely at the tool selection (section 6.3) and implementation (section 6.4) processes. These other sections are in themselves an abridged version of text that can be found in Chapters 10 and 11 of the book Software Test Automation by Mark Fewster and Dorothy Graham, published by Addison-Wesley, 1999, ISBN 0-201-33140-3. Which test activities to automate? Given the wide range of different types of testing tool there is a wide range of testing activities that can be tool supported (but not necessarily fully automated). When considering tool support identify all the activities where tool support could be beneficial and prioritise them to identify the most important. Although test running tools are the most popular, this may not be the best place to introduce tool support first. CAST tool requirements Testing tools often provide a rich set of functions and features. However, the most functional rich tool may not be best for any particular situation. Having a tool that matches the existing test process is usually more important. If a tool does not adequately support your test process it could be more of a hindrance than a help. CAST readiness CAST readiness is the term given to a test process that is ready for tool support. If you have a chaotic testing process it is unlikely that any testing tool will help. The benefits of tools usually depend on a systematic and disciplined test process. If the test process is poor it may be possible to improve it in parallel with the introduction of a testing tool but care should be taken. In our experience about half of the organisations that have bought a test tool have abandoned it. Introducing a testing tool is not an easy thing to do well and if it is not done well there is a strong chance that testing will cost more or be less effective as a result of the tool. Environment constraints and CAST integration The environment into which it will go will almost certainly influence the choice of a testing tool. Some tools can only work on certain hardware platforms, operating systems or programming languages, and some require their own hardware. If one or more testing tools are already in use it may be necessary to consider their integration with any new tool. This issue may also affect other types of tool, for example, configuration management tools and CASE (Computer Aided Software Engineering) tools. Some vendors offer a variety of integrated test tools where compatibility between the tools is assured although this does not necessarily mean that if one of their tools suites your needs all the other will as well.

Page 68 of 88

Selection process After it has been decided which testing activity will offer most benefit from tool support the job of selecting a suitable test tool should be considered as a process comprising four stages: produce a candidate tool shortlist; arrange demonstrations; evaluate the selected tool(s); review and select tool. Implementation (pilot project and roll out) After a tool has been selected the task of implementing should start with a pilot project. The aim of this is to ensure that the tool can be used to achieve the planned benefits. Objectives of the pilot project include gaining experience with the tool in a controlled way and on a small scale, identifying changes to the testing process to accommodate the tool to its best advantage and to assess the actual costs and benefits of a full implementation. Roll out of the tool on a larger scale should only be attempted after a successful pilot project. This will require a strong commitment from new tool users and projects as there is an initial overhead is using any new tool.

The tool selection process


This section and the next (6.4) give additional information on tool selection and implementation, and go into more depth than is required by the syllabus. The tool selection process evaluates the many tools available and selects one that is appropriate for your organisation. A different process, the tool implementation process, then ensures that the selected tool is used throughout the organisation in an effective way. If you are in charge of the selection of a tool to be used by dozens or hundreds of people within your organisation, then you will need to approach the tool selection process in a formal and detailed way. If you are looking for a tool to try on an experimental basis with only two or three people, then your tool selection project will be on a much smaller scale, and will be less detailed and informal. However in both cases, the stages of the process will be the same. This session describes the process with a medium-sized organisation in mind. Choosing a test automation tool is a project in its own right, and must be funded, resourced and staffed adequately. It should never be a large project, though it will be a larger project in a larger organisation. The tool selection project for a medium sized organisation will typically take from 4 to 6 man-weeks of effort, and may involve three to ten people. The tool evaluation and selection team There is a need for a team of people to make the tool selection decision. If only one person makes the decision, it is much more difficult to achieve a broad user base for the tool within the organisation in the implementation phase. The amount of time required from the team members neednt be too great, perhaps 3 to 5 days each, spread out over a month to six weeks. More time would be needed from the leader of this team.

Page 69 of 88

One person should be put in charge of managing the tool selection and evaluation. This individual should be someone with management skills or potential, and the ability to build a team of people from different areas of the organisation. This person should ideally be someone who has a broad view of the organisation and who is well respected. The team leader may be the tool Champion, the person in the implementation phase who is most enthusiastic about selling test automation within the organisation, and the focal point for test automation practices. The other team members should include representatives from each area of the organisation that may want to automate their own testing. This may involve people from a number of projects, departments or locations. As this is an important decision that may affect the efficiency and productivity of the whole organisation, the team members should be knowledgeable about their own areas of the organisation and capable of making an objective evaluation that can be justified to their colleagues. The team should also include a variety of skills representing the different roles or jobs of the target users for the tool. This would include end users who want to automate acceptance testing, test specialists who want to automate system testing, and developers who want to automate unit or integration testing. Identifying your constraints Having established what your testing problems are, and having established that this would be a good time to introduce a test automation tool, only now are you ready to begin looking at the tool market. There will be a number of factors that will constrain your choice of tool. If you can identify them right at the beginning, you can save yourself a lot of wasted time and effort investigating tools that will be rejected anyway. Environmental constraints (hardware & software)

Testing tools are software packages and therefore will be specific to particular hardware, software or operating systems. You would not want to spend any time considering a tool that runs only on a Unix platform when you have only a Windows environment and no possibility of acquiring or using anything else. Should the tool be co-resident with the software under test?

Most people look for a tool which will run on the environment in which they are currently developing or maintaining software, but that is not the only possibility. Many tools can work in a hosttarget formation, where the tool runs on one environment and the system under test is on another environment. Consider the future direction of your organisation for hardware and software, and plan the test tools for the long term, not just for what you have now. Commercial supplier constraints

The company that you buy the tool from may be an important factor for you in the future. If you have problems with the tool, you will want them sorted out quickly and competently. If you want the best from the tool, you will want to take advantage of their expertise. You may want to influence the future development of the tool. A good relationship with your vendor can help you to progress your test automation in the direction you want it to go. Cost constraints

Cost is often the most stringent and most visible constraint on tool selection. But the purchase price of the tool may be only a fraction of the total cost of fully implementing the tool. Of

Page 70 of 88

course there must be guidelines, but it is also important not to be too rigidly bound by what may be a fairly arbitrary number. Political constraints

Political factors may well override all of the other constraints and requirements. For example, you may be required to buy the same tool that your parent company uses. There may be a restriction against buying anything other than a tool supported in your own country. Quality constraints

What are the required quality characteristics of the tool? This may include both functional aspects and non-functional aspects. For example: How many users can use the tool at once? What level of skill is required to use the tool effectively? What is the quality of the documentation? What overheads does the tool cause? Can the tool integrate with others that you are using?

Identifying features The next step is to begin to familiarise yourself with the general capabilities of the test automation tools available in the commercial marketplace. Make a list of the features, and classify them into categories. Some are suggested below; use whatever categories are most useful to you. As a minimum, have categories for mandatory and not mandatory. Mandatory

The mandatory features together with your constraints are used to rule out any tool that does not fulfil your essential conditions. Make sure that the things you list as mandatory really are only the essential minimum required to solve your problem. The tool you choose will not have only these features, but without these features you could not use it at all. Desirable

The desirable features are used to discriminate among those tools that fulfil all the essential conditions. You may want to divide this category into highly desirable and desirable. Dont care

The dont care category is best used to ensure that a feature in one tool is not used to deselect another tool when the feature is not required. Some tool features are either present or absent, for example, whether the tool records mouse clicks. The evaluation of this type of feature is straight-forward. If the feature is a mandatory requirement, then any tool that does not have this feature is eliminated from further consideration. Other features may be present to a degree, or the tool may offer support for one of your requirements but only partially, for example, non-functional attributes such as ease of use. Some tools may be easier for non-technical people to use, but tedious for the technical test automator to use. Producing the long list The place to start is with information about the information, i.e. lists of current testing tools. There are a number of sources of information for currently available testing tools. Some of these sources may just

Page 71 of 88

be contact details for vendors with no information about what the tools do. Others may have summary information about what each tool does, so you can at least eliminate the ones you are not interested in. Still others may contain evaluations of the tools themselves, with comments on how well they do what they do. Here are some possible sources: internal informal tools or utilities; the World Wide Web; magazine and other publications covering software testing; British Computer Society Specialist Interest Group in Software Testing; Testing tool reports; ask the vendors of your other software or hardware products; ask the vendors of testing tools which are almost suitable (for example if it doesnt yet run on your platform, ask if they have plans to port it soon); investigate whether you could use a tool if you ran the test tool in a different environment than the one you had thought; check the testing features of other software development tools, such as CASE tools (ComputerAided Software Engineering); attend a testing conference or event with a tools exhibition; look for tools sourced from a country other than your own (e.g. USA / Europe / Asia).

Your long list will contain all potentially suitable tools, i.e. all those which meet both your constraints and your mandatory features. Constructing the shortlist If there are more than 3 or 4 tools now in your long list, use your list of desirable features to eliminate some, so that you are left with a short list of 2 or 3 tools. If your long list contains only 1 tool, you may wish to go ahead and evaluate that one for suitability. After all, you only need one tool in the end. If your long list is empty, you will need to either relax your constraints and/or mandatory features list and try again, or you may want to consider building your own tool or you may decide not to opt for tool support at this time. Evaluating the short listed candidate tools Contact the vendors of the short listed tools and arrange to have information sent (if you have not done this already). Study the information and compare features. Request further information from the vendors if the literature sent does not explain the tool function clearly enough. Your function and feature list will be evolving at this point, depending on the information you are gathering and the greater understanding of what the tools can do. This should help you to decide between the tools. This is the time to consult one or more of the publications which have evaluated testing tools, if the ones you are interested in are covered in such a report. These reports are often perceived as being very expensive. However, the cost of the report should be compared to the cost of someones time in performing similar evaluations, and the cost of choosing the wrong tool because you did not know about something that was covered in published material.

Page 72 of 88

Ask the short-listed vendors to give you the names of a couple of their existing customers as references, preferably using the same hardware and software that you have. If there are user groups for any of the tools, contact them and attend one of their meetings if possible. The reference sites situation will be different to yours, so the benefits or problems that they have had may not be the same as the ones that are important to you. However, the experience of someone else who bought a tool for similar reasons to yours is invaluable and well worth pursuing. At any point in the selection and tool evaluation process it may become clear which tool will be the best choice. When this happens, any further activities may not influence the choice of tool but may still be useful in assessing in more detail how well the chosen tool will work in practice. It will either detect a catastrophic mismatch between the selected tool and your own environment, or will give you more confidence that you have selected a workable tool. In-house demonstrations The advice in this section assumes that your short-listed tools are not of the shrink-wrapped off-the-shelf take-it-or-leave-it cheap-and-cheerful variety. Before contacting the vendor to arrange for them to visit you to do a tool demonstration, some preparatory work will help to make your assessment of the competing tools more efficient and unbiased. Prepare two test cases for tool demonstration: one of a normal mainstream test case; and one of a worst-case nightmare case (or something more complex than normal). Rehearse both tests manually, in order to discover any defects in the test cases themselves. It is important that the tools be set up and used on your premises, using your own configuration, and we recommend this, if at all possible, for the demonstration. Invite the vendors of all short-listed tools to give demonstrations within a short time-frame, for example on Monday, Wednesday and Friday of the same week. This will make sure that your memory of a previous tool is still fresh when you see a different one. Prepare one additional test case that is not supplied in advance to the vendor. After they have shown you what the tool can do with your other two test cases, see how easy it is to put this other one into the tool from cold. This test case should be neither too easy nor too complex, but somewhere in the middle. Ask the vendors you saw first any questions that occurred to you when watching a later vendors presentation or demonstration. This will give the fairest comparison between the tools. Assess tool performance against measurable criteria defined earlier, taking any special circumstances into account. Compare features and functions offered by competing tools. Compare non-functional attributes, such as usability. Compare the commercial attributes of vendor companies. Test the technical support by ringing their help line and asking a technical question or two (you may need to gain the permission of the tool vendor for doing this). Competitive trial An in-house competitive trial will give you a clearer idea of how the tool will work out in your own situation. This does involve additional effort, and is probably more appropriate for larger organisations where the tool chosen will eventually be used by a large number of people. Many tool vendors will allow short-term use of the tool under an evaluation licence, particularly for tools that are complex and represent a major investment. Such licences will be for a limited period of time, and the evaluation team must plan and prepare for that evaluation accordingly.

Page 73 of 88

Making the decision Having spent a considerable amount of effort in assessing the candidate tool(s), the evaluation report would normally recommend the purchase of the tool that would best meet the requirements and constraints of the organisation. Before making this recommendation, assess the business case: will the potential savings from this tool give a good return on investment, including purchase/lease price, training costs and ongoing internal tool costs? The likely benefits need to be clearly communicated, so that expectations for the benefits are realistic. Deciding not to go ahead with purchasing any of the tools investigated could be the best economic decision at this time; do not be afraid to make it if a tool is not justified.

The implementation process


Once a tool has been chosen, the real work starts. Although it is important to choose carefully, success in the tools use is by no means guaranteed. The important thing to remember is that when you introduce a testing tool into an organisation, this will change the way people work. People generally dont like change, but there are ways to make the process of change easier for everyone involved. Roles in the implementation/change process Tool champion

In every successful implementation of test automation that we have seen, there is one person who is the focal point for the introduction of the tool. This person is a very enthusiastic believer in test automation, and has a clear vision of the benefits which test automation can bring to the company. This person could also be called an evangelist; they tend to try to convert everyone they meet to their cause. The champion will quite likely have been the driving force behind the tool evaluation and selection effort. He or she will not need to be highly technical, although should have a good basic understanding of the technical problems likely to be encountered. The champion must also work well with people, and be diplomatic and patient. Change agent

The change agent is the person who plans and manages the process of change within the organisation. This may be the same person as the champion, or in a larger organisation, this may be a separate person. The change agent is in charge of the day-to-day progress of the tool uptake while it is being phased into the working practices of the organisation. The change agents task is to plan what changes will happen to whom and when, and to lead people through these changes. The change agents job may or may not be full-time. Management sponsor or angel

It is critical that the change initiative has the support of top management in order to succeed. It helps if there is a very senior person who visibly supports the champion and change agent, and makes it known that test automation is something that meets with their approval. The sponsor could also be the champion. Tool custodian

Page 74 of 88

The tool custodian may be the same person as the change agent and/or champion, but is more likely to be a separate role. He or she is responsible for technical tool support, implementing upgrades from the vendor and providing internal help or consultancy in the use of the tool. The tool custodian would also be the owner of the standards for the way in which the tool is to be used. These standards would be developed as part of the implementation project. The implementation team

The team that selected the tool may also be the team that helps to implement it. Ideally it would include representatives from the different parts of the organisation that would be expected to use the tool. In particular it should include end-users, if the tool will be used for user acceptance testing. The team will meet regularly (perhaps one day a month) over a period of months or years.

Management commitment Obviously there is already a level of management commitment to tool support for software testing, because the tool selection process will have been authorised. The commitment needed from top management is not just a one-off agreement to purchase the tool, but needs to be continual throughout the implementation process. The change agent must be adequately supported by management in at least two ways: firstly, visible backing from high-level managers; and secondly, adequate funding and resourcing (this may mean adversely impacting other projects in the short term). The first without the second is classic lip service and is the way to end up with shelfware. The second without the first is seldom successful and makes the change agents task considerably more difficult, if not impossible. Managers also need to realise that the first thing which happens when a tool is used for the first time is that productivity will go down, even when the tool is intended to increase productivity. Adequate time must be allowed for learning and teething problems, otherwise the tool will be abandoned at its point of least benefit and greatest cost. Publicity Once you have the management commitment, both verbal and financial, the change agent needs to begin publicising the intended changes. People are not convinced by one presentation, and even if they are, they dont stay convinced over time. Your role as change agent is to provide a constant drip-feed of publicity about the tool, who is using it, success stories, and problems overcome. The most important publicity is from the earliest real use of the tool. The benefits gained on a small scale should be widely publicised to increase the desire and motivation to use the tool. Testimonials, particularly from converted sceptics, are often more effective than statistics. It is also important to give relevant bad news to keep expectations at a realistic level. As the implementation project proceeds it is possible you will discover that some of the planned uses of the tool are not practical or will not work as expected. Be sure to let others know ahead of time so they will not be disappointed when the tool arrives in their area.

Page 75 of 88

Internal market research In parallel with the publicity drive, the change agent and the change management team need to do a significant amount of internal market research, talking to the people who are the targeted users of the tool. Find out how the different individuals currently organise their testing and how they would want to use the tool, and whether it can meet their needs, either as it is or with some adjustments. The lines of communication set up by interviewing potential tool users can also be used to address the worries and fears about using the tool that contribute to peoples resistance to change. Pilot project It is best to try out the tool on a small pilot project first. This ensures that any problems encountered in its use are ironed out when only a small number of people are using it. It also enables you to see how the tool will affect the way you do your testing, and so to modify your existing procedures or standards to make best use of the tool. The pilot project should start by defining a business case for the use of the tool on this project, with measurable success factors. For example, you may want to reduce the time to run regression tests from 1 week to 1 day. Actually, applying the dont be over optimistic rule, it may be better to set a target for the time of running 20% of regression tests to be reduced from 1 day to 2 hours. The pilot project should be neither too long nor too short, say between 2 and 4 months. Second and subsequent phases of the pilot project could extend this time beyond 4 months but each phase should have measurable objectives. If the pilot drags on too long without producing tangible results it will cast doubt on the viability of test automation. Small benefits gained quickly are much better than larger benefits that are a long time coming, and they are also less risky. The use of the testing tool will change your testing procedures in ways that you will probably not expect. For example, using a test execution tool may make debugging more difficult; previously, when testing manually, you knew where you were when something went wrong, which would help you to find the bug. Using the tool, you only know afterwards that something went wrong, and you then have to spend time recreating the context of the bug before you can find it. So there is an extra job to do which you did not have to do before. The pilot project is the place to experiment and to discover how to build automated test suites that will be sufficiently easy to maintain in real situations. Once your script structure, data organisation, naming conventions etc. have been seen to work well in the pilot, then they can be rolled out to a wider set of people. Evaluation of results from pilot After the pilot project is completed, the results are compared to the business case for this project. If the objectives have been met, then the tool has been successful on a small scale and can safely be scaled up. The lessons learned on the pilot project will help to make sure that the next project can gain even greater benefits. If the objectives have not been met, then either the tool is not suitable or it is not yet being used in a suitable way (assuming that the objectives were not over optimistic). Determine why the pilot was not successful, and decide the next steps to take. Do not attempt to use the tool on a wider scale if you cannot explain why it has not succeeded on a small scale! The overheads for start-up may be much more significant on a small scale, for example, and may not have been adequately taken into account in the

Page 76 of 88

initial business case. It is best to proceed fairly cautiously in scaling up, and to increase tool use incrementally by one project group at a time. Planned phased installation or roll-out Assuming the pilot project was successful, the use of the tool in the rest of the organisation can now be planned. This is a major activity in any organisation, and without careful planning, it will not be successful. The change agent and change management team can act as internal consultants to the new tool users, and can perform a very useful role in co-ordinating the growing body of knowledge about the use of the tool within the organisation. It is very important to follow through on the tool investment by ensuring that adequate training is given in its use. A tool that is not being used properly will not give the benefits that could be realised. Every tool user should be trained in the way that is appropriate for them. For those who will use the tool directly, this usually means the training given by the vendor of the tool. Once your own regime is in place, the training for users of test automation may consist exclusively of how to use the additional procedures, routines, spread-sheets, etc. which you have set up to interface to the tools themselves. This latter training you must design (and probably present) yourselves, since it is based on your own way of doing things, that is, your test automation regime.

Page 77 of 88

Principles of Testing
Testing Terminology The BCS SIGIST Standard Glossary of Testing Terms (British Standard BS 7925-1) will be used

There is no generally accepted set of testing definitions used by the world-wide testing community. BS 7925-1 exists as a new source of testing definitions.

Why Testing is Necessary

define errors, faults, failures and reliability; errors and how they occur; cost of errors; exhaustive testing is impossible; testing and risk; testing and quality; testing and contractual requirements; testing and legal, regulatory or mandatory requirements; how much testing is enough

An error is a human action that produces an incorrect result. A fault is a manifestation of an error in software (also known as a defect or bug). A fault, if encountered, may cause a failure, which is a deviation of the software from its expected delivery or service. Reliability is the probability that software will not cause the failure of a system for a specified time under specified conditions. Errors occur because we are not perfect and, even if we were, we are working under constraints such as delivery deadlines. A single failure can cost nothing or a lot (e.g. Venus probe). Software in safety-critical systems can cause death or injury if it fails, so the cost of a failure in such a system may be in human lives. Exhaustive testing would in most cases take an enormous amount of resource and is therefore usually impractical. The amount of testing performed depends on the risks involved. Risk must be used as the basis for allocating the test time that is available and for selecting what to test and where to place emphasis. Testing identifies faults, whose removal increases the software quality by increasing the softwares potential reliability. Testing is the measurement of software quality. We measure how closely we have achieved quality by testing the relevant factors such as correctness, reliability, usability, maintainability, reusability, testability, etc. Other factors that may determine the testing performed may be contractual requirements, or legal requirements, normally defined in industry-specific standards, or based on agreed best practice (or more realistically non-negligent practice). It is difficult to determine how much testing is enough.

Fundamental Test Process

the test process; successful tests detect faults; meaning of completion or exit criteria, coverage criteria

The fundamental test process comprises planning, specification, execution, recording and checking for completion. In more detail: Test planning. The test plan should specify how the test strategy and project test plan apply to the software under test. This should include identification of all exceptions to the test strategy and of all software with which the software under test will interact during test execution, such as drivers and stubs.

Page 78 of 88

Test specification. Test cases should be designed using the test case design techniques selected in the test planning activity. Test execution. Each test case should be executed. Test recording. The test records for each test case should unambiguously record the identities and versions of the software under test and the test specification. The actual outcome should be recorded. It should be possible to establish that all of the specified testing activities have been carried out by reference to the test records. The actual outcome should be compared against the expected outcome. Any discrepancy found should be logged and analysed in order to establish where its cause lies and the earliest test activity that should be repeated, e.g. in order to remove the fault in the test specification or to verify the removal of the fault in the software. The test coverage levels achieved for those measures specified as test completion criteria should be recorded. Checking for test completion. The test records should be checked against the previously specified test completion criteria. If these criteria are not met, the earliest test activity that must be repeated in order to meet the criteria should be identified and the test process should be restarted from that point. It may be necessary to repeat the Test Specification activity to design further test cases to meet a test coverage target. As the objective of a test should be to detect faults, a 'successful' test is one that does detect a fault. This is counter-intuitive, because faults delay progress: a successful test is one that may cause delay. The successful test reveals a fault which, if found later, may be many times more costly to correct so, in the long run, is a good thing. Completion or exit criteria are used to determine when testing (at any test stage) is complete. These criteria may be defined in terms of cost, time, faults found or coverage criteria. Coverage criteria are defined in terms of items that are exercised by test suites, such as branches, user requirements, most frequently used transactions, etc.

The Psychology of Testing

Testing to find faults; tester-developer relationship; independence

Testing is performed with the primary intent of finding faults in the software, rather than of proving correctness. Testing can therefore be perceived as a destructive process. The mindset required to be a tester is different to that of a developer. There are right and wrong ways of presenting faults to authors or management (give examples). It is important to communicate between developer and tester: e.g., changes to the application or menu structures that might affect the tests; or where the developer thinks the code might be buggy; or where there might be difficulty in reproducing reported bugs. Generally it is believed that objective, independent testing is more effective. If author tests then assumptions made are carried into testing, people see what they want to see, there can be emotional attachment, and there may be a vested interest in not finding faults.

Page 79 of 88

Levels of independence, such as: a) b) c) d) e) test cases are designed by the person(s) who writes the software under test; test cases are designed by another person(s); test cases are designed by a person(s) from a different section; test cases are designed by a person(s) from a different organisation; test cases are not chosen by a person.

Re-Testing and Regression Testing

fault-fixing and re-testing; test repeatability; regression testing and automation; selecting regression test cases

Whenever a fault is detected and fixed then the software should be re-tested to ensure that the original fault has been successfully removed. You should also consider testing for similar and related faults. Tests should be repeatable, to allow re-testing / regression testing. Regression testing attempts to verify that modifications have not caused unintended adverse side effects in the unchanged software (regression faults) and that the modified system still meets its requirements. It is performed whenever the software, or its environment, is changed. Regression test suites are run many times and generally evolve slowly, so regression testing is ideal for automation. If automation is not possible or the regression test suite is very large then it may be necessary to prune the test suite. You may drop repetitive tests, reduce the number of tests on fixed faults, combine test cases, designate some tests for periodic testing, etc. A subset of the regression test suite may also be used to verify emergency fixes.

Expected Results

identifying required behaviour

Expected results are synonymous with expected outcomes, but not the same as outputs. If expected results have not been defined then a plausible, but erroneous, result may be interpreted as the correct one. Expected results must therefore be defined prior to test execution. The oracle assumption is that a tester can routinely identify the correct outcome of a test. An oracle may be (e.g.) the existing system (for a benchmark), or a specification, or an individuals specialised knowledge, but not the code.

Prioritisation of Tests

test scope and limited resources; most important tests first; criteria for prioritisation

There is never enough time to do all the testing you want, so you must prioritise. Prioritise tests so that whenever you stop testing you have done the best testing in the time available. Identify the ranking criteria used to prioritise, such as severity, probability, visibility of failure, the priorities of the requirements to be tested, what the customer wants, change proneness, errorproneness, business criticality, technical criticality and complexity.

Testing throughout the lifecycle


Models for Testing V, V and T; V-model
Definitions of verification, validation and testing as per BS7925-1.

Page 80 of 88

The V-model of testing, showing that it identifies baselines (both testing and development deliverables) which should be tested at each stage of development (i.e., testing throughout the life cycle).

Economics of Testing

early test design; how preparing tests find defects in specifications; cost of faults versus the cost of testing

The cost of faults escalates as we move the product towards field use. If a fault is detected before field use, the cost of rework to correct the fault increases dramatically because more than one previous stage of design, coding and testing may have to be repeated. If the fault occurs during field use, the potential cost of the fault might be catastrophic. If faults present in documentation go un-detected, then development based on that documentation might generate many related faults which multiply the effect of the original one. Early test design can prevent fault multiplication. Analysis of specifications during test preparation often brings faults in specifications to light. The cost of testing is generally lower than the cost associated with major faults (such as poor quality product and/or fixing faults), although few organisations have figures to confirm this.

High Level Test Planning

Scoping the test; risk analysis; test stages; entry and exit criteria; test environment requirements; sources of test data; documentation requirements

What to consider, based on IEEE 829-1998 Test Plan Outline. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Test plan identifier; Introduction; Test items; Features to be tested; Features not to be tested; Approach; Item pass/fail criteria; Suspension criteria and resumption requirements; Test deliverables; Testing tasks; Environmental needs; Responsibilities; Staffing and training needs; Schedule; Risks and contingencies; Approvals.

Acceptance Testing

User acceptance testing, contract acceptance testing, alpha & beta testing

Acceptance testing may be the only form of testing conducted by and visible to a customer when applied to a software package. User acceptance testing the final stage of validation. Customer should perform or be closely involved in this. Customers may choose to do any test they wish, normally based on their usual business processes. A common approach is to set up a model office where systems are tested in an environment as close to field use as is achievable. Contract acceptance testing A demonstration of the acceptance criteria, which would have been defined in the contract, being met.

Page 81 of 88

Alpha & beta testing In alpha and beta tests, when the software seems stable, people who represent your market use the product in the same way(s) that they would if they bought the finished version and give you their comments. Alpha tests are performed at the developers site, while beta tests are performed at the testers sites.

Integration Testing in the Large

testing the integration of systems and packages; testing interfaces to external organisations (e.g. Electronic Data Interchange, Internet)

Integration with other (complete) systems. Identification of, and risk associated with, interfaces to these other systems. Incremental/non-incremental approaches to integration.

Non-Functional System Testing

non-functional requirements; non-functional test types: load, performance and stress; security; usability; storage; volume; installability; documentation; recovery

Explain that non-functional requirements are as important as functional requirements. Very briefly cover each of the listed techniques. (Load, performance and stress as defined on Systeme Evolutif web pages; security, usability, storage, volume, installability, documentation and recovery as defined in Myers book.)

Functional System Testing

functional requirements; requirements-based testing; business processbased testing

Functional requirement as per IEEE definition, which is A requirement that specifies a function that a system or system component must perform. Requirements-based testing where the user requirements specification and the system requirements specification (as used for contracts) may be used to derive test cases. Business process-based testing based on expected user profiles (e.g. scenarios, use cases, etc.).

Integration Testing in the Small


Role of stubs and drivers.

assembling components into sub-systems; sub-systems to systems; stubs and drivers; big-bang, top-down, bottom-up, other strategies

Integration testing tests interfaces and interaction of modules/subsystems.

Incremental strategies, to include: top-down, bottom-up and functional incrementation. Non-incremental approach (big bang).

Component Testing

(also known as Unit, Module, Program Testing); overview of BS 7925-2 Software component testing; component test process

Page 82 of 88

Maintenance Testing

problems of maintenance; testing changes; risks of changes and regression testing

Testing old code with poor/missing specifications. Scope of testing with respect to changed code. Impact analysis is difficult so higher risk when making changes and difficult to decide how much regression testing to do.

Dynamic Testing Techniques


Black and White box testing functional or black-box testing; structural, white or glass box testing; trend from white box to black box through the lifecycle; techniques and tools

Explain terminology: black box/functional and white box/structural/glass box. Describe difference between black and white box techniques. Explain that black box is relevant throughout the life cycle whereas, in general, additional white box is appropriate for sub-system testing (unit, link) but becomes progressively less useful towards system and acceptance testing. System and acceptance testers will tend to focus more on specifications and requirements than on code. Emphasise the use of systematic techniques (and corresponding measures) to provide confidence. Explain that tools increase productivity and quality and are particularly useful for white box testing.

Black box test techniques

Black box techniques as defined in the BCS standard.

List all black box techniques in BS 7925-2. Provide description and example of equivalence partitioning and boundary value analysis, plus one other.

White box test techniques

White box techniques as defined in the BCS standard.

List all white box techniques in BS 7925-2. Provide description and example of statement and of branch/decision testing, as per BS 7925-2.

Error-Guessing

using experience to postulate errors; using error-guessing to complement test design techniques

Error-guessing can detect some faults that systematic techniques can miss. Test cases are derived from experience of where errors have occurred in the past or the tester has an insight as to where errors are likely to occur in the future. Error-guessing should be used as a 'mopping-up approach or as a supplement to systematic techniques, not as the first choice technique.

Page 83 of 88

Static Testing
Reviews and the Test Why, when and what to review?; costs and benefits of reviews Process
Why reviews are known to be cost effective. Any document can be reviewed. For instance, requirement specifications, design specifications, code, test plans, user guides, etc. Ideally review as soon as possible. Costs on-going review costs of approx. 15% of development budget. The cost of reviews includes activities such as the review process itself, metrics analysis and process improvement. Benefits include areas such as development productivity improvements, reduced development timescales, testing cost and time reductions, lifetime cost reductions, reduced fault levels, etc.

Types of Review

types of review; goals, activities performed, roles and responsibilities, deliverables, pitfalls

Explain similarities/differences between walkthroughs, inspections, informal reviews, and technical reviews, where each can be identified by the following attributes: Walkthroughs scenarios, dry runs, peer group, led by author. Inspections led by trained moderator (not author), defined roles, includes metrics, formal process based on rules and checklists with entry and exit criteria. Informal reviews undocumented, but useful, cheap, widely-used. Technical reviews (also known as peer reviews) documented, defined fault-detection process, includes peers and technical experts, no management participation. Goals validation and verification against specifications and standards, (and process improvement). Achieve consensus. Activities planning, overview meeting, preparation, review meeting, and follow-up (or similar). Roles and responsibilities moderators, authors, reviewers/inspectors and managers (planning activities). Deliverables product changes, source document changes, and improvements (both review and development). Pitfalls lack of training, lack of documentation, lack of management support (and failure to improve process).

Static Analysis

simple static analysis; compiler-generated information; data-flow analysis; control-flow graphing; complexity analysis

Explain that static analysis involves no dynamic execution and can detect possible faults such as unreachable code, undeclared variables, parameter type mismatches, uncalled functions and procedures, possible array bound violations, etc. Explain that any faults found by compilers are found by static analysis. Compilers find faults in the syntax. Many compilers also provide information on variable use, which is useful during maintenance.

Page 84 of 88

Explain that data flow analysis considers the use of data on paths through the code, looking for possible anomalies, such as definitions with no intervening use, and use of a variable after it is killed. Explain use of, and provide example of production of control flow graph for a program. Introduce complexity metrics, including cyclomatic complexity.

Test Management
Organisation organisational structures for testing; team composition
Explain that organisations may have different testing structures: testing may be the developers responsibility, or may be the teams responsibility (buddy testing), or one person on the team is the tester, or there is a dedicated test team (who do no development), or there are internal test consultants providing advice to projects, or a separate organisation does the testing. A multi-disciplinary team with specialist skills is usually needed. Most of the following roles are required: test analysts to prepare strategies and plans, test automation experts, database administrator or designer, user interface experts, test environment management, etc.

Configuration Management

typical symptoms of poor CM; configuration identification; configuration control; status accounting; configuration auditing.

Describe typical symptoms of poor CM such as: unable to match source and object code, unable to identify which version of a compiler generated the object code, unable to identify the source code changes made in a particular version of the software, simultaneous changes are made to the same source code by multiple developers (and changes lost), etc. Configuration identification requires that all configuration items (CI) and their versions in the test system are known. Configuration control is maintenance of the CIs in a library and maintenance of records on how CIs change over time. Status accounting is the function recording and tracking problem reports, change requests, etc. Explain that configuration auditing is the function to check on the contents of libraries, etc. for standards compliance, for instance. CM can be very complicated in environments where mixed hardware and software platforms are being used, but sophisticated cross-platform CM tools are increasingly available.

Test Estimation, Monitoring and Control

test estimation; test monitoring; test control.

Test estimation - explain that the effort required to perform activities specified in the high-level test plan must be calculated in advance and that rework must be planned for. Test monitoring describe useful measures for tracking progress (e.g. number of tests run, tests passed/failed, incidents raised and fixed, retests, etc.). Explain that the test manager may have to report on deviations from the project/test plans such as running out of time before completion criteria achieved. Test control explain that the re-allocation of resources may be necessary, such as changes to the test schedule, test environments, number of testers, etc.

Page 85 of 88

Incident Management

what is an 'incident'; incidents and the test process; incident logging; tracking and analysis.

An incident is any significant, unplanned event that occurs during testing that requires subsequent investigation and/or correction. Incidents are raised when expected and actual test results differ. Incidents may be raised against documentation as well as code or a system under test. Incidents may be analysed to monitor the test process and to aid in test process improvement. Incidents should be logged when someone other than the author of the product under test performs the testing. Typically the information logged on an incident will include expected and actual results, test environment, software under test id, name of tester(s), severity, scope, priority and any other information deemed relevant to reproducing and fixing the potential fault. Incidents should be tracked from inception through various stages to eventually close out and resolution.

Standards for Testing QA standards; industry-specific standards; testing standards


Explain that QA standards simply specify that testing should be performed, while industry-specific standards specify what level of testing to perform, and testing standards specify how to perform testing. Ideally testing standards should be referenced from the other two. Examples are ISO 9000, Railway Signalling standard, BS 7925-1, BS 7925-2.

Tool Support for Testing (CAST)


Types of CAST Tool requirements testing; static analysis; test design; data preparation; character-based test running; GUI test running; test harnesses, drivers and simulators; performance testing; dynamic analysis; debugging; comparison; test management; coverage measurement

Requirements testing tools provide automated support for the verification and validation of requirements models, such as consistency checking and animation. Static analysis tools provide information about the quality of the software by examining the code, rather than by running test cases through the code. Static analysis tools usually give objective measurements of various characteristics of the software, such as the cyclomatic complexity measure and other quality metrics. Test design tools generate test cases from a specification that must normally be held in a CASE tool repository or from formally specified requirements held in the tools itself. Some tools generate test cases from an analysis of the code. Test data preparation tools enable data to be selected from existing databases or created, generated, manipulated and edited for use in tests. The most sophisticated tools can deal with a range of file and database formats. Character-based test running tools provide test capture and replay facilities for dumb-terminal based applications. The tools simulate user-entered terminal keystrokes and capture screen responses for later comparison. Test procedures are normally captured in a programmable script language, data, test cases and expected results may be held in separate test repositories. These tools are most often used to automate regression testing . GUI test running tools provide test capture and replay facilities for WIMP interface based applications. The tools simulate mouse movement, button clicks and keyboard inputs and can recognise GUI objects such as windows, fields, buttons and other controls. Object states and bitmap images can be captured for later comparison. Test procedures are normally captured in a programmable script language, data, test

Page 86 of 88

cases and expected results may be held in separate test repositories. These tools are most often used to automate regression testing. Test harnesses and drivers are used to execute software under test which may not have a user interface or to run groups of existing automated test scripts which can be controlled by the tester. Some commercially available tools exist, but custom-written programs also fall into this category. Simulators are used to support tests where code or other systems are either unavailable or impracticable to use (e.g. testing software to cope with nuclear meltdowns). Performance test tools have two main facilities: load generation and test transaction measurement. Load generation is done either by driving the application using its user interface or by test drivers, which simulate the load generated by the application on the architecture. Records of the numbers of transactions executed are logged. Driving the application using its user interface, response time measurements are taken for selected transactions and these are logged. Performance testing tools normally provide reports based on test logs, and graphs of load against response times. Dynamic analysis tools provide run-time information on the state of executing software. These tools are most commonly used to monitor the allocation, use and de-allocation of memory, flag memory leaks, unassigned pointers, pointer arithmetic and other errors difficult to find 'statically'. Debugging tools are used mainly by programmers to reproduce bugs and investigate the state of programs. Debuggers enable programmers to execute programs line by line, to halt the program at any program statement and to set and examine program variables. Comparison tools are used to detect differences between actual results and expected results. Standalone comparison tools normally deal with a range of file or database formats. Test running tools usually have built-in comparators that deal with character screens, GUI objects or bitmap images. These tools often have filtering or masking capabilities, whereby they can 'ignore' rows or columns of data or areas on screens. Test management tools may have several capabilities. Testware management is concerned with the creation, management and control of test documentation, e.g. test plans, specifications, and results. Some tools support the project management aspects of testing, for example the scheduling of tests, the logging of results and the management of incidents raised during testing. Incident management tools may also have workflow-oriented facilities to track and control the allocation, correction and re-testing of incidents. Most test management tools provide extensive reporting and analysis facilities. Coverage measurement (or analysis) tools provide objective measures of structural test coverage when tests are executed. Programs to be tested are instrumented before compilation. Instrumentation code dynamically captures the coverage data in a log file without affecting the functionality of the program under test. After execution, the log file is analysed and coverage statistics generated. Most tools provide statistics on the most common coverage measures such as statement or branch coverage.

Tool Selection and Implementation

which test activities can be automated?; CAST tool requirements; which tool types to use?; test process maturity and 'CAST readiness'; selection process; tools, platforms and CAST integration; pilot projects and rollout

There are many test activities which can be automated and test execution tools are not necessarily the first or only choice. Identify your test activities where tool support could be of benefit and prioritise the areas of most importance. The fit with your test process may be more important than choosing the tool with the most features in deciding whether you need a tool, and which one you choose. The benefits of tools usually depend on a systematic and disciplined test process. If testing is chaotic, the tools may not be useful and may hinder testing. You must have a good process now, or recognise that your process must improve in parallel with tool implementation. The ease by which CAST tools can be implemented might be called CAST readiness.

Page 87 of 88

Tools may have interesting features, but may not necessarily be available on your platforms. E.g. works on 15 flavours of Unix, but not yours. Some tools, e.g. performance testing tools, require their own hardware, so the cost of procuring this hardware should be a consideration in your cost benefit analysis. If you already have tools, you may need to consider the level and usefulness of integration with other tools. E.g., you may want a test execution tool to integrate with your existing test management tool (or vice versa). Some vendors offer integrated toolkits, e.g. test execution, test management, performance-testing bundles. The integration between some tools may bring major benefits, in other cases, the level of integration is cosmetic only. Once automation requirements are agreed, the selection process has 4 stages: 1. 2. 3. 4. Creation of a candidate tool shortlist Arrange demos Evaluation(s) of selected tool(s) Review and select tool.

Before making a commitment to implementing the tool across all projects, a pilot project is usually undertaken to ensure the benefits of using the tool can actually be achieved. The objectives of the pilot are to gain some experience in use of the tools, identify changes in the test process required and assess the actual costs and benefits of implementation. Roll out of the tool should be based on a successful result from the evaluation of the pilot. Roll-out normally requires strong commitment from tool users and new projects, as there is an initial overhead in using any tool in new projects. [End of document]

Page 88 of 88

You might also like