The document discusses the reliability of non-parametric statistical tests for detecting linear drift in experimental data. It presents the results of a simulation study comparing the Wald-Wolfowitz run test and Mann-Whitney reverse arrangement test. The simulation program generated Gaussian data with a linear drift pattern and variable slope and variance, to test the sensitivity of the methods to drift under different noise conditions. Graphs of the tests' success in detecting drift were produced as a function of drift relevance and noise amplitude. These allow experimentalists to preliminarily determine the effectiveness of each method for their conditions and elucidate causes of contradictory results.
The document discusses the reliability of non-parametric statistical tests for detecting linear drift in experimental data. It presents the results of a simulation study comparing the Wald-Wolfowitz run test and Mann-Whitney reverse arrangement test. The simulation program generated Gaussian data with a linear drift pattern and variable slope and variance, to test the sensitivity of the methods to drift under different noise conditions. Graphs of the tests' success in detecting drift were produced as a function of drift relevance and noise amplitude. These allow experimentalists to preliminarily determine the effectiveness of each method for their conditions and elucidate causes of contradictory results.
The document discusses the reliability of non-parametric statistical tests for detecting linear drift in experimental data. It presents the results of a simulation study comparing the Wald-Wolfowitz run test and Mann-Whitney reverse arrangement test. The simulation program generated Gaussian data with a linear drift pattern and variable slope and variance, to test the sensitivity of the methods to drift under different noise conditions. Graphs of the tests' success in detecting drift were produced as a function of drift relevance and noise amplitude. These allow experimentalists to preliminarily determine the effectiveness of each method for their conditions and elucidate causes of contradictory results.
Reliability Analysis of Non-Parametric Statistical
Tests f or t he Evaluation of Linear Dr i f t in
Experimental Dat a P. Cappa2, S. A. S c i ~ t o ~ * ~ and S. Silvestri ' Department of Mechanics and Aeronautics - University of Rome "La Sapienza" - ViaEudossiana 18,00184 - Rome, Italy. Clinical Engineering Service - Children's Hospital "Bambino Gesu' of Rome - Piazza S. Onofrio 4,00165 - Rome, Italy. Department of Mechanical and Industrial Engineering - University of Roma Tre - Viadella Vasca Navale 79,00146 - Rome, Italy, Abstract: During usual data gathering, the statistical analysis eficiency strongly depends on the noise level superimposed on the signal. It has been found that some well known statistical tests, commonly utilised in data acquisition in order to detect the presence of drift, can fail under some conditions. Thus, a statistical procedure for the predictive reliability estimation of the utilised statistical method could be usefil in the design of experimental analysis. This paper reports the results of a simulation study carried out to evaluate the performance in drift detection of non-parametric tests suck as the Wald-Wolfowitz run test, in comparison with the Mann-Wkitney, reverse arrangement test. In order to detect the sensitivity of the tests to evaluate a monotonous drift, a simulation program was developed. In the program a Gaussian raw data sequence with a linear pattern of variable slope and with variable variance was simulated and given as the input to the tests. The capability to detect the presence of drift as a function of angular coeficient and variance of the noise superimposed on the signal was verified. The obtained data were synthesised in graphs so that the experimentalist could determine preliminarily the eflectiveness of each of the considered statistical methods in terms of percentage of success in detecting the presence of drift phenomena as a function of drift relevance and the noise amplitude. Finally, the graphs permitted the elucidation of the causes of contradictovy failing results observed in long term experimental analysis. Key Words: Drift, zero-shift, non parametric tests, statistical reliability, pulmona y ventilators Not at ion: A value of reverse arrangements in the Mann- Whitney reverse arrangement test obtained examining the whole data set; generic reverse arrangement value in the Mann-Whitney reverse arrangement test; reverse arrangement obtained examining xi and xi; number of positive runs in the Wald-Wolfovitz run test; number of negative runs in the Wald-Wolfovitz run test; Ai h, N total number of runs; N1 N2 Y RAT RT X x m PA 0 'Y 'Y random variable of run distribution; reverse arrangement test; run test; random variable; mean value of a random variable; mean of arrangements in the Mann-Whitney reverse arrangement test; mean of runs in the Wald-Wolfovitz run test; variance of arrangements in the Mann-Wtney reverse arrangement test; variance of m in the Wald-Wolfovitz run test; variance associated to y where y=a+bx. Strain 2001 Vol. 37 No. 2 67 Introduction Pulmonary ventilators, as it well known, are commonly utilised even for long periods of time [I , 21 and, unfortunately, ventilatory parameter drift is a common problem with which clinical technicians have often to deal. Drift of these parameters, as it is obvious, could be very harmful for patient's health [3-51. Anyway, in spite of their importance, drift tests are not prescribed currently by ventilator manufacturers. Furthermore, the "Standard specification for ventilators intended for use in critical care" [6] (the only available reference) describes experimental methods to conduct endurance tests that seem too rigorous to be practised in common maintenance procedures and offers no standard procedure for statistical data analysis. For this reason, in a previous phase of research, a PC-based automatic procedure with a user friendly interface for ventilator drift tests 171has been designed. This is in use at the Clinical Engineering Service+(CES) of the Children's Hospital "Bambino GesG"++and has helped technicians with maintenance procedures for the 68 pulmonary ventilators currently (for about 1.5 M$ total value) installed. In particular, the proposed methodology to verlfy the possible ventilator drift could be extended also to all of the fields where a zero shift analysis is required. During the verification phase of the proposed method, some ventilation parameters, such as airway pressure, tidal volume and respiratory flow, were continuously acquired for a 20 day time period and the collected data were post-processed by means of statistical tools [8-171 in order to evaluate the possible drift of the examined ventilators. In particular, non-parametric tests, i.e. the Wald-Wolfovitz run test (RT) and the Mann-Whitney reverse arrangement test (RAT), were used to process data in order to highlight the presence of a systematic trend in the observed results as a function of time. The results were more than satisfactory even though in some cases different tests provided different responses for the same raw data and the same confidence coefficient (a level). As it is useful know the reliability that could be expected from such methods before their application, an investigation of non-parametric test failure causes was of interest. As, it was observed that discordance mostly appeared when the presence of noise is particularly high in the acquired data and, it is well known, that the efficiency of statistical analysis strongly depends not only on the noise level superimposed on the signal but also on the accuracy of measurement set-up, we decided to implement a procedure for the predictive estimation of the reliability of the utilised non-parametric tests as a function of noise level and linear drift slope. In order to achieve this aim and to identify the reasons for different responses, a simulation study was carried out by applying statistical methods to a variable slope linear drift with variable variance Gaussian white noise superimposed on the data set. Non-Parametric Test Description The statistical procedures which do not assume a specific distribution function for the original random variable of interest are called distribution free or non- parametric procedures. One of the best known distribution free procedures used for data evaluation is the well known chi-square goodness of fit test, but also RT and RAT are widely utilised valuable data processing techniques for drift detection. Every statistical test gives its response of acceptance or rejection with the starting hypothesis at a certain level of confidence or significance, also called a level. Usually, a level equal to 95%, which obviously means that there is a 5% probability to fail, is commonly accepted for experimental data processing. Tobetter understand the considerations that follow the Wald-Wolfovitz and Mann-Whitney test, are briefly described. Wald-Wolfovitz run t est Let us consider a sequence of N observed values of a random variable x where each observation can be classified into one of two mutually exclusive categories, which may be identified simply by a plus (+) or a minus (-). For example, in the case of a sequence of measured values xi, i= 1,2,3, . .., N with a mean value x,, we will count a (+) or a (-) for each xi L x, and xi <x , , respectively. A run is defined as a sequence of identical observations, positive if referred to (+) observations or negative if vice versu, that is followed and preceded by a different observation, (-) or (+). The number of runs occurring in the whole sequence of observations gives an indication as to whether or not data are independent observations of the same random variable. More specifically, if a sequence of N observations of the same random variable are independent, the probability of a (+) or a (-) result does not change from one observation to the 'The Clinical Engineering Service was established in 1980 and manages about 5000 electro-medical devices for a global value of about 40 million US$. "TheChildren's Hospital "Bambino Gesu (about 730 bed-medical facility) is a private and non-profit-making hospital located inthe Vatican City, i.e. the independent Papal st at e within the city of Rome (Italy), and isofficially recognised by the Italian Government as a "Research and Care Institute of a Scientific Nature". 68 Strain 2001 Vol. 37 No. 2 N ( 2 N + 5 ) ( N - 1 ) next and, as a consequence, the sampling distribution of the number of runs occurred in the sequence is a U,.I = -79 (7) I L random variable r with a mean value and a variance evaluated as follows: Then a normalised Gaussian curve is developed with a mean value and variance calculated according to equations (6) and (7), respectively. If A lies in the confidence interval defined by the GI level, then the response is positive and drift is not present. Also in this case, limited tabulation of percentage points for the distribution function are available in literature [16]. As already observed, both tests allow the estimation of linear drift tendency of a data set with a certain (1) 2N, N* +1 PT =7 (2) a, = where N1 is the number of positive runs and N2 the number of negative runs. Then, a normalised Gaussian curve is obtained by means of equations (1) and (2). If N1 lies in the confidence interval defined by the a level 2 2 N, N, @N, N, - N ) N' ( N -1) then the response is positive and two categories have same distributions indicating an absence of drift. Limited tabulation of percentages points for the distribution function of runs are also available in the literature [16]. Mann-Whitney reverse arrangement test Given a sequence of N observed values of a random variable x where the observations are denoted by xi, i=l, 2, 3..., N, each time that xi>xj for i<j it must be counted as a reverse arrangement and the total number of reverse arrangements is denoted by A. A general definition for A is as follows. For a set of observations xl, x2, .... xN we can define A'-1 A - ~ A , ,=1 ( 3) level of confidence, but they do not take into account in any way the effect of the data variance, which could significantly affect the reliability of their response. However, in order to evaluate the "level of dispersion" of data, simple calculation of variance of the whole data set was not found useful because, as well known, variance value can be strongly dependent on the slope of the linear underlying tendency. Therefore, it is necessary to individuate a statistical index able to estimate data dispersion due to the effects induced by a reduced measurement system accuracy, independently from the slope of data trend. In order to obtain objective diagrams for the reliability evaluation of the two considered tests the determination of 02y [ B] , i.e. the variance associated with y where y is linearly related to the input x, seemed to be an appropriate index to attain our aim, i.e. to separate the noise floor contribution from drift tendency. as the total number of reverse arrangements where any element of the sum Ai is defined by In order to check the independence of 0 2 y from slope of data set, a simulation was carried out by randomly applying Gaussian noise with a variance crz equal to calculating global variance az and a 2 y . Results are shown in Figure 1 where o2 and a 2 y are represented as a function of the angular coefficient expressed in A' a set value of 25 to different linear slope data sets and 4 =pv and (4) , = , + I 1 rfx, z XI = 0 otherwise (5) xo ................................. xo. for any k j , is termed a reverse arrangement. If the sequence of N observations are independent observations of the same population, i.e. no drift is present, then the number of reverse arrangements is a K i i i i : , , i , , .i i - i ~ , i , random variable A, with a mean value and a variance as follows. ,co. J 0 0 5 1 ' 5 1 2 5 3 35 1 4 5 5 L 5 6 Slope [YO] N(N-1) Figure I: Comparison between o2 and 0 2 y as a function of the linear slope with Gaussian noise superimposed (6) Strain 2001 Vol. 37 No. 2 69 Figure 2: Flow chart ofsofl ware for t he determination of test reliability graphs percentage. A first sight examination of this figure confirms that a 2 y is an effective indicator of data dispersion, sufficiently independent from the angular coefficient of the data trend, i.e. from monotonous instrumentation shift. Simulation and Graph Description To determine the reliability of the two tests for drift detection as a function of the level of dispersion of the collected data, a simulation software was designed in LabViewTM. With reference to Figure 2, the program is composed of two main modules. The first part was developed for the generation of a linear function with variable angular coefficient with the possibility of overlapping it with Gaussian white noise of variable variance. The second module performs RT and RAT on the data provided from the first part of the software, and in addition calculates azy (Figure 2). Once the level of reliability ( a level) is established equal to 95%, the software provides a graph in which the success/failure of the specific test is represented as a function of dy that, as previously mentioned, is evidently related to the system accuracy, and the angular coefficient of the linear data trend that is related to the drift tendency of the utilised set-up. From an overall analysis of the diagram provided by the simulation software for the above mentioned tests in the case of monotonous drift, it is possible to observe that, as expected, the increase in noise amplitude determines an increase in test unreliability. Furthermore, three main zones have been outlined (Figures 3 and 4): (a) a success zone, where the test gives a reliable response, (b) a failure zone, inside which the test is completely unreliable, and (c) an uncertainty zone, i.e. a data set with unstable results. In particular, it is possible to observe that the zone of unreliability increases with a direct dependency on the data dispersion level. The identification of limit lines for test reliability allows one to determine the angular coefficient of the minimal noticeable drift once the value of a*,, for the collected data set is calculated. Furthermore, diagrams also show that in case of monotonous drift RAT results more efficient than RT. Consequently, once the experimentalist determines the aZy, the minimal angular coefficient can be estimated, i.e. the instrumentation drift, by each of the mentioned tests. Application t o Experimental Data and Discussion In order to validate the results provided by the examined drift test analysis, the application to those data sets previously acquired during pulmonary ventilator parameter drift analysis [7] was decided. The measured fundamental parameters (tidal volume, airway pressure, percentage of oxygen, etc.) were acquired for a twenty days time period by connecting a ventilator to a patient simulator, the physical characteristics of which were assumed constant as a function of time. Mechanical characteristics of patient simulator are guaranteed by the manufacturer to be stable over a period of time much longer than 20 days, 0 0 0.1 0.2 0.3 0.4 0.5 0 0.05 0.1 0.15 E Reliabi' Slope [%] Slope [YO] Figure 3: Zones of reliability, unrel i abi l q and uncertainty for run test as a function of dy, and mi ni mal appreciable drifi slope with a level equal to 95% Figure 4: Zones of reliability, unreliability and uncertain?. for reverse arrangement test as a function of dy and minimal appreciable dr i p slope with a level equal t o 95% 70 Strain 2001 Vol. 37 No. 2 which is the duration we chose for data acquisition. The analysis of the obtained results outlined the presence of a generally noticeable drift in the airway pressure values. This phenomenon was evident for peak pressure values (Figure 5) for which an increase of 18mmH20 appeared during test time length, i.e. the airway peak pressure increased with a 0.037mmH20/hour slope and both the statistical tests outlined that tendency. However, with reference to other pressure values, such as end inspiration (Figure 6) or mean pressures, despite an expected drift tendency, the applied statistical tests provided discordant responses. In this particular case, drift presence, for the same level, emerged by utilising RAT, while RT excluded it. The calculation of oZy on the data set relative to end inspiration pressure provided a value of oZy =8.51mm2H20. From the examination of Figures 3 and 4 it emerges that the minimal noticeable drift for oZy =8.51mm2H20 must have an angular coefficient higher than 0.13% for RAT and higher than 0.42% for RT. As the parameter was acquired with a 4 samples per hour frequency the minimal noticeable drift turns out to be equal to 0.04mmH20/hour, a value that lies out of the zone of reliability for RT but inside the zone of reliability for RAT. Thus, only the RAT result has to be taken into consideration because it is capable of identifying the system instability. The magnitude of ventilator parameter variation outlined by the examined case study does not seem to be relevant at a first sight comparison with actual pathophysiological changes in human beings. Besides, pulmonary ventilators have been taken as a case study for their supposed stability. In fact, as it is well known, they are usually very expensive medical devices specifically devoted to high risk utilisation and, as expected, their functioning is remarkably stable. As a Figure 5: Example of raw data acquired from a pulmonary mechanical ventilator where drifl is present and was identified by bath the examined statistical tests consequence, even though drift sometime appears, it is generally not relevant in adult applications. Anyway, there are two main aspects that must be taken into account: first of all the clinical relevance of a ventilator drift cannot be stated a priori. Furthermore the present study has been conducted with a view to the real application of the devices examined at our Children's Hospital. Small variations in the parameters can be hazardous due to the mechanical characteristics of neonatal lungs that strongly depend on age, sex and physiology of the patient. More specifically, in new born infants an error of few cmH,O on Peak Inspiratory Pressure (PIP) can cause barotraumas and a result definitely dangerous to patient's health. With reference to the constancy of ventilator settings, it must be considered that, whereas in an Intensive Care Unit (ICU) environment, as well as in an operating theatre, patient condition can change even very suddenly, in long term treatments, i.e. in case of home life support or chronic diseases, settings can be left unchanged for months. Furthermore, this study was conducted with the main aim of outlining the drift component which can be attributed just to the ventilator as a single device in order to evaluate its reliability of use. The application of the here reported results can be eventually useful to CES technicians for the objective comparison of different ventilator performances or to check the ageing process of the same ventilator during maintenance procedures. Conclusions To provide acceptance conditions and outline reliability for some widely used statistical tests that can be applied to any kind of experimental data where I g .5 m p - . ..... ,G-i.:kf .....-. .I- -:i-m-+:.-2 .- "' ;::. ..................... . ... -.. .. a o . * . . . - . . - . ................. ..................................... I ...... " .... -... " " ..". .... Figure 6: Example of raw data acquired from a pulmonary mechanical ventilator where drif? was identified by only one of the examined two statistical tests Strain 2001 Vol. 37 No. 2 71 a constant output is expected, tests have been carried out by means of a patient simulator specifically designed by the manufacturer for ventilator calibration to guarantee stable "patient" conditions during device parameter testing. Therefore, the observed drift amount can be attributed to the examined ventilator. The reported analysis allows the determination of confidence level associated with run test and reverse arrangement test when they are utilised to evaluate a monotonous drift in experimental data set. Thus, the experimentalist can, in an a priori approach, evaluate the minimum noticeable drift when the overall accuracy associated with the measurement set-up is known. The method proposed here was validated with the experimental data and identified, in an objective manner, the discordance of the results obtained by means of the two tests when applied to the same data set. References 1. Tobin, MJ ., J ubran, A. and Hinesc, E. Jr. (1994) Pathophysiology of failure to wean from mechanical ventilation. Schweiz Med Wochenschr 2. Nava, S. et al. (1994) Survival and prediction of successful ventilator weaning in COPD patients requiring mechanical ventilation for more than 21 days. Eur Respir J 7,1645-1652. 3. J ohnson, B. et al. (1985) Pathophysiological considerations on special modes of ventilation in severe respiratory distress syndrome. Excerpta Medica Int. Congr., Rome (Italy), Gasparetto. 4. Calon, B., Clever, B. and Urli, D. (1989) An unusual failure of the 900C Siemens Servo Ventilator. An. Fr. Anesthetics, Strasburgh (France). 5. Hartopp, I.K. (1994) Incorrect settings on Manley ventilators [letter]. Anaesthesia, 49,916-917. 6. ASTM F 1100-90 (1990) Standard Specification for Ventilators Intended for Use in Critical Care. West Conshohocken, PA, USA. 7. Branca F.P., Cappa P., Sciuto S.A. and Silvestri S. (1997) A novel methodology for the experimental evaluation of pulmonary ventilator performance drift. JournaZ of Clinical Engineering 22, 163-170. 8. Taylor, J.R. (1982) An introduction to error analysis. University Science Book, Mill Valley. 9. Draper, N.R. and Smith, H. (1981) Applied Regression Analysis. J ohn Wiley & Sons, New York. 10. Brownlee, K.A. (1965) Statistical Theory and Methodology in Science and Engineering. J ohn 124,2139-2145. Wiley & Sons, New York. 11. Wald, A. and Wolfovitz, J . (1940) On a test whether two samples are from the same population. Ann of Math Statist 11, 147-162. 12. Conover, W.J. (1980) Practical Nonparametric Statistics, Pd ed.. J ohn Wiley & Sons, New York. 13. Blalock, H.M. (1979) Social Statistics, Pd ed.. McGraw-Hill, New York. 14.Stewart, J.Q. and Warntz, W, (1958) Physics of Population Distribution. Journal of Regional Science 15. Lehmann, E.L. (1975) Non parametrics: statistical methods based on ranks. Holden Day, San Francisco. 16. Wayne, D.W. (1978) Applied Nonparametric Statistics. Houghton Mifflin, Boston, MA, USA. 17.Hollander, M. and Wolfe, D.A. (1973) Nonparametric Statistical Methods. J ohn Wiley & Sons, New York. 1,90-123. F 1 H A R N S S S V S ~ Technical Centre, Owen Road Diss, Norfolk, England IP22 4ER Telephone +44(0) 1379 646200 Fax +44(0) 1379 646900 http://www.fl systerns.com MVBSTORS IN PEOPLE F1 Harness Systems are world leaders in the design and manufacture of electrical wiring harnesses for the premier Motorsport industry, with a major presence in Formula One, World Rally Car and Indycars. We are continuing to experience rapid growth in measurement products and wish to recruit a Strain Gauge Engineermechnician to support this development. STRAIN GAUGE ENGINEEWECHNICIAN You will have specialist experience of design and installation of strain gauge based systems. You will be used to building installations to the highest quality and to short timescales, to service a demanding market. The role will include designing stain gauge systems, installation and bonding of strain gauges, training and developing production processes. 'lease apply in writing with your career details or contact Susan Hutchinson : Telephone 01379-646214 Email : Susan.Hutchinson@flsysterns.com 72 Strain 2001 Vol. 37 No. 2