Multivariate Statistical Process Control With Industrial Applications

Multivariate Statistical Process Control with Industrial Applications
ASA-SIAM Series on
Statistics and Applied Probability
The ASA-SIAM Series on Statistics and Applied Probability is published jointly by the American Statistical Association and the Society for Industrial and Applied Mathematics. The series consists of a broad spectrum of books on topics in statistics and applied probability. The purpose of the series is to provide inexpensive, quality publications of interest to the intersecting membership of the two societies.
Editorial Board
Robert N. Rodriguez SAS Institute, Inc., Editor-in-Chief Janet P. Buckingham Southwest Research Institute Richard K. Burdick Arizona State University James A. Calvin Texas A&M University Katherine Bennett Ensor Rice University Robert L. Mason Southwest Research Institute Gary C. McDonald General Motors R&D Center Jane F. Pendergast University of Iowa Alan M. Polansky Northern Illinois University Paula Roberson University of Arkansas for Medical Sciences Dale L Zimmerman University of Iowa
Mason, R. L. and Young, J. C., Multivariate Statistical Process Control with Industrial Applications Smith, P. L, A Primer for Sampling Solids, Liquids, and Gases: Based on the Seven Sampling Errors of Pierre Gy Meyer, M. A. and Booker, J. M., Eliciting and Analyzing Expert Judgment: A Practical Guide Latouche, G. and Ramaswami, V., Introduction to Matrix Analytic Methods in Stochastic Modeling Peck, R., Haugh, L., and Goodman, A., Statistical Case Studies: A Collaboration Between Academe and Industry, Student Edition Peck, R., Haugh, L., and Goodman, A., Statistical Case Studies: A Collaboration Between Academe and Industry Barlow, R., Engineering Reliability Czitrom, V. and Spagon, P. D., Statistical Case Studies for Industrial Process Improvement
mULTIVARIATE STATISTICAL
Process Control with
Industrial Applications
Robert L Mason
Southwest Research Institute San Antonio, Texas
John C. Young
McNeese State University Lake Charles, Louisiana
InControl Technologies, Inc. Houston, Texas
S1HJTL
Society for Industrial and Applied Mathematics Philadelphia, Pennsylvania
ASA
American Statistical Association Alexandria, Virginia
Copyright 2002 by the American Statistical Association and the Society for Industrial and Applied Mathematics. 10987654321 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688. Library of Congress Cataloging-in-Publication Data Mason, Robert L, 1946Multivariate statistical process control with industrial applications / Robert L. Mason, John C. Young. p. cm. - (ASA-SIAM series on statistics and applied probability) Includes bibliographical references and index. ISBN 0-89871-496-6 1. Process control-Statistical methods I. Young, John C., 1942- II. Title. III. Series. TS156.8 .M348 2001 658.5'62-dc21 2001034145
is a registered trademark. Windows and Windows NT are registered trademarks of Microsoft Corporation. QualStat is a trademark of InControl Technologies, Inc. The materials on the CD-ROM are for demonstration only and expire after 90 days of use. These materials are subject to the same copyright restrictions as hardcopy publications. No warranties, expressed or implied, are made by the publisher, authors, and their employers that the materials contained on the CD-ROM are free of error. You are responsible for reading, understanding, and adhering to the licensing terms and conditions for each software program contained on the CD-ROM. By using this CD-ROM, you agree not to hold any vendor or SIAM responsible, or liable, for any problems that arise from use of a vendor's software.
^&o ^Saimen an& Qjâm
This page intentionally left blank
Contents
Preface 1 Introduction to the T2 Statistic 1.1 Introduction 1.2 Univariate Control Procedures 1.3 Multivariate Control Procedures 1.4 Characteristics of a Multivariate Control Procedure 1.5 Summary 2 Basic Concepts about the T2 Statistic 2.1 Introduction 2.2 Statistical Distance 2.3 T2 and Multivariate Normality 2.4 Student t versus Retelling's T2 2.5 Distributional Properties of the T2 2.6 Alternative Covariance Estimators 2.7 Summary 2.8 Appendix: Matrix Algebra Review 2.8.1 Vector and Matrix Notation 2.8.2 Data Matrix 2.8.3 The Inverse Matrix 2.8.4 Symmetric Matrix 2.8.5 Quadratic Form 2.8.6 Wishart Distribution 3 Checking Assumptions for Using a T2 Statistic 3.1 Introduction 3.2 Assessing the Distribution of the T2 3.3 The T2 and Nonnormal Distributions 3.4 The Sampling Distribution of the T2 Statistic 3.5 Validation of the T2 Distribution 3.6 Transforming Observations to Normality 3.7 Distribution-Free Procedures 3.8 Choice of Sample Size
vii
xi 1 3 4 5 9 11 13 13 13 17 20 22 26 28 28 29 29 30 30 31 32 33 33 33 37 37 41 47 48 49
viii 3.9 3.10 3.11 Discrete Variables Summary Appendix: Confidence Intervals for UCL
Contents 50 50 51 53 54 55 57 61 62 64 65 68 72 78 78 78 79 81 81 81 82 85 86 86 89 91 92 96 97 98 98 100 105 106 108 Ill 115 118 119 119 120 125
4 Construction of Historical Data Set 4.1 Introduction 4.2 Planning 4.3 Preliminary Data 4.4 Data Collection Procedures 4.5 Missing Data 4.6 Functional Form of Variables 4.7 Detecting Collinearities 4.8 Detecting Autocorrelation 4.9 Example of Autocorrelation Detection Techniques 4.10 Summary 4.11 Appendix 4.11.1 Eigenvalues and Eigenvectors 4.11.2 Principal Component Analysis 5 Charting the T2 Statistic in Phase I 5.1 Introduction 5.2 The Outlier Problem 5.3 Univariate Outlier Detection 5.4 Multivariate Outlier Detection 5.5 Purging Outliers: Unknown Parameter Case 5.5.1 Temperature Example 5.5.2 Transformer Example 5.6 Purging Outliers: Known Parameter Case 5.7 Unknown T2 Distribution 5.8 Summary 6 Charting the T2 Statistic in Phase II 6.1 Introduction 6.2 Choice of False Alarm Rate 6.3 T2 Charts with Unknown Parameters 6.4 T2 Charts with Known Parameters 6.5 T2 Charts with Subgroup Means 6.6 Interpretive Features of T2 Charting 6.7 Average Run Length (Optional) 6.8 Plotting in Principal Component Space (Optional) 6.9 Summary 7 Interpretation of T2 Signals for Two Variables 7.1 Introduction 7.2 Orthogonal Decompositions 7.3 The MYT Decomposition
Contents 7.4 7.5 7.6 7.7 7.8 7.9 7.10 Interpretation of a Signal on a T2 Component Regression Perspective Distribution of the T2 Components Data Example Conditional Probability Functions (Optional) Summary Appendix: Principal Component Form of T2
ix 127 129 131 135 140 142 143 147 147 147 149 152 155 157 162 163 165 169 171 172 172 174 175 175 176 180 183 188 191 193 193 194 199 201 202 202 207 209 211 212 213 213
8 Interpretation of T2 Signals for the General Case 8.1 Introduction 8.2 The MYT Decomposition 8.3 Computing the Decomposition Terms 8.4 Properties of the MYT Decomposition 8.5 Locating Signaling Variables 8.6 Interpretation of a Signal on a T2 Component 8.7 Regression Perspective 8.8 Computational Scheme (Optional) 8.9 Case Study 8.10 Summary 9 Improving the Sensitivity of the T2 Statistic 9.1 Introduction 9.2 Alternative Forms of Conditional Terms 9.3 Improving Sensitivity to Abrupt Process Changes 9.4 Case Study: Steam Turbine 9.4.1 The Control Procedure 9.4.2 Historical Data Set 9.5 Model Creation Using Expert Knowledge 9.6 Model Creation Using Data Exploration 9.7 Improving Sensitivity to Gradual Process Shifts 9.8 Summary 10 Autocorrelation in T2 Control Charts 10.1 Introduction 10.2 Autocorrelation Patterns in T2 Charts 10.3 Control Procedure for Uniform Decay 10.4 Example of a Uniform Decay Process 10.4.1 Detection of Autocorrelation 10.4.2 Autoregressive Functions 10.4.3 Estimates 10.4.4 Examination of New Observations 10.5 Control Procedure for Stage Decay Processes 10.6 Summary 11 The T2 Statistic and Batch Processes 11.1 Introduction
x 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10 Types of Batch Processes Estimation in Batch Processes Outlier Removal for Category 1 Batch Processes Example: Category 1 Batch Process Outlier Removal for Category 2 Batch Processes Example: Category 2 Batch Process Phase II Operation with Batch Processes Example of Phase II Operation Summary
Contents 213 217 219 221 226 226 230 232 234 237 253 259
Appendix. Distribution Tables Bibliography Index
Preface
Industry continually faces many challenges. Chief among these is the requirement to improve product quality while lowering production costs. In response to this need, much effort has been given to finding new technological tools. One particularly important development has been the advances made in multivariate statistical process control (SPC). Although univariate control procedures are widely used in industry and are likely to be part of a basic industrial training program, they are inadequate when used to control processes that are inherently multivariate. What is needed is a methodology that allows one to monitor the relationships existing among and between the process variables. The T2 statistic provides such a procedure. Unfortunately, the area of multivariate SPC can be confusing and complicated for the practitioner who is unfamiliar with multivariate statistical techniques. Limited help comes from journal articles on the subject, as they usually include only theoretical developments and a limited number of data examples. Thus, the practitioner is not well prepared to face the problems encountered when applying a multivariate procedure to a real process situation. These problems are further compounded by the lack of adequate computer software to do the required complex computations. The motivation for this book came from facing these problems in our data consulting and finding only a limited array of solutions. We soon decided that there was a strong need for an applied text on the practical development and application of multivariate control techniques. We also felt that limiting discussions to strategies based on Hotelling's T2 statistic would be of most benefit to practitioners. In accomplishing this goal, we decided to minimize the theoretical results associated with the T2 statistic, as well as the distributional properties that describe its behavior. These results can be found in the many excellent texts that exist on the theory of multivariate analysis and in the numerous published papers pertaining to multivariate SPC. Instead, our major intent is to present to the practitioner a modern and comprehensive overview on how to establish and operate an applied multivariate control procedure based on our conceptual view of Hotelling's T2 statistic. The intended audience for this book are professionals or students involved with multivariate quality control. We have assumed the reader is knowledgeable about univariate statistical estimation and control procedures (such as Shewhart charts) and is familiar with certain probability functions, such as the normal, chi-square, t, and F distributions. Some exposure to regression analysis also would be helpful.
xi
xii
Preface
Although an understanding of matrix algebra is a prerequisite in studying any area of multivariate analysis, we have purposely downplayed this requirement. Instead, appendices are included in various chapters in order to provide the minimal material on matrix algebra needed for our presentation of the T2 statistic. As might be expected, the T2 control procedure requires the use of advanced statistical software to perform the numerous computations. All the T2 charts presented in this text were generated using the QualStat software package, which is a product of InControl Technologies, Inc. On the inside back cover of the book we have included a free demonstration version of this software. You will find that the full-licensed version of the package is easy to apply and provides an extended array of graphical and statistical summaries. It also contains modules for use with most of the procedures discussed in this book. This text contains 11 chapters. These have been designed to progress in the same chronological order as one might expect to follow when actually constructing a multivariate control procedure. Each chapter has numerous data examples and applications to assist the reader in understanding how to apply the methodology. A brief description of each chapter is given below. Chapter 1 provides the incentive and intuitive grasp of statistical distance and presents an overview of the T2 as the ideal control statistic for multivariate processes. Chapter 2 supplements this development by providing the distributional properties of the T2 statistic as they apply to multivariate SPC. Distributional results are stated, and data examples are given that illustrate their use when applied to control strategy. Chapter 3 provides methods for checking the distributional assumptions pertaining to the use of the T2 as a control statistic. When distributional assumptions cannot be satisfied, alternative procedures are introduced for determining the empirical distribution of the T2 statistic. Chapters 4 and 5 discuss the construction of the historical data set and T2 charting procedures for a Phase I operation. This includes obtaining the preliminary data, analyzing data problems such as collinearity and autocorrelation, and purging outliers. Chapter 6 addresses T2 charting procedures and signal detection for a Phase II operation. Various forms of the T2 statistic also are considered. Signal interpretation, based on the MYT (Mason-Young-Tracy) decomposition, is presented in Chapter 7 for the bivariate case. We show how a signal can be isolated to a particular process variable or to a group of variables. In Chapter 8 these procedures are extended to cases involving two or more variables. Procedures for increasing the sensitivity of the T2 statistic to small consistent process changes are covered in Chapter 9. A T2 control procedure for autocorrelated observations is developed in Chapter 10, and the concluding chapter, Chapter 11, addresses methods for monitoring batch processes using the T2 statistic. We would like to express our sincere thanks to PPG Industries, Inc., especially the Chemicals and Glass Divisions, for providing numerous applications of the T2 control procedure for use in this book. From PPG in Lake Charles, LA, we thank Joe Hutchins of the Telphram Development Project; Chuck Stewart and Tommy Hampton from Power Generation; John Carpenter and Walter Oglesby from Vinyl; Brian O'Rourke from Engineering; and Tom Hatfield, Plant Quality Coordinator. The many conversations with Bob Jacobi and Tom Jeffery (retired) were most
Preface
xiii
helpful in the initial stages of development of a T2 control procedure. A special thanks also is due to Cathy Moyer and Dr. Chuck Edge of PPG's Glass Technical Center in Harmerville, PA; Frank Larmon, ABB Industrial Systems, Inc.; Bob Smith, LA Pigment, Inc.; and Stan Martin (retired), Center for Disease Control. Professor Youn-Min Chou of the University of Texas at San Antonio, and Professor Nola Tracey McDaniel of McNeese State University, our academic colleagues, have been most helpful in contributing to the development of the T2 control procedure. We also wish to acknowledge Mike Marcon and Dr. James Callahan of InControl Technologies, Inc., whose contributions to the application of the T2 statistic have been immeasurable. We ask ourselves, where did it all begin? In our case, the inspiration can be traced to the same individual, Professor Anant M. Kshirsagar. It is not possible for us to think about a T2 statistic without incurring the fond memories of his multivariate analysis classes while we were in graduate school together at Southern Methodist University many years ago. Finally, we wish to thank our spouses, Carmen and Pam. A project of this magnitude could not have been completed without their continued love and support. Robert L. Mason John C. Young
Chapter 1
Introduction to the T2 Statistic
The Saga of Old Blue

Imagine that you have recently been promoted to the position of performance engineer. You welcome the change, since you have spent the past few years as one of the process engineers in charge of a reliable processing unit labeled "Old Blue." You know every "nook and cranny" of the processing unit and especially what to do to unclog a feed line. Each valve is like a pet to you, and the furnace is your "baby." You know all the shift operators, having taught many and learned from others. This operational experience formed the basis for your recent promotion, since in order to be a performance engineer, one needs a thorough understanding of the processing unit. You are confident that your experience will serve you well. With this promotion, you soon discover that your job responsibilities have changed. No longer are you in charge of meeting daily production quotas, assigning shifts to the operators, solving personnel problems, and fighting for your share of the maintenance budget. Your new position demands that you adopt the skills of a detective and search for methods to improve unit performance. This is great, since over time you have developed several ideas that should lead to process improvement. One of these is the recently installed electronic data collector that instantly provides observations on all variables associated with the processing unit. Some other areas of responsibility for you include identifying the causes of upset conditions and advising operations. When upsets occur, quick solutions must be found to return the unit to normal operational conditions. With your understanding of the unit, you can quickly and efficiently address all such problems. The newly created position of performance engineer fulfills your every dream. But suddenly your expectations of success are shattered. The boss anxiously confronts you and explains that during the weekend, upset conditions occurred with Old Blue. He gives you a diskette containing process data,
Chapter 1. Introduction to the T2 Statistic

retrieved from the data net for both a good-run time period and the upset time period. He states that the operations staff is demanding that the source of the problem be identified. You immediately empathize with them. Having lived through your share of unit upsets, you know no one associated with the unit will be happy until production is restored and the problem is resolved. There is an entire megabyte of data stored on the diskette, and you must decide how to analyze it to solve this problem. What are your options ? You import the data file to your favorite spreadsheet and observe that there are 10,000 observations on 35 variables. These variables include characteristics of the feedstock, as well as observations on the process, production, and quality variables. The electronic data collector has definitely done its job. You remember a previous upset condition on the unit that was caused by a significant change in the feedstock. Could this be the problem? You scan the 10,000 observations, but there are too many numbers and variables to see any patterns. You cannot decipher anything. The thought strikes you that a picture might be worth 1,000 observations. Thus, you begin constructing graphs of the observations on each variable plotted against time. Is this the answer? Changes in the observations on a variable should be evident in its time-sequence graph. With 35 variables and 10,000 observations, this may involve a considerable time investment, but it should be worthwhile. You readily recall that your college statistics professor used to emphasize that graphical procedures were an excellent technique for gaining data insight. You initially construct graphs of the feedstock characteristics. Success eludes you, however, and nothing is noted in the examination of these plots. All the input components are consistent over the entire data set, including over both the prior good-run period and the upset period. From this analysis, you conclude that the problem must be associated with the 35 process variables. However, the new advanced process control (APC) system was working well when you left the unit. The multivariable system keeps all operational variables within their prescribed operational range. If a variable exceeded the range, an alarm would have signaled this and the operator would have taken corrective action. How could the problem be associated with the process when all the variables are within their operational ranges? Having no other options, you decide to go ahead and examine the process variables. You recall from working with the control engineers in the installation of the APC system that they had been concerned with how the process variables vary together. They had emphasized studying and understanding the correlation structure of these variables, and they had noted that the variables did not move independently of one another, but as a group. You decide to examine scatter plots of the variables as well as time-sequence plots. Again, you recall the emphasis placed on graphical techniques by that old statistics professor. What was his name? You begin the laborious task, soon realizing the enormity of the job. From experience, it is easy to identify the most important control variables and the
1.1. Introduction fundamental relationships existing between and among the variables. Perhaps scatter plots of the most influential variables will suffice in locating the source of the problem. However, you realize that if you do not discover the right combination of variables, you will never find the source of the problem. You are interrupted from your work by the reappearance of your boss, who inquires about your progress. The boss states he needs an immediate success story to justify your newly created position of performance engineer. There is a staff meeting in a few days, and he would like to present the results of this work as the success story. More pressure to succeed. You decide that you cannot disappoint your friends in the processing unit, nor your boss. You feel panic creeping close to the edge of your consciousness. A cup of coffee restores calm. You reevaluate your position. How does one locate the source of the problem? There must be a quicker, easier way than the present approach. We have available a set of data consisting of 10,000 observations on 35 variables. The solution must lie with the use of statistics. You slowly begin to recall the courses you took in college, which included basic courses in statistical procedures and statistical process control (SPC). Would those work here? Yes, we can compare the data from the good-run period to the data from the upset period. How is this done for a group o/35 variables? This was the same comparison made in SPC. The good-run period data served as a baseline and the other operational data were compared to it. Signals occurred when present operational data did not agree with the baseline data. Your excitement increases as you remember more. What was that professor's name, old Dr. or Dr. Old? Your coursework in SPC covered only situations involving 1 variable. You need a procedure that considers all 35 related variables at one time and indicates which variable or group of variables contributes to the signal. A procedure such as this would offer a solution to your problem. You rush to the research center to look for a book that will instruct you on how to solve problems in multivariate SPC.
1.1
Introduction
The problem confronting the young engineer in the above situation is common in industry. Many dollars have been invested in electronic data collectors because of the realization that the answer to most industrial problems is contained in the observations. More money has been spent on multivariable control or APC systems. These units are developed and installed to ensure the containment of process variables within prescribed operational ranges. They do an excellent job in reducing overall system variation, as they restrict the operational range of the variables. However, an APC system does not guarantee that a process will satisfy a set of baseline conditions, and it cannot be used to determine causes of system upsets. As our young engineer will soon realize, a multivariate SPC procedure is needed to work in unison with the electronic data collector and the APC system. Such a
Group Number Figure 1.1: Shewhart chart of a process variable.
procedure will signal process upsets and, in many cases, can be used to pinpoint precursors of the upset condition before control is lost. When signals are identified, the procedure allows for the decomposition of the signal in terms of the variables that contributed to it. Such a system is the main subject of this book.
1.2
Univariate Control Procedures
Walter A. Shewhart, in a Bell Telephone Laboratories memorandum dated May 16, 1924, presented the first sketch of a univariate control chart (e.g., see Duncan (1986)). Although his initial chart was for monitoring the percentage defective in a production process, he later extended his idea to control charts for the average and standard deviation of a process. Figure 1.1 shows an example of a Shewhart chart designed to monitor the mean, X, of a group of process observations taken on a process variable at the same time point. Drawn on the chart are the upper control limit (UCL) and the lower control limit (LCL). Shewhart charts are often used in detecting unusual changes in variables that are independent and thus not influenced by the behavior of other variables. These changes occur frequently in industrial settings. For example, consider the main laboratory of a major chemical industry. Many duties are assigned to this facility. These may range from research and development to maintaining the quality of production. Many of the necessary chemical determinations are made using various types of equipment. How do we monitor the accuracy (i.e., closeness to a target value) of the determination made by the equipment? Often a Shewhart chart, constructed from a set of baseline data, is utilized. Suppose a measurement on a sample of known concentration is taken. If the result of the sample falls within the control limits of the Shewhart chart, it is assumed that the equipment is performing normally.
1.3. Multivariate Control Procedures
Figure 1.2: Model of a production unit. Otherwise, the equipment is fixed and recalibrated and a new Shewhart chart is established. It may be argued that a specific chemical determination is dependent on other factors such as room temperature and humidity. Although these factors can influence certain types of equipment, compensation is achieved by using temperature and humidity controls. Thus, this influence becomes negligible and determinations are treated as observations on an independent variable.
1.3
Multivariate Control Procedures
There are many industrial settings where process performance is based on the behavior of a set of interrelated variables. Production units, such as the one illustrated in Figure 1.2, are excellent examples. They are designed to change an input to some specific form of output. For example, we may wish to change natural gas, a form of energy, to an alternate state such as steam or electricity. Or, we may wish to convert brine (salt water) to caustic soda and chlorine gas; sand to silica or glass; or hydrochloric acid and ethylene to ethylene dichloride, which in turn is changed to vinyl chloride. Our interest lies in the development of a control procedure that will detect unusual occurrences in such variables. Why not use univariate control procedures for these situations? To answer this question, we first need to describe the differences between univariate and multivariate processes. Although the biggest distinction that is evident to the practitioner is the number of variables, there are more important differences. For example, the characteristics or variables of a multivariate process often are interrelated and form a correlated set. Since the variables do not behave independently of one another, they must be examined together as a group and not separately. Multivariate processes are inherent to many industries, such as the chemical industry, where input is being chemically altered to produce a particular output. A good example is the production of chlorine gas and caustic soda. The input variable is saturated brine (water saturated with salt). Under proper conditions, some of the brine is decomposed by electrolysis to chlorine gas; caustic soda is formed within the brine and is later separated. The variables of interest are the components produced by the electrolysis process. All are related to the performance of the process. In
addition, many of the variables follow certain mathematical relationships and form a highly correlated set. The correlation among the variables of a' multivariate system may be due to either association or causation. Correlation due to association in a production unit often occurs because of the effects of some unobservable variable. For example, the blades of a gas or steam turbine will become contaminated (dirty) from use over time. Although the accumulation of dirt is not measurable, megawatt production will show a negative correlation with the length of time from the last cleaning of the turbine. The correlation between megawatt production and length of time since last cleaning is one of association. An example of a correlation due to causation is the relationship between temperature and pressure since an increase in the temperature will produce a pressure change. Such correlation inhibits examining each variable by univariate procedures unless we take into account the influence of the other variable. Multivariate process control is a methodology, based on control charts, that is used to monitor the stability of a multivariate process. Stability is achieved when the means, variances, and covariances of the process variables remain stable over rational subgroups of the observations. The analysis involved in the development of multivariate control procedures requires one to examine the variables relative to the relationships that exist among them. To understand how this is done, consider the following example. Suppose we are analyzing data consisting of four sets of temperature and pressure readings. The coordinates of the points are given as
where the first coordinate value is the temperature and the second value is the pressure. These four data points, as well as the mean point of (175, 75), are plotted in the scatter plot given in Figure 1.3. There also is a line fitted through the points and two circles of varying sizes about the mean point. If the mean point is considered to be typical of the sample data, one form of analysis consists of calculating the distance each point is from the mean point. The distance, say D, between any two points, (ai, a^) and (&i, 62)5 is given by the formula This type of distance measure is known as Euclidean, or straight-line, distance. The distance that each of our four example points is from the mean point (in order of occurrence) is computed as
From these calculations, it is seen that points 1 and 4 are located an equal distance from the mean point on a circle centered at the mean point and having a radius of 3.16. Similarly, points 2 and 3 are located at an equal distance from the mean but on a larger circle with a radius of 7.07.
1.3. Multivariate Control Procedures
Figure 1.3: Scatter plot illustrating straight-line distance. There are two major criticisms of this analysis. First, the variation in the two variables has been completely ignored. From Figure 1.3, it appears that the temperature readings contain more variation than the pressure readings, but this could be due to the difference in scale between the two variables. However, in this particular case the temperature readings do contain more variation. The second criticism of this analysis is that the covariation between temperature and pressure has been ignored. It is generally expected that as temperature increases, the pressure will increase. The straight line given in Figure 1.3 depicts this relationship. Observe that as the temperature increases along the horizontal axis, the corresponding value of the pressure increases along the vertical axis. This poses an interesting question. Can a measure of the distance between two points be devised that accounts for the presence of a linear relationship between the corresponding variables and the difference in the variation of the variables? The answer is yes; however, the distance is statistical rather than Euclidean and is not as easy to compute. To calculate statistical distance (SD), a measure of the correlation between the variables of interest must be obtained. This is generally expressed in terms of the covariance between the variables, as covariance provides a measure of how variables vary together. For our example data, the sample covariance between temperature and pressure, denoted as S]2, is computed using the formula
where x\ represents the temperature component of the observation vector and x% represents the pressure component. The number of sample points is given by n. The value of the sample covariance as computed from the temperature-pressure data set is 18.75. Also needed in the computation of the statistical distance is the sample variance of the individual variables. The sample variance of a variable, x,
Figure 1.4: Scatter plot illustrating statistical distance. is given by
The sample variances for temperature and pressure as determined from the example data are 22.67 and 17.33, respectively. Using the value of the covariance, and the values of the sample variances and the sample means of the variables, the squared statistical distance, (SD)2, is computed using the formula
where R = S12/S1S2is the sample correlation coefficient The actual SD value is obtained by taking the principal square root of both sides of (1.2). Since (1.2) is the formula for an ellipse, the SD is sometimes referred to as elliptical distance (in contrast to straight-line distance). It also has been labeled Mahalanobis's distance, or Hotelling's T 2 , or simply T2. The concept of statistical distance is explored in more detail in Chapter 2. Calculating the (SD)2 for each of the four points in our temperature-pressure sample produces the following results;
From this analysis it is concluded that our four data points are the same statistical distance from the mean point. This result is illustrated graphically in Figure 1.4. All four points satisfy the equation of the ellipse superimposed on the plot. From a visual perspective, this result appears to be unreasonable. It is obvious that points 1 and 4 are closer to the mean point in Euclidean distance than points 2 and 3. However, when the differences in the variation of the variables and the
1.4. Characteristics of a Multivariate Control Procedure
relationships between the variables are considered, the statistical distances are the same. The multivariate control procedures presented in this book are developed using methods based on the above concept of statistical distance.
1.4
Characteristics of a Multivariate Control Procedure
There are at least five desirable characteristics of a multivariate control procedure. These include the following: 1. The monitoring statistic should be easy to chart and helpful in identifying process trends. 2. When out-of-control points occur, it must be easy to determine the cause in terms of the contributing variables. 3. The procedure must be flexible in application. 4. The procedure needs to be sensitive to small but consistent process changes. 5. The procedure should be capable of monitoring the process both on-line as well as off-line. A good charting method not only allows for quick signal detection but also helps in identifying process trends. By examining the plotted values of the charting statistic over time, process behavior is observed and upset conditions are identified in advance of a signal, i.e., an out-of-control point. For a clear understanding of the control procedure, interpretation needs to be global and not isolated to a particular data set. The same must be true for the detection of variable conditions that are precursors to upset or chaotic situations. A control procedure having this ability is a valuable asset. A control procedure should work with both independent and time-dependent process observations and be applicable to both continuous processes and batch processes. It should be flexible enough for use with various forms of the control statistic, such as a sample mean or an individual observation, and work with different estimators of the internal structure of the variables. Most industries are volume-oriented. Small changes in efficiency can be the difference in creating a profit or generating a loss. Sensitivity to small process changes is a necessary component of any multivariate control procedure. Multivariate procedures are computationally intense. This is a necessary component since industrial processes are steadily moving toward total computerization. Recent technological advances in industrial control procedures have greatly improved the quantity and quality of available data. The use of computing hardware, such as electronic data collectors, facilitates the collection of data on a multitude of variables from all phases of production. In many situations, one may be working with a very large number of variables and thousands of observations. These are the data available to a statistical control procedure. Any control procedure must be programmable and able to interface and react with such collected online data.
10
Charting with the T2 Statistic

Although many different multivariate control procedures exist, it is our belief that a control procedure built on the T2 statistic possesses all the above characteristics. Like many multivariate charting statistics, the T2 is a univariate statistic. This is true regardless of the number of process variables used in computing it. However, because of its similarity to a univariate Shewhart chart, the T2 control chart is sometimes referred to as a multivariate Shewart chart. This relationship to common univariate charting procedures facilitates the understanding of this charting method. Signal interpretation requires a procedure for isolating the contribution of each variable and/or a particular group of variables. As with univariate control, outof-control situations can be attributed to individual variables being outside their allowable operational range; e.g., the temperature is too high. A second cause of a multivariate signal may be attributed to a fouled relationship between two or more variables; e.g., the pressure is not where it should be for a given temperature reading. The signal interpretation procedure covered in this text is capable of separating a T2 value into independent components. One type of component determines the contribution of the individual variables to a signaling observation, while the other components check the relationships among groups of variables. This procedure is global in nature and not isolated to a particular data set or type of industry. The T2 statistic is one of the more flexible multivariate statistics. It gives excellent performance when used to monitor independent observations from a steadystate continuous process. It also can be based on either a single observation or the mean of a subgroup of n observations. Minor adjustments in the statistic and its distribution allow the movement from one form to the other. Many industrial processes produce observations containing a time dependency. For example, process units with a decaying cycle often produce observations that can be modeled by some type of time-series function. The T2 statistic can be readily adapted to these situations and can be used to produce a time-adjusted statistic. The T2 statistic also is applicable to situations where the time correlation behaves as a step function. We have experienced no problems in applying the T2 statistic to batch or semibatch processes with targets specified or unspecified. In the case of target specification, the T2 statistic measures the statistical distance the observed value is from the specified target. In cases where rework is possible, such as blending, components of the T2 decomposition can be used in determining the blending process. Sensitivity to small process change is achieved with univariate control procedures, such as Shewhart charts, through applications of zonal charts with run rules. Small, consistent process changes in a T2 chart can be detected by using certain components of the decomposition of a T2 statistic. This is achieved by monitoring the residual error inherent to these terms. The detection of small process shifts is so important that a whole chapter of the text is devoted to this procedure. An added benefit of the T2 charting procedure is the potential to do on-line experimentation that can lead to local optimization. Because of the demand of
1.5. Summary
11
production quotas, the creation of dangerous and hazardous conditions, extended upset recovery periods, and numerous other reasons, the use of experimental design is limited in most production units. However, one can tweak the process. Monitoring of the appropriate residual terms allows one to observe the effect of this type of experimentation almost instantaneously. In addition, the monetary value of process changes, due to new equipment or operational procedures, can be quickly determined. This aspect of a T2 control procedure has proved invaluable in many applications. Numerous software computer programs are available for performing a variety of univariate SPC procedures. However, computer packages for doing multivariate SPC are few in number. Some, such as SAS, can be useful but require individual programming. Others, such as JMP, a product of SAS, Inc., provide only limited multivariate SPC. The program QualStat, a product of InControl Technologies, Inc., contains a set of procedures based entirely on the T2 statistic. This program is used extensively in this book to generate the T2 graphs and perform the T2 analyses.
1.5
Summary
Industrial process control generally involves monitoring a set of correlated variables. Such correlation confounds the interpretation of univariate procedures run on individual variables. One method of overcoming this problem is to use a Hotelling's T2 statistic. As demonstrated in our discussion, this statistic is based on the concept of statistical distance. It consolidates the information contained in a multivariate observation to a single value, namely, the statistical distance the observation is from the mean point. Desirable characteristics for a multivariate control chart include ease of application, adequate signal interpretation, flexibility, sensitivity to small process changes, and available software to use it. One multivariate charting procedure that possesses all these characteristics is the method based on the T2 statistic. In the following chapters of this book, we explore the various properties of the T2 charting procedure and demonstrate its value.
Chapter 2
Basic Concepts about the 72 Statistic
2.1
Introduction
Some fundamental concepts about the T2 statistic must be presented before we can discuss its usage in constructing a multivariate control chart. We begin with a discussion of statistical distance and how it is related to the T2 statistic. How statistical distance differs from straight-line or Euclidean distance is an important part of the coverage. Included also is a discussion of the relationship between the univariate Student t statistic and its multivariate analogue, the T2 statistic. The results lead naturally to the understanding of the probability functions used to describe the T2 statistic under a variety of different circumstances. Having knowledge of these distributions aids in determining the UCL value for a T2 chart, as well as the corresponding false alarm rate.
2.2
Statistical Distance
Hotelling (1947), in a paper on using multivariate procedures to analyze bombsight data, was among the first to examine the problem of analyzing correlated variables from a statistical control perspective. His control procedure was based on a charting statistic that he had introduced in an earlier paper (i.e., Hotelling (1931)) on the generalization of the Student t statistic. The statistic later was named in his honor as Hotelling's T 2 . Slightly prior to 1931, Mahalanobis (1930) proposed the use of a similar statistic, which would later become known as Mahalanobis's distance measure, for use in measuring the squared distance between two populations. Although the two statistics differ only by a constant value, the T2 form is the most popular in multivariate process control and is the main subject of this text. The following discussion provides insight into how the concept of statistical distance, as defined with the T2 statistic, is used in the development of multivariate
13
14
Chapter 2. Basic Concepts about the T2 Statistic
control procedures. The reader unfamiliar with vectors and matrices may find the definitions and details given in this chapter's appendix (section 2.8) to be helpful in understanding these results. Suppose we denote a multivariate observation on p variables in vector form as X' = (x,x2,..., xp}. Our main concern is in processing the information available on these p variables. One approach is to use graphical techniques, which are usually excellent for this task, but plotting points in a pdimensional space (p > 3) is severely limited. This restriction inhibits overall viewing of the multivariate situation. Another method for examining information provided in a ^-dimensional observation is to reduce the multivariate data vector to a single univariate statistic. If the resulting statistic contains information on all p variables, it can be interpreted and used in making decisions as to the status of a process. There are numerous procedures for achieving this result, and we demonstrate two of them below. Suppose a process generates uncorrelated bivariate observations, (xi,:^), and it is desired to represent them graphically. It is common to construct a twodimensional scatter plot of the points. Also, suppose there is interest in determining the distance a particular point is from the mean point. The distance between two points is always measured as a single number or value. This is true regardless of how many dimensions (variables) are involved in the problem. The usual straight-line (Euclidean) distance measures the distance between two points by the number of units that separate them. The squared straight-line distance, say D, between a point (1,22) and the population mean point (//i,/^) is defined as Note that we have taken the bivariate observation, (xi,x 2 ), and converted it to a single number D, the distance the observation is from the mean point. If this distance, D, is fixed, all points that are the same distance from the mean point can be represented as a circle with center at the mean point and a radius of D (i.e., see Figure 2.1). Also, any point located inside the circle has a distance to the mean point less than D. Unfortunately, the Euclidean distance measure is unsatisfactory for most statistical work (e.g., see Johnson and Wichern (1998)). Although each coordinate of an observation contributes equally to determining the straight-line distance, no con sideration is given to differences in the variation of the two variables as measured by their variances, a\ and cr|, respectively. To correct this deficiency, consider the standardized values
and all points satisfying the relationship
The value SD, the square root of (SD)2 in (2.1), is known as statistical distance. For a fixed value of SD, all points satisfying (2.1) are the same statistical distance
2.2. Statistical Distance
15
Figure 2.1: Region of same straight-line distance.
from the mean point. The graph of such a group of points forms an ellipse, as is illustrated in the example given in Figure 2.2. Any point inside the ellipse will have a statistical distance less than SD, while any point located outside the ellipse will have a statistical distance greater than SD. In comparing statistical distance to straight-line distance, there are some major differences to be noted. First, since standardized variables are utilized, the statistical distance is dimensionless. This is a useful property in a multivariate process since many of the variables may be measured in different units. Second, any two points on the ellipse in Figure 2.2 have the same SD but could have possibly different Euclidean distances from the mean point. If the two variables have equal variances and are uncorrelated, the statistical and Euclidean distance, apart from a constant multiplier, will be the same; otherwise, they will differ. The major difference between statistical and Euclidean distance in Figure 2.2 is that the two variables used in statistical distance are weighted inversely by their standard deviations, while both variables are equally weighted in the straight-line distance. Thus, a change in a variable with a small standard deviation will contribute more to statistical distance than a change in a variable with a large standard deviation. In other words, statistical distance is a weighted straight-line distance where more importance is placed on the variable with the smaller standard deviation to compensate for its size relative to its mean. It was assumed that the two variables in the above discussion are uncorrelated. Suppose this is not the case and that the two variables are correlated. A scatter plot of two positively correlated variables is presented in Figure 2.3. To construct a statistical distance measure to the mean of these data requires a generalization of (2.1).
16
Figure 2.2: Region of same statistical distance.
Figure 2.3: Scatter plot of correlated variables. From analytical geometry, the general equation of an ellipse is given by
where the a^- are specified constants satisfying the relationship (a^2 4ana22) < 0, and c is a fixed value. By properly choosing the a^ in (2.2), we can rotate the
2.3. T2 and Multivariate Normality
17
Figure 2.4: Elliptical region encompassing data points.
ellipse while keeping the scatter of the two variables fixed, until a proper alignment is obtained. For example, the ellipse given in Figure 2.4 is centered at the mean of the two variables vet rotated to reflect the correlation between them.
2.3
72 and Multivariate Normality
Suppose (xi, #2) can be described jointly by a bivariate normal distribution. Under this assumption, the statistical distance between this point and the mean vector (//i, Hz] is the value of the variable part of the exponent of the bivariate normal probability function
where oo < x^ < oo for i = 1,2, and Oi > 0 represents the standard deviation of Xi. The value of (SD) 2 is given by
where p represents the correlation between the two variables, with 1 < p < I. The cross-product term between x\ and x^ in (2.4) accounts for the fact that the two variables vary together and are dependent. When x\ and x^ are correlated, the major and minor axes of the resulting ellipse differ from that of the variable space (,TI, x%). If the correlation is positive, the ellipse will tilt upward to the right, and if the correlation is negative, the ellipse will tilt downward to the right. This
18
Figure 2.5: Correlation and the ellipse.
is illustrated in Figure 2.5. If p = 0, so that there is no correlation between x\ and #2, the ellipse will be oriented similar to the one given in Figure 2.2. Equation (2.4) can be expressed in matrix notation (see section 2.8.5) as
where X' = (1,2:2), // = (//i, ^2), and E
a\ ai2 L = po\a-2 where is the covariance between <7i 2:1 and THAT E 2 = cr2i x-2- The matrix E is referred to as the covariance matrix between x\ and 2:2- The expression in (2.5) is a form of Hotelling's T2 statistic. Equations for the contours of a bivariate normal density are obtained by fixing the value of SD in (2.4). This can be seen geometrically by examining the bivariate normal probability function presented in Figure 2.6. The locus, or path, of the point X' = (2:1,2:2) traveling around the probability function at a constant height is an ellipse. Ellipses of constant density are referred to as contours and can be determined mathematically to contain a fixed amount of probability. For example, the 75% and 95% contours for the bivariate normal function illustrated in Figure 2.6 are presented in Figure 2.7. The elliptical contours represent all points having the same statistical distance or T2 statistic value.
is the inverse of the matrix E. Note
2.3. T2 and Multivariate Normality
19
Figure 2.6: A bivariate normal probability function (a\ = 1, cr2 = 1, p = 0.8).
Figure 2.7: Bivariate normal contours containing 75% and 95% of the probability.
20
This result can be generalized to the situation where X' (xi, 2, , %p) is described by the p-variate normal (multivariate normal (MVN)) probability function given by
where oo < Xi < oo for i = 1,2, . . . , > . The mean vector of X' is given by //' = (//I, //2, , (J>P) and the covariance matrix is given by
A diagonal element, an, of the matrix E represents the variance of the ith variable, and an off-diagonal element, a^, represents the covariance between the ith and jth variables. Note that E is a nonsingular, symmetric, and positive definite matrix. In this setting, the equation for an ellipsoidal contour of the MVN distribution in (2.6) is given by
where T2 is a form of Hotelling's T2 statistic. As in the bivariate case, the ellipsoidal regions contain a fixed percentage of the MVN distribution and can be determined exactly.
2.4
Student t versus Hotelling's T"2
The univariate Student t statistic is very familiar to most data analysts. If it is to be compared to a Student t distribution, the statistic is computed from a random sample of n observations taken from a population having a normal distribution with mean /j, and variance a2. Its formula is given by
where
is the sample mean and
is the
corresponding sample standard deviation. The square of the t statistic is given by
and its value is defined as the squared statistical distance between the sample mean and the population mean.
2.4. Student t versus He-telling's T2
21
The numerator of (2.9) is the squared Euclidean distance between x and //. Thus, it is a measure of the closeness of the sample mean to the population mean. As x gets closer to //, the value of t2 approaches zero. Division of the squared Euclidean distance by the estimated variance of x (i.e., by s2/n) produces the squared statistical distance. Hotelling (1931) extended the univariate t statistic to the multivariate case using a form of the T2 statistic based on sample estimates (rather than known values) of the covariance matrix. His derivation is described as follows. Consider a sample of n observations X\, X^, , Xn, where X[ = (xn,Xi2,..., ZIP), i = 1, 2 , . . . , n, is taken from a p-variate normal distribution having a mean vector // and a covariance matrix E. A multivariate generalization of the t2 statistic is given by where X and S are sample estimators of /u and E and are denned by
and
The sample covariance matrix S also can be expressed as
where sa is the sample variance of the ith variable and s^- is the sample covariance between the ith and jth variables. The matrix S has many special properties. Those properties that pertain to our use of the T2 as a control statistic for multivariate processes are discussed in later sections. In terms of probability distributions, the square of the t statistic in (2.9) has the form t 2 = (normal random variable) * (chi-square random variable/dj)~ 1 * (normal random variable), where df represents the n I degrees of freedom of the chi-square variate, (n I)s 2 /cr 2 , and the normal random variable is given by ^/n(x /u)/cr. In this representation, the random variable, x, and the random variable, s 2 . are statistically independent. Similarly, the T2 statistic in (2.10) may be expressed as T2 (multivariate normal vector) * (Wishart matrix/df) * (multivariate normal vector), where df indicates the n~l degrees of freedom of the Wishart variate, (n1)5, and
22
the multivariate normal vector is given by ^/n(X ^). The random vector X and the random matrix S are statistically independent. The Wishart distribution (see section 2.8.6 for details) in (2.12) is the multivariate generalization of the univariate chi-square distribution. Using the two forms presented in (2.10) and (2.12), it is possible to extend Hotelling's T2 statistic to represent the squared statistical distance between many different combinations of p-dimensional points. For example, one can use the T2 statistic to find the statistical distance between an individual observation vector X and either its known population mean // or its population mean estimate X. Hotelling's T2 also can be computed between a sample mean, Xi, of a subgroup and the overall mean, X, of all the subgroups.
2.5
Distributional Properties of the T2
One basic assumption preceding any discussion of the distribution properties of a Hotelling's T2 statistic is that the multivariate observations involved are the result of a random sampling of a p-variate normal population having a mean vector \JL and a covariance matrix E. Thus, the behavior of the independent observations can be described by a probability function with parameters either known or unknown. If the parameters are unknown, it will be assumed that there exists a historical data set (HDS) which was collected under steady-state conditions when the process was in control. This data set is used to produce estimates of the unknown parameters. Our work requires that we transform these p-variate sample observations to a single Hotelling's T2 statistic. Since the original variables are random, these new T2 values also are random and can be described by an appropriate probability function. For example, when the parameters of the underlying multivariate normal distribution are unknown and must be estimated, some form of a univariate F distribution is used in describing the random behavior of the T2 statistic. This is also applicable in the univariate case. If the t statistic in (2.8) can be described by a t distribution with (n 1) degrees of freedom, the square of the statistic, t2 in (2.9), can be described by an F distribution with 1 and (n 1) degrees of freedom. The T2 statistic may be computed using a single observation made on p components at a fixed sampling point, or it may be computed using the mean of a sample of size m taken during a fixed time period. Unless otherwise stated in this book, a subgroup of size 1 (i.e., a single observation) will be assumed for the T2 computations. When necessary, stated results will be modified to include the case where the subgroup size exceeds 1. Particular assumptions that govern the distribution of the T2 statistic are separated into two major cases: the parameters IJL and of the underlying distribution being either known or unknown. The second case, when the parameters must be estimated, also has two different situations. The first occurs when an observation vector X is independent of the estimates of the parameters. Independence will occur when X is not included in the computation of X and S1, the sample estimates of IJL and S. The second situation occurs when X is included in the computation of the estimates and hence is not independent of them.
2.5. Distributional Properties of the T2
23
Figure 2.8: Chi-square distribution. Several different probability functions can be used in describing the T2 statistic (e.g., see Fuchs and Kenett (1998); or Wierda (1994)). Three key forms are discussed below along with the conditions in which each is applicable. (1) Assume the parameters, [i and S, of the underlying MVN distribution are known. The T2 statistic for an individual observation vector X has the form and distribution given by
where %2 ^ represents a chi-square distribution with p degrees of freedom. The T2 distribution depends only on p, the number of variables in the observation vector X. Graphs of the chi-square distribution function, /(^ 2 ), for various values of p are presented in Figure 2.8, and percentage points of the chi-square distribution are given in Table A.3 in the appendix. For smaller values of p, we observe a skewed distribution with a long tail to the right; a more symmetric form is observed for larger values of p. The chi-square probability function provides the probability distribution of the T2 values along this axis. (2) Assume the parameters of the underlying MVN distribution are unknown and are estimated using the estimators X and S given in (2.11). These values are obtained using an HDS consisting of n observations. The form and distribution of the T2 statistic for an individual observation vector X, independent of X and /S, is
24
Figure 2.9: F distribution. given as
where F(p^n_p^ is an F distribution with p and (n p) degrees of freedom. The distribution in (2.14) depends on the sample size of the HDS as well as on the number of variables being examined. Graphs of the F distribution, f ( F ] , for various values of the numerator and denominator degrees of freedom are presented in Figure 2.9, and the percentage points of the F distribution are given in Table A.4 in the appendix. Again, we observe a skewed distribution with a long tail to the right. (3) Assume the observation vector X is not independent of the estimators X and S but is included in their computation. In this situation, the form and distribution of the T2 statistic (e.g., see Tracey, Young, and Mason (1992)) is given as
where -B(p/2,(n-P-i)/2) represents a beta distribution with parameters p/2 and (n p l)/2. The distribution in (2.15) depends on the number of variables, p, and on the sample size, n, of the HDS. Of the above three probability functions used in describing the random behavior of a T2 statistic, the beta distribution is probably the one that is most unfamiliar to analysts. Unlike the chi-square and F distributions that allow evaluations for any variable values greater than zero, the beta distribution f(B] restricts beta values
2.5. Distributional Properties of the T2
25
Figure 2.10: Beta distribution. to the unit interval (0,1). However, within this interval, the distribution can take on many familiar shapes, such as those associated with the normal, chi-square, and F distributions. Examples of the beta distribution for various parameter values are depicted in Figure 2.10, and percentage points of the beta distribution are given in Table A.5 in the appendix. It was stated earlier that the distribution used in describing a T2 statistic when the parameters of the underlying normal distribution are unknown is some form of an F distribution. However, in (2.15) we have used the beta distribution to describe the T2 statistic. The T 2 , for this case, can be expressed as an F statistic by using a relationship that exists between the F and beta probability functions. The result is given by
where In practice, we generally choose to use the beta distribution rather than the F distribution in (2.16). Although this is done to emphasize that the observation vector X is not independent of the estimates obtained from the HDS, either distribution is acceptable. Since each T2 value obtained from the HDS depends on the same value of X and 5, a weak interdependence among the T2 values is produced. The correlation between any two T2 values computed from an HDS is given as l/(n 1) (see Kshirsagar and Young (1971)). It is easily seen that even for modest values of n, this correlation rapidly approaches zero. Although this is not justification for
26
assuming independence, it has been shown (see David (1970) and Hawkins (1981)) that, as n becomes large, the set of T2 values behaves like a set of independent observations. This fact becomes important when subjecting the T2 values of an HDS to other statistical procedures. Process control in certain situations is based on the monitoring of the mean of a sample (i.e., subgroup) of m observations taken at each of k sampling intervals. The distribution that describes the statistical distance between the sample mean of the ith observation vector Xi and the HDS mean X is given by
where F(p,n-p) represents an F distribution with p and (n p) degrees of freedom. This distribution, which assumes independence between Xi and the parameter estimates, depends on the sample size m, the size of the HDS, and the number of variables in the observation vector, Xi. Although the estimator of S given in (2.11) is used in the statistic of (2.17), it is more common to insert a pooled estimate of 5 given by
where Si represents the sample covariance estimate for the data taken during the ith sampling period, i = 1, 2 , . . . , k. With this estimate, the form and distribution of the T2 statistic becomes
2.6
Alternative Covariance Estimators
The T2 is a versatile statistic as it can be constructed with covariance estimators other than the common estimator S given in (2.11). An example is given in the formula in (2.19), where the pooled estimator Sw is used in the construction of the T2 statistic for monitoring subgroup means. Several other estimators of the covariance matrix and the associated T2 statistic also are available. For example, Holmes and Mergen (1993) as well as Sullivan and Woodall (1996), labeled S&W below, presented an estimator based on the successive difference of consecutive observation vectors (order of occurrence) in computing a covariance estimator when only individual observations are available. The estimator, SD, is given as
The distribution of a T2 statistic using SD is unknown. However, S&W provided the following approximation:
2.6. Alternative Covariance Estimators where
27
is based on a result given in Scholz and Tosch (1994). Note that the formula in (2.21) contains a correction to the expression given in the SfeW article. In selected situations, the statistic in (2.21) serves as an alternative to the common T2 in detecting step and ramp shifts in the mean vector. Other covariance estimators have been constructed in a similar fashion by partitioning the sample in different ways. For example, Wierda (1994) suggested forming a covariance estimator by partitioning the data into independent, nonoverlapping groups of size 2. Consider a sample of size n, where n is even. Suppose group 1 = {Xi,X2}, group2 = {X 3 ,X 4 },..., group(n/2) = {Xn-i,Xn}. The estimated covariance matrix for each group C{, i = 1, 2 , . . . , (n/2), is
and the partition-based covariance estimator is given by
S&W presented a simulation study of five alternative estimators of the covariance matrix and compared the power of the corresponding T2 chart in detecting outliers as well as in detecting step and ramp shifts in the mean vector. Included in the comparisons were the common estimator S, given in (2.11), and the above Sp. These authors showed that, with 5, the T2 chart for individual data is not effective in detecting step shifts near the middle of the data. In such situations, the chart is biased, meaning that the signal probability with out-of-control data is less than with in-control data. Accordingly, S&W recommended the covariance estimator, S1^, given in (2.20), for retrospective analysis of individual observations, and they showed that T2 charts based on SD are more powerful than ones based on Sp. They also showed that the T2 chart based on S was the most effective among those studied in detecting outliers, but was susceptible to masking with a moderate number of outliers. In a follow-up study, Sullivan and Woodall (2000) gave a comprehensive method for detecting step shifts or outliers as well as shifts in the covariance matrix. Chou, Mason, and Young (1999) conducted a separate power analysis of the T2 statistic in detecting outliers using the same five covariance estimators as used by S&W. They too showed that the common estimator, $, is preferred for detecting outliers. Although we have presented a brief description of some alternative covariance estimators, our applications of the T2 statistic in future chapters will rely primarily on usage of the common estimator S. The major exception is in the use of the within-group estimator Sw in the application of the T2 to batch processes (see Chapter 11). We do this for several reasons. In many industrial settings, ramp changes, step changes, and autocorrelation may be inherent qualities of the process. The variation associated with observations taken from such a process, even when the
28
process is in-control, does not represent random fluctuation about a constant mean vector. As we will see later, a multivariate control procedure can accommodate such systematic variations in the mean vector. The common estimator S captures the total variation, including the systematic variation of the mean, whereas many of the above-mentioned alternative estimators estimate only random variation, i.e., stationary variation. Thus, these alternative estimators require additional modeling of the systematic variation to be effective. With autocorrelation, the T2 charts based on S are less likely to signal, either when the data are out-of-control or in-control, unless similar additional modeling is performed (i.e., see Chapter 10). When data are collected from an MVN distribution, the common covariance estimator has many interesting properties. For example, 5 is an unbiased estimator and is the maximum likelihood estimator. The probability function that describes 5 is known, and this is important in deriving the distribution of the T2 statistic. Another important observation on S that is useful for later discussion is that its value is invariant to a permutation of the data. Thus, the value of the estimator is the same regardless of which one of the many possible arrangements of the data, Xi, X-2-, , Xn, is used in its computation.
2.7
Summary
In this chapter, we have demonstrated the relationship between the univariate t statistic and Hotelling's T2 statistic. Both are shown to be a measure of the statistical distance between an observed sample mean and its corresponding population mean. This concept of statistical distance, using the T2 statistic, was expanded to include the distance a single observation is from the population mean (or its sample estimate) and the distance a subgroup mean is from an overall mean. With the assumption of multivariate normality, we presented several probability functions used to describe the T2 statistic. This was done for control procedures based on the monitoring of a single observation or the mean of a subgroup of observations. Since there are many occasions in which the T2 statistic is slightly modified to accommodate a specific purpose, we will continue to introduce appropriate forms of the T2 statistic and its accompanying distribution. The ability to construct a Hotelling's T2 for these different situations adds to its versatility as a useful tool in the development of a multivariate control procedure.
2.8
Appendix: Matrix Algebra Review
In Chapter 1, we introduced the T2 statistic for two variables in an algebraic form given by (1.2). As the number of variables increases, the algebraic form of the T2 becomes intractable and we must resort to a simpler form of the T2 using matrix notation. Many of the properties of the T2 statistic are related to the properties of the matrices that compose this form. As an aid to understanding Chapter 2, we offer a brief matrix review. Our presentation includes only those matrix properties that we deem necessary for a clear understanding of the matrix form of the T2
2.8. Appendix: Matrix Algebra Review
29
statistic. It is assumed that the background of the reader includes knowledge of basic matrix operations. For additional information on this subject, the interested reader is directed to the many textbooks written on matrix algebra (e.g., see Agnew and Knapp (1995)).
2.8.1
Vector and Matrix Notation
An (r x c) matrix is a rectangular array of elements having r rows and c columns. For example, the (2 x 3) matrix A has 2 rows and 3 columns and is given by
where the a^ are constant values. Vectors are matrices with either one row a row vector) or one column (a column vector). Consider a multivariate process involving p process variables. We denote the first process variable as #1, the second as #2, , and the pth process variable as xp. A simple way of denoting an observation (at a given point in time) on all process variables is by using a (p x 1) column vector X, where
The p process variables in X define a p-dimensional variable space, one dimension for each process variable. An observation on the p process variables translates to a point in this p-dimensional space. The transpose of a matrix is obtained by taking each of its columns and making a corresponding row from them. Thus, the first column of a matrix becomes the first row in the transpose, the second column becomes the second row in the transpose, etc. If A denotes the vector, then A' will denote the transpose. For example, the transpose of X is a (1 x p) row vector that is given by
2.8.2
Data Matrix
A sample of n observation vectors on p process variables is designated as X\, X%,..., Xn. The sample mean vector is obtained by
30
The information contained in the sample can be arranged in an (n x p] data matrix given by
Another important form of the data matrix is achieved by subtracting the mean vector from each observation vector. This form is given as
The sample covariance matrix S that is necessary in computing a T2 statistic is computed from the data matrix using
2.8.3 The Inverse Matrix The inverse of a square (p x p} matrix A is defined to be the matrix, A~1 that satisfies where / is the (p x p} identity matrix given by
The inverse matrix exists only if the determinant of A is nonzero. This implies that the matrix A must be nonsingular. If the inverse does not exist, the matrix A is singular. Sophisticated computer algorithms exist to compute accurately the inverses of large matrices and are contained in most computer packages used in the analysis of multivariate data. 2.8.4 Symmetric Matrix
Numerous references are made to symmetric matrices throughout this text. A symmetric matrix A is one that is equal to its transpose, i.e.,
2.8. Appendix: Matrix Algebra Review
31
The definition dictates that the matrix A must be a square matrix, so that the number of rows equals the number of columns. It also implies that the off-diagonal elements of the matrix A, denoted by a^-, i ^ j, are equal, i.e., that
As an example, the sample covariance matrix S is a symmetric matrix, since,
2.8.5
Quadratic Form
Let X' = (#1,2:2) and
and consider the quadratic form expression given by
Performing the matrix and vector multiplication produces the following univariate expression: With the assumption that A is a symmetric matrix, so that 0-12 = 021, the above expression can be written as
In this form, it is easy to see the relationship between the algebraic expression and the matrix notation for a quadratic form. The matrix A of the quadratic form is defined to be positive definite if the quadratic expression is larger than zero for all nonzero values of the vector X. To demonstrate this procedure, consider the quadratic form in three variables given by This expression is written in matrix notation as X'AX, where X' = (2:1, 2,2:3) and
Consider the algebraic expression for a T2 statistic given in (2.4) as
32
With a little manipulation, the above T2 can be written as the Quadratic form
where X' = (xi,X2) and // = (^1,^2)- The inverse of the matrix S, is given by
where a\i = a<i\ pa\ai is the covariance between x\ and #2, p is the corresponding correlation coefficient, and ai is the square root of an, i = 1,2.
2.8.6
Wishart Distribution
When data are collected from an MVN distribution, the (p x p) random matrix, (n 1)5, where 5 is defined by
is labeled a Wishart matrix. This name comes from Wishart (1928), who generalized the joint distribution of the p(p + l)/2 unique elements of this matrix. This result is predicated on the assumption that the original random sample of n observation vectors is obtained from an NP(/J,, E), so that each Xi ~ NP(^L, E) for i = 1, 2 , . . . , n. It is also assumed that the matrix 5 is a symmetric matrix. The Wishart probability function is given as
The matrix S is positive definite and F(.) is the gamma function (e.g., see Anderson (1984)). Unlike the MVN distribution, the Wishart density function has very little use other than in theoretical derivations (e.g., see Johnson and Wichern (1998)). For the case of p = 1, it can be shown that the Wishart distribution reduces to a constant multiple of a univariate chi-square distribution. It is for this reason that the distribution is thought of as being the multivariate analogue of the chi-square distribution.
Chapter 3
Checking Assumptions for Using a T2 Statistic
3.1
Introduction
As was indicated in Chapter 2, the distributions of various forms of the T2 statistic are well known when the set of p-dimensional variables being sampled follows an MVN distribution. The MVN assumption for the observation vectors guarantees the satisfaction of certain conditions that lead to the known T2 distributions. However, validating this assumption for the observation vectors is not an easy task. An alternative approach for use with nonnormal distributions is to approximate the sampling distribution of the T2. In this chapter, we take this latter approach by seeking to validate only the univariate distribution of the T2 statistic, rather than the MVN distribution of the observation vectors. There are a number of other basic assumptions that must be made and requirements that must be met in order to use the T2 as a control statistic. These conditions include: (1) selecting a sample of independent (random) observations, (2) determining the UCL to use in signal detection, (3) collecting a sufficient sample size, and (4) obtaining a consistent estimator of the covariance matrix for the variables. In this and later chapters, we discuss these assumptions and requirements and show how they relate to the T2 statistic. We also demonstrate techniques for checking their validity and offer alternative procedures when these assumptions cannot be satisfied.
3.2
Assessing the Distribution of the T2
In univariate process control, if the probability function of the control chart statistic is assumed to be a normal distribution, the assumption can be verified by performing a goodness-of-fit test. There are numerous formal tests available for this task, including the Shapiro-Wilk test, the Kolmogorov-Smirnov test, and the
33
34
Chapter 3. Checking Assumptions for Using a T2 Statistic
Anderson-Darling test. These procedures have been documented in numerous statistical textbooks (e.g., see Anderson, Sweeney, and Williams (1994)), and the corresponding test statistics and associated probability functions are readily defined and tabulated. Many different tests have been proposed for assessing multivariate normality for a group of p-variate observations (e.g., for a comprehensive list and discussion, see Seber (1984) and Looney (1995)). These include such familiar ones as those based on the multivariate measures of skewness and kurtosis. Unfortunately, the exact distribution of the test statistic is unknown in these tests. Without this knowledge the critical values used for the various test procedures can only be approximated. Hence, these procedures serve solely as indicators of possible multivariate normality. In view of these results, we take a different approach. Although, for these discussions, we restrict our attention to the case where a baseline data set is being utilized, the findings can easily be extended to the monitoring of future observations. The assumption of multivariate normality guarantees known distribution results for the T2 statistic (e.g., see Tracey, Young, and Mason (1992)). In a baseline data set, when the p-dimensional observation vectors X are distributed as an MVN, the T2 values follow a beta distribution, so that where the T2 statistic is based on the formula given in (2.15). Since the T2 in (3.1) is a univariate statistic with a univariate distribution, we propose performing a goodness-of-fit test on its values to determine if the beta is the appropriate distribution, rather than on the values of X to determine if the MVN is the correct distribution. Although observations taken from an MVN can be transformed to a T2 statistic having a beta distribution, it is unknown whether other multivariate distributions possess this same property. The mathematics for the nonnormal situations quickly becomes intractable. However, we do know that the beta distribution obtained under MVN theory provides a good approximation for some nonnormal situations. We illustrate this phenomenon below using a bivariate example. We begin by generating 1,000 standardized bivariate normal observations having a correlation of 0.80. A scatter plot of these observations is presented in Figure 3.1. For discussion purposes, three bivariate normal contours (i.e., equal altitudes on the surface of the distribution) at fixed T2 values are superimposed on the data. These also are illustrated on the graph. Note the symmetrical dispersion of the points between the concentric T2 ellipses. The concentration of points diminishes from the center outward. Using 31 contours to describe the density of the 1,000 observations, and summing the number of observations between these contours, we obtain the histogram of the T2 values presented in Figure 3.2. The shape of this histogram corresponds to that of a beta distribution. In contrast to the above example, a typical scatter plot of a nonnormal bivariate distribution is presented in Figure 3.3. Note the shape of the plot. The observations in Figure 3.3 are generated from the observations in Figure 3.2 by truncating variable x\ at the value of 1. Distributions such as this occur regularly in industries where run limits are imposed on operational variables. Truncation, which produces
3.2. Assessing the Distribution of the T2
35
Figure 3.1: Bivariate normal scatter plot and contours.
Figure 3.2: Histogram of T2 values for bivariate normal data. long-tailed distributions, also occurs with the use of certain lab data. This can be due to the detection limit imposed by the inability of certain types of equipment to make determinations below (or above) a certain value. In Figure 3.4, three bivariate normal contours at fixed T2 values, computed using the mean vector and covariance matrix of the truncated distribution, are superimposed on the nonnormal data. A major distinction between the scatter plots in Figures 3.1 and 3.4 is the dispersion of points between the elliptical contours. For the bivariate normal data in Figure 3.1, the points are symmetrically dispersed between the contours. This is not the case for the nonnormal data in Figure 3.4.
36
Figure 3.3: Scatter plot of data from a truncated distribution.
Figure 3.4: Bivariate normal contours superimposed on truncated distribution.
For example, note the absence of points in the lower left area between the two outer contours. Nevertheless, possibly due to this particular pattern of points, or due to the size of the sample, the corresponding T2 histogram given in Figure 3.5 for this distribution has a strong resemblance to the histogram given in Figure 3.2. Agreement of this empirical distribution to a beta distribution can be determined by performing a univariate goodness-of-fit test.
3.4. The Sampling Distribution of the T2 Statistic
37
Figure 3.5: Histogram of T2 values for truncated data.
3.3
The T2 and Nonnormal Distributions
We do not want to imply from the above example that the beta distribution can be used indiscriminately to describe the T2 statistic. To better understand the problem, consider the bivariate skewed distribution, f(xi,x<2), illustrated in Figure 3.6. The contours of this density are presented in Figure 3.7. Observe the irregular shape and lack of symmetry of the contours; they definitely lack the elliptical shape of the bivariate normal contours given in Figure 3.1. The T2 statistic, when multiplied by n/(n I) 2 in a baseline data set, maps the observation vector X into a value on the unit interval (0,1). This is true regardless of the distribution of X. However, multivariate normality guarantees that these transformed values are distributed as a beta variable. This result is illustrated in the histogram of T2 values given in Figure 3.2. For a nonnormal multivariate distribution to produce the same result, the number of observations contained within the bins (i.e., between contours) of the T2 histogram must be close to that produced by the MVN. Some multivariate distributions appear to satisfy this condition, as was the case with the truncated normal example. What characteristics will these nonnormal distributions have in common with the MVN? The answer lies in investigating the sampling distribution of the T2 statistic.
3.4
The Sampling Distribution of the T2 Statistic
Consider a random sample of size n taken from a multivariate population with a finite mean vector JJL and covariance matrix E. When the assumption of multivariate normality is imposed, the sampling distribution of the T2 statistic can be shown to be a beta distribution when a baseline data set is used or, if appropriate, a
38
Chapter 3. Checking Assumptions for Using a T Statistic
Figure 3.6: Bivariate skewed distribution.
Figure 3.7: Contours of skewed distribution. chi-square or F distribution. However, when this MVN assumption is invalid, the sampling distribution of the T2 statistic must be approximated. The kurtosis of the underlying multivariate distribution plays an important role in approximating the sampling distribution of the T2 statistic. To understand this
3.4.
The Sampling Distribution of the T2 Statistic
39
role, consider first the kurtosis, denoted by 0:4, for a imivariate distribution with a known mean fj, and a known standard deviation a. The kurtosis is usually defined as being the expected value of the fourth standard moment, i.e.,
This is a measure of the "peakedness" or "heaviness" of the tail of a distribution. For example, the univariate bell-shaped normal distribution has a kurtosis value of 3. Using this value as a point of reference, the kurtosis values for other distributions can be compared to it. If a distribution has a kurtosis value that exceeds 3, it is labeled "peaked" relative to the normal, and if its kurtosis value is less than 3, it is labeled "flat" relative to the normal. For example, the univariate exponential distribution has a kurtosis value of 9, whereas the univariate uniform and beta distributions have kurtosis values less than 3. We conclude that the uniform and beta distributions are "flatter" than the normal distribution and that the exponential distribution is more "peaked." The gamma, lognormal, and weibull distributions have kurtosis values around 3 and thus are "peaked" similarly to the normal distribution. In a multivariate distribution, the kurtosis measure is closely related to the sampling distribution of the T2 statistic. Consider a ^-dimensional vector X, with a known mean vector /j, and a known covariance matrix E, and assume a T2 statistic is computed using this vector. The first moment of the T2 sampling distribution is given as E[T2} = p. The second moment, E [ ( T 2 ) 2 ] , can be expressed in terms of Mardia's kurtosis statistic, denoted as /?2,p (see Mardia, Kent, and Bibby (1979)), and associated with the multivariate distribution of the original observation vector X. The result is given as
When X follows an MVN distribution, Np(p,, ), the kurtosis value in (3.3) reduces to where T2 is based on the formula given in (2.13). For a sample of size n, taken from a p-dimensional multivariate distribution with an unknown mean vector and covariance matrix, the sample kurtosis, &2,p, is used to estimate /?2,p- The estimate is given by
where T2 is based on the formula given in (2.15). The relationship in (3.5) indicates that large T2 values directly influence the magnitude of the kurtosis measure. We can use the above results to relate the kurtosis value of a multivariate nonnormal distribution to that of an MVN distribution. As an example, consider two uncorrelated variables, (x\,x-2), having a joint uniform distribution represented by a unit cube. Both marginal distributions are uniform and have a kurtosis value of 1.8. Thus, these are "very flat" distributions relative to a univariate normal.
40
Chapter 3. Checking Assumptions for Using a T2 Statistic Table 3.1: Comparison of kurtosis values.
No. of Uniform Variates
1 2 3 4 5 6
No. of Exponential Variates

1 1 1 1 1 1
Kurtosis Value for Multivariate Nonnormal Distribution 12.8 18.6 26.4 36.2 48.0 61.8
Kurtosis Value for Multivariate Normal Distribution

8 15 24 35 48 63
This "flatness" carries over to the joint distribution of the two variables. Using (3.4), the kurtosis of the bivariate normal, with known parameters, is given by p(p + 2) = 2(4) = 8. The kurtosis value for the joint uniform is found by evaluating (3.3), where // = (0.5,0.5) and E is a diagonal matrix with entries of (1/12) on the diagonal. The value is calculated as 5.6. This implies that a bivariate uniform distribution is considerably "flatter" than a bivariate normal distribution. In contrast to the above, there are many combinations of distributions of p independent nonnormal variables that can produce a multivariate distribution that has the same kurtosis value as a p-variate normal. For example, consider a multivariate distribution composed of two independent variables: x\ distributed as a uniform (0,1) and x% distributed as an exponential (i.e., f ( x ) = e~x for x > 0 and zero elsewhere). Using (3.3), the kurtosis of this bivariate nonnormal distribution is 12.8. In comparison, the kurtosis of a bivariate normal distribution is 8. Thus, this distribution is heavier in the tails than a bivariate normal. However, suppose we keep adding another independent uniform variate to the above nonnormal distribution and observe the change in the kurtosis value. The results are provided in Table 3.1. As the number of uniform variables increases in the multivariate nonnormal distribution, the corresponding kurtosis value of the MVN distribution approaches and then exceeds the kurtosis value of the nonnormal distribution. Equivalence of the two kurtosis values occurs at the combination of five uniform variables and one exponential variable. For this combination, the tails of the joint nonnormal distribution are similar in shape to the tails of the corresponding normal. The result also implies that the T2 statistic based on this particular joint nonnormal distribution will have the same variance as the T2 statistic based on the MVN distribution. This example indicates that there do exist combinations of many (independent) univariate nonnormal distributions with the same kurtosis value that is achieved under an MVN assumption. For these cases, the mean and variance of the T2 statistic based on the nonnormal data are the same as for the T2 statistic based on the corresponding normal data. This result does not guarantee a perfect fit of the T2 sampling distribution to a beta (or chi-square or F) distribution, as this would require that all (higher) moments of the sampling distribution of the T2 statistic be identical to those of the corresponding distribution. However, such agreement of
3.5.
Validation of the T2 Distribution
41
the lower moments suggests that, in data analysis using a multivariate nonnormal distribution, it may be beneficial to determine if the sampling distribution of the T2 statistic fits a beta (or chi-square or F) distribution. If such a fit is obtained, the data can then be analyzed as if the MVN assumption were true.
3.5
Validation of the T2 Distribution
A popular graphical procedure that is helpful in assessing if a set of data represents a reference distribution is a quantile-quantile (Q-Q) plot (e.g., see Gnanadesikan (1977) or Sharma (1995)). Thus, this technique can be used in assessing the sampling distribution of the T2 statistic. We emphasize that the Q-Q plot is not a formal test procedure but simply a visual aid for determining if a set of data can be approximated by a known distribution. Alternatively, goodness-of-fit tests can also be used for such determinations. For example, with baseline data, we could construct a Q-Q plot of the ordered sample values, denoted by x^ = [n/(n 1)2]T?, against the corresponding quantiles, <?(i), of the reference distribution. If [n/(n 1)2]T2 can be described by the appropriate beta distribution, /3[ p /2,(n-p-i)/2]> the beta quantiles are computed from the following integral equation:
If a Q-Q plot of the data results in an approximately straight line, it can be concluded that the distribution of the data is not different from the reference distribution. A straight line, with a slope of 1 and an intercept of 0, indicates an excellent fit of the empirical distribution to the hypothesized distribution. A straight-line plot with an intercept different from zero indicates a difference in location for the two distributions, while a distinct curve in the plot suggests a difference in variation. As illustrated in later chapters, the Q-Q plot can provide an additional method of locating atypical observations. To demonstrate the Q-Q plot for use in determining if the T2 statistic follows a beta distribution, we generate a sample of size n = 54 from a bivariate normal distribution. The corresponding T2 values are computed and converted to beta values using (3.1). A Q-Q plot corresponding to a beta distribution, with parameters equal to 1 and 26, is constructed and presented in Figure 3.8. The plot displays an approximate linear trend, along a 45-degree line, though the last five to six points are slightly below the projected trend line. This pattern suggests that the T2 data for this sample follow a beta distribution, a conclusion that is expected given that the underlying data were generated from a bivariate normal distribution. From Table 3.1, we noted that the joint distribution of five independent uniform variables with an additional independent exponential variable has the same kurtosis value as an MVN with six variables. The mean and variance of the corresponding T2
42
Figure 3.8: Q-Q plot of generated bivariate normal data.
Figure 3.9: T chart based on simulated data from nonnormal distribution.
sampling distributions are also equal. Under the MVN assumption, the T2 statistic follows a chi-square distribution with p degrees of freedom. We now illustrate the appropriateness of the same chi-square distribution for the T2 statistic generated from the nonnormal distribution. Two hundred observations for five independent uniform variables and one independent exponential are generated. The T2 chart for the 200 observations are presented in Figure 3.9, where UCL = 16.81 is based on a = 0.01. The corresponding Q-Q plot using chi-square quantiles is presented in Figure 3.10. Our interest lies in the tail of the distribution. For a = 0.01, we would expect two values greater than the chi-square value of 16.81. Although two T2 values in Figure 3.9 are near the UCL, the T2 chart indicates no signaling T2 values. These two large T2 values are located in the extreme right-hand corner of the Q-Q plot in Figure 3.10. The Q-Q plot follows a linear trend and displays little deviation from
3.5. Validation of the T2 Distribution
43
Figure 3.10: Q-Q plot based on simulated data from nonnormal distribution.
Figure 3.11: T2 chart based on simulated data from nonnormal distribution. it. Thus, the T2 values appear to follow a chi-square distribution despite the fact that the underlying multivariate distribution is nonnormal. In contrast, consider a bivariate nonnormal distribution based on one independent uniform and one independent exponential variable. From Table 3.1, it is noted that the kurtosis of this distribution is larger than that of a bivariate normal. This implies a heavier tail (i.e., more large T2 values) than would be expected under normality. We generate 200 observations from this nonnormal distribution and construct the T2 chart presented in Figure 3.11 using UCL = 9.21 and a = 0.01. With this a, we would expect to observe two signals in the chart. However, there are five signaling T2 values, indicating the heavy tailed distribution expected by the high kurtosis value. The corresponding Q-Q plot for this data is presented in Figure 3.12. Note the severe deviation from the trend line in the upper tail of the plot of the T2 values.
44
Figure 3.12: Q-Q plot based on simulated data from nonnormal distribution.
Figure 3.13: Scatter plot of bivariate data.
The conclusion is that a chi-square distribution does not provide a good fit to these data. To illustrate the appropriateness of the use of the beta distribution to describe the sampling distribution of the T2 statistic, we consider 104 bivariate observations taken from an actual industrial process. A scatter plot of the observations on the variables x\ and x 2 is presented in Figure 3.13. Observe the elongated elliptical shape of the data swarm. This is a characteristic of the correlation, r = 0.45, between the two variables and not of the form of their joint distribution. Observe also the presence of one obvious outlier that does not follow the pattern established by the bulk of the data. The presence of outliers poses a problem in assessing the distribution of the T2 statistic. Thus, we must remove the three outliers and recompute the T2 values of the remaining data. These remaining values are plotted in the T2 chart given in Figure 3.14. There remain some large T2 values associated with some of the observations, but none are signals of out-of-control points. Observations of this
3.5. Validation of the T2 Distribution
45
Figure 3.14: T2 chart of bivariate data.
Figure 3.15: Q-Q plot of T2 values of bivariate data.
type (potential outliers) are not removed in this example, although they could possibly affect the fit of the corresponding beta distribution. The corresponding Q-Q plot for this data is presented in Figure 3.15. Since p 2 and n = 103, the beta distribution fit to the data, using (3.1), is 5^ 50 ). From inspection, the Q-Q plot exhibits a very strong linear trend that closely follows the 45 line imposed on the plot. This indicates an excellent fit between the T2 sampling distribution and the appropriate beta distribution. A question of interest is whether the above beta distribution is describing the process data because the actual observations follow a bivariate normal or because the fit provides a good approximation to the sampling distribution. To address this question, we examine estimates of the marginal distributions of the individual variables x\ and x<. If the joint distribution is bivariate normal, then each marginal
46
Figure 3.16: Histogram for x\.
Figure 3.17: Histogram for x%.
distribution must follow a univariate normal distribution. Since histograms of the individual variables provide a good indicator of the marginal distributions, these are presented in Figures 3.16 and 3.17. Neither variable appears to have a distribution similar to the normal curves superimposed on the histograms and based on using the means and standard deviations of the individual variables. If the marginal distributions were normal, we would expect agreement between these normal curves and the corresponding histograms. However, the distribution of variable x\ appears to be flat and skewed to the left, while the distribution of x% appears to be somewhat flat and skewed to the right. This is supported by a kurtosis value of 2.68 for the distribution of xi, and a value of 2.73 for the distribution of x% (relative to a value of 3 for a normal distribution). From these observations, we conclude that the joint distribution of the two variables is nonnormal. The good fit of the beta distribution to the data
3.6. Transforming Observations to Normality
47
appears to be due to its excellent approximation to the sampling distribution of the T2 statistic.
3.6
Transforming Observations to Normality
The closer that observations are to fitting a MVN distribution, the better the beta distribution will be in describing the behavior of the T2 statistic. For this reason, we examine, in terms of the MVN, some major reasons why goodness-of-fit test procedures might lead to rejection of the appropriate T2 distribution. The MVN distribution is symmetric about its mean and has long thin tails away from its mean. Thin tails imply that the concentration of probability drops off very rapidly as the statistical distance of the data from the mean increases. Thus, the number of points within the contours will diminish as the distance from the mean point increases. Also, symmetry of the MVN distribution implies a symmetric dispersion of points within the contours. When a beta distribution for the T2 values is rejected, it may be a result of the fact that the data are from a nonnormal distribution having (true) contours that are nonelliptical in shape (e.g., see Figure 3.7). This can occur when the tails of the distribution become thick in one or more directions. It also may be due to the fact that the distribution of the data is skewed in one direction so that the points are not symmetrically dispersed within the contours. Outliers can be a problem in that they can distort the estimate of the mean vector as well as of the covariance matrix. This in turn may lead to contours that are odd-shaped and off-center. Under the assumption of multivariate normality, the probability functions governing the individual variables for an MVN are univariate normal. Also, any subset of variables following an MVN is jointly normally distributed, and any linear combination of the variables is univariate normal. Unfortunately, the converses of these statements are not true. For example, it is possible to construct data sets that have marginal distributions that are normally distributed, yet the overall joint distribution can be non-normal. Nevertheless, although single-variable normality does not guarantee overall multivariate normality, it does improve the overall symmetry of the joint distribution, and this is an important issue when applying the T2 statistic. If the T2 values cannot be shown to follow a beta distribution, it may be possible to transform the individual variables so that the T2 values based on the transformed values follow a beta distribution. One simple means to achieve this might be to transform the individual variables so that each is approximately normally distributed. For example, several methods have been proposed for using the Johnson system to transform a nonnormal variate to univariate normality (e.g., see Chou, Polansky, and Mason (1998)). Tests may then be conducted to determine if the T2 based on the set of transformed variables follows a beta distribution. A similar but more complex technique for making observations more normal is to use a multivariate transformation so that the entire observation vector has an approximate MVN distribution (e.g., see Johnson and Wichern (1998), Velilla (1993)). Also, the transformation based on the above Johnson system has been extended to the multivariate case (e.g., see Johnson (1987)). The difficulty in using
48
these types of transformations is in making inferences from the transformed data back to the original data.
3.7
Distribution-Free Procedures
When the beta distribution for the T2 values is rejected and cannot be used in determining the UCL for a T2 control chart, alternative procedures are needed. One simple but conservative method of determining a UCL is based on an application of Chebyshev's theorem (e.g., see Dudewicz and Mishra (1988)). The theorem states that regardless of the distribution of a;,
where A: is a chosen constant such that k > I and where p, and d1 are the mean and variance, respectively, of x. For example, the probability that a random variable x will take on a value within k = 3.5 standard deviations of its mean is at least 1 l/k2 = 1 l/(3.5) 2 = 0.918. Conversely, the probability that x would take on a value outside this interval is no greater than l/k2 = 1 0.918 = 0.082. To use the Chebyshev procedure in a T2 control chart, calculate the mean, T, and the standard deviation, s^, of the T2 values obtained from the HDS. Using these as estimates of the parameters HT and a? of the T2 distribution, an approximate UCL is given as The value of k is determined by selecting a, the probability of observing values of x outside the interval bounded by [ik<j, and solving the equation
Another simple method for estimating the UCL of a T2 statistic is based on finding a distribution-free confidence interval (CI) for the UCL. This approach uses the fact that the UCL represents the (1 a)th quantile of the T2 distribution, where a is the false alarm rate. A brief summary of the approach is given in this chapter's appendix (section 3.11); also, while Conover (2000) gave a detailed discussion of both one-sided and two-sided CIs for the pth quantile of a distribution. A third method of obtaining an approximate UCL is to fit a distribution to the T2 statistic using the kernel smoothing technique (e.g., see Chou, Mason, and Young (2001)). We then can estimate the UCL using the (1 a)th quantile of the fitted kernel distribution function of the T2. An example of a fitted set of data and the corresponding smoothed distribution function is illustrated in Figure 3.18. The kernel smoothing approach can provide a good approximation to the UCL of the T2 distribution provided the sample size is reasonably large (i.e., n > 250). However, the estimate will be biased. A detailed example using this method to remove outliers is provided in Chapter 5. For smaller samples, other approaches must be utilized.
3.8. Choice of Sample Size
49
Figure 3.18: Kernel smoothing fit to T histogram.
3.8
Choice of Sample Size
Many different variables need to be examined in a multivariate process, and many different parameters need to be estimated. These requirements result in the need for large sample sizes. For example, in a steam turbine study, the variables of interest might include temperature, pressure, feed flow, and product flow. Due to differences in the standard deviations of these variables, each individual variable might require a separate estimated sample size to achieve a given precision. The problem, then, is to combine these different estimates into a single value of n. One useful solution to this problem is to use the largest estimated sample size for the individual variables as the overall sample size (see Williams (1978)). However, we need to be certain that this choice meets certain conditions. Using the T2 statistic requires a sample in which the number of observations n exceeds the number of variables p. If p < n, neither the inverse covariance matrix S"1 nor its estimate S~l can be computed. However, this is a minimal requirement. In addition, a large number of parameters must be estimated when the mean vector and covariance matrix are unknown. To provide stable and accurate estimates of these parameters, n must be sufficiently large. This occurs because there are p means, p variances, and [p(p l)/2] covariances, or a total of 2p + [p(p l)/2] parameters, to be estimated. For large p, this number can be significant. For example, for p = 10 we must estimate 65 parameters, while for p = 20 the number increases to 230. From the above discussion, one can see that the sample size for a multivariate process can be large, particularly when p is large, as there are many parameters to estimate. Other considerations also govern the choice of n. For example, in choosing a preliminary sample for a steam turbine system one might want to include observations under normal operational conditions. To obtain an adequate profile of ambient conditions would require observations from an entire year of operation. For
50
Figure 3.19: Scatter plot with a discrete variable. replication, one would also need multiple observations at various loads (megawatt production) for the different temperatures. An alternative solution to choosing a large sample size is to seek to reduce the dimensionality of the multivariate problem. This can be achieved by reducing the number of parameters that need to be estimated. One useful solution to this problem involves applying principal component analysis (Jackson (1991)).
3.9
Discrete Variables
The MVN distribution is generally used to describe observations on a set of continuous random variables. These include variables that can take on all possible values in a given interval. However, this does not imply that all components of the observation vector must be continuous variables and that discrete variables are prohibited. For example, consider the plot of a hypothetical bivariate sample of data taken on shoe size and body weight that is given in Figure 3.19. The data are concentrated at the different shoe sizes and these occur only at discrete values. With the assumption of a bivariate normal distribution, we would obtain the elliptical region that is superimposed on the plot. In this example, the T2 statistic is usable because there are several different categories for the discrete variable.
3.10
Summary
Fundamental to the use of any statistic as a decision-making tool is the probability function describing its behavior. For the T2 statistic, this is either the chi-square, the beta, or the F distribution. Multivariate normal observation vectors are the basis for these distributions. Since multivariate normality is not easily validated,
3.11. Appendix: Confidence Intervals for UCL
51
an alternative approach is to validate the distribution of the T2 statistic. We offer a number of procedures for accomplishing this task. A Q-Q plot of the ordered T2 values plotted against the quantiles of the beta (or chi-square, or F) distribution is a useful graphical procedure for validating the T2 distribution. Strong linearity of the plotted points suggests that the observed T2 values can be described by the reference distribution. If graphical procedures are too imprecise, we recommend performing a goodness-of-fit test on the observed T2 values. Many of the common goodness-of-fit tests can be used for this purpose. If the above tests indicate that the T2 values do not provide a good fit to the required beta, or chi-square, or F distribution, several alternatives are available for finding the required UCL to use in the T2 control chart. One approach is to transform the individual nonnormal variables to normality. If such transformations can be found, the observed T2 values could then be retested for the appropriate fit. Other procedures involve estimating the UCL using a nonparametric approach based on kernel smoothing, the quantile technique, or Chebyshev's inequality. Sample size selection is always a concern when performing a multivariate analysis. For the T2 control chart, this means having enough observations to estimate the unknown parameters of the mean vector and covariance matrix. As the number of parameters increases, the required sample size can become quite large. This is a restriction that must be considered in any application of the T2methodology.
3.11
Appendix: Confidence Intervals for UCL
Suppose T2^ < T22. < < T,2 N represent n ordered sample T2 values and let a = 0.01. Also let Qp represent the pih quantile of the T2 distribution. A 1007% CI for Qo.gg (the UCL) is given by [T(2r),T(2s)] and satisfies
where r and s are determined by the inequality
and s r is a minimum. For large n, one may approximate r and s by the two values where 2( 7 /2) is the upper 7/2th quantile of the standard normal distribution. The CIs obtained from (A3.1) and (A3.2) are generally very similar when n is large and 7 ~ 0.95. We choose to use the inequality in (A3.1) to obtain r and s. From (A3.1), one can be at least 1007% sure that the UCL is somewhere between 2 2 2 TV N) and T\ -,. Since there is an infinite number of values between T,\ ^) and T/\2 ) ^ , there l are infinitely many choices for the UCL. For convenience, we choose the midpoint of the interval as an approximate value for the UCL. It is given by
b
r
Chapter 4
Construction of Historical Data Set
Old Blue: The Research Center

You enter the research center and immediately start your search for information on SPC. You are amazed at the number of books on univariate process control. Here is one by Montgomery and another by Ryan. However, none of these books appears to address multivariate process control in the detail that you require. You decide to change your approach. Using the key word "multivariate, " you again search for textbooks. Here is one by Anderson, on multivariate analysis, that might contain some helpful information, but it is at the graduate level. Ah, here is one titled, Multivariate Statistical Process Control with Industrial Applications. Could this contain the solution to the problem? A research center associate quickly secures you a copy and you sit down to read it. You are encouraged that the preface states the book is written for the practitioner and from an engineering perspective. You are quickly absorbed in reading and the next few hours slip away. Soon the research associate is asking if you need anything before the lunch break. Your response is, "Yes, I need a personal copy of this book." Back in the office, while you are pondering what you have learned, you hear the boss asking, "Any progress?" With confidence restored, you respond with a very firm "Yes." To solve the problem with Old Blue, you must compare the new data containing upset conditions to past data taken when the process was in control. The in-control data is used to create a baseline or HDS. This was explained in the book as a Phase I operation. This is no different from what you were taught in univariate process control by . . . what was his name? However, the book contains two chapters on this topic. This calls for closer examination. The monitoring of new operational data to ascertain if control is being maintained is referred to as a Phase II operation. You continue thinking about what you have read as you open your lunch bag.
53
54
Chapter 4. Construction of Historical Data Set The statistic used to make the needed comparison is a Hotelling's T2. You twice read the section explaining how this is done. The theory is complex, but from an intuitive point of view, you now understand how multivariate control procedures work. A T2 statistic, the multivariate analogue of a common t-statistic, can assess all 35 variables at the same time. It is written as a quadratic form in matrix notation. You never appreciated that course in matrix algebra until now. It is all very amazing. Suddenly, you realize you still have a most serious problem. How is all of this computing to be done? You can't do it with your favorite spreadsheet without spending days writing macros. How was it done in the text? What software did they use in all of their data examples? A quick search provides the answer, QualStat, a product of InControl Technologies, Inc. You note that a CD-ROM containing a demonstration version of this program is included with the book. (This chicken salad sandwich is good. You must remember to tell the cafeteria staff that their new recipe is excellent.) Following the instructions on your computer screen, you quickly upload the software. Now, you are ready to work on a Phase I operation and create an HDS.
4.1
Introduction
An in-control set of process data is a necessity in multivariate control procedures. Such a data set, often labeled historical, baseline, or reference, provides the basis for establishing the initial control limits and estimating any unknown parameters. However, the construction of a multivariate HDS is complicated and involves problem areas that do not occur in a univariate situation. It is the purpose of this chapter to explore in detail some of these problem areas and offer possible solutions. The development of the HDS is referred to as a Phase I operation. Using it as a baseline to determine if new observations conform to its structure is termed a Phase II operation. Since there is only one variable to consider, univariate Phase I procedures are easy to apply. Upon deciding which variable to chart, one collects a sample of independent observations (preliminary data) on this variable from the in-control process. The resulting data provide initial estimates of the parameters that characterize the distribution of the variable of interest. The parameter estimates are used to construct a preliminary control procedure whose major purpose is to purge the original data set of any observations that do not conform to the structure of the HDS. These nonconforming or atypical observations are labeled outliers. After the outliers are removed from the preliminary data set, new estimates of the parameters are obtained and the purging process is repeated. This is done as many times as necessary to obtain a homogeneous data set as defined by the control procedure. After all outliers are removed, the remaining data is referred to as the HDS. The role of a multivariate HDS is the same as in the univariate situation. It provides a baseline for the control procedure by characterizing the in-control process. However, construction of a historical data set becomes more complicated when using multivariate systems. For example, we must decide which variables to include and their proper functional forms. This determination may require in-depth process
56
Chapter 4. Construction of Historical Data Set
Planning Establish Goals Study and Map Process Define Good Process Operations Collect Preliminaey Data Set Verify Data Quality Filter Data
Collection Procedures Human Errors Electronic Errors
Variable Form Thoreticel Relationships Empirical Relationships Transformations
Missing Data Estimation Deletion
Detecting Data Problems
Collinearity Effects Detection & Removal
Autocorrelation Effects Detection Outliers Detection Purging Process Alternative Covariance Estimators
Construct Final Historical Data Set
Chart 4.1: Guidelines for constructing an HDS.
4.3. Preliminary Data
57
The production of caustic soda and chlorine gas is a major industry in the United States. One method of production is through the electrolysis of brine (i.e., salt water). This work is done in an electrolyzer that is composed of one or more cells. A cell is the basic unit where the conversion takes place. The major purpose of a control procedure is to locate cells whose conversion efficiency has dropped so that they can be treated to restore their efficiency The brine (feed stock) for an electrolyzer must be treated to remove impurities. This operation takes place in a brine treatment facility. The primary purpose of a control procedure on this unit is to maintain the quality of the feed stock for the electrolyzer. "Bad" brine has the potential of destroying the cells and contaminating the caustic soda being produced. From the electrolyzer, caustic soda is produced in a water solution. The water is removed through evaporation in an evaporating unit. A control procedure on this unit maintains maximum production for a given set of run conditions, maintains the desired caustic strength, and helps locate sources of problems. One method of transporting the finished caustic product is by railroad tank cars. Overweight tank cars present a number of major problems. Control procedures on the loading of the tank cars can ensure that no car will be loaded above its weight limit. Control procedures on steam and gas turbines, used in the production of electricity for the electrolysis of the brine, detect deviations in the efficiency of the turbines. Also, they are used to locate sources of problems that occur in operations. Boilers, used in steam production for the evaporation of water, are controlled in a similar fashion. Control procedures on large pumps and compressors are used for maintenance control to detect any deviation from a set of "ideal" run conditions. It is less expensive to replace worn parts than to replace a blown compressor. Control procedures on various reactors are used to maintain maximum efficiency for a given set of run conditions and to locate the source of the problem when upsets occur.
4.3
Preliminary Data
The first step in the development of an HDS is to obtain a preliminary data set. This is a sample of observations taken from the process while it is "in control." However, "in control" must be defined. For example, the purpose of the desired control procedure may be to keep the process on target or to minimize some unwanted variation in production. How is a sample containing the desired information obtained? Generally, you are presented with a block of process data collected during a given time period. The data may be unfiltered and may contain information taken when the process was in control as well as out of control. In addition, the resultant sample may include data from different operational levels, different product formulations, different feed stocks, etc. The data may provide a genuine history of the process over the given time period but may not be in the desired form necessary to construct an HDS.
58
Chapter 4. Construction of Historical Data Set Table 4.1: Chemical process data.
Obs. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Xi 2020 2020 2014 1960 1870 1800 1711 1800 2011 1875 2099 2175 1226 1010 2041 2040 2330 2330 2250 2250 2351 2350 1977 2125 2033 1850 1904 1950 1795 2060 X2 165 255 266 270 185 195 201 250 182 175 252 270 216 180 192 225 131 160 241 195 177 135 181 200 189 150 103 125 290 240
661 675 675 900 850 590 663 875 710 600 566 535 495 520 692 700 483 600 523 480 679 640 705 830 830 800 970 670 629 590
X3
X4 5.90 7.00 7.00 3.00 4.00 3.00 4.00 0.00 2.40 4.30 5.70 4.50 0.00 0.00 6.80 7.50 7.40 5.50 7.90 6.50 1.80 3.30 6.70 6.50 6.50 5.80 0.00 6.50 5.10 6.00
X5 1.60 5.00 3.90 8.00 4.80 4.00 3.60 6.00 7.70 6.00 7.20 10.00 9.60 8.00 9.40 10.00 7.90 8.00 6.20 8.50 7.90 5.50 3.30 4.50 5.90 5.00 4.10 4.00 5.70 4.00
X6
0.28 0.26 0.27 0.27 0.30 0.30 0.29 0.33 0.29 0.29 0.35 0.33 0.29 0.31 0.28 0.32 0.29 0.26 0.31 0.31 0.32 0.28 0.29 0.31 0.32 0.32 0.24 0.29 0.35 0.32
X7 0.56 0.61 0.60 0.65 0.62 0.52 0.61 0.59 0.59 0.57 0.68 0.60 0.56 0.58 0.60 0.64 0.62 0.53 0.58 0.60 0.61 0.58 0.60 0.59 0.60 0.64 0.51 0.52 0.63 0.65
* 8
86 84 85 89 90 87 88 88 86 97 86 86 86 92 92 90 88 92 90 90 94 95 87 90 90 92 93 88 90 88
To understand these concepts, consider the data given in Table 4.1. It consists of a sample of 30 observations taken on eight variables, (Jî, X < 2 , . . . , Xg), measured on a chemical process. It is assumed at this stage of data investigation that a decision has been made as to the purposes and types of control procedures required for this process. Suppose it is desired to construct a control procedure using only the observations on the first seven variables presented in Table 4.1. Further, suppose it is most important to maintain variable X^ above a critical value of 5. Any drifting or changes in relationships of the other process variables from values that help maintain X^ above its critical value need to be detected so that corrective action can be taken. Initially, the data must be filtered to obtain a preliminary data set from which the HDS can be constructed. There are 17 observations with X above its critical value of 5. The obvious action is to sort the data on X and remove all observations in which X^ has a value below its critical value. This action should produce a set of data with the desired characteristics.
4.3. Preliminary Data Table 4.2: Group means of first seven variables.
Group
1 2
59
% Diff
Xi 1849 2070 0.7
195 201
X2
-3.1
694 657 5.6
X3
X5 X4 2.33 6.55 6.48 6.00 -6.4 9.1
X6 .295 .300 -1.7
X7 .583 .600 -2.8
Figure 4.1: Time-sequence plot of megawatts and fuel usage.
The filtering of data can provide valuable process information. For example, suppose the out-of-specification runs (i.e., X^ < 5) are labeled Group 1 and the inspecification runs (i.e., X^ > 5) are labeled Group 2. The means of the variables of the two groups are presented in Table 4.2. Valuable process information is obtained by closely examining the data. A mean difference on variable 4 is to be expected since it was used to form the groups. However, large percentage differences in the means are observed on variables 1 and 5, and a moderate difference is observed on variable 3. Further investigation is needed in determining how these variables are influencing variable 4. A preliminary data set should be thoroughly examined using both statistical procedures and graphical tools. For example, consider the graph presented in Figure 4.1 of fuel consumption, labeled Fuel, and megawatt-hours production, labeled Megawatts (or MW), of a steam turbine over time (in days of operation). Close examination of the graph produces interesting results. Note the valleys and peaks in the MW trace. These indicate load changes on the unit, whereas the plateaus reflect production at a constant load. When the load is reduced, the MW usage curve follows the fuel graph downward. Similarly, the MW graph follows the fuel graph upwards when the load is increased. This trend indicates there is a lag,
60
Chapter 4. Construction of Historical Data Set Table 4.3: Data for lag comparison.
Obs. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
X
7.84 9.14 9.20 9.10 9.21 8.62 6.78 9.14 9.15 9.13 9.29 8.95 9.09 9.18 8.33 8.63 4.32 8.74 8.91 8.70 9.20 9.18 8.76 6.25 5.18 5.94 6.13
116.93 117.45 118.04 111.40 85.96 117.24 117.42 117.66 118.42 116.84 116.50 116.76 108.05 90.34 79.01 113.24 115.50 111.92 118.54 118.51 119.25 82.01 68.89 77.76 81.05 101.77 111.15
ylagl 4 116.93 117.45 118.04 111.40 85.96 117.24 117.42 117.66 118.42 116.84 116.50 116.76 108.05 90.34 79.01 113.24 115.50 111.92 118.54 118.51 119.25 82.01 68.89 77.76 81.05 101.77 111.15
during a load change, in the response time of the turbine to the amount of fuel being supplied. This is very similar to the operation of a car, since accelerating or decelerating it does not produce an instantaneous response. Lags in only part of the data, as seen in the example in Figure 4.1, often can be easily recognized by graphical inspection. Other methods, however, must be used to detect a lag time extending across an entire processing unit. Observations across a processing unit are made at a single point in time. Before one can use the observations in this form, there must be some guarantee that the output observations match the input observations. Otherwise, the lag time must be determined and the appropriate parts of the observation vector shifted to match the lag. Some processes, such as strippers used to remove unwanted chemical compounds, work instantaneously from input to output. Other processes, such as silica production, have a long retention time from input to output. Consultation with the operators and process engineers can be most helpful in determining the correct lag time of the process. A helpful method for determining if lag relationships exist between two variables is to compute and compare their pairwise correlation with the correlation between one variable and the lag of the other variable. For example, consider hourly
4.4. Data Collection Procedures Table 4.4: Lag correlation with process variable.
61
ylagl ylag2 ylag3
Correlation 0.148 0.447 0.937 0.598
observations taken (at the same sampling period) on two process variables. A sample of size 27 is presented in Table 4.3, where the variable x is a feedstock characteristic and the variable y is an output quality characteristic. We begin by calculating the correlation between x and y. The value is given as 0.148. Next we lag the y values one time period and reconstruct the data set, as presented in Table 4.3. The resulting variable, labeled ylagl, has a correlation of 0.447 when compared to the x variable. We note an increase in the correlation. The observations on the quality variable y could be continuously shuffled downward until its maximum correlation with y is obtained. The correlations for three consecutive lags are presented in Table 4.4. Maximum correlation is obtained by lagging the quality characteristic two time periods. Note the decrease in the correlation for three lags of the quality characteristic. Thus, we estimate the time through the system as being two hours.
4.4
Data Collection Procedures
Careful examination of the recorded data can alleviate problems occurring in data acquisition. For example, data are recorded in many ways within processing units. At one extreme, information may be manually tallied on log sheets, whereas, in more modern units, data may be recorded electronically. Problems have occurred using both methods. Consider data recorded by operators using log sheets. Missing data frequently occur, as the operator is often busy doing something else at the appropriate sampling time and fails to accurately record an observation. Transcribing errors also may be prevalent. If problems such as these occur, they must be resolved before the control procedure is implemented. If observations are costly to obtain, missing data can become a serious problem. Electronic data collectors may eliminate problems like the ones described above, but they can produce others. For example, consider the 20 observations, on seven variables, listed in Table 4.5. The observation vectors are collected electronically at 1-minute intervals. Some of the components are chemical analysis determinations obtained using a separate piece of equipment. Since the recycle times of the different pieces of equipment vary, the same variables are being recorded at different sampling times. For example, the piece of equipment used in recording the observation on variable X-j has a two-minute recycle time. Note that observations 12, 13, and 14 all have the same value of 0.27. Other observations also contain repeats on this variable. To construct a preliminary data set from observations such as these, one
62
Chapter 4. Construction of Historical Data Set Table 4.5: Electronically gathered data.
Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Xi
138.21 135.72 131.72 128.24 122.96 121.91 118.21 121.04 124.53 131.47 131.60 116.98 122.94 128.78 132.21 134.87 134.64 134.38 131.36 120.79
X2
1.27 1.27 1.27 1.27 1.27 1.28 1.28 1.27 1.27 1.27 1.23 1.27 12.7 12.7 1.27 1.27 1.28 1.27 12.6 1.27
X3
5762.99 5762.99 5763.87 5763.87 5762.11 5762.99 5763.87 5763.87 5763.87 7009.57 7009.57 6232.03 6233.50 6234.38 6233.50 6234.38 6233.50 6234.28 6234.38 5822.46
158.81 154.20 151.13 149.81 152.67 175.08 192.57 204.29 203.34 179.13 160.42 162.14 148.67 141.55 141.13 141.55 166.27 182.23 194.50 200.91
X4
X5
2327.05 2327.05 2327.93 2327.05 2327.05 2327.05 2312.11 2312.11 2312.07 2312.70 2312.70 2076.86 2077.73 2077.73 2075.98 2253.22 2254.10 2253.22 2253.22 2253.22
X6
X7
0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.27 0.27 0.27 0.05 0.05 0.05 0.05 0.05 0.05
0.35 0.35 0.34 0.34 0.33 0.33 0.30 0.29 0.28 0.27 0.26 0.24 0.24 0.24 0.28 0.26 0.25 0.23 0.22 0.20
needs to examine closely how each observation is determined and the recycle time of each piece of equipment. Another problem area in data collection includes incorrect observations on components. This may be a result of faulty equipment such as transistors and temperature probes. Problems of this type can be identified using various forms of data plots or data verification techniques (usually included in the purging process).
4.5
Missing Data
Missing or incorrect observations on variables can occur for numerous reasons, including operator failure, equipment failure, or incorrect lab determinations. The simplest approach would be to delete the observations or variables with the missing data. However, this method is valid only if the remaining data is still a representative sample from the process and if the cause of the missing data is unrelated to the values themselves. Otherwise, the problem of incomplete data is complex and could have several different solutions. One helpful estimate of a missing value, when using the T2 as the control statistic, is the mean of that variable, conditioned on the observed values of the other vector components. This is simply the predicted value of the variable based on the regression of the variable on the other remaining variables. This value should have little or no influence on the other components of the data vector and minimum effect on the parameters to be estimated. When substituted into the data vector, it should make the observation vector homogeneous with the other observations.
4.5. Missing Data Table 4.6: Chemical process data for six variables.
Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Xi(NaOH) 134.89 129.30 145.50 143.80 146.30 141.50 157.30 141.10 131.30 156.60 135.60 128.39 138.10 140.50 139.30 152.39 139.69 130.30 132.19 134.80 142.30 X2(NaCl) 203.00 203.10 208.60 188.10 189.10 196.19 185.30 209.10 200.80 189.00 192.80 213.10 198.30 186.10 204.00 176.30 186.10 190.50 198.60 196.10 198.80 X3(Ii) 0.05 0.06 0.17 0.11 0.22 0.16 0.09 0.16 0.17 0.19 0.26 0.07 0.15 0.30 0.25 0.19 0.15 0.23 0.09 0.17 0.09 X4(I2) 4.00 1.90 6.10 0.40 0.50 3.50 2.90 0.50 3.80 0.50 0.50 3.60 2.70 0.30 3.80 0.90 1.60 2.60 5.70 4.90 0.30 X5(C12) 98.37 98.37 98.23 98.44 98.44 98.26 98.23 98.69 97.95 97.97 97.65 98.43 98.12 98.15 98.02 98.22 98.30 98.08 98.30 97.98 98.41 X6(02) 1.17 1.17 1.42 1.12 1.11 1.35 1.40 0.86 1.64 1.62 1.94 1.23 1.36 1.37 1.54 1.30 1.25 1.37 1.16 1.50 1.00
63
Table 4.7: Correlation comparison.

Without With NaOH 0.042 0.050 NaCl 0.246 0.247 -0.551 -0.563
Ii
-0.123 -0.158
I2
-0.962 -0.970
02
To demonstrate this regression procedure for estimating a missing value, consider the 21 observations presented in Table 4.6. Assume this is an HDS for a chemical process and consider the highlighted observation on variable X^ (in the last row) as a missing observation. The regression equation of X on X\, X%, ^3, X^, arid ^g as derived from the first 20 observations is given as
and the estimated value of C12 is computed as 98.58. This is in close agreement with the actual value of 98.41 given in Table 4.6. The T2 value, with this estimate of the missing component, is 5.94 as compared to the actual value of 5.92. Thus, substituting the missing value has little influence on the T2 statistic. Similarly, there is negligible change in the mean of X$. The mean of the observations without the missing value is 98.21 versus a mean of 98.23 when the estimated value is included. A comparison of the correlations between ^5 and the other five variables, with and without the predicted value, are given in Table 4.7. There appears to be little difference between these correlations. The fill-in-the-value approach presented above is a simple and quick method for estimating missing values in an HDS. However, among its limitations are the fact
64
that the estimated value is only as good as the prediction equation that produced it, and the fact that estimation may affect the variance estimates. Many other solutions exist (e.g., see Little and Rubin (1987)), and these can be used when better estimation techniques are preferred.
4.6
Functional Form of Variables
Using the originally measured variables in an HDS may not provide information about relationships between process variables in the best usable form. As stated in Chapter 2, the T2 statistic is constructed using the linear relationships existing between and among the process variables. Many situations occur where a simple transformation on one of the variables can strengthen the linearity, or correlation, among the variables. Examining scatter plots of pairs of variables can be helpful in making these decisions. For example, consider the plot of millivolts versus temperature given in Figure 4.2. The trend in the data appears to follow some form of a logarithmic or exponential relationship. However, the correlation between the variables is very strong with a value of 0.94. This is mainly due to the strong linearity between the two variables when millivolts exceed 5 in value. A downward curvature of the plot occurs toward the lower end of the scale of the millivolts axis. This curvature can be (somewhat) removed with a simple log transformation of millivolts. A plot of log (millivolt) versus temperature is presented in Figure 4.3. Although the plot still exhibits a slight curvature, the correlation between temperature and log (millivolt) has increased slightly to a value of 0.975.
Figure 4.2: Plot of relationship between millivolts and temperature.
4.7. Detecting Collinearities
65
Other methods, such as those based on theoretical knowledge about relationships between the variables, may be more helpful in substituting functional forms for some of the variables. Consulting with the process engineer can be helpful in making these decisions.
4.7
Detecting Collinearities
The formula for a T2 statistic is based on a covariance matrix that is nonsingular and can be inverted. A singular covariance matrix occurs when two or more observed variables are perfectly correlated (i.e., exactly collinear). Computer software packages used in computing the inverse of the matrix produce a warning when this results. However, most computer packages do not provide a warning when the collinearity in a matrix is severe but not exact. Our discussion in this section centers on the latter situation and how to detect and adjust for Collinearities. Several books include discussions on Collinearities and their effects on matrices (e.g., see Belsley, Kuh, and Welsch (1980) or Chatterjee and Price (1999)). Collinearities can occur in a covariance or correlation matrix because of sampling restrictions, because of theoretical relationships existing in the process, and because of outliers in the data. One method of identifying a collinearity is to examine the eigenvalues and eigenvectors of the sample covariance matrix (see subsection 4.11.1 of this chapter's appendix). However, to ease computations, one usually examines the corresponding correlation matrix since the results from this examination are equally applicable to the covariance matrix. A statistical tool that is useful in this process is a principal component analysis (PCA). This procedure can help detect the existence of a near singularity, can be used to determine subgroups of variables that are highly correlated, and can be used to estimate the dimensionality of the system. In some cases, the principal components themselves can give insight into the true nature of these dimensions.
Figure 4.3: Plot of relationship between temperature and log (millivolts).
66
Figure 4.4: Schematic of diaphragm cell. A brief summary of PCA and its relationship to the eigenvalue problem is contained in subsection 4.11.2 of this chapter's appendix. The reader unfamiliar with these concepts is encouraged to review this appendix before continuing with this section. A more detailed discussion of PCA is provided by Jackson (1991). The effects of a near-singular covariance matrix on the performance of a T2 statistic will be demonstrated in the following example. We begin by expressing the inverse of the sample covariance matrix as
where AI > A2 > > Ap are the eigenvalues of S and t/j, j 1, 2 , . . . ,p, are the corresponding eigenvectors. If Xp is close to zero, the ratio (1/AP) becomes very large and can have a disproportionate effect on the calculation of the inverse matrix. This distorts any statistic, such as the T2, that uses the inverse matrix in its calculation. To demonstrate how to examine the eigenstructure of a matrix, we examine the correlation matrix of a chlorine (C^)/caustic (NaOH) production unit. A unit schematic is presented in Figure 4.4. This particular process is based on the electrolysis of brine (salt). A current is passed through a concentration of brine solution where the anode and cathode are separated by a porous diaphragm. The chlorine is displaced as a gas and the remaining water/brine solution contains the caustic. The unit performing this work is referred to as a cell, and several of these are housed together (as a unit) to form an electrolyzer. Overall performance of the cell is measured by the percentage of the available power being used in the conversion process. This percentage is a computed variable and is referred to as conversion efficiency (CE). High values of this variable are very desirable. Many variables other than CE are used as indicators of cell performance. Measured variables are the days of life of the cell (DOL), cell gases including chlorine
4.7. Detecting Collinearities Table 4.8: Correlation matrix for chlorine data.
NaOH NaCl
Ii I2 C12
67
02 CE
NaOH 1.000 -0.013 0.218 0.037 -0.297 0.284 -0.284
NaCl -0.013 1.000 0.015 0.023 0.006 0.001 -0.001
Ii
0.218 0.015 1.000 0.567 -0.395 0.323 -0.324
0.037 0.023 0.567 1.000 -0.402 0.368 -0.369
12
-0.297 0.006 -0.395 -0.402 1.000 -0.956 0.956
C12
0.284 0.001 0.323 0.323 -0.956 1.000 -0.999
02
CE
-0.284 -0.001 -0.324 -0.324 0.956 -0.999 1.000
Table 4.9: Eigenvalues of correlation matrix.

No. 1 2 3 4 5 6 7
Eigenvalue 3.4914 1.1513 0.9959 0.9116 0.3965 0.0534 0.0003
%of Total 49.88 16.45 14.23 13.02 5.66 0.76

0
Cum CE NaOH 02 NaCl % C12 I2 Ii 49.88 0.2073 0.0028 0.2999 0.3039 -0.5105 0.507 -0.507 0.2384 -0.2382 0.22 66.32 -0.1838 -0.6092 -0.6347 -0.1739 0.0654 -0.0655 80.55 -0.0664 0.9761 -0.1487 -0.1027 -0.0401 0.1634 0.1265 -0.1638 0.9151 0.1158 0.2557 -0.1194 93.57 99.24 0.2584 -0.0049 -0.6677 0.0231 0.0792 -0.0232 0.6928 0.3926 -0.3937 0.8278 0.0064 -0.0068 0.001 0.0745 99.99 0.7074 0.7068 0.0007 0.0001 0.0002 100 0.0001 0.0001
and oxygen (Cl? and 02), caustic (NaOH), salt (NaCl), and impurities production (Ii and 12). The levels of impurities are important since their production indicates a waste of electrical power, and they contaminate the caustic. Table 4.8 is the correlation matrix for an HDS (n = 416) based on seven of these variables. Its eigenstructure will be examined in order to determine if a severe collinearity exists among the computed variables. Inspection of the correlation matrix reveals some very large pairwise correlations. For example, the correlation between the two measured gases, C12 and O 2 , has a value of -0.956. Also, the computed CE variable, which contains both Cl2 and O2, has a correlation of 0.956 with C12 and -0.999 with O 2 . Using a PCA, the seven eigenvalues and eigenvectors for the correlation matrix are presented in Table 4.9. Also included is the proportion of the correlation variation explained by the corresponding eigenvectors as well as the cumulative percentage of variation explained. A recommended guideline for identifying a near-singular matrix is based on the size of the square root of the ratio of the maximum eigenvalue to each of the other eigenvalues. These ratios are labeled as condition indices. A condition index greater than 30 implies that a severe collinearity is present. The value of the largest index, labeled the condition number, for the data in Table 4.9 is
which clearly indicates the presence of a severe collinearity among the variables. A severe collinearity in the correlation matrix translates into the presence of a severe collinearity in the associated covariance matrix. Since it is not possible or advisable to use a T2 control statistic when the covariance matrix is singular or near singular, several alternatives are suggested. The first, and simplest to implement, is to remove one of the variables involved in the collinearity. This is especially useful
68
when one of the collinear variables is computed from several others, since deletion of one of these variables will not remove any process information. To determine which variables are involved in a severe collinearity, one need only examine the linear combination of variables provided by the eigenvector corresponding to the smallest eigenvalue. From Table 4.9, this linear combination corresponding to the smallest eigenvalue of 0.0003 is given by
Ignoring the variables with small coefficients (i.e., small loadings) gives the linear relationship between the two variables that is producing the collinearity problem. This relationship is given as
This relationship confirms the large negative correlation, 0.999, found between CE and O2 in Table 4.8. The information contained in the computed variable CE is redundant with that contained in the measured variable 62- This relationship is producing a near singularity in the correlation matrix. Since CE is a computed variable that can be removed with no loss of additional information, one means of correcting this data deficiency is to compute the T2 statistic using only the remaining six variables. The revised correlation matrix for these six variables is obtained from the correlation matrix presented in Table 4.9 by deleting the row and column corresponding to CE. Another method for removing a collinearity from a covariance matrix is to reconstruct the matrix by excluding the eigenvectors corresponding to the near-zero eigenvalues. The contribution of the smallest eigenvalues would be removed and S~l would be computed using only the larger ones; i.e.,
This approach should be used with caution since, in reducing the number of principal components, one may lose the ability to identify shifts in some directions in terms of the full set of the original variables.
4.8
Detecting Autocorrelation
Most multivariate control procedures require the observation vectors to be uncorrelated over time. Unfortunately, violations of this assumption can weaken the effectiveness of the overall control procedure. Some industrial operations, such as chemical processes, are particularly prone to generating time-correlated, or autocorrelated, data. This situation can occur because of the continuous wear on
4.8. Detecting Autocorrelation
69
Figure 4.5: Continuous decay of heat transfer coefficient over time. equipment, the environmental and chemical contamination of the equipment, and the depletion of certain critical components, such as the availability of catalyst in the process. Autocorrelation, like ordinary correlation, may be the result of either a causeand-effect relationship or only of an association. If it is due to a cause-and-effect relationship, the observation on the time-dependent variable is proportional to the value of the variable at some prior time. If it is a relationship based on association, the present value of the variable is only associated with the past value and not determined by it. Why do certain types of processes have a tendency to generate observations with a time dependency? One possible answer to this very important question is that it is due to an association (correlation) with an unobservable "lurking" variable. Consider two variables that are negatively correlated, so that as one variable increases in value, the other decreases. Suppose one of the variables, the lurking one, cannot be observed and is increasing with time. Since this variable is not measurable, the time association will appear as a time dependency in the second variable. Without knowledge of the first variable, one could conclude that the second variable has a time dependency in its observations. Autocorrelated observations may occur in at least two different forms. The first we label as continuous or uniform decay, as it occurs when the observed value of the variable is dependent on some immediate past value. Certain in-line process filters, used to remove impurities, behave in this fashion. Another example is given by the measure of available heat, in the form of a heat transfer coefficient, which is used to do work in many types of processes. During a life cycle of the unit, the transfer of heat is inhibited due to equipment contamination or other reasons that cannot be observed or measured. A cycle is created when the unit is shut down and cleaned. During the cycle, the process is constantly monitored to insure maximum efficiency. Figure 4.5 is the graph of a heat transfer coefficient over a number of life cycles of a production unit.
70
Figure 4.6: Stage decay of process variable over time. The second form of autocorrelated data is labeled stage decay (e.g., see Mason, Tracy, and Young (1996)). This occurs when the time change in the variable is inconsistent over shorter time periods, but occurs in a stepwise fashion over extended periods of time. This can occur in certain types of processes where change with time occurs very slowly. The time relationship comes from the performance in one stage being dependent on the process performance in the previous stage(s). The graph of a process variable having two stages of decay is presented in Figure 4.6. If autocorrelation is undetected or ignored, it can create serious problems with control procedures that do not adjust for it. The major problem is similar to the one that occurs when using univariate control procedures on variables of a multivariate process. Univariate procedures ignore relationships between variables. Thus, the effect of one variable is confounded with the effects of other correlated variables. A similar situation occurs with autocorrelated data when the time dependencies are not removed. Adjustment is necessary in order to obtain an undistorted observation on process performance at a given point in time. Control procedures for autocorrelated data in a univariate setting often make these adjustments by modeling the time dependency and plotting the resultant residuals. Under proper assumptions, these residual errors, or adjusted values (i.e., effect of the time dependency removed), can be shown to be independent and normally distributed. Hence, they can be used as the charting statistic for the time-adjusted process. It is also useful to look at forecasts of charting statistics since processes with in-control residuals can drift far from the target values (e.g., see Montgomery (1997)). We offer a somewhat similar solution for autocorrelated data from multivariate processes. However, the problem becomes more complicated. We must be concerned not only with autocorrelated data on some of the variables, but also with how the time variables relate to the other process variables. Autocorrelation
4.8. Detecting Autocorrelation
71
Figure 4.7: Process variable with cycle.
does not eliminate these relationships, but instead confounds them and thus must be removed for clear signal interpretation. How this is done is a major focus of Chapter 10. One simple method of detecting autocorrelation in univariate processes is accomplished by plotting the variable against time. Depending on the nature of the autocorrelation, the points in a graph of the process variable versus time will either move up or down or oscillate back and forth. Subsequent data analysis is used to verify the presence of autocorrelation, determine lag times, and fit appropriate autoregressive models. Observations from a multivariate process are p-dimensional and the components are usually correlated. The simple method of plotting graphs of individual components against time can be inefficient when there are a large number of variables. Also, these time-sequence plots may be influenced by other correlated variables, resulting in incorrect interpretations. For example, considering the cyclic nature over time of the variable depicted in Figure 4.7, one might suspect that some form of autocorrelation is present. However, this effect is due to the temperature of the coolant, which has a seasonal trend. Nevertheless, even with this drawback, we have found that graphing each variable over time is useful. To augment the above graphical method and to reduce the number of individual graphs for study, one could introduce a time-sequence variable in the data set and examine how the individual variables relate to it. If a process variable correlates with the time-sequence variable, it is highly probable that the process variable correlates with itself in time. Using this method, one can locate potential variables that are autocorrelated. Detailed analysis, including the graphing of the variable over time, will either confirm or deny the assertion for individual variables.
72
When a variable is correlated with time, there will be dependencies between successive observations. This dependency may occur between consecutive observations, every other observation, every third observation, etc. These are denoted as different lag values. One useful technique for displaying this time dependency is to calculate the sample autocorrelation, r^, between observations a lag of k time units apart, where
The rk values (with TQ = 1) provide useful information on the structure of the autocorrelation. For example, if the fcth lag autocorrelation is large in absolute value, one can expect the time dependency at this lag to be substantial. This information can be summarized in a plot of the r^ values against the values of k. This graph of the autocorrelation function is sometimes referred to as a correlogram. More details on the use of these plots can be found in time-series books (e.g., Box and Jenkins (1976)).
4.9
Example of Autocorrelation Detection Techniques
To demonstrate the techniques of detecting autocorrelated data in a multivariate process, we will use a partial data set on a heat transfer coefficient taken from Mason and Young (1997). Many industries, especially the chemical industry, use heat as the energy source in the removal or separation of chemical components. For example, removal of water is necessary to strengthen a caustic solution. Heat available to do such work is measured by a heat transfer coefficient. Most evaporators are efficient only if the coefficient remains above a certain value. When the value drops below a critical level, the unit is cleaned, critical components are replaced, and a new cycle begins. Data consisting of 20 observations on such a unit are presented in Table 4.10. Heat transfer is one of the many variables used in the control of the process. It interests us because, as is indicated in Figure 4.8, its values appear to be linearly decreasing over time. The autocorrelation structure is illustrated with the correlogram in Figure 4.9. Of the various lag autocorrelations presented for this variable, the highest value occurs at ri = 0.9164 and corresponds to a lag of one time unit. Such a linear trend in a time plot is labeled a first-order autoregressive relationship, or simply an AR(1) relationship. Inferential statistical techniques also can be applied to confirm this result. For example, Table 4.11 contains the regression analysis-of-variance table for the fitting of an AR(1) model of the form
4.9. Example of Autocorrelation Detection Techniques Table 4.10: Raw data for heat transfer example.
Obs. No. Heat Transfer Lag Heat Transfer
73
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
103 103 106 106 107 105 102 103 99 99 99 98 98 97 94 99 99 96 93 92 90 91 90
106 103 103 106 106 107 105 102 103 99 99 99 98 98 97 94 99 99 96 93 92 90 91
Table 4.11: ANOVA table for AR(1) model.

Regression Residual Total
df I
21 22
SS 506.304 96.912 603.217
MS 506.304 4.614
F 109.711
p- value 0.000
where b0 and 61 are the estimated coefficients of the model relating the heat transfer variable yt to its lag value yt-i- The small p-value for the F statistic in the table implies that there is strong evidence that the immediate past heat transfer coefficient is an important predictor of the current heat transfer coefficient. As another example, consider the techniques necessary for detecting autocorrelation in data collected from a reactor used to convert ethylene (C2H4) to ethylene dichloride (EDC). EDC is the basic building block for much of the vinyl product industry. Feed stock for the reactor includes hydrochloric acid gas (HC1), ethylene, and oxygen (02). Conversion of the feed stock to EDC occurs under high temperature in the reactor. The conversion process is labeled oxyhydrochlorination (OHC). There are many different types of OHC reactors available to perform the conversion of ethylene and HC1 to EDC. One type, a fixed life or fixed bed reactor, must have critical components replaced at the end of each run cycle. The components
74
Figure 4.8: Time-sequence plot of heat transfer coefficient.
Figure 4.9: Correlogram for heat transfer coefficient. are slowly depleted during operation and performance of the reactor follows the depletion of the critical components. The best performance of the reactor is at the beginning of the cycle, as the reactor gradually becomes less efficient during the remainder of the cycle. While other variables have influence on the performance of the reactor, this inherent decay of the reactor produces a time dependency in many of the process and quality variables. We have chosen seven variables to demonstrate how to detect and adjust for autocorrelated data in this type of process. These are presented in Table 4.12. The first variable, RPl, is a measure of feed rate. The next four, Temperature, LI, 1/2, and 1/3, are process variables, and the last two are output variables. Variable PI is an indication of the amount of production for the reactor and variable Cl is an
4.9. Example of Autocorrelation Detection Techniques Table 4.12: Process variables from an OHC reactor.
Input Variables
RPl
75
Process Variables Temp LI, L2, L3
Output Variables
Cl, PI
Figure 4.10: Time-sequence plot of temperature.
undesirable by-product of the production system. All variables, with the exception of feed rate, show some type of time dependency. Temperature measurements are available from many different locations on a reactor. All of them are important elements in the performance and control of the reactor and increase over the life cycle. To demonstrate the time decay of the measured temperatures, we present in Figure 4.10 a graph of the average temperature over a good production run. The graph indicates the average temperature of the reactor is initially stable, but then it gradually increases over the life cycle of the unit. The time-sequence graphs in Figures 4.11 and 4.12 of the two process variables, L3 and LI, present two contrasting patterns. In Figure 4.12, L3 increases linearly with time and has the appearance of a first-order lag relationship. This is confirmed by the fact that r\ 0.7533. However, the graph of LI in Figure 4.13 is depicted as a quadratic or exponential across time, but can still be approximated by a first-order lag relationship. In this case, r<2 = 0.9331. The graph of Cl versus time is presented in Figure 4.13. The time trend in this graph differs somewhat from the previous graphs. There appear to be separate stages in the plot: one at the beginning, another in the middle, and the third stage at the end. Of the remaining three variables, none show strong time dependencies. As an example, consider the time-sequence plot for RPl given in Figure 4.14. Across time, the plot of the data is nearly horizontal and shows no trends or patterns.
76
Figure 4.11: Time-sequence plot of L3.
Figure 4.12: Time-sequence plot of I/I.
Another useful indication of autocorrelation in a process is the presence of a strong pairwise correlation between a variable and the time sequence of its collection. For example, the correlations between the seven variables listed in Table 4.12 and a time-sequence variable (Time) based on the observation number for a run cycle are presented in Table 4.13. Inspection of the pairwise correlations between the seven variables and time indicates there is a moderate time dependency for variables 1/2 and Temp, and a lesser one for the unwanted I/I and Cl. A weak-tomoderate relationship is indicated for RP1, for the process variable L3, and for the production variable PI. Note that the production variable PI decreases over time, as indicated by the negative correlation coefficient.
4.9. Example of Autocorrelation Detection Techniques
77
Figure 4.13: Time-sequence plot of Cl.
Figure 4.14: Time-sequence plot of RP1.
Table 4.13: Pairwise correlations with time-sequence variable.

Process Variables
RPl LI LI
L3
Temp
Cl PI
Time 0.507 0.693 0.811 0.318 0.808 0.691 -0.456
78
4.10
Summary
Control procedures are designed to detect and help in determining the cause of unusual process events. The point of reference for "unusual events" is the historical data set. This is the baseline of any control procedure and must be constructed with great care. The first step in its construction is to acquire an understanding of the process. This knowledge can be obtained from the operators and process engineers. A study of the overall system will reveal problem areas where the applications of a control procedure would be most helpful. This is necessary to determine the type and purpose of the control procedure. With the selection of an appropriate area for application of a control procedure, we can obtain a preliminary data set. However, the data must be carefully filtered of its impurities so that the resulting data set is clean. Graphical tools can be a great aid in this process, as they can be used to identify obvious outliers and, in some cases, determine useful functional relationships among the variables. In addition, data collection and data verification procedures must be examined and any missing data replaced or estimated, or else one must remove the associated observation vector or process variable. After filtering the preliminary data, we strongly recommend checking on the singularity of the covariance matrix. The problems of a singular covariance matrix, or of collinearity among the variables, can be quite critical. Collinearity often occurs when there are many variables to consider or when some of the variables are computed from measured ones. These situations can be detected using the eigenvalues of the covariance or correlation matrices. Principal component analysis can be a useful tool in this determination, as can consultation with the process engineers. Since a severe collinearity can inflate the T2 statistic, appropriate action must be taken to remove this problem. Steady-state control procedures do not work well on autocorrelated processes. Thus, one must investigate for the presence of autocorrelation in the preliminary data set. We offer two procedures for detecting the presence of autocorrelated data in a multivariate system. The first is based on plotting a variable over time and looking for trends or patterns in the plot. The second is based on plotting the sample autocorrelations between observations separated by a specified lag time versus time and examining the observed trends. Large autocorrelations will pinpoint probable cases for further study. In Chapter 10, we discuss procedures for removing the effects of these time dependencies on the T2 statistic.
4.11
4.11.1
Appendix
Eigenvalues and Eigenvectors
Consider a square (p x p) matrix A. We seek to find scalar (constants) values A^, i = l,2,...,p, and the corresponding (p x 1) vectors Ui, i = 1, 2 , . . . , p , such that the matrix equation
4.11. Appendix
79
is satisfied. The \i are referred to as the eigenvalues, or characteristic roots, of A, and the corresponding vectors, C/j, are referred to as the eigenvectors, or characteristic vectors, of A. The \i are obtained by solving the pth-degree polynomial in A given by where AXI \ represents the determinant of the matrix (AXI). The corresponding eigenvectors Ui are then obtained by solving the homogeneous system of equations given in (A4.1). The eigenvalues (Ai, A 2 , . . . , Xp) are unique to the matrix A] however the corresponding eigenvectors (t/i, C/2, , Up) are not unique. In statistical analysis the eigenvectors are often scaled to unity or normalized, so that Note that the eigenvalues of A~l are the reciprocals of the eigenvalues of A. The corresponding eigenvectors of A~l are the same as those of A. Covariance matrices, such as S, that are associated with a T2 statistic are symmetric, positive definite matrices. For symmetric matrices, the corresponding eigenvalues must be real numbers. For positive definite symmetric matrices, the eigenvalues must be greater than zero. Also, with symmetric matrices the eigenvectors associated with distinct eigenvalues are orthogonal so that UiU'j = 0. Near-singular conditions (i.e., collinearities) exist when one or more eigenvalues are close to zero. Closeness is judged by the size of an eigenvalue relative to the largest eigenvalue. The square root of the ratio of the largest eigenvalue (Ai) to any other eigenvalue (Xi) of a matrix A is known as a condition index and is given by
A recommended guideline for identifying a near-singular matrix is based on the size of this ratio. A ratio greater than 30 implies that a severe collinearity is present. Our main reason for examining eigenvalues and eigenvectors is their use in diagnosing (near) collinearities associated with the T2 statistic. This topic is extensive and cannot be covered in this limited space. For more details, see Belsley (1991) or Myers and Milton (1991). 4.11.2 Principal Component Analysis
The iih principal component, Zi, of a matrix A is obtained by transforming an observation vector X' (rci, # 2 , . . . , xp) by
where Ui, i = 1, 2 , . . . , p, are the eigenvectors of A. If A is symmetric, its eigenvectors (associated with distinct eigenvalues) are orthogonal to each other. Thus, the resulting principal components (zi, 2 2 , . . . , zp) also would be orthogonal. Also, each principal component in (A4.3) has a variance given by the corresponding eigenvalue, i.e.,
80
A major use of PCA is to transform the information contained in p correlated process variables into the p independent principal components. The transformation in (A4.3) is made in such a way that the first k of the principal components contain almost all the information related to the variation contained in the original p variables. Used in this manner, PCA is known as a dimension-reduction technique since k < p. The percentage of the total variation explained by the first k principal components is computed by the ratio of the sum of the first k eigenvalues to the total sum of the eigenvalues. It can be shown that when a PCA is performed on a correlation matrix, this reduces to
The principal components of a covariance matrix can be used to identify the variables that are related to a collinearity problem. Suppose the condition index for the smallest eigenvalue, A p , is greater than 100. This implies that Xp is very small relative to the largest eigenvalue of the covariance matrix. It also implies that the variance of the pth principal component is very small, or approximately zero, i.e., that var(zp) ~ 0. For a covariance matrix, this implies that a near-perfect linear relationship is given by the pth principal component, i.e.,
This equation produces the collinear relationship that exists between Xj and the other variables of the system. The theoretical development of PCA is covered in the many texts on multivariate analysis, e.g., Morrison (1990), Seber (1984), and Johnson and Wichern (1998). An especially helpful reference on the applications of PCA is given in Jackson (1991).
Chapter 5
Charting the T2 Statistic in Phase I
5.1
Introduction
In this chapter we discuss methods, based on the T2 statistic, for identifying atypical observations in an HDS. We also include some examples of detection schemes based on distribution-free methods. When attempting to detect such observations, it is assumed that good preliminary data are available and that all other potential data problems have been investigated and resolved. The statistical purging of unusual observations in a Phase I operation is essentially the same as an outlier detection problem. An outlier is an atypical observation located at an extreme distance from the main part of the sample data. Several useful statistical tests have been presented for identifying these observations, and these techniques have been described in numerous articles and books (e.g., see Barnett and Lewis (1994), Hawkins (1980), and Gnanadesikan (1977)). Although the T2 statistic is not necessarily the optimal method for identifying outliers, particularly when used repeatedly as in a control chart, it is a simple procedure to apply and can be very helpful in locating individual outlying observations. Further, as shown in Chapter 7, the T2 statistic has the additional advantage of being capable of determining the process variables causing an observation to signal. For these reasons, we will concentrate only on the T2 statistic.
5.2
The Outlier Problem
The reason we seek to remove outlying observations from the HDS is because their inclusion can result in biased sample estimates of the population mean vector and covariance matrix and lead to inaccurate control procedures. To demonstrate this, consider the scatter plot in Figure 5.1 of a preliminary data set on two variables. Three separate groupings of outliers, denoted as Groups A, B, and C, are presented in the graph. The inclusion of these observations in the HDS, denoted by
81
82
Chapter 5. Charting the T2 Statistic in Phase I
figure 5.1 scater plot with thre eeg roufp f dat a black circle son the graph will bias the estimates of thwe varieanc of the two vart ables and/or the estimates of the correlation between these two variables. For example, the inclusion of Group A data will increase the variation in both variables but will have little effect on their pairwise correlation. In contrast, including the Group C data will distort the correlation between the two variables, though it will increase the variation of mainly the x\ variable. Why do atypical observations, similar to those presented above, occur in an initial sample from an in-control multivariate process? There are many reasons, such as a faulty transistor sending wrong signals, human error in transcribing a log entry, or units operating under abnormal conditions. Most atypical information can be identified using graphs and scatter plots of the variables or by consulting with the process engineer. For example, several of the observations in Groups B and C of Figure 5.1, such as points Bl and Cl, are obvious outliers; however, others may not be as evident. It is for this reason that a good purging procedure is needed. Detecting atypical observations is not as straightforward in multivariate systems as in univariate ones. A nonconforming observation vector in the multivariate sense is one that does not conform to the group. The purging procedure must be able to identify both the components of the observation vectors that are out of tolerance as well as those that have atypical relationships with other components.
5.3
Univariate Outlier Detection
Consider a univariate process where control is to be monitored with a Shewhart chart based on an individual observation x. Assume the mean, //, and the standard deviation, cr, of the underlying normal distribution, 7V(//,<7 2 ), are known for the process. Our goal is to clean the preliminary data set (collected in Phase I) by
5.3. Univariate Outlier Detection
83
Figure 5.2: Normal distribution with control limits.
purging it of outliers. Any observation in the data set that is beyond the control limits of the chart is removed from further consideration. Suppose that the Shewhart upper and lower control limits, denoted UCL and LCL, for this data set are those depicted in Figure 5.2. We assume that any outlier is an observation that does not come from this distribution, but from another normal distribution, N(/j, + d, a 2 ), having the same standard deviation, but with the mean shifted d units to the right. Both distributions are depicted in Figure 5.3. Detecting an outlier in this setting is equivalent to the testing of a statistical null hypothesis. To decide if the given observation is taken from the shifted normal distribution, and thus declared an outlier, we test the null hypothesis
that all observations arise from the normal distribution JV(p,,cj 2 ) against the alternative hypothesis that all observations arise from the shifted normal distribution N({j,-\-d: a 2 ). If the null hypothesis is rejected, we declare the observation to be an outlier and remove it from the preliminary data set. In the above hypothesis test, the distribution under the null hypothesis is referred to as the null distribution and the distribution under the alternative hypothesis is labeled the nonnull distribution. The power of the hypothesis test is denoted in Figure 5.3 by the area of the shaded region under the nonnull distribution, which is the distribution shifted to the right. This is the probability of detecting an observation as an outlier when it indeed comes from the shifted distribution. Comparisons are made among different outlier detection schemes by comparing the power function of the procedures across all values of the mean shift, denoted by d. Many analysts use univariate control chart limits of individual variables to remove outlying observations. In this procedure, all observation vectors that contain
84
Figure 5.3: Original and shifted normal distributions.
Figure 5.4: Univariate Shewhart region and T2 control region.
an observation on a variable outside the 3cr range are excluded. This is equivalent to using the univariate Shewhart limits for individual variables to detect outlying observations. A comparison of this procedure with the T2 procedure is illustrated in Figure 5.4 for the case of two variables.
5.4. Multivariate Outlier Detection
85
The shaded box in the graph is defined by the univariate Shewhart chart for each variable. For moderate-to-strong correlations between the two variables, the T2 control ellipse usually extends beyond the box. This indicates that the operational range of the variables of a multivariate correlated system can be larger than the control chart limits of independent variables. Use of the univariate control chart limits of a set of variables ignores the contribution of their correlations and in most cases restricts the operational range of the individual variables. This restriction produces a conservative control region for the control procedure, which in turn generates an increased number of false signals. This is one of the main reasons for not using univariate control procedures to detect outliers in a multivariate system.
5.4
Multivariate Outlier Detection
As in the univariate case, a preliminary control procedure for a multivariate process must be constructed to purge the data of atypical observations. With the T2 statistic, the corresponding control chart has only a UCL. If the T2 value that is computed for an observation exceeds this limit, the observation is deleted. For (univariate) Shewhart charting, there is very little procedural difference between the preliminary charting procedure and the actual control procedure. However, the multivariate Shewhart chart based on the T2 statistic offers some distinct differences in the computation of the UCL, mainly due to the form of the probability distribution of the T2 statistic. As noted in Chapter 2, depending on the circumstances, the T2 statistic can be described by three different probability functions: the beta, the F, and the chi-square distributions. The beta distribution is used in the purging process of a Phase I operation, whereas the F distribution is used in the development of the control procedure in a Phase II operation. The chi-square distribution has applications in both Phase I and II operations. One begins the purging process by selecting a value for a, the probability of a Type I error. Its choice determines the size, 1 a, of the control region. A Type I error is made if an observation is declared an outlier when in fact it is not. In making such errors, we exclude good observations from the HDS. These observations have large statistical distances (from the mean vector) and will lie in the tail region of the assumed MVN distribution. For small preliminary data sets, excluding these observations can have a significant effect on estimates of the covariance matrix and mean vector. For larger samples, the effect should be minimal. The situation is reversed when one considers the error of including an outlier in the HDS. For small samples, the effect of one outlier on the estimates of the mean vector and covariance matrix can be substantial. In the face of this dilemma, especially when using small sample sizes, we recommend carefully examining any observation considered for deletion and discussing it with the process engineer.
86
5.5
Purging Outliers: Unknown Parameter Case
Consider a Phase I purging procedure where a single observation vector X' = (xi, X2,. , xp] is to be monitored for control of the process using a T2 chart. We assume that the data follow an MVN distribution with an unknown mean vector H and an unknown covariance matrix E. From the preliminary data, we obtain estimates x and S of // and E using the procedures from Chapter 2. We begin the purging process by making an initial pass through the preliminary data. For a given a level, all observation vectors whose T2 values are less than or equal to the UCL will remain in the data set, i.e., retain X if where the control limit is determined by
and where -S[a, p /2,(n-p-i)/2] is the upper crth quantile of the beta distribution, B\p/2,(n-p-i)/2]- If an observation vector has a value greater than the UCL, it is to be purged from the preliminary data. With the remaining observations, we calculate new estimates of the mean vector and covariance matrix. A second pass through the data is now made. Again, we remove all detected outliers and repeat the process until a homogeneous set of observations is obtained. The final set of data is the HDS. When process control is to be based on monitoring the subgroup means of k samples of observations, the actual purging process of the preliminary data set is the same as for individual observations. The data are recorded in samples of size m^, i = l , 2 , . . . , f c , yielding a total sample size of n = ^ m^. Since each individual observation vector comes from the same MVN distribution, we can disregard the k subgroups and treat the observations as one group. With the overall group, we obtain the estimates, and S, and proceed as before. When the process is in control, this approach produces the most efficient estimator of the covariance matrix (e.g., see Wierda (1994) or Chou, Mason, and Young (1999)). 5.5.1 Temperature Example
i=l
To illustrate the above procedure when using individual observations, consider the 25 observations presented in Table 5.1. These data are a set of temperature readings from the eight configured burners on a boiler. The burner temperatures are denoted as ti, t2,..., ts, and their correlation matrix is presented in Table 5.2. The control procedure is designed to detect any significant deviation in temperature readings and any change in the correlations among these variables. If this occurs, a "cold spot" develops and inadequate burning results. The T2 values of the 25 observations are computed using (5.1) and are presented in Table 5.3. Using an a 0.001, the control limit is computed using the formula in (5.2). This yields a value of 17.416. Observation 9 is detected as an outlier since its T2 value of 17.58 exceeds the UCL. Hence, it is removed from the preliminary data set.
5.5. Purging Outliers: Unknown Parameter Case Table 5.1: Boiler temperature data.
Obs. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
ti 507 512 520 520 530 528 522 527 533 530 530 527 529 522 532 531 535 516 514 536 522 520 526 527 529
*2
87
516 513 512 514 515 516 513 509 514 512 512 513 514 509 515 514 514 515 510 512 514 514 517 514 518
t3 527 533 537 538 542 541 537 537 528 538 541 541 542 539 545 543 542 537 532 540 540 540 546 543 544
*4
516 518 518 516 525 524 518 521 529 524 525 523 525 518 528 525 530 515 512 526 518 518 522 523 525
*5
*6
499 502 503 504 504 505 503 504 508 507 507 506 506 501 507 507 509 501 497 509 497 501 502 502 504
512 510 512 517 512 514 512 508 512 512 511 512 512 510 511 511 511 516 512 512 514 514 516 512 516
472 476 480 480 481 482 479 478 482 482 482 481 481 476 481 482 483 476 471 482 475 475 477 475 479
*7
ts 477 475 477 479 477 480 477 472 477 477 476 476 477 475 478 477 477 481 476 477 478 478 480 476 481
Table 5.2: Correlation matrix for boiler temperature data.
tl
*2 *3
*4
tl
1
0.059
1
*2
*5
t6 t?
ts
0.059 0.584 0.901 0.819 -0.147 0.813 0.014
ts 0.584 0.281
1
0.281 0.258 0.044 0.659 0.094 0.797
0.901 0.258 0.444

1
*4
0.444 0.308 0.200 0.396 0.294
0.819 0.044 0.308 0.846

1
*5
0.849 -0.226 0.788 0.018
-0.147 0.659 0.200 -0.226 -0.231

1
*6
-0.231 0.924 -0.043
0.813 0.094 0.396 0.788 0.924 -0.103

1
<7
-0.103 0.893
ts 0.014 0.797 0.294 0.018 -0.043 0.893 0.079

1
0.079
New estimates of the mean vector and covariance matrix are computed from the remaining 24 observations, and the purging process is repeated. The new correlation matrix is presented in Table 5.4. Comparing the correlation matrices of the purged data and the unpurged data, we find a definite change. For example, with the removal of observation 9 the correlation between ti and 3 increases from 0.584 to 0.807. This illustrates the effect a single outlier can have on a correlation coefficient when there is a small sample size (n = 25). Note that such a significant change in the correlation matrix implies a similar change in the covariance matrix. The new T2 values are presented in Table 5.5. Since the new UCL for the reduced set of 24 observations is 17.00, the second pass through the data produces
88
Chapter 5. Charting the T2 Statistic in Phase I Table 5.3: T2 values for first pass.
Obs. No.
T2 Value Obs. No. T2 Value 1 14 9.55 13.96 2 9.78 15 7.07 6.52 3 16 5.47 17 14.74 4 4.77 8.74 18 6.58 5 9.84 19 5.31 6 8.64 20 7.89 7 8 12.58 21 9.78 22 2.79 17.58* 9 23 2.79 10 6.09 11 24 7.98 3.29 5.32 12 25 3.63 1.32 13 'Indicates T value is significant at 0.001 level.
Table 5.4: Correlation matrix with one outlier removed.
*1
*2 *3 *4
tl
1
0.051
1 0.342 0.259 0.034 0.662 0.087 0.799
t2
t5
*6
t7
ts
0.051 0.807 0.899 0.808 -0.141 0.805 0.021
0.807 0.342 1 0.717 0.506 0.204 0.569 0.320
*3
*4
0.899 0.259 0.717 1 0.838 -0.225 0.780 0.027
0.808 0.034 0.506 0.838 1 -0.228 0.922 -0.037
*5
-0.141 0.662 0.204 -0.225 -0.228 1 -0.096 0.893
*6
0.805 0.087 0.569 0.780 0.922 -0.096 1 0.086
*7
0.021 0.799 0.320 0.027 -0.037 0.893 0.086

1
Table 5.5: T2 values for second pass.

Obs. No.
1 2 3 4 5 6 7 8 10 11 12 13
T2 value 16.07 9.62 5.41 14.09 6.56 5.43 7.89 9.34 5.26 3.21 3.53 1.22
Obs. No.
14 15 16 17 18 19 20 21 22 23 24 25
T2 value 9.84 7.55 7.49 6.62 8.75 9.66 10.62 12.62 3.26 6.79 7.76 5.44
no additional outliers. The listed observations (excluding 9) form a homogeneous statistical group and can be used as the HDS. An alternative graphical approach to outlier detection based on the T2 statistic is to examine a Q-Q plot (see Chapter 3) of the appropriate T2 values. For the
5.5. Purging Outliers: Unknown Parameter Case
89
Figure 5.5: Q-Q plot of temperature data. above temperature data, with p = 2 and n = 25, the beta distribution for the T2 statistic using the formula given in (2.15) is
A Q-Q plot of the T2 values, converted to beta values by dividing them by 0.922, is presented in Figure 5.5. Several of the plotted points do not fall on the given line through the data. This is especially true for the few points located in the upper right corner of the graph. This is supported by the T2 values given in Table 5.3. where four points. 1, 4, 9, and 21, have T2 values larger than 10. Observation 9, located at the upper end of the line of plotted values, is somewhat removed from the others. Given this result, the point should be investigated as a potential outlier. 5.5.2 Transformer Example
As a second data example, consider 134 observations taken on 23 variables used in monitoring the performance of a large transformer. The T2 values of the preliminary data set are presented in the T2 control chart given in Figure 5.6. An a 0.00 was used in determining the UCL of 44.798. The chart indicates that an upset condition begins at observation 20 and ends at observation 27. Other than the upset condition, the T2 values for the preliminary data appear to be in steady state and indicate the occurrence of good process operations. A Q-Q plot of the 134 T2 values is presented in Figure 5.7. Although the data generally have a linear trend, the large values associated with the observations contained in the upset condition have a tendency to pull the other observations upward in the upper right corner of the graph. The 7 observations contained in the upset condition of Figure 5.6 were removed from the transformer data set, and the T2 control chart was reconstructed using the remaining 127 observations. The corresponding T2 plot is presented in Figure 5.8.
90
Figure 5.6: T chart for transformer data with upset condition.
Figure 5.7: Q-Q plot of transformer data.
Figure 5.8: T2 chart for transformer data after outlier removal.
5.6.
Purging Outliers: Known Parameter Case
91
Figure 5.9: Q-Q plot for transformer data after outlier removal.
The UCL is recalculated as 44.528. Observe that the system appears to be very consistent and all observations have T2 values below the UCL. The corresponding Q-Q plot of the T2 values is presented in Figure 5.9. Observe the strong linear trend exhibited in the plot and the absence of observations far off the trend line.
5.6
Purging Outliers: Known Parameter Case
Assume the sample data follow an MVN distribution having a known mean vector and known covariance matrix. In this setting, the T2 test statistic for an observation vector Xf = ( x - [ , X 2 , . . . ,xp) becomes
For a given a level, the UCL for the purging process is determined using
where xJa p\ is the upper ath quantile of a chi-square distribution having p degrees of freedom. To illustrate this procedure and contrast it to the case where the parameters are unknown, assume the sample mean vector and covariance matrix of the data in Table 5.1 are the true population values. Using an a 0.001, the UCL is ^fo 001 2) = 26.125. Comparing the observed T2 values in Table 5.3 to this value we find that no observation is declared an outlier. Thus, observation 9 would not be deleted. A major difference between the T2 statistics in (5.1) and (5.3) is due to how we determine the corresponding UCL. When the mean vector and covariance matrix are estimated, as in (5.1), the beta distribution is applicable, but when these parameters are known, as in (5.4), the chi-square distribution should be used. It can be shown that for large n, the UCL as calculated under the beta distribution (denoted as
92
Chapter 5. Charting the T2 Statistic in Phase I Table 5.6: Values of UCLB when a = 0.001.
n
50 2 8.55 6 15.00 30 39.27
100 8.88 15.90 45.39
150 8.99 16.21 47.27
200 9.04 16.36 48.19
250 9.08 16.45 48.74
300 9.10 16.51 49.10
350 9.12 16.55 49.36
400 9.13 16.58 49.55
450 9.14 16.61 49.70
500 0 0 9.14 9.21 16.63 16.81 49.82 50.89
UCL#) approaches the UCL as specified by the chi-square distribution (denoted as UCL^). This is illustrated in Table 5.6 for various values of n and p at a = 0.001. Notice that, in this example, using the chi-square (instead of the beta) UCL for small n and p, such as p = 2 and n = 50, increases the likelihood of accepting potential outliers in the HDS. This occurs because UCLc- (at n = oo) always exceeds UCL#. For large p, such as p = 30, n should exceed 500 in order to justify usage of the chi-square UCL. A comparison of the UCL for various values of n, p, and a for the two situations can be found in Tracy, Young, and Mason (1992).
5.7
Unknown T2 Distribution
In Chapter 3, we introduced three procedures for determining the UCL in a T2 control chart in cases where the distribution of the T2 statistic was unknown. The first was a method based on the use of Chebyshev's theorem, the second was based on the quantile technique, and the third was based on the fitting of the empirical T2 distribution using a kernel smoothing technique. In this section, we use a data example to show how these methods can be used to identify outliers. To demonstrate the procedure based on Chebyshev's theorem, we consider n = 491 observations on p = 6 variables taken as a preliminary data set under good operational conditions from an industrial process. The corresponding T2 values of the observations are computed and plotted on the control chart given in Figure 5.10. Examination of the chart clearly indicates the presence of a number of large T2 values. These large values are also indicated in the tail of the T2 histogram presented in Figure 5.11. The estimated mean and the standard deviation of the 491 T2 values are computed and the UCL is approximated using the formula in (3.7) from Chapter 3, i.e., with k = 4.472 and a < 0.10. The estimated UCL for this first pass is 19.617, and it is used to remove 13 outliers. The estimation of the UCL and the resultant purging process is repeated until a homogeneous data set of T2 values is obtained. Five passes are required and 28 observations are deleted. The results of each pass of this procedure are presented in Table 5.7.
5.7. Unknown T2 Distribution
93
Figure 5.10: T2 values for industrial data set.
Figure 5.11: Histogram of T2 values for industrial data. Table 5.7: Results of purging process using Chebyshev's procedure.
T
ST
UCL
# of Outliers Removed
Pass 1 5.988 4.310 19.617

13
Pass 2 5.987 3.630 17.468

6
Pass 3 5.987 3.488 17.019

4
Pass 4 5.987 3.407 16.761

3
Pass 5 5.987 3.348 16.575

2
Pass 6 5.987 3.310 16.455

0
94
Chapter 5. Charting the T2 Statistic in Phase I Table 5.8: Results of purging process assuming multivariate normality.
UCL
Pass 1 16.627
17
Pass 2 16.620
7
Pass 3 16.617
3
Pass 4 16.616
1
Pass 5 16.616
0
Table 5.9: Results of purging process using nonparametric procedure.

UCL
Pass 1 22.952
5
Pass 2 21.142
8
Pass 3 17.590
6
Pass 4 17.223
3
Pass 5 16.588
5
Pass 6 16.402
(1)
If one assumes that the T2 values are described by a beta distribution and calculates the UCL using (5.2) with an a = 0.01, the same 28 observations are removed. However, the order of removal is not the same and only four passes are required. These results are presented in Table 5.8. In this case, the major difference between the two procedures is that the probability of a Type I error is fixed at a = 0.01 for the beta distribution, whereas the error rate for the Chebyshev approach is only bounded by a < 0.10. To demonstrate the procedure based on the quantile technique, the 491 T2 values are arranged in descending order and an approximate UCL is calculated using a = 0.01 and the formula in (A3.3) from Chapter 3, i.e.,
The estimated UCL for this first pass is 22.952, and it is used to remove five outliers. The estimation of the UCL and the purging process are repeated until a homogeneous data set of T2 values is obtained. In this procedure, there will always be at least one T2 value exceeding the estimated UCL in each pass. Thus, one must stop at the step where only a single outlier is encountered. Since this occurs at step 6 for our data example, only five passes are required and 27 observations are deleted. The results of each pass of this procedure are presented in Table 5.9. The third method for obtaining an appropriate UCL is to fit a distribution to the T2 statistic using the kernel smoothing technique. The UCL can be approximated using the (1 a)th quantile of the fitted kernel distribution function of the T2. We begin by using the preliminary data of n observations to obtain the estimates X and S of the parameters // and E. Using these estimates, we compute the T2 values. These n values provide the empirical distribution of the T2 statistic for a Phase I operation. As previously noted, we are assuming that the intercorrelation common to the T2 values has little effect on the application of these statistical procedures. We apply the kernel smoothing procedure described by Polansky and Baker (2000) to obtain FK(t), the kernel estimate of the distribution of T 2 , or simply the
5.7. Unknown T Distribution
95
Table 5.10: Results of the purging process using the kernel smoothing technique.
UCL
Pass 1 23.568
5
Pass 2 22.531
4
Pass 3 19.418
4
Pass 4 18.388
4
Pass 5 17.276
5
Pass 6 16.626
4
Pass 7 16.045
3
kernel distribution of the T 2 . It is given as
where $ is the standard normal distribution function, T 2 (j), j = 1 , . . . , n, denote the n T2 values, and /i is the two-stage estimate of the bandwidth. An algorithm for computing h is outlined in Polansky and Baker (2000). The UCL is determined as the (1 a)th quantile of FK(i) and satisfies the equation
(5.7)
Since FK is generally a skewed distribution, the UCL can be large for small a values, such as 0.01 and 0.001. The (1 a)th sample quantile of the T 2 (j), j = 1 , . . . , n, can be used as the initial value for the UCL in (5.7). Since the kernel distribution tends to fit the data well, for a moderate a value between 0 and 1, approximately no. (rounded to the nearest integer) of the T2 values are beyond the UCL or the upper lOOcrth percentile of the kernel distribution. For n = 491 and a = 0.01, na = 4.91 ~ 5, and one may expect that four to five values of the T2 are above the UCL. After these outliers are removed in each pass, there are always some points above the newly calculated UCL. This seems to be inevitable unless n or a is very small, so that na is around 0. Because the kernel method is based solely on data, one way of determining the UCL for the final stage is to compare the UCLs and the kernel distribution curves for successive passes. If the UCLs for two consecutive passes are very different, this implies that the kernel distribution curves also differ significantly after outliers are removed. However, if the UCLs and the curves for two consecutive passes are nearly the same, this implies that the UCL is the desired UCL for the final stage. For the data in the example, the UCLs for Passes 7 and 8 are 16.045 and 15.829, respectively. The difference between the bandwidths of these kernel estimates is only 0.004. Therefore, the three points in Pass 7 cannot be viewed as outliers and should be kept in the HDS. After six passes, 26 observations are removed and the remaining 465 observations form the HDS. The UCL for the T2 chart should be set at 16.045, as Pass 7 is the final pass. Table 5.10 presents the results of the entire purging process.
96
5.8
Summary
Historical data sets are very important in process control as they provide a baseline for comparison. Any process deviation from this reference data set is considered out of control, even in those situations where the system improves. Because of this criticality, the process needs to be in control and on target when the HDS observations are selected. Inclusion of atypical observations will increase the variation and distort the correlations among the variables. As few as one outlying observation can do this. It is for these reasons that a good outlier purging procedure, such as one based on the T2 statistic, is needed. A common misconception is that the operational range of a variable in a multivariate process is the same as the operational range of a variable in a univariate process. This is only true for independent variables. For correlated variables, the operational range of the variables is increased. Correct outlier purging procedures will determine the appropriate operational ranges on the variables. Several different forms of the T2 can be used in detecting outliers in a Phase I operation. These include situations where the population parameters are both known and unknown and where observations are either individually charted or charted as means. In addition, three alternative procedures are available for use when the assumption of multivariate normality is invalid. These include the Chebyshev approach, the quantile method, and the kernel smoothing approach.
Chapter 6
Charting the T2 Statistic in Phase II
Old Blue: An Invaluable Tool

Using the software package, you quickly create a set of scatter plots, histograms, and time-sequence charts. That old statistics professor was right on target when he said graphical procedures provide true insight into the structure and behavior of your data. You visually spot a number of questionable observations. From your process experience, you know these are not typical. You find the procedure for creating the baseline data to be more involved than the methods you used with single-variable data. It takes some time for you to figure out that the heat transfer coefficient is causing the singularity in the covariance matrix. You should have caught that right away since the textbook pointed out that a computed variable sometimes contains redundant information when the variables used in constructing the computed variable are included in the analysis. The extremely high correlation between the heat transfer coefficient and the pressure drop was another clue that you missed. You examine a Q-Q plot of the T2 data from the Phase I operation and decide the beta distribution can be used to describe the T2 statistic for your process. As noted in the initial plots, this plot also indicates the presence of a few outliers. However, it only takes two passes through the data to remove them. These unusual observations are not like the rest of the data set. You wonder why they occurred and decide to set these aside for later investigation. All in all, the Phase I analysis shows "Old Blue" to be very consistent. The T2 values of the baseline data are all small and very close together and show very little variation. Since each T2 is a composite of observations on the available input, process, and output variables, you can easily determine the overall performance of the unit. You reflect on the time you spent in process operations. A tool based on the T2 chart, and used on-line and in real time, would have been invaluable. You recall the constant stress created in determining if all was going well.
98
Chapter 6. Charting the T2 Statistic in Phase II There was no single monitored variable useful in determining unit performance. You conclude that something must be done about incorporating this powerful measure into the unit operations. You now are prepared to move forward with your project. The next step is the location and isolation of the T2 signals from the incoming process data. You are ready for Phase II.
6.1 Introduction
A number of items need to be considered when choosing the appropriate T2 charting procedure for a Phase II operation. These include computing the appropriate charting statistic, selecting a Type I error probability, and determining the UCL. For example, if we monitor a steady-state process that produces independent observations, a T2 charting procedure will suffice. However, if the observations exhibit a time dependency, such as that which is inherent to decay processes, some adjustment for the time dependency must be made to the T2 statistic (i.e., see Chapter 10). The charting of the T2 statistic in a Phase II operation is very similar to the approach used in charting the statistic for a Phase I operation. The major difference is in the probability functions used in determining the control region. Two cases exist. When the mean vector and covariance structure are known, a chi-square distribution is used to describe the behavior of the statistic and determine the upper control limit. When the mean and covariance parameters are unknown and must be estimated from the historical data, an F distribution is used to describe the statistic and locate the upper control limit. In this chapter, we address several different charting procedures for the T2 statistic and examine the advantages and disadvantages of each. We initially discuss monitoring a process using a T2 statistic when only a single observation vector is collected at each time point. This is later extended to the situation where the process is monitored using the mean of a subgroup of observations taken at each time point. Other topics discussed include the choice of the probability of a Type I error, procedures for calculating the average run length to detect a given mean shift, and charts for the probability of detecting a shift in the mean vector. Any nonrandom pattern displayed in a T2 chart can imply process change. For this reason, we include a section on detecting systematic patterns in T2 charts.
6.2
Choice of False Alarm Rate
When constructing a T2 control chart, consideration must be given to the choice of a, the probability of a Type I error. Recall that this topic was briefly discussed in Section 5.4 for a Phase I operation. In a Phase II operation, a Type I error occurs when we conclude that an observation presents a signal when in fact no signal is present. Signaling observations are detected when their T2 values exceed the UCL. In turn, the UCL is primarily determined by the choice of a. This is an important
6.2. Choice of False Alarm Rate
99
decision, since a value of a can be chosen such that the T2 value of an observation exceeds the UCL, even though the observation really contains no signal. Note, also, that the size of the control region is 1 a. This is the probability of concluding that the process is in control when in fact control is being maintained on all process variables. The size of a cannot be considered without discussion of /3, the probability of a Type II error. This is the error of concluding there is no signal when in fact a signal is present. Type I and Type II errors are interrelated in that an increase in the probability of one will produce a decrease in the probability of the other. Careful consideration must be given to the consequences produced by both types of error. For example, suppose a chemical process is producing a product that becomes hazardous when a particular component increases above a given level. Assume that this component, along with several other correlated components, is observed on a regular basis. A T2 control procedure is used to check the relationships among the components as well as to determine if each is in its desired operational range. If a Type I error is made, needless rework of the product is required since the process is in control. If a Type II error is made, dangerous conditions immediately exist. Since dangerous conditions override the loss of revenue, a very small (3 is desirable. Given this preference for low risk of including an outlier, a large a would be acceptable. The value of a chosen for a T2 chart in a Phase II operation does not have to agree with the value used in constructing the Phase I chart. Instances do exist where making a Type I error in a Phase II operation is not so crucial. For example, suppose change is not initiated in the actual control of a process until more than one signal is observed. This reduces the risk of overcontrolling the process. Situations such as these require a larger a in the Phase II operation. In contrast, a large a for Phase I can produce a conservative estimate of both the mean vector and the covariance matrix, so some balance is necessary. The choice of a for a univariate charting procedure pertains only to the false alarm rate for the specified variable being monitored. For example, the control limits of a Shewhart chart are frequently located at plus or minus three standard deviations from the center line of the charted statistic. This choice fixes the false alarm rate a at a value of 0.0027 and fixes the size of the control region at (1 a) or 0.9973. This translates to a false alarm rate of about 3 observations per 1,000. The choice of a in monitoring a multivariate process is more complex, as it reflects the simultaneous risk associated with an entire set of variables (e.g., Timm (1996)). Establishing a control procedure for each component of the observation vector would lead to an inappropriate control region for the variables as a group, as individual control does not consider relationships existing among the variables. This is illustrated with the following example. Suppose X' = (1,0:2) is a bivariate normal observation on a process that is to be monitored by a joint control region defined by using a 3-sigma Shewhart procedure for each individual variable. The shaded box given in Figure 6.1 illustrates the control region. A major problem with this approach is that it ignores the relationship that exists between the process variables and treats them independently. The true joint control procedure, if the two variables were correlated, would be similar to the ellipse, which is superimposed on the box in Figure 6.1.
100
Chapter 6. Charting the T2 Statistic in Phase II
Figure 6.1: Region of joint control. If the p variables being monitored in a multivariate process are independent, the simultaneous false alarm rate, o;s, is computed as
where a is the false alarm rate for each individual variable. Thus the value of as increases as p, the number of variables, increases. For example, if a = 0.0027, the simultaneous false alarm rate for p 2 is 1 (0.0027)2 = 0.0053, but for p = 4, the rate increases to 0.0108. This example produces exact probabilities for a process with independent variables. In reality, a process usually consists of a group of correlated variables. Such situations tend to increase the true false alarm rate even beyond (6.1).
6.3
T2 Charts with Unknown Parameters
Consider a continuous steady-state process where the observation vectors are independent and the parameters of the underlying normal distribution are unknown and must be estimated. Assume the process is being monitored by observing a single observation vector, X' = (xi, 2, , xp), on p variables at each time point. The T2 value associated with X is given by
where the common estimates X and S are obtained from the HDS following the procedures described in Chapter 4. In this Phase II setting, the T2 statistic in (6.2) follows the F distribution given in (2.14). For a given a, the UCL is computed as
where n is the size of the HDS and F(a.p^n_p) is the ath quantile of F( p;n _ p ).
6.3. T2 Charts with Unknown Parameters
101
Figure 6.2: Example of a T2 chart. Using these results, we can plot new incoming observations in sequence on a control chart. An example of such a chart is given in Figure 6.2. The chart is the same as that of a Phase I chart used in the purging operation with the exception that the UCL is computed using a different form of the F distribution. Any new observation whose T2 value plots above the UCL is declared a signal, and it is concluded that the observation does not conform to the baseline data set. To illustrate the T2 charting procedure, we consider observations taken from the steam turbine system illustrated in Figure 6.3. The input to this system is fuel in the form of natural gas that is used to produce steam in the boiler. The high-pressure steam is used to turn the turbine, which is connected to the generator that produces electricity (in megawatts-hours). The warm low-pressure vapor from the turbine is moved to the condenser, where it is converted into a liquid state for pumping to the boiler. The boiler uses heat to transform the liquid water into high-pressure steam. The system is considered to be at steady state and works in a continuous cycle. Baseline data for 28 observations on a steam turbine system are presented in Table 6.1. Measurements are made on the following variables: fuel usage (Fuel), the amount of steam (Steam Flow) produced, the steam temperature (Steam Temp), the megawatt-hour production (MW) of the turbine, the coolant temperature (Cool Temp), and the absolute pressure (Pressure) observed from the condenser. Estimates of the mean vector and covariance matrix, obtained from these data, are given as follows:
and
- 5.2507 2.7607 -11749.5 S= 3313.42 -408.673 .-169.391
2.7607 1.91E07 -8112.06 2302.14 -203.795 -115.969
-11749.5 -8112.06 8.6918 -0.93735 0.152381 0.032143
3313.42 -408.673 -169.391 ' 2302.14 -203.795 -115.969 0.93735 0.152381 0.032143 0.285332 -0.02312 -0.0134 0.02312 0.043598 0.003757 0.0134 0.002474. 0.003757
102
Figure 6.3: Steam turbine system. Table 6.1: Phase I steam turbine data.
Obs. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Fuel 232666 237813 240825 240244 239042 239436 234428 232319 233370 237221 238416 235607 241423 233353 231324 243930 252550 251166 252597 243360 238771 239777 219664 228634 231514 235024 239413 228795 Steam Flow 178753 177645 177817 178839 177817 177903 177903 177990 177903 178076 177817 177817 177903 177731 178753 187378 187287 187745 188770 179868 181389 181411 167330 176137 176029 176115 176115 176201 Steam Temp 850 847 848 850 849 850 848 848 848 850 848 848 847 849 846 844 843 842 841 842 843 841 850 846 843 846 845 847
MW 20.53 20.55 20.55 20.57 20.57 20.59 20.57 20.55 20.48 20.49 20.55 20.55 20.55 20.53 20.64 21.67 21.65 21.67 21.78 20.66 20.81 20.88 19.08 20.64 20.24 20.22 20.31 20.24
Cool Temp 54.1 54.2 54.0 53.9 53.9 54.0 53.9 53.7 53.6 53.9 53.9 53.8 53.7 53.6 53.9 53.9 54.2 53.7 53.4 53.7 53.9 54.0 54.1 54.0 53.8 53.6 53.7 54.3
Pressure 29.2 29.2 29.2 29.1 29.2 29.1 29.2 29.1 29.1 29.1 29.1 29.1 29.1 29.1 29.1 29.1 29.1 29.1 29.1 29.1 29.1 29.1 29.2 29.2 29.2 29.2 29.2 29.2
6.3. T2 Charts with Unknown Parameters Table 6.2: Phase II steam turbine data.
Steam Fuel Obs. No. Flow 1 234953 181678 247080 189354 2 184419 238323 3 4 248801 189169 246525 5 185511 233215 6 180409 233955 181323 7 238693 181346 8 248048 185307 9 233074 181411 10 11 242833 186216 12 243950 182147 183349 238739 13 14 188012 251963 240058 183372 15 235376 16 182436 "Exceeds UCL = 43.91. Steam Temp 843 844 845 843 842 845 842 844 844 844 844 844 844 850 846 844
103
MW 20.84 20.86 21.10 22.18 21.21 20.75 20.82 20.92 21.15 20.93 21.59 21.37 21.01 21.68 21.15 20.99
Cool Temp 54.5 54.4 54.5 54.5 54.6 54.5 54.6 54.8 54.6 54.5 54.4 54.2 54.3 54.4 54.2 54.3
Pressure 29.0 28.9 28.9 28.9 28.9 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0 29.0
T2 35.00 167.98* 56.82* 69.48* 65.91* 32.56 43.91 49.33* 39.96 34.46 25.51 41.03 23.28 29.33 16.40 24.10
A T2 control procedure is developed, using these estimates, to monitor efficiency by detecting significant changes in any of the six monitored variables. The UCL for the chart is computed from (6.3), with a = 0.001, as
Considering the data in Table 6.1 as the HDS, T2 values for 16 new incoming observations are computed. Table 6.2 contains these values in addition to their corresponding T2 values. The T2 values are computed using (6.2) and the parameter estimates obtained from the historical data in Table 6.1. For example, the T2 value for observation 1 is
where
and the inverse matrix is obtained from S. A T2 chart for the new observations is presented in Figure 6.4. Out-of-control conditions occur at observations 2-5 and again at observation 8. Our conclusion is that these observations do not conform to the HDS presented in Table 6.1.
Reaction to a Signal
Any observation that produces a T2 value falling outside its control region is a signal. This implies that conditions have changed from the historical situation.
104
Figure 6.4: T2 chart for new observations.
Although an isolated signal can be due to a chance occurrence of an upset condition, multiple signals often imply a definite shift in the process. Should we react to single signals or do we wait for multiple signals before declaring that there are upset conditions? The answer depends on the process setting, such as whether or not the problem can be immediately corrected or can wait until the next scheduled maintenance period. An engineer may choose not to react to an isolated out-ofcontrol point in order to minimize the likelihood of overcontrolling the process. In such settings, searches are made for trends and patterns in the T2 control chart, and out-of-control conditions are declared only when a number of T2 values plot outside the control limit. Also, consultation with the process engineer and operators helps in determining if upset conditions exist in the process. Consider the performance of a steam turbine that is operating at steady-state conditions. Generally, a fixed amount of fuel is supplied to the turbine and a fixed load (MW) is the output. In this state, one would expect little fluctuation from the baseline data. However, steady-state conditions are known to deteriorate under load changes. For example, MW production varies with fuel usage and increases as the load increases while decreasing when the load decreases. Even though load changes occur over a very short time period, they can produce erratic behavior in the control system. Situations such as these, where there are upsets with known causes, should be identified. When the cause is unknown, upset conditions are declared when several points fall outside the control region. Upsets do not produce a critical state in the operation of a steam turbine. However, upset conditions in certain types of processing units can and do produce critical or dangerous situations. For example, reconsider the chemical process example of section 6.2, where a product is produced that becomes hazardous when a particular component increases beyond a specified level. One would react immediately to any indication of process movement in this system due to the severe consequences of ignoring it. The determination of the occurrence of a signal depends on many factors. We recommend thorough understanding of the process, determination of
6.4. T2 Charts with Known Parameters
105
the critical components, and a risk analysis to determine the consequences of the actions to be taken.
6.4
T2 Charts with Known Parameters
Sometimes the parameters of the underlying MVN distribution for the process data are known. This may occur because extensive past information is available about the operations of the process. For example, known mean vectors can arise in an industrial setting where "old" steady-state processes are studied for extended periods of time. However, our experience indicates that this seldom occurs in practice. Nevertheless, it is useful to consider the case where these parameters are known, as the results provide information on the use and application of the limiting distribution of the T2 statistic. If the parameters of the underlying MVN distribution are known, the T2 value for a single observation vector in a Phase II operation is computed using
The probability function used to describe this T2 statistic is the chi-square distribution with p degrees of freedom as given in (2.13). This is the same distribution used to purge outliers when constructing the HDS. For a given value of a, the UCL is determined as where xfa ) ig the upper ath quantile of % 2 . In this case, the control limit is independent of the size of the HDS. To illustrate the procedure for signal detection in the known parameter case, consider a bivariate industrial process. A sample of 11 new observations and their mean corrected values are presented in Table 6.3. The purpose of the control procedure is to maintain the relationship between the two variables (xi,x-z) and to guarantee that the two variables stay within their operational ranges. The mean vector arid covariance matrix are given as
and
The T2 chart for the data in Table 6.3 is presented in Figure 6.5. Letting a = 0.05, observations 1 and 10 produce signals since their T2 values are above the UCL = 4,05,2) = 5-99. The occurrence of situations where the distribution parameters are known is rare in industry. Processing units, especially in the chemical industry, are in a constant state of flux. A change in the operational range of a single variable of a process can produce a ripple effect throughout the system. Many times these changes
106
Chapter 6. Charting the T2 Statistic in Phase II Table 6.3: New observations from an industrial process.
XI Obs. No. 1 150.5 2 147.6 3 143.9 4 147.9 5 144.3 6 147.6 7 149.7 148.2 8 9 140.5 10 137.3 147.1 11 >i = 145.86; /^ =
X2
204.0 201.3 199.3 198.3 198.5 196.3 197.3 195.8 202.1 202.5 194.0 199.04.
xi - M i * 4.64 1.74 -1.97 2.03 -1.56 1.74 3.83 2.33 -5.47 -8.56 1.24
X2 - M 2 *
4.96 2.26 0.26 -0.74 -0.54 -2.74 -1.74 -3.24 3.06 3.46 -5.04
Figure 6.5: T2 chart for industrial process data. are initiated through pilot-plant studies, research center studies under controlled conditions, or from data obtained through other types of experiments. This requires constant updating of the baseline conditions, which in turn demands the use of new estimates of the parameters. Managers, performance engineers, process engineers, and the operators are constantly striving to improve the performance of the unit. There is no status quo.
6.5
T2 Charts with Subgroup Means
When the process observations at a given time point are made on a subgroup consisting of m vectors instead of only on a single vector, the mean of the subgroup is used in computing the charted value. If the individual vectors are described by a
6.5. T2 Charts with Subgroup Means
107
p-variate normal distribution Np(p,: S), we are assured that the mean vector X of a sample of m observations is distributed as a p-variate normal Np(n, E/m) with the same mean vector [i as an individual observation, but with a covariance matrix given by E/ra. If the individual observation vectors are not multivariate normally distributed, we are assured by the central limit theorem that as m increases in size, the distribution of X becomes more like that of an Np(n, E/ra). This produces the following changes in the charting procedure. When the parameters of the underlying MVN distribution are known, the T2 statistic for the iih subgroup mean Xi is computed by
and the UCL for a given a is determined by using (6.5). The control limit is independent of the sample size of either the subgroup or the HDS. When the parameters of the underlying MVN distribution are unknown, the T2 statistic for the ith sample mean Xi is computed as
where X and S are the common estimates of JJL and E obtained from the HDS. The distribution of a T2 statistic based on the mean of a subgroup of m observation vectors is given in (2.17). For a given a, the UCL for use with the statistic given in (6.7) is computed as
where n is the size of the HDS. To illustrate this procedure, we use the set of 21 observations on the electrolyzer data given in Table 4.6 in section 4.5 as an HDS. We calculate X and S from this data. Although overall performance of an electrolyzer is judged by the average efficiency of the 11 cells that compose it, performance is monitored by sampling only 4 of the individual cells. Replacement of the electrolyzer occurs when its monitored efficiency drops below the specified baseline values. Table 6.4 contains data from a sample of m 4 cells taken from six different electrolyzers. Four variables are observed on each cell along with two composite gas samples from all 11 cells. Observations on C>2 and Cl2 are composite (average) gas samples from all 11 cells, including the four sampled cells. A UCL using a = 0.01 is calculated from (6.8) as
The T2 values for the average of the four cells for each electrolyzer are listed in Table 6.5. When compared to UCL = 10.28, we conclude that electrolyzers 573 and 963 are to be removed from service and refurbished.
108
Chapter 6. Charting the T2 Statistic in Phase II Table 6.4: Electrolyzer data.

Elect No.
573
Cell No.
1 5 6 7
AVG.
372 3 5 7 10
AVG.
834 1 2 8 10
AVG. 1021
2 5 9 11
AVG.
963 3 5 10 11
AVG.
622 3 4 6 9
AVG.
NaOH 126.33 137.83 124.30 131.72 130.04 122.17 129.13 128.83 146.93 131.77 128.17 129.17 138.42 141.73 134.37 141.40 137.29 144.00 139.20 140.47 118.40 144.19 128.10 128.01 129.68 136.81 133.93 140.71 140.40 137.96
NaCl 190.12 188.68 199.23 191.00 192.26 191.72 193.85 188.87 184.57 189.75 192.43 191.15 180.13 176.25 184.99 197.89 201.25 194.03 190.21 195.84 215.24 197.75 205.33 173.79 198.03 201.74 198.98 195.19 192.03 196.98
0.20 0.22 0.34 0.27 0.26 0.14 0.13 0.15 0.23 0.16 0.12 0.06 0.18 0.20 0.14 0.11 0.18 0.10 0.17 0.14 0.27 0.13 0.34 0.19 0.23 0.19 0.15 0.19 0.26 0.20
Ii
6.32 14.28 15.56 7.55 10.93 1.37 0.96 0.81 0.60 0.94 1.13 1.19 1.08 1.01 1.10 2.92 9.01 0.30 5.82 4.51 21.60 0.78 10.94 0.77 8.52 3.03 2.38 0.92 0.62 1.74
12
1.47 1.47 1.47 1.47 1.47 1.21 1.21 1.21 1.21 1.21 1.12 1.12 1.12 1.12 1.12 1.64 1.64 1.64 1.64 1.64 1.71 1.71 1.71 1.71 1.71 1.38 1.38 1.38 1.38 1.38
02
98.10 98.10 98.10 98.10 98.10 98.36 98.36 98.36 98.36 98.36 98.43 98.43 98.43 98.43 98.43 97.91 97.91 97.91 97.91 97.91 97.82 97.82 97.82 97.82 97.82 98.18 98.16 98.16 98.16 98.18
C12
Table 6.5: T2 values for sample means for electrolyzer data.

Electrolyzer No. T2 Value for Mean Vector 573 46.006* 372 7.391 834 8.693 1021 3.955 963 19.749* 622 0.803 "Exceeds UCL = 10.28.
6.6
Interpretive Features of T2 Charting
What do we expect to see in a T2 chart? The first part of the answer to this question is easy. Under ideal conditions, the T2 chart will exhibit the same randomness as one might expect to see in any type of charting procedure. For example, in a univariate Shewhart chart, one expects to see a predominance of the plotted
6.6. Interpretive Features of T2 Charting
109
Figure 6.6: T2 chart for mercury cell data.
points distributed randomly about the center line. This occurs because, for a normal distribution, approximately 68% of the observations are contained within one standard deviation of the mean (centerline). Does something similar occur in a T2 chart? The answer is no, but it is not emphatic. In some types of industries, T2 charts are often unique and can be used to characterize the behavior of the process. Close study of the plotted statistic can produce valuable insight on process performance. Upset conditions, after they occur, become obvious to those involved. However, process conditions leading to upsets are not as obvious. If so. there would be few upsets. If the precursor conditions can be identified by examining the T2 plot, sometimes it is possible to avoid the upset. Figure 6.6 is the plotted T2 statistic for a control procedure on a mercury cell (Hg cell), which is another type of processing unit used to produce chlorine gas and caustic soda. Seven process variables are observed simultaneously in monitoring the performance of the cell. The plotted T2 statistics over the given time period illustrated in Figure 6.6 indicate a very steady-state iri-control process relative to the baseline data set. There is very little change in the pattern of the T2 statistic. The UCL, as determined from the HDS, has a value of 18.393; however, the values of the plotted T2 statistic are consistently located at a substantial distance below this value. Any erratic or consistent movement of the observed T2 values from the established pattern of Figure 6.6 would indicate a process change. Figure 6.7 illustrates such a condition, where the T2 values are increasing towards the control limit. Had process intervention been initiated around observation 1000, it may have been possible to prevent the upset conditions that actually occurred at the end of the chart. Of course, one needs tools to determine what variable or group of variables is the precursor of the upset conditions. These will be discussed in Chapter 7. Figures 6.6 and 6.7 present another interesting T2 pattern. Notice the running U pattern contained in both charts. This is due to the fluctuation in the ambient
110
Figure 6.7: T2 chart with precursor to upset.
Figure 6.8: T2 chart for glass furnace data. conditions from night to day over a 24-hour period, and it represents a source of extraneous variation in the T2 charts. Such variation can distort the true relationships between the variables, and can increase the overall variation of the T2 statistic. For example, the cluster of points between observations 800 and 900 in Figure 6.7 are U-shaped. Do they represent a process change or a change in ambient conditions? Removing the effect of ambient conditions would produce a clear process picture. Another example of a T2 chart is presented in Figure 6.8. Given are the T2 values for data collected on 45 process variables measured in the monitoring of a furnace used in glass production. Observe the steady-state operating conditions of
6.7. Average Run Length (Optional)
111
the process from the beginning of the charting to around observation 500. This pattern reflects a constant trend with minimal variation. After observation 500, the T2 values exhibit a slow trend toward the UCL with upset conditions occurring around observations 500, 550, 600, and 650. Corrections were made to the process and control regained at about observation 650. However, note the increase in variation of the T2 values and some signals between observations 650 and 1350. The T2 plot flattens out beyond this point and the steady-state pattern returns. These examples illustrate another important use of the T2 control chart. After the trends in the T2 chart have been established and studied for a process, any deviation from the established pattern indicates some type of process change. Sometimes the change is for the better, and valuable process knowledge is gained. Other times, the change is for the worse, and upset conditions occur. In either case, we recommend the investigation of any change in the plotted T2 values. Using this approach, expensive upsets that lead to chaotic conditions can be avoided.
6.7
Average Run Length (Optional)
The average run length (ARL) for a control procedure is defined as
where p represents the probability of being outside the control region. For a process that is in control, this probability is equal to a, the probability of a Type I error (see section 6.2). The ARL has a number of uses in both univariate and multivariate control procedures. For example, it can be used to calculate the number of observations that one would expect to observe, on average, before a false alarm occurs. This is given by Another use of the ARL is to compute the number of observations one would expect to observe before detecting a given shift in the process. Consider the two univariate normal distributions presented in Figure 6.9. One is located at the center line (CL) and the other is shifted to the right and located at the UCL. The probability of detecting the shift (i.e., the probability of being in the shaded region in Figure 6.9) equals (1 /3), where /3 is the probability of a Type II error (see section 6.2). Given the shift, this probability can be determined using standard statistical formulas (e.g.. see Montgomery (2001)). The ARL for detecting the shift is given by From Chapter 5, we recognize that the probability (1 (3) represents the power of the test of a statistical hypothesis that the mean has shifted. This result produces another major use of the ARL, which consists of comparing one control procedure to another. This is done by comparing the ARLs of the two procedures for a given process shift. Shifts and the probability of detection, (1 /3), are easy to compute in the univariate case. However, it is more difficult to do these calculations in the multivariate
112
Figure 6.9: Univariate process shift.
Figure 6.10: Multivariate process shift.
case. Consider a bivariate control region and a mean shift of the process as represented in Figure 6.10. We assume that the covariance matrix has not changed and remains constant for the multivariate distributions. Hence, the orientation of the control region for the shifted distribution is the same as that of the control region for the in-control process. The area of the shifted distribution that corresponds to the shaded region in Figure 6.10 equals the probability (1 /?) of detecting the shift. This probability can be computed analytically for p = 2, but becomes very
6.7. Average Run Length (Optional)
113
Figure 6.11: Control region and chi-square distribution.
difficult for higher dimensions. However, using additional statistical theory and the nonnull distribution of the T2 statistic, the problem can be simplified. Suppose the parameters, \ix and S, of the MVN distribution are known. The T2 control region for an in-control observation vector X is described by a chisquare distribution (see section 6.4) and can be compared to the UCL based on that distribution; i.e.,
In terms of the chi-square distribution, the control region is represented by the shaded region in Figure 6.11. This particular distribution is known as a central chi-square. This is due to the fact that the vector (X }JLX} is described by an MVN distribution with a mean vector of zero. If one considers an observation vector Y from another MVN distribution with the same covariance matrix but with a different mean vector jjLy. its T2 value is given by but it cannot be described by the central chi-square distribution. This is because the MVN that describes the vector (Y /i x ) has a mean different from zero. However, we can determine the mean of the normal vector (Y /j,x) in terms of /j,x and ^y. Consider
where 6 = (/j,y IJLX} represents the mean shift. With this result, the distribution of T2 is given by
114
Figure 6.12: Shifted chi-squared distribution.
where x'(^ \\ is a noncentral chi-square distribution with p degrees of freedom. A major difference between this distribution and the central chi-square is the additional parameter A, labeled the noncentrality parameter. It can be expressed as
Changing the noncentrality parameter produces changes in the distribution. For example, the mean of a central chi-square is given by the degrees of freedom, p, whereas for the noncentral distribution, the mean equals (p + A). Similarly, the variance changes from 2p for the central chi-square to (2p + 4A) for the noncentral chi-square. As A becomes large, the differences in the two distributions are significant, but as A approaches zero, the noncentral distribution approaches the central chi-square. Our interest lies in computing the probability of detecting a given shift in the process. This is accomplished by computing probabilities from the noncentral chisquare. Another representation of the shift depicted in Figure 6.10 is given in terms of the central and noncentral chi-square rather than in terms of normal distributions. This is represented in Figure 6.12. The central chi-square distribution represents the in-control situation and the noncentral chi-square distribution represents the distribution shifted to the right. It can be shown that the shaded area (1 /3) under the noncentral chi-square distribution in Figure 6.12 is the same as the shaded region of Figure 6.10. For univariate control procedures, shifts of the process distribution are usually expressed in terms of the standard deviation. This is not feasible with multivariate processes, since shifts involve more than one variable. Shifts of multivariate processes are expressed in terms of values of the noncentrality parameter A, which is a function of both the mean distance between the two distributions and the covariance matrix S.
6.8. Plotting in Principal Component Space (Optional)
115
To determine the probability of detecting a given shift, the noncentral chi-square distribution must be evaluated above the UCL. This integral is given as
where w ~ X 2 p + 2?) , and where UCL = X?a.p,xy A similar discussion can be presented for a Phase II operation where the parameters are unknown. The distributions involved are the central F distribution for the in-coritrol situation and the noncentral F for the shifted distribution. Again, shifts are measured in terms of the same noncentrality parameter A. The probability of detecting various shifts (A) for a given sample size n, significance level a, and number of variables p can be determined by integrating the noncentral F distribution similar to (6.9). The shift studied in this section and depicted in Figure 6.10 is a simple mean shift. Unfortunately, not all shifts are of this type. For example, in a multivariate setting, there can be simultaneous shifts of the mean vector and covariance matrix. Computing the probability of such shifts is much more complicated than the above examples.
6.8
Plotting in Principal Component Space (Optional)
In Chapter 4, we introduced PCA as a tool used in the location of singularities in the covariance matrix. In this section, we expand the use of principal components to include a charting procedure for the T2 statistic. This use requires more mathematical rigor than previously given. The plotting of T2 values in a principal component space can provide additional insight into process performance for certain types of problems. Consider a process generated from a bivariate normal distribution with known parameters /j, and S. For an observation vector X, control of the process is maintained when the T2 statistic is less than the UCL; i.e.,
Recall from Chapter 2 that the curve T2 = UCL establishes an elliptical control region. An example of such a control region, where x\ and x% are positively correlated, is illustrated in Figure 6.13. A number of observations can be made about the control region represented in Figure 6.13. It is referenced by three different coordinate systems. The first is the variable space, represented by (.TI, x ^ ) . This is obtained by expanding (6.10) as a function of x\ and x2 and constructing the graph. If we standardize x\ and 2, using the values
116
Figure 6.13: Control ellipse for a bivariate system. we obtain the translated axes (yi, 7/2) located at the center of the ellipse in Figure 6.13. The T2 statistic in terms of y\ and 7/2 takes the form
r2 = (x - ^'T,-I(X -n} = Y'P~IY,

where Y' = (yi, 2/2) and P is the correlation matrix for X. In the (yi, 7/2) space the ellipse remains tilted as it was in the (xi, #2) space since the correlation between yi and 7/2 is the same as the correlation between x\ and x^. The third set of axes in Figure 6.13 is obtained using the transformation
Under this transformation, the T2 has the form
where Z' = (21,22) and the matrix A is a diagonal matrix with the eigenvalues of P along the diagonal. The above rotation of the (xi, x^} space to the (21, 22) space removes the dependency (correlation) between x\ and x^. In the (z\, z^) space, the elliptical control region is not tilted, since z\ and z2 are independent. Further, the z\ and z^ values are expressed as linear combinations of y\ and y^ and, hence, ultimately as linear combinations of x\ and x^. As such, these variables are the principal components of the correlation matrix for x\ and x 2 . If x\ and x% are not standardized, the z\ and z<2 variables are the principal components of the corresponding covariance matrix E and will have a different representation. For purposes of plotting, it is usually best to use the correlation matrix.
6.8. Plotting in Principal Component Space (Optional)
117
Figure 6.14: Principal component control region. The control region for the T2 statistic can be written in terms of z\ and z^ by expanding the matrix multiplication of (6.10) to obtain
where p is the population pairwise correlation between x\ and x^. Note that the eigenvalues of the correlation matrix for x\ and x% are (1 + p) and (1 p), so the T2 statistic is now expressed in terms of these variables. If the equation in (6.13) is set equal to the UCL, it forms a bivariate elliptical control region that can be used as a charting procedure in the principal component space, i.e., in terms of z\ and z%. For example, given a bivariate observation (xi, 2), we can standardize the observations using (6.11). The principal components z\ and 2 are computed using (6.12) and plotted in the principal component space. Observations plotting outside the elliptical region are out of control, as they do not conform to the HDS. The point A in the principal component control region presented in Figure 6.14 illustrates this. The method of principal component translation can be generalized to the pdimensional case. The control region for the T2 statistic can be expressed in terms of the p principal components of the correlation matrix as
where AI > A2 > computed as
> Ap are the eigenvalues of the correlation matrix. Each Zi is
where Ui: i = 1,2, . . . , p , are the corresponding normalized eigenvectors of the correlation matrix and Y is the standardized observation vector. Again, note that each Zi is a linear combination of the standardized process variables.
118
The graph of (6.14), when set equal to the control limit, is a hyperellipsoid in apdimensional space. However, plotting this region for p > 3 is not currently possible. As an alternative, one can plot any combination of the principal components in a subspace of three or fewer dimensions. This procedure also has a major drawback. Any point (2j, Zj, z^) that plots outside the region defined by (6.14) will produce a signal, but there is no guarantee that a point plotting inside the region does not contain a signal on another principal component.
6.9
Summary
Signal detection is an important part of any control procedure. In this chapter, we have discussed charting the T2 statistic when monitoring a process in a Phase II operation. This includes the charting of the T2 statistic based on a single observation and the T2 statistic based on the mean of a subgroup of observations. Included are procedures to follow when the parameters of the underlying MVN distribution are known and when they are unknown. In both cases it is assumed that the covariance structure is nonsingular. Also contained in this chapter is a discussion on determining the ARL for a T2 chart. To calculate an ARL for a given mean shift in a multivariate distribution involves the introduction of noncentral distributions and the evaluation of some complicated integrals. For the reader who has a deeper interest in this area, there are many excellent texts in multivariate analysis on this subject, e.g., Johnson and Wichern (1998), Fuchs and Kenett (1998), and Wierda (1994). An optional section on the plotting of the T2 statistic in a principal component space was presented. As pointed out in the discussion, the procedure has both advantages and disadvantages. A major advantage is that one can plot and observe signals on particular principal components in a subspace of the principal component space. However, a major disadvantage is that each principal component is a linear combination of all the process variables. This often inhibits a straightforward interpretation procedure in terms of the process variables.
Chapter 7
Interpretation of T2 Signals for Two Variables
7.1 Introduction
Univariate process control usually involves monitoring control charts for location and variation. For example, one might choose to monitor mean shifts with an X chart and variation shifts with an R chart, as both procedures are capable of detecting deviations from the historical baseline. In this setting, signal interpretation is simplified, as only one variable needs to be examined. A signal indicates that either the process has shifted and/or the process variation has changed. In multivariate SPC, the situation becomes more complicated. Nonconformity to a given baseline data set can be monitored using the T2 statistic. If the observed T2 value falls outside the control region, a signal is detected. The simplicity of the monitoring scheme, however, stops with signal detection, as a variety of variable relationships can produce a signal. For example, an observation may be identified as being out of control because its value for an individual variable is outside the bounds of process variation established by the HDS. Another cause of a signal is when values on two or more variables do not adhere to the linear correlation structure established by the historical data. The worst case is when the signal is a combination of the above, with some variables being out of control and others being countercorrelated. Several solutions have been posed for the problem of interpreting a multivariate signal. For example, Doganaksoy, Faltin, and Tucker (1991) proposed ranking the components of an observation vector according to their relative contribution to a signal using a univariate t statistic as the criterion. Hawkins (1991, 1993) and Wade and Woodall (1993) separately used regression adjustments for individual variables to improve the diagnostic power of the T2 after signal detection. Runger, Alt, and Montgomery (1996) proposed using a different distance metric, and Timm (1996) used a stepdown procedure for signal location and interpretation. An overview of several of these multivariate process control procedures, including additional
119
120
Chapter 7. Interpretation of T2 Signals for Two Variables
ones by Kourti and MacGregor (1996) and Wierda (1994), can be found in Mason, Champ, Tracy, Wierda, and Young (1997). Also, several comparisons are given in Fuchs and Kenett (1998). In this chapter, we present a method of signal interpretation that is based on the orthogonal decomposition of the T2 statistic. The independent decomposition components, each similar to an individual T2 variate, are used to isolate the source of a signal and simplify its interpretation. The discussion is limited to a two-variable problem, as it is the easiest to geometrically visualize. The more general p-variable case is presented in Chapter 8.
7.2
Orthogonal Decompositions
The typical T2 statistic for an observation vector X' = (xi, X 2 , . . . , xp) is given as
The major purpose of our discussion in this section is to provide a methodology for interpreting a signal with this statistic. A procedure for achieving this goal is to decompose or separate the T2 value into additive orthogonal components that can be related to the p process variables. A close examination of how this is done will provide insight into the workings and understanding of the proposed T2 decomposition. Orthogonal decompositions are standard tools in well-known statistical procedures such as the analysis of variance and regression analysis. For example, in regression analysis, the total sum of squares (i.e., total variation) of the response variable is separated into two independent additive components. One component, the regression sum of squares, measures the contribution of the predictor variables (to the total variation), and the other component, the residual sum of squares, measures the contribution of the model error (to the total variation). A similar decomposition approach can be used with the T2 statistic. Consider a two-variable case where an observation vector is denoted by X' = (xi^x-z) and has a bivariate normal distribution with unknown mean vector and covariance matrix. Assume that the two variables are independent so that, in a sample of n observations, their pairwise correlation is zero. Also assume that the two individual sample variances, s2 and s 2 , are unequal, and denote the two corresponding sample means as x\ and x 2 . AT 2 elliptical control region is created using the following formula:
where c is an appropriately chosen constant that specifies the size of the control region (see section 6.2). A typical control region is illustrated by the interior of the ellipse given in Figure 7.1. Any sample point located on the ellipse would be located the same statistical distance from the sample mean as any other point on the ellipse.
7.2. Orthogonal Decompositions
121
Figure 7.1: Bivariate independent control region with unequal variances.
There are two additive components to the T2 statistic given in (7.1), and these provide a natural decomposition of the corresponding statistic. The components, in fact, are independent due to the independence of the two original x variables. This property is what causes the ellipse not to be tilted. Since the components are unequally weighted due to their unequal variances, we will transform the variables to a form that will provide equal weights. Doing so will produce a circular region and make the statistical distance, represented by the square root of the T2 in (7.1), equivalent to the corresponding Euclidean, or straight-line, distance. This is the view we need in order to interpret the T2 value. Let represent the standardized values of x\ and x2j respectively. Using this transformation, we can re-express the T2 value in (7.1) as
The T2 value is again separated into two independent components as in (7.1), but now the components have equal weight. The first component, y 2 , measures the contribution of x\ to the overall T2 value, and the second component, y2: measures the contribution of x2 to the overall T2 value. Careful examination of the magnitude of these components will isolate the cause of a signal. The control region using the transformation given in (7.3) is illustrated by the interior of the circle depicted in Figure 7.2. A T2 value in this orthogonal transformed space is the same as the squared Euclidean distance that a point (2/1,3/2) is from the origin (0,0), and it can be represented by the squared hypotenuse of the
122
Figure 7.2: Bivariate orthogonal control region with equal variances. SD refers to statistical distance.
enclosed right triangle depicted in Figure 7.2. This is equivalent to the statistical distance the point (zi,#2) is from the mean vector (x\, 2)- Thus, all points with the same statistical distance are located on the circle in Figure 7.2, as well as on the ellipse in Figure 7.1. Consider a situation where the variables of X' = (xi,x 2 ) are not independent. Since the pairwise correlation r between x\ and X2 is nonzero, the T2 value would be given as
and the corresponding elliptical control region would be tilted. This is illustrated in Figure 7.3. Again, letting yi and y2 represent the standardized values of xi and a?2, the T2 value in (7.5) can be written as
Unfortunately, as can be seen from examining (7.6), we cannot separate the contributions of yi and 7/2 to the overall T2 value, since transforming to the standardized space does not remove the cross-product term; i.e., the transformation is not orthogonal. Thus, the resultant control region is elliptical and tilted, as is illustrated in Figure 7.4. To express the statistical distance in (7.6) as a Euclidean distance so that it can be visualized, we must first transform the axes of the original (xi, #2) space
7.2. Orthogonal Decompositions
123
Figure 7.3: Bivariate nonindependent control region with unequal variances.
Figure 7.4: Bivariate translated control region.
to the axes of the ellipse. This is not done in either Figure 7.3 or Figure 7.4. For example, the axes of the ellipse in Figure 7.4 do not correspond to the axes of the (yii 2/2) space. The axes can only be aligned through an orthogonal transformation. In this nonindependent case, the transformation in (7.6) is incomplete, as it does not provide the axis rotation that is needed. The T2 statistic in (7.6) can be separated into two additive components using the orthogonal transformation
124
Figure 7.5: Bivariate principal component control region with unequal variances.
and
As was shown in Chapter 6, the values z\ and z^ are the first and second principal components of the correlation matrix for x\ and x% (also see the appendix to this chapter, section 7.10). Using this transformation, we can decompose the T2 value in (7.6) as follows:
Unfortunately, a graph of the control region of (7.8) in the principal component space still does not allow easy computation of the statistical distance since the principal components are unequally weighted. This is illustrated in Figure 7.5 by the different sizes of the axes of the ellipse. The first principal component has a weight equal to the reciprocal of AI = (1 + r) and the second principal component is inversely weighted by A2 (1 r). The weights are equal only when r = 0, i.e., when the original variables are independent. The above problem can be circumvented by transforming the z values in (7.7) to a new space where the variables have equal weights. One such transformation is
and
7.3. The MYT Decomposition
125
Figure 7.6: Bivariate principal component control region with equal variances.
Using this orthogonal transformation, we can express the T2 value in (7.1) as
The resultant control region, presented in Figure 7.6, is now circular, and the (squared) statistical distance is represented by the hypotenuse of a right triangle. The transformation given in (7.10) provides an orthogonal decomposition of the T2 value. Thus, it will successfully separate a bivariate T2 value into two additive and orthogonal components. However, each w\ and u>2 component in (7.10) is a linear combination of both x\ and x^. Since each component consists of both variables, this hampers clear interpretation as to the source of the signal in terms of the individual process variables. This problem becomes more severe as the number of variables increases. What is needed instead is a methodology that will provide both an orthogonal decomposition and a means of interpreting the individual components. One such procedure is given by the MYT (Mason-Young-Tracy) decomposition, which was first introduced by Mason, Tracy, and Young (1995).
7.3
The MYT Decomposition
Since the T2 statistic is a sum of squares, there are an infinite number of ways to separate it into p independent (orthogonal) components. For example, the decomposition of the T2 statistic into its principal components is one such representation. As discussed in section 7.2, a major problem with decompositions of this type is the interpretation of the components. For example, principal components can be
126
difficult to interpret, as they are linear combinations of the p variables of the observation vector. The components of the MYT decomposition of the T2 statistic, in contrast, have global meaning. This is one of the most desirable characteristics of the method. We will demonstrate the MYT procedure for a bivariate observation vector X' = (#1, 2)5 where x\ and x% are correlated. Details on the more general p-variable case can be found in Chapter 8. The MYT decomposition uses an orthogonal transformation to express the T2 values as two orthogonal and equally weighted terms. One such decomposition is given by
where
and
In this formulation, x2.i is the estimator of the conditional mean of x% for a given value of xi, and s|.i is the corresponding estimator of the conditional variance of #2 fr a given value of x\. Details on these estimators are given in the last section of this chapter. The first term of the MYT decomposition in (7.11), Tf, is referred to as an unconditional term, as it depends only on x\. The second term of the orthogonal decomposition, written as T|1? is referred to as a conditional term, as it is conditioned on the value of x\. Using the above notation, we can write (7.11) as
The square root of this T2 value can be plotted and viewed in the T\ and T^.i space. This is illustrated in Figure 7.7. The orthogonal decomposition given in (7.11) is one of two possible MYT decompositions of a T2 value for p = 2. The other decomposition is given as
where
and
The representation of T2 given in (7.15) is different from the representation given in (7.11). The first term of the decomposition in (7.15) is an unconditional term for the variable x-z, whereas the first term of (7.11) is an unconditional term for the variable x\. Similarly, the conditional term of (7.15) depends on the conditional density of x\ given #2, while the conditional term of (7.11) depends on the conditional density of #2 given x\. These two conditional terms are not the same except in the case where x\ and x% are uncorrelated.
7.4. Interpretation of a Signal on a T Component
127
Figure 7.7: T2 plot for MYT bivariate decomposition.
7.4
Interpretation of a Signal on a T"2 Component
Equations (7.11) and (7.15) represent the two possible MYT decompositions for an observation vector X' (a?i, x^}. Considering both decompositions, we have four unique terms: two unconditional terms given by T2 and T| and two conditional terms represented by T 2 2 and Tjl. A signaling T2 value can produce large values on any combination of these terms. Consider a large value for either the unconditional term
or the unconditional term An unconditional term is the square of a univariate one-sample t statistic, and it measures the statistical distance of the observed value Xi from its mean a^. If the observed value of the variable is out of tolerance (i.e., outside its operational range as based on the HDS), a signal is obtained on the unconditional term. Interpretation of a large value on a conditional term is more involved. Consider the conditional term which is a measure of the squared statistical distance of the observed value x% is from the conditional mean 2.1- This distance can be examined graphically. For example, consider the representation of the control region in the variable space
128
Figure 7.8: Interpretation of the T221 component.
Figure 7.9: Interpretation of the T 2 2 component. (xi, # 2 ) presented in Figure 7.8. As the value of x\ changes, so does the value of X2.i. Consider a fixed value of xi, say x\ = a. This is represented by the vertical line drawn upwards from the x\ axis. For process control to be maintained at this value of #1, the corresponding value of x% must come from the shaded interval along
7.5. Regression Perspective
129
the x-2 axis. This means the value of x% must be contained in this portion of the conditional density, otherwise, a signal will be obtained on the T|: component. A similar discussion can be used to illustrate a signal on the T^2 term. Suppose the value of x^ is fixed at a point b and we examine the restricted (conditional) interval that must contain the observation on x\. This is depicted in Figure 7.9. If the value of x\ is not contained in the shaded interval, a signal will be obtained on the T^2 component. A large value on a conditional term implies that the observed value of one variable is not where it should be relative to the observed value of the other variable. Observations on the variables (xi, #2) that produce a signal of this type are said to be countercorrelated, as something is astray with the relationship between x\ and X2- Countercorrelations are a frequent cause of a multivariate signal.
7.5
Regression Perspective
As stated earlier, a signal on a conditional term implies something is wrong with the linear relationships among the involved variables. Additional insight for interpreting a signaling conditional term is obtained by examining these terms from a regression perspective. Note that the line labeled x^.i in Figure 7.8 is the regression line of x-2 on xi: and likewise, the line labeled x\^ in Figure 7.9 is the regression line of x\ on x^. In general, Tf- is a standardized observation on the ith variable adjusted by the estimates of the mean and variance from the conditional distribution associated with Xi.j. For the bivariate case, the general form of a conditional term is given as
Consider the estimated mean of x% adjusted for Xj, i.e., Xi.j. This is given as where Xi and Xj are the sample means of Xi and Xj obtained from the historical data, and b is the estimated regression coefficient relating Xi to Xj in this data set. The left-hand side of (7.19) contains Xi.j, which is the predicted value of x^ based on the corresponding value of Xj (i.e., x^ is the dependent variable and x3 is the predictor variable). Thus, the numerator of (7.18) is a regression residual; i.e., Rewriting the conditional variance as where JT^.J is the multiple correlation between Xi and x j ; and substituting r^.j for (xi Xj.j), we can re-express T? as
or
We use this notation for consistency with the formula used in the p-dimensional case discussed in Chapter 8.
130
Figure 7.10: Residual from regression of x% on x\.
The conditional T2 term in (7.20) explains how well a future observation on a particular variable is in agreement with the value predicted by the other variable. When the denominator in (7.20) is very small, as occurs when the pairwise correlation is near 1, we would expect very close agreement between the observed and predicted Xi values. Otherwise, "largeness" of the conditional T2 term will be due to the numerator, which is a function of the agreement between the observed and predicted values of Xj. A significant deviation between these two values will produce a large T2 term. As an example, consider an observation (#1, #2) that is inside the Shewhart control "box" formed using the two variables. Suppose there is significant disagreement in magnitude between an observed x\ value and the corresponding predicted value obtained using the value of 2 and (7.19). This implies that the observation on this particular component is below or above what was predicted by the HDS. To better understand the result in (7.20), consider the two conditional T2 terms whose square roots are given by
and
where r^.i = (#2 2:2.1) and ri.2 = (x\ #1.2) are residuals from the respective regression fits of x% on x\ and x\ on x-2- These residuals are illustrated in Figures 7.10 and 7.11.
7.6. Distribution of the T2 Components
131
Figure 7.11: Residual from regression of x\ on x%-
Notice that the two conditional values in (7.21) and (7.22), apart from the R? j term, are actually standardized residuals having the form of 7Yj/Si. When the residuals (after standardizing) in Figures 7.10 and 7.11 are large, the conditional T2 terms signal. This would occur only when the observed value of x\ differs from the value predicted by x^, or the observed value of x% differs from the value predicted by xi, where prediction is derived from the HDS.
7.6
Distribution of the T"2 Components
Issues pertaining to the largeness of the T2 components of the MYT decomposition can be resolved by determining the probability of observing specific (or larger) values for each individual term. In order to do this, we must know the probability distribution of the individual terms. All terms, both conditional and unconditional, under the assumption of no signal, are described by an F distribution. For example, the unconditional terms that are used to determine whether the individual variables are within tolerance are distributed as
for j = 1,2. Similarly, the conditional terms, T?-, used in checking the linear relationships between the variables are distributed as
where k equals the number of conditioned variables. When k = 0, the distribution in (7.24) reduces to the distribution in (7.23). Derivational details are supplied in
132
Mason, Tracy, and Young (1995). Thus, one can use the F distribution to determine when an individual unconditional or conditional term of the decomposition is significantly large and makes a contribution to the signal. The procedure for making this determination is as follows. For a specified a level and HDS sample of size n, obtain ^(a,i,n-fc-i) from the appropriate F table. Compute the UCL for individual terms using
Compare each individual unconditional term of the decomposition to the appropriate UCL. All terms satisfying
imply that the corresponding Xj is contributing to the signal. Likewise, any conditional term greater than its UCL, such as
implies that Xi and Xj are both contributing to the signal. The above procedure for locating contributing terms of a decomposition for a signaling observation vector is not exact. To see this, consider the acceptance region for maintaining control for both conditional and unconditional terms when p = 2. A MYT decomposition for a T2 value can be represented bv
Using this representation, the T? acceptance region is given by
Likewise, the TjA acceptance region is defined by
The relationship between these regions, as denned by the distribution of the individual terms and the elliptical control region, is presented in Figure 7.12. This demonstrates that, when we use the distribution of the individual terms of a T2 decomposition to detect signal contributors, we are approximating the elliptical control region with a parallelogram. This parallelogram acceptance region is equivalent to the acceptance region specified by a cause-selecting (CS) chart (e.g., see Wade and Woodall (1993)). The CS method is based on the regression adjustment of one variable for the value of the other. The same approach is used in constructing the conditional terms of the MYT decomposition. In general, the CS procedure
7.6. Distribution of the T2 Components
133
Figure 7.12: Acceptance regions for T2 and T^.
does not require the assumption of nmltivariate normality. However, without this assumption, or some distributional assumption, it is difficult to detect a signal. We would like the parallelogram and ellipse in Figure 7.12 to be similar in size. The size of the ellipse is determined by the choice of the overall probability, labeled cci, of making a Type I error or of saying the process is out of control, when in fact control is being maintained. The size of the parallelogram is controlled by the choice of the specific probability, labeled a 2 , used for testing the "largeness" of individual terms. Thus, a-2 represents the probability of saying a component is part of the signal when in fact it is not. The two a's are not formally related in this situation. However, ambiguities can be reduced by making the two regions agree in size. We use the F distributions in (7.23) and (7.24) to locate large values among the T2 decomposition terms. This is done because, given that an overall signal exists, the most likely candidates among the unique terms of a total MYT decomposition are the components with large values that occur with small a priori probabilities. Our interest in locating the signaling terms of the MYT decomposition is due to the ease of interpretation for these terms. Consider an example to illustrate the methodology of this section. For p = 2, the control region is illustrated in Figure 7.13. Signals are indicated by points A, B, C. and D. Note that the box encompassing the control region represents the tolerance on variables x\ and x^ for a = 0.05, as specified by (7.26). The tolerance regions are defined by the Shewhart control limits of the individual variables for the appropriate a level.
134
Figure 7.13: T2 control region with four signaling points. Table 7.1: Decomposition summary for bivariate example.
T2 Value Point A 10.05* B 6.33* 6.63* C D 9.76* * Denotes significance
2 T J 2 1? 10.03* 2.78 3.49 0.11 0.34 3.01 2.54 1.73 at the 0.05 level.
2 T J
0.02 2.83 6.29* 8.03*
1.2
2 T J
7.27* 6.22* 3.62 7.22*
2.1
A summary of the T2 decomposition values for the four signaling points is presented in Table 7.1. Observation A produces a large value on the unconditional term T|, since the observation on x-2 is out of tolerance. The T|x term for this point also is large, since x 2 is not contained in the conditional range of x% given xi. This is evident in the distance A is from the regression line denoted as x 2 .i. Likewise, observation B produces a large value for its T22 x term since the observation is a great distance from the same regression line. However, its two unconditional terms are not significant, since individually the variables are within the box region illustrated in Figure 7.13. Observation C has acceptable values on both unconditional terms, since both variables are within tolerance. However, a large value is produced on T 2 2 , as the observed x\ is not contained in the conditional distribution of x\ given x 2 . Again, note the extreme distance that point C is from the regression line, labeled 0)1.2, of x\ on x 2 . Observations on both variables are within tolerance for point D, but observations on either variable are not where they should be relative to the position of the other variable. This produces large values for the two conditional terms.
7.7. Data Example
135
Figure 7.14: T2 chart for the boiler data.
7.7
Data Example
In this section, we consider a more detailed example to demonstrate the techniques developed for signal interpretation and also to reemphasize previously covered points in the development of a control procedure. As the production unit, we examine a boiler used to produce steam for industrial use. Two of the variables used in the control procedure that monitor boiler efficiency are fuel usage (i.e., fuel) and steam flow (i.e., stmflow). The T2 values for an HDS consisting of 500 data points are presented in Figure 7.14. We now examine the HDS from a statistical perspective. The T2 chart is indicative of the performance of the boiler. All T2 values are below the UCL value of 9; however, there appears to be a certain amount of variation and several step changes in the values. Note the consistency of the T2 values up to observation 250, the higher values between observations 250 and 350, and the lower values thereafter. Later investigation will show that the higher values, from 250 to 350, are caused by a period of consistent low steam production. The unusual spikes are produced when a rapid change in the production rate occurs. This becomes apparent when the two individual variables are plotted in time sequence as shown in Figure 7.15. The vertical axis is labeled "Units." Fuel usage and steam flow are measured in different units. However, the values are close enough together that one axis can be used for both measurement systems. Close examination of Figure 7.15 reveals the strong relationship between the two variables. Fuel usage is always higher than the corresponding steam flow; however, the distance between the two points increases with increases in fuel usage. This is more readily apparent when only the first few points are examined, as shown in Figure 7.16. A comparison of the time-sequence chart in Figure 7.15 with the T2 chart in Figure 7.14 reflects close similarity. Low values on the time-sequence chart for the two variables correspond to high values on the T2 chart, and high values on the time-sequence chart correspond to low T2 values. This is because the HDS is
136
Figure 7.15: Time-sequence chart for fuel usage and steam.
Figure 7.16: Time-sequence chart for first few points of boiler data. dominated by moderate-to-high values of the two variables, making the low values farther from the mean vector. Since the T2 statistic is a squared number, values far from the mean are large and positive. A Q-Q plot of the 500 ordered T2 values versus the corresponding beta values for the boiler historical data is presented in Figure 7.17. The upper four points in the graph correspond to the four T2 values of Figure 7.14 that are greater than or equal to a value of 8. Although the linear trend for these four points is not consistent with the linear trend of the other 496 points, the deviation is not severe enough to disqualify the use of the T2 statistic from detecting signals in the Phase II operation.
7.7. Data Example
137
Beta Quantiles Figure 7.17: Q-Q plot of T values for the boiler data.
Table 7.2: Summary statistics for boiler HDS.

Mean Minimum Maximum Std Dev Fuel 374.02 186.46 524.70 98.37 Steam 293.32 138.87 412.53 77.96
Summary statistics for the boiler HDS are presented in Table 7.2. The minimum and maximum values of the variables give a good indication of the operational ranges on the two variables. For example, fuel usage in the HDS ranges from a low of 186.46 units to a high of 524.70 units. With statistical understanding of the boiler system through examination of the HDS, we are ready to move to signal interpretation. The control region is presented in Figure 7.18 in the variable space of fuel usage and steam flow. Also included are three signaling points, designated as points 1, 2, and 3. Examination of the signals in this graphical setting provides insight as to how the terms of the MYT decomposition identify the source of the signal and how the signals are to be interpreted. For example, note the (Euclidean) distance point 1 is from the body of the data (HDS). Also, note the "closeness" of points 2 and 3 to the control region, especially point 2. This leads one to think that the signal for point 1 is more severe than the signals for the other two points. However, this is not the case. Observe the T2 values presented in Table 7.3 for the three signaling points. Point 3 clearly has the largest T2 value, while point 2 is the smallest of the three signaling T2 values. To understand why this occurs, we need to examine the values of the MYT decomposition terms that are presented in Table 7.4. Since there are only two variables, the three signaling points can be plotted in either the (7\, T 2 .i) space or the (T2, 7Y2) space. A representation in the (Ti, T 2 .i) space is presented in Figure 7.19. Geometrically, the circle in Figure 7.19
138
Figure 7.18: Control region in variable space for boiler data. Table 7.3: T2 values for three signaling points.
Obs. No.
1 2 3
T2 Value 13.25 10.97 405.76
* Significant at a = 0.01, UCL = 9.33.
Table 7.4: MYT decomposition terms of three signaling points

Obs. No.
1 2 3
1? 10.70* 0.07 4.18
2 T J 2 11.24* 0.01 0.92
2.01 10.96* 404.84*
2 T J 1.2
2.54 10.90* 401.58*
2 T J 2.1
* Denotes significance at the 0.05 level.
represents a rotation of the elliptical control region given in Figure 7.18. In the transformed space, the T2 statistic is represented by the square of the length of the arrows designated in the plot. The UCL of 9.33 defines the square of the radius of the circular control region. The coordinates of point 1 in Figure 7.19 are (3.27, 1.57). The sum of squares of these values equals the T2 value of point 1, i.e.,
The coordinates of point 2 are (1.3, 5.05), and those of point 3 are (2.04, 20.01). Scalewise, point 3 would be located off the graph of Figure 7.19. However, this
7.7. Data Example
139
Figure 7.19: Control region in (Ti, T2.i) space for boiler data. Table 7.5: Three out-of-control points.
Obs. No.
1 2 3
Fuel 52.21 400.00 172.83
Steam 31.95 300.00 218.41
representation is sufficient to demonstrate that point 1 is much closer to the control region than the other two points. Signals on both unconditional terms for point 1 indicate the observed values of fuel usage and steam to be beyond the operational range of the data as specified by the HDS. This is indeed the case, as the observation was taken when the boiler was in an idling mode, so that there was no demand for steam production. In this mode, a minimum amount of fuel is still supplied to the boiler, as it is less expensive to let the unit idle than to start a cold boiler. The observed fuel and steam values for all three points are presented in Table 7.5. Comparison of Table 7.5 with the summary statistics of Table 7.2 shows point 1 to be well beyond the operational range of the variables as specified by the HDS. The T2 signal on point 2 is due to an incorrect relationship between the two variables. Both the T|x and 7\22 terms produced signals. This is because the fuel value (/) is not where it should be relative to the value of steam (s). To see this, consider the regression lines of fuel on steam (i.e., / = 4.43 + 1.26*s), and of steam on fuel (i.e., s = 2.65 + 0.79*/) as derived from the HDS. The predicted value of fuel for the given value of steam for point 2 is
The corresponding observed fuel value of 400.00 is too large for this value of steam. Likewise, the difference between the actual steam value and the predicted steam
140
Observation Number Figure 7.20: Time-sequence graph of three signaling points.
value for this point is too large. The predicted steam value is s = -2.65 + 0.79(400.00) = 313.74. When compared to the actual value of 300.00, the residual of 13.74 steam units is too large to be attributed to random fluctuation. Point 3 also has T2 signals on both the conditional terms. This indicates that the linear relationship between the two variables is astray. The reason for this can best be seen by examining the time-sequence graph presented in Figure 7.20 for the three signaling points. These were derived from the observation plot of the HDS in Figure 7.16, where it was established that fuel must be above the corresponding value of steam. For point 3 in Figure 7.20, the relationship is reversed as the value of steam is above the corresponding fuel value. This counterrelationship produces large signaling values on the two conditional T2 terms.
7.8
Conditional Probability Functions (Optional)
A bivariate normal distribution is illustrated in Figure 7.21. This is the joint density function of the observation (x, y), and it is denoted by f ( x , y ) . It explains the behavior of the two variables as they jointly vary. For example, it can be used to determine the probability that x < a and, at the same time, that y < 6; i.e., P(x < o, y < b). The conditional density of x given ?/, denoted by f ( x \ y ) , is used for a different purpose. Its use is to describe the behavior of x when y is fixed at a particular value. For example, it is used to determine the probability that x < a, given that y = b] i.e., P(x < a\y = b). It can be observed geometrically by passing a plane
7.8. Conditional Probability Functions (Optional)
141
Figure 7.21: Example of a bivariate normal distribution.
Figure 7.22: Example of conditional densities of x given y. through the joint density at the fixed value of y, i.e., at y = b. This is illustrated in Figure 7.22 for various values of the constant b. For the MVN distribution with p = 2, the conditional density of x given y is
Close examination of (7.27) reveals the conditional density of x given y to be normal, with a conditional mean [ix\y and conditional variance a\ given by
142
Figure 7.23: Regression of x on y.

and
Examination of (7.28) reveals that (j,x\y depends on the specified value of y. For example, the conditional mean of the distribution of x for y = b is given as
For various values of the constant b (i.e., for values of y), it can be proven that the line connecting the conditional means (as illustrated in Figure 7.23) is the regression line of re on y. This can also be seen in (7.28) by noting that the regression coefficient (3 (of x on y] is given by Thus, another form of the conditional mean (of the line connecting the means of the conditional densities) is given by
In contrast to the conditional mean, the conditional variance in (7.29) does not depend on the particular value of y. However, it does depend on the strength of the correlation, p, existing between the two variables.
7.9
Summary
In this chapter, we have discussed the essentials for using the MYT decomposition in the interpretation of signals for a bivariate process. We have shown that a signaling T2 value for a bivariate observation vector has two possible MYT decompositions,
7.10. Appendix: Principal Component Form of T2

or
143
Each of the decompositions consists of two independent components. The unconditional terms T2 can be used to check the tolerance of the individual variables, and the conditional terms T2j can be used to check the linear relationship between the two variables. Each component can be described by an appropriate F distribution. Using this methodology, the signal of a T2 statistic can be separated into two orthogonal components. A large value for an unconditional term implies that the designated variable is out of tolerance. A large value on a conditional term implies that a wrong linear relationship exists between the observations on the variables. Thus, this procedure provides a powerful tool for signal determination in terms of the two process variables. In the next chapter, we extend this methodology to the general case involving a p-variable process. In addition, a quick and efficient computing scheme is developed for locating the signaling terms.
7.10
Appendix: Principal Component Form of T2
The T2 statistic can be expressed as a function of the principal components of the estimated covariance matrix (e.g., see Jackson (1991)). The formulas are similar to those presented in section 6.8 for the principal components of the population correlation matrix. For example, an alternate form of the T2 statistic is given by
where \i > \2 > > Xp are the eigenvalues of the estimated covariance matrix S and the z%, i = 1 , . . . ,p, are the corresponding principal components. A principal component is obtained by multiplying the vector quantity (X X) by the transpose of the normalized eigenvector Ui of S corresponding to A$; i.e.,
Each Zi is a scalar quantity, and the T2 statistic is expressed in terms of these values. The representation in (A7.1) is derived from the fact that the estimated covariance matrix S is a positive definite symmetric matrix. Thus, its singular value decomposition is given as
where U is a p x p orthogonal matrix whose columns are the normalized eigenvectors Ui of 5, and A is a diagonal matrix whose elements are the corresponding eigenvalues
144
\i of S. In matrix notation, these matrices are given by
Note that Substituting this quantity into the T2 statistic of (A7.1), we have
where Z = U'(X - X] and Z' = (zi, z2,.., zp). A Hotelling's T2 statistic for a single observation also can be written as
where R is the estimated correlation matrix and Y is the studentized observation vector of X, i.e.,
where r^ = corr(xi,Xj) and
The matrix R (obtained from S) is a positive definite symmetric matrix and can be represented in terms of its eigenvalues and eigenvectors. Using a transformation similar to (A7.2), the above T2 can be written as
where Wi, W<2,..., Wp are the principal components of the correlation matrix R and the 7^ are the eigenvalues of R. The principal component values are given by Wi = V{ (X X}, where the Vi are the normalized eigenvectors of R. Equation (A7.4) is not to be confused with (A7.1). The first equation is written in terms of the eigenvalues and eigenvectors of the covariance matrix, and the second is in terms of the eigenvalues and eigenvectors of the estimated correlation matrix.
7.10. Appendix: Principal Component Form of Tz
145
Figure A7.1: Bivariate control region. These are two very different forms of the same Hotellirig's T 2 , as the mathematical transformations are not equivalent. Similarly, (A7.4) should not be confused with (6.13). The equation in (6.13) refers to a situation where the correlation matrix is known, while (A7.4) is for the case where the correlation matrix is estimated. The principal component representation of the T2 plays a number of roles in multivariate SPC. For example, it can be used to show that the control region is elliptical in shape. Consider a control region defined by a UCL. The observations contained in the HDS have T2 values less than the UCL; i.e., for each X1t,
Thus, by (A7.4),
The control region is defined by the equality
which is the equation of a hyperellipsoid in a p-dimensional space, provided the / yl are all positive. The fact that the estimated correlation matrix R is a positive definite matrix guarantees that all the 7^'s are positive. A geometrical representation of a T2 bivariate control region, when ^ and S are unknown, is given in Figure A7.1. The elliptical region is formed using the
146
algebraic expression of the T2 statistic and is given by
Substituting y\ and y^ for the standardized values of x\ and #2, we have
and
In the principal component space of the estimated correlation matrix, this reduces to
which gives the equation of the control ellipse. The length of the major axis of the ellipse in (A7.5) is given by 71, and the length of the minor axis is given by 72The axes of this space are the principal components, w\ and w^. The absence of a product term in this representation indicates the independence between wi and W2- This is a characteristic of principal components, since they are transformed to be independent. Assuming that the estimated correlation r is positive, it can be shown that 71 = (1 + r) and 72 = (1 r). For negative correlations, the 7$ values are reversed. One can also show that the principal components can be expressed as
From these equations, one can obtain the principal components as functions of the original variables.
Chapter 8
Interpretation of T2 Signals for the General Case
8.1
Introduction
In this chapter, we extend the interpretation of signals from a T2 chart to the setting where there are more than two process variables. The MYT decomposition is the primary tool used in this effort, and we examine many interesting properties associated with it. For example, we show that the decomposition terms contain information on the residuals generated by all possible linear regressions of one variable on any subset of the other variables. In addition to being an excellent aid in locating the source of a signal in terms of individual variables or subsets of variables, this property has two other major functions. First, it can be used to increase the sensitivity of the T2 statistic in the area of small process shifts (see Chapter 9). Second, the property is very useful in the development of a control procedure for autocorrelated observations (see Chapter 10).
8.2
The MYT Decomposition
The general MYT decomposition procedure is outlined below. The T2 statistic for a p-dimensional observation vector X' = (x, x % , . . . , xp) can be represented as
Suppose we partition the vector (X X^ as
where X^p~1' = (xi, #2, ,p-i) represents the (p l)-dimensional variable p ~^ represents the corresponding p x I p and X^ vector excluding the pth variable
147
148
Chapter 8. Interpretation of T2 Signals for the General Case
elements of the mean vector. Suppose we similarly partition the matrix S so that
where Sxx 'ls the (p 1) x (p 1) covariance matrix for the first (p 1) variables, Sp is the variance of xp, and sxx is a (p l)-dimensional vector containing the covariances between xp and the remaining (p 1) variables. The T2 statistic in (8.1) can be partitioned into two independent parts (see Rencher (1993)). These components are given by
The first term in (8.3),
uses the first (p 1) variables and is itself a T2 statistic. The last term in (8.3) can be shown (see Mason, Tracy, and Young (1995)) to be the square of the pth component of the vector X adjusted by the estimates of the mean and standard deviation of the conditional distribution of xp given (xi, X 2 , . . . , xp~i). It is given as where
and
is the (p l)-dimensional vector estimate of the coefficients from the regression of xp on the (p 1) variables xi, x % , . . . ,X P _I. It can be shown that the estimate of the conditional variance is given as
Since the first term of (8.3) is a T2 statistic, it too can be separated into two orthogonal parts: The first term, T2_2, is a T2 statistic on the first (p 2) components of the X vector, and the second term, 7 1 p_ 1-1)2) ... ) p_2> '1S the square of xpî adjusted by the estimates of the mean and standard deviation of the conditional distribution of p _i given (xi, x2, . . ,z p _ 2 ). Continuing to iterate and partition in this fashion yields one of the many possible MYT decompositions of a T2 statistic. It is given by
8.3. Computing the Decomposition Terms
149
The TI term in (8.6) is the square of the univariate t statistic for the first variable of the vector X and is given as
Note this term is not a conditional term, as its value does not depend on a conditional distribution. In contrast, all other terms of the expansion in (8.6) are conditional terms, since they represent the value of a variable adjusted by the mean and standard deviation from the appropriate conditional distribution. We will represent these terms with the standard dot notation used in multivariate analysis (e.g., see Johnson and Wichern (1999)) to denote conditional distributions. Thus, T2^ k corresponds to the conditional T2 associated with the distribution of Xi adjusted for, or conditioned on, the variables Xj and Xk-
8.3
Computing the Decomposition Terms
There are many different ways of computing the terms of the MYT decomposition. A shortcut approach is discussed in this section. From (8.3), we know that the first (p I) terms of (8.6) correspond to the T2 value of the subvector X', _^ = (x\, x 2 , . . . ,z p -i); i.e.,
Similarly, the first (p 2) terms of this expansion correspond to the subvector (P-2) = ( z i , 2 , - - . , z P - 2 ) ; i-e.,
Continuing in this fashion, we can compute the T2 values for all subvectors of the original vector X. The last subvector, consisting of the first component X^ = (xi), is used to compute the unconditional T2 term given in (8.7); i.e.,
All the T2 values, T? T? x,...,Tf2 v are computed using ^ X i ,X2 ,. . - , X pv ) \J<1 ,X2 ,...,JUp I ) (J,i)' ^ the general formula
A
where X^ represents the appropriate subvector, X^ is the corresponding subvector mean, and Sjj denotes the corresponding covariance submatrix obtained from the overall S matrix given in (8.2) by deleting all unused rows and columns. Thus,
150
Figure 8.1: Ellipsoidal control region for three process variables, the terms of the MYT decomposition can be computed as follows:
To illustrate this method for computing the conditional and unconditional terms of a MYT decomposition, consider an industrial situation characterized by three process variables. The in-control HDS is represented by 23 observations, and the estimates of the covariance matrix and mean vector are given by
A three-dimensional plot of the data with a 95% (a = 0.05) control ellipsoid is presented in Figure 8.1. For graphing purposes, the data have been centered at the mean value. The T2 value for a new observation vector X' (533,514,528) is computed using (8.1) and produces a T2 value of 79.994. For an a = 0.05 with 23 observations, the T2 critical value is 11.923. Since the observed T2 is larger than this value,
8.3. Computing the Decomposition Terms
151
Figure 8.2: Ellipsoidal control region with signaling point. the new observation produces a signal. A graphical illustration of the signaling point is presented in Figure 8.2. Recall from Chapter 7 that for p = 2, there were two separate MYT decompositions of the T2 statistic. Likewise, for p = 3, a number of decompositions exists. One possible MYT decomposition for the observation vector is given as
To compute this value, we begin by determining the value of the conditional term Tjl2- From the above discussion, we have
To obtain T? x -, we partition the original estimates of the mean vector and covariance structure to obtain the mean vector and covariance matrix of the subvector X^ = ( x i ^ x z ) . The corresponding partitions are given as
and T|x 2 is calculated as
152
Chapter 8. Interpretation of T2 Signals for the General Case -, is given by
Similarly, the decomposition for T?
We obtain Tf.i by computing the T2 value of the subvector X^> = ( x i ) . This is the unconditional term T2 and is computed by
Thus, Tn-i is computed as
From this, we have
and the smallness of the first two terms, T2 and T2A, imply that the signal is contained in the third term, T2-^ 2Only one possible MYT decomposition was chosen above to illustrate a computing technique for the decomposition terms. Had we chosen another MYT decomposition, such as other terms of the decomposition would have had large values. With a signaling overall T2 value, we are guaranteed that at least one term of any particular decomposition will be large. We illustrate this important point in later sections.
8.4
Properties of the MYT Decomposition
Many properties are associated with the MYT decomposition. Consider a pdimensional vector defined as X' = (xix?, - . . , xp}. Interchange the first two components to form another vector (x%, #1, , xp] so that the only difference between the two vectors is that the first two components have been permuted. The T2 value of the two vectors is the same; i.e.,
This occurs because T2 values cannot be changed by permuting the components of the observation vector.
8.4. Properties of the MYT Decomposition
153
This invariance property of permuting the T2 components guarantees that each ordering of an observation vector will produce the same overall T2 value. Since there are p\ = (p)(p l)(p 2) (2)(1) permutations of the components of the vector ( i , 2 , . . . , p ), this implies we can partition a T2 value in pi different ways. To illustrate this result, suppose p = 3. There are 3! = (3)(2)(1) = 6 decompositions of the T2 value for an individual observation vector. These are listed below:
Each row of (8.10) corresponds to a different permutation of the components of the observation vector. For example, the first row corresponds to the vector written in its original form as (xi, x 2 , 3), whereas the last row represents (x3, 2 , xi). Note that all six possible permutations of the original vector components are included. The importance of this result is that it allows one to examine the T2 statistic from many different perspectives. The p terms in any particular decomposition are independent of one another, although the terms across the decompositions are not necessarily independent. With p terms in each partition and p\ partitions, there are p x p\ possible terms to evaluate in a total MYT decomposition of a signaling T2 statistic. Fortunately, all these terms are not unique to a particular partition, as certain terms occur more than once. For example, the T2 term occurs in the first and second decompositions listed in (8.10), and the Tji 3 term occurs in the second and fifth decompositions. In general, there are p x 2^p~1^ distinct terms among the possible decompositions. These unique terms are the ones that need to be examined for possible contribution to a T2 signal. When p is large, computing all these terms can be cumbersome. For example, when p = 10, there are over 5,000 unique terms in the MYT decomposition. To alleviate this problem, several computational shortcuts have been established and are discussed in detail in later sections of this chapter. Consider the MYT decomposition given in (8.6) and suppose T2 dominates the overall value of the T2 statistic. This indicates that the observation on the variable x\ is contributing to the signal. However, to determine if the remaining variables in this observation contribute to the signal, we must examine the T2 value associated with the subvector (x 2 , 3, . - , p ), which excludes the x\ component. Small values of the T2 statistic for this subvector imply that no signal is present. They also indicate that one need not examine any term of the total decomposition involving these (p 2) variables. The fact that a T2 statistic can be computed for any subvector of the overall observation vector has numerous applications. For example, consider a situation (not uncommon in an industrial setting) where observations on process variables are
154
Table 8.1: List of possible regressions and conditional T2 terms when p = 3.

Regression of x\ on X2 xi on X3 x\ on X2, X3 X2 on xi X2 on xs
X2 On Xi, X3
Conditional T2 2 T J 1.2
2 J T 1.3 2 T J 1.2,3 2 T J 2.1
2 r2.1,3
J
2 T J 3.1 2 T J 3.2 2 T J 3.1,2
2 T J
2.3
3 on xi
X3 On X2
xs on xi, x-2
more frequently available than observations on lab variables. A T2 statistic can be computed for the subvector of the process variables whenever observations become available, and an overall T2 statistic can be determined when both the process and lab variables are available. Another application of this important result is in the area of missing data. For example, we need not shut down a control procedure because of a faulty sensor. Instead, we can drop the variable associated with the sensor from the control procedure and run the T2 on the remaining variables until the sensor is replaced. To demonstrate the use of the T2 statistic in signal location, reconsider the three-variable example discussed in the last section. We computed the T2 value of the signaling observation vector, X' = (533,514,528), to be
Furthermore, we computed the T2 value of the subvector X'^ = (533, 514) to be
The smallness of this latter T2 value suggests that there are no problems with the observations on variables x\ and x 2 . Thus, all T2 terms, both conditional and unconditional, involving only x\ and x% will have small values. Our calculations confirm this result as
From this type of analysis one can conclude that the signal is caused by the observed value on #3. Another important property of the T2 statistic is the fact that the p(2p~1 1) unique conditional terms of a MYT decomposition contain the residuals from all possible linear regressions of each variable on all subsets of the other variables. For example, for p = 3, a list of the nine (i.e., 3(23~1 1)) linear regressions of each variable on all possible subgroups of the other variables is presented in Table 8.1 along with the corresponding conditional T2 terms. It will be shown in Chapter
8.5. Locating Signaling Variables
155
9 that this property of the T2 statistic provides a procedure for increasing the sensitivity of the T2 statistic to process shifts.
8.5
Locating Signaling Variables
In this section we seek to relate a T2 signal and its interpretation to the components of the MYT decomposition. Consider a signaling observation vector X' = (xi, X27 i xp) such that One method for locating the variables contributing to the signal is to develop a forward-iterative scheme. This is accomplished by finding the subset of variables that do not contribute to the signal. Recall from (8.3) and (8.5) that a T2 statistic can be constructed on any subset of the variables, xi, #2, , xp. Construct the T2 statistic for each individual variable, Xj, j = 1 , 2 , . . . , p, so that
where Xj and s2- are the corresponding mean and variance estimates as determined from the HDS. Compare these individual T2 values to their UCL, where
is computed for an appropriate a level and for a value of p = 1. Exclude from the original set of variables all Xj for which
since observations on this subset of variables are definitely contributing to the signal. From the set of variables not contributing to the signal, compute the T2 statistic for all possible pairs of variables. For example, for all (x^, Xj) with i ^ j, compute
and compare these values to the upper control limit,
Exclude from this group all pairs of variable for which
156
The excluded pairs of variables, in addition to the excluded single variables, comprise the group of variables contributing to the overall signal. Continue to iterate in this fashion so as to exclude from the remaining group all variables of signaling groups of three variables, four variables, etc. The procedure produces a set of variables that contribute to the signal. To illustrate the above methodology, recall the T2 value of 79.9441 for the threedimensional observation vector X' (533, 514, 528) from the previous example. The unconditional T2 values for the individual observations are given by
The UCL is computed as
From these computations we observe that T2 and T| are in control. However,
indicates that 3 is part of the signal contained in the observation vector. Next, we separate the original observation into two groups, (x\, x%) and (rr 3 ). We compute and compare it to
We conclude no signal to be present in the (xi, x^] component of the observation vector. Hence, the reason for the signal lies with the observation on the third variable, namely, x$ = 528. One problem with this method is that it provides little information on how the isolated signaling vector component(s) contributes to the signal. For example, in the above set of data, it does not indicate how the observed value of 528 on x% contributes to the overall signal. Nevertheless, it can be used, as indicated, to locate the components of the observation vector that do contribute to the signal. It should be noted also that a backward-elimination scheme, similar to the forward scheme, could be developed to locate the same signaling vector components. Another method of locating the vector components contributing to a signal is to examine the individual terms of the MYT decomposition of a signaling observation vector and to determine which are large in value. This is accomplished by comparing each term to its corresponding critical value. Recall from Chapter 7 that the distribution governing the components of the MYT decomposition for the situation where there are no signals is an F distribution. For the case of p variables, these are given by
8.6. Interpretation of a Signal on a T2 Component Table 8.2: Unique T2 terms of total decomposition.
7\2 T| Tf T2 = 1.3934 = 0.0641 = 11.6578* = 1.3294 T^g = 28.2305* T| ! = 0.0001 T|3 = 9.5584* Tf^ =38.4949* Tf 2 = 21.1522* T^g = 58.7278* Tf ^ 3 = 40.0558* T 3.i,2 = 78.5506*
157
* Denotes significance at the 0.01 level.
for unconditional terms, and by
for conditional terms, where k equals the number of conditioned variables. For k = 0, the distribution in (8.13) reduces to the distribution in (8.12). Using these distributions, critical values (CVs), for a specified a level and an HDS sample of size n for both conditional and unconditional terms are obtained as follows:
We can compare each individual term of the decomposition to its critical value and make the appropriate decision. To illustrate the above discussion, recall the T2 value of 79.9441 for the observation vector X' = (533, 514, 528) taken from the example described in section 8.3. Table 8.2 contains the 12 unique terms and their values for a total decomposition of this T2 value. A large component is determined by comparing the value of each term to the appropriate critical value. The T2 values with asterisks designate those terms that contribute to the overall T2 signal, e.g., T32, T^, T|3, T|_l5 T|2, T?2_3, T^ i 3 , and T^12- All such terms contain the observation on x3. This was the same variable designated by the exact method for detecting signaling variables. Thus, one could conclude that a problem must exist in this variable. However, a strong argument also could be made that the problem is due to the other two variables, since four of the signaling terms contain x\ and four terms contain x^. To address this issue, more understanding of what produces a signal in terms of the decomposition is needed.
8.6
Interpretation of a Signal on a T2 Component
Consider one of the p possible unconditional terms resulting from the decomposition of the T2 statistic associated with a signaling observation. As stated earlier, the term
158
Figure 8.3: Ellipsoid within a box.
j = 1, 2 , . . . ,p, is the square of a univariate t statistic for the observed value of the j'th variable of an observation vector X. For control to be maintained, this component must be less than its critical value, i.e.,
Since following interval:
we can reexpress this condition as Tj being in th
or as
where ( a /2,n-i) is the appropriate value from a t distribution with n 1 degrees of freedom. This is equivalent to using a univariate Shewhart control chart for the jfth variable. If the control limits in (8.16) are constructed for each of the p variables and plotted in a p-dimensional space, we would obtain a hyperrectangular "box." This box is the equivalent of a Shewhart control procedure on all individual variables. However, the true control region, based on the T2 statistic, is a hyperellipsoid located within the box. In most situations, the ellipsoid will not fit inside the box. We illustrate this situation for p = 3 in Figure 8.3. The rectangular box represents the control limits on the individual box as computed by (8.16), while the
8.6. Interpretation of a Signal on a T2 Component
159
ellipsoid represents the control region for the overall T2 statistic. Various signaling points are also included for discussion below. If an observation vector plots outside the box, the signaling univariate Tj2 values identify the out-of-control variables, since the observation on the particular variable is varying beyond what is allowed (determined) by the HDS. This is illustrated in Figure 8.3 by point A. Thus, when an unconditional term produces a signal, the implication is that the observation on the particular term is outside its allowable range of variation. The point labeled C also lies outside the box region, but the overall T2 value for this point would not have signaled since the point is inside the elliptical control region. This part of the T2 signal analysis is equivalent to ranking the individual t values of the components of the observation vector (see Doganaksoy, Faltin, and Tucker (1991)). While these components are a part of the T2 decomposition, they represent only the p unconditional terms. Additional insight into the location and cause of a signal comes from examination of the conditional terms of the decomposition. Consider the form of a general conditional term given as
If the value in (8.17) is to be less than its control limit,
its numerator must be small, as the denominator of these terms is fixed by the historical data. This implies that component Xj from the observation vector X' = (xi, #2, , Xj, i Xp) is contained in the conditional distribution of Xj given x\. X2, , Xj-i and falls in the elliptical control region. A signal occurs on the term in (8.17) when Xj is not contained in the conditional distribution of Xj given xi, X 2 , . . . , j-i, i.e., when
This implies that something is wrong with the relationship existing between and among the variables xi, x 2 , . . . ,Xj. For example, a signal on T 2 X 2 j_i implies that the observation on Xj is not where it should be relative to the value of xi, X2, , Xj-i- The relationship between Xj and the other variables is counter to the relationship observed in the historical data. To illustrate a countercorrelation, consider the trace of the control region of Figure 8.3 in the x\ and x3 spaces as presented in Figure 8.4. The signaling point B of Figure 8.3 is located in the upper right-hand corner of Figure 8.4, inside the operational ranges of x\ and 3 but outside the T2 control region. Thus, neither the T2 nor the T| term would signal, but both the T 2 3 and T321 terms would. Conditional distributions are established by the correlation structure among the variables, and conditional terms of an MYT decomposition depend on this
160
Figure 8.4: Trace of the control region for x\ and x3. structure. With the assumption of multivariate normality, no correlation implies independence among the variables. The decomposition of a T2 statistic for this situation will contain no conditional terms. To see this, consider a p-variate normal distribution having a known mean of /j, and a covariance matrix given by
The T2 value for an observation taken from this distribution would be given by
In this form, all terms in the decomposition are unconditional, where
To summarize the procedure for interpreting the components of an MYT decomposition, signals on unconditional terms imply that the involved variable is outside the operational range specified by the HDS. An observation vector containing this type of signal will plot outside the hyperrectangular box defined by the control limits for all unconditional terms. Overall T2 signals on observations within the box will have large conditional terms, and these imply that something is wrong with
8.6. Interpretation of a Signal on a T2 Component Table 8.3a: Summary statistics.

XI
161
X2
Mean
Min Max
525.435 536.000 536.000
513.435 509.000 518.000
539.913 532.000 546.000
X3
Table 8.3b: Correlation matrix.

XI XI X2 X3 X2
1.000 0.205 0.725
0.205 1.000 0.629
3 0.725 0.629 1.000
the relationship among the variables contained in the term. All of these variables would need to be examined to identify a possible cause. More information pertaining to the HDS given in section 8.3 is needed in order to expand our previous example to include interpretation of the signaling components for the observation vector X' = (533, 514, 528). This information is summarized in Tables 8.3a and 8.3b. Consider from Table 8.2 the value of Tf_3 = 28.2305, which is declared large since it exceeds the CV = 8.2956. The size of T3 implies that something is wrong with the relationship between the observed values on variables x\ and #3. Note from Table 8.3 that, as established by the HDS, the correlation between these two variables is 0.725. This implies that the two variables vary together in a positive direction. However, for our observation vector X' = (533,514,528), the value of x\ = 533 is somewhat above its mean value of 525.435, whereas the value of x3 = 528 is well below its mean value of 539.913. This contradictory result is an example of the observations on x\ and x<z being countercorrelated. To reestablish control of the process, either x\ must be lowered, if possible, or the value of 3 must be increased. To determine which variable to move requires one to be familiar with the process and the process variables. This includes knowing which variable is easiest to control. If x\ is controllable and x3 is not, then x\ should be lowered. If x% is controllable and x\ is not, then x3 should be decreased. If both are controllable, then one might consider the large size of the unconditional term T3 in Table 8.2; i.e.,
A large value on an unconditional term implies that the observation on that variable is outside the Shewhart box. This is the case for the observed value of 3 = 528, as it is considerably less than the minimum value of 532 listed in the HDS. Hence, to restore control in this situation, one would adjust variable xs upward.
162
8.7
Regression Perspective
As stated earlier a signal on a conditional term implies that something is wrong with the relationships among the involved variables. Additional insight for interpreting a signaling conditional term is obtained by examining these terms from a regression perspective. In general, T^ 2 J-_1 is a standardized observation on the jth variable adjusted by the estimates of the mean and variance from the conditional distribution associated with Xj.it2,...,j-i- The general form of this term is given in (8.17). Consider the estimated mean of Xj adjusted for x\, 2:2, , xj-i- We estimate this mean using the prediction equation
where Xj is the sample mean of Xj obtained from the historical data. The subvector X^~^ is composed of the observations on (1,0:2,... ,2^-1), and X^~l"> is the corresponding estimated mean vector obtained from the historical data. The vector of estimated regression coefficients Bj is obtained from partitioning the submatrix Sjj, the covariance matrix of the first j components of the vector X. To obtain Sjj, partition S as follows:
Further, partition the matrix Sjj as
Then Since the left-hand side of (8.18) contains %.i,2,...,j-i, the predicted value of Xj from the given values of 1,0:2, , a^-i, the numerator of (8.17) is a regression residual represented by
Rewriting the conditional variance as
(see, e.g., Rencher (1993)) and substituting rj,i^,...,j~i for (xj 2^.1,2,...j-i), we can re-express T^ 2) j_i as a squared standardized residual having the form
8.8. Computational Scheme (Optional)
163
The conditional term in this form explains how well a future observation on a particular variable is in agreement with the value predicted by a set of the other variate values of the vector, using the covariance matrix constructed from the HDS. Unless the denominator in (8.19) is very small, as occurs when R2 is near 1, the "largeness" of the conditional T2 term will be due to the numerator, which is a function of the agreement between the observed and predicted values of Xj. Even when the denominator is small, as occurs with large values of R? we would expect very close agreement between the observed and predicted Xj values. A significant deviation between these values will produce a large T2 term. When the conditional T2 term in (8.19) involves many variables, its size is directly related to the magnitude of the standardized residual resulting from the prediction of xj using 1, 2, , xj-i and the HDS. When the standardized residual is large, the conditional T2 signals. The above results indicate that a T2 signal may occur if something goes astray with the relationships between subsets of the various variables. This situation can be determined by examination of the conditional T2 terms. A signaling value indicates that a contradiction with the historical relationship between the variables has occurred either (1) due to a standardized component value that is significantly larger or smaller than that predicted by a subset of the remaining variables, or (2) due to a standardized component value that is marginally smaller or larger than that predicted by a subset of the remaining variables when there is a very severe collinearity (i.e., a large -R2 value) among the variables. Thus, a signal results when an observation on a particular variable, or set of variables, is out of control and/or when observations on a set of variables are counter to the relationship established by the historical data.
8.8
Computational Scheme (Optional)
A primary consideration in the decision to use a multivariate control procedure is ease of computation. Interpretation efforts may require numerous computations, and this fact might initially discourage practitioners. The MYT decomposition of the T2 statistic has been shown to be a great aid in the interpretation of signaling T2 values, but the number of unique terms can be large, particularly when p exceeds 10. Although this problem has been noted by other authors (e.g., Kourti and MacGregor (1996)), it fortunately has led to the development of computer programs that can rapidly produce the significant components of the decomposition for moderately large sets of variables (e.g., see Langley, Young, Tracy, and Mason (1995)). Nevertheless, the question on how these computational methods will work when there are hundreds of thousands of variables has yet to be answered. There are many factors influencing the capability of the procedure, and these include computer capacity, computer speed, the size of the data set, and the programming of the algorithm. The following is an outline of a sequential computational scheme that can be used to reduce the computations to a reasonable number when the overall T2 signals.
164
Step 1. Compute the individual T2 statistics for every component of the X vector. Remove variables whose observations produce a significant T2. The observations on these variables are out of individual control, and it is not necessary to check how they relate to the other observed variables. With these significant variables removed, we have a reduced set of variables. Check the subvector of the remaining k variables for a signal. If no signal remains, we have located the source of the problem. Step 2. If a signal remains in the subvector of k variables not deleted, compute all T2j terms. Remove from consideration all pairs of variables, (2^,0^-), that have a significant T2 term. This indicates that something is wrong with the bivariate relationship. When this occurs it will further reduce the set of variables under consideration. Examine all removed variables for the cause of the signal. Compute the T2 value for the remaining subvector. If no signal is present, the source of the problem is with the bivariate relationships and those variables that were out of individual control. Step 3. If the subvector of the remaining variables still contains a signal, compute all the T2 k terms. Remove any three-way combination of variables, (xijXjjXk): that shows significant results and check the remaining subvector for a signal. Step 4. Continue computing the higher-order terms in this fashion until there are no variables left in the reduced set. The worst-case situation is that all unique terms will have to be computed. Let us apply this computational scheme to the simulated data from Hawkins (1991) that is described in Mason, Tracy, and Young (1995). Measurements were taken on five dimensions of switch drums: the inside diameter of the drum, xi, and the distances from the head to the edges of four sectors of the drum, X2, 3, 4, and 5. A signaling observation from this situation yielded a T2 22.88. With p = 5, there are 5 x 24 = 80 distinct components that could be evaluated in decomposing the T2 value to determine the cause of the signal. Table 8.4 contains the 31 terms out of these 80 that are significant. The larger conditional terms indicate that the cause of the problem exists with x\ individually, with the relationship between x\ and the other variables, and with x and its relationship to x. Suppose the Hawkins data is re-examined using our computational scheme. At Step 1, the five individual T2 statistics would be computed. These are given in Table 8.5. Only T2 is significant, so variable 1 is removed from the analysis. After removing x\ from the data vector, the remaining subvector is still significant; i.e., T2-T? = 22.88-7.61 = 16.37. Thus, the 12 T?j terms that do not involve x\ must be computed. As can be seen from scanning Table 8.4, the only one of these that is significant is T524. Thus, 4 and x$ are removed from the analysis. At this stage of the computational scheme, the subvector involving x% and #3 is not significant, so the calculations stop. The interpretation is that xi is out of individual control and that the observed value for 5 is not what would be predicted by the observed value of 4 according to the regression equation obtained using the historical data. The computational scheme was very efficient and reduced the number of required calculations from 80 terms to only 17 terms.
8.9. Case Study Table 8.4: Significant decomposition signaling observation.

Component T2 Value 7.61 Component
2 T J 1.2,3,4
165
components
Value 15.86
for Hawkins's
Value 4.43
(1991)
TL 2 J T 1.3 2 i T 1.4
2 T J 1.5 2 T J 1.2,3
8.90 14.96 9.41

15.62 15.78 9.33 15.98 15.18 15.10 16.49
2 T J 1.2,3,5
Component 2 J T 5.4
2 T J 5.1,4 2 T J 5.2,4
2 T J
2 T 1 1.3, 4, 5 2 T J 1.2, 3, 4, 5
2 T J 3.1 2 T J 3.1,4 2 T J 3.1,2
1.2, 4, 5
T J 1.2,4 2 T M.2,5 T J 1.3,4 2 T J 1.3,5

2 J T 1.4,5 2
r3.1,2,4
J
16.79 16.45 17.41 16.37 8.10 6.00 7.06
2 r5.1,2
J
10.48 11.51
2 r5.3,4
J
2 T J 5.1,2,3 2 T J 5.1,2,4
2 T -'S.I, 3,4
4.73 4.66 4.93 11.85 6.89 5.66 6.17
2 T J
5.1
6.70 11.87
2 T J 5.2,3,4
2 T 1 5.1,2,3,4
Table 8.5: Unconditional T2 values for Hawkins's (1991) data.

Xi Variable X4 X2 5 X3 T2 7.61* 0.67 0.76 0.58 3.87 * Denotes significance at the 0.05 level, based on one-sided UCL = 4.28.
8.9
Case Study
To illustrate the interpretation of signals for the T2 statistic and the computing scheme for the signal location in terms of components of the MYT decomposition, we present the following example. Consider a process that produces a specialty product on demand. The quality characteristics of the product must conform to rigorous standards specified by the customer. Any deviation leads to serious upset conditions in the customer's process. The quality of the product is determined by measurements on seven quality variables. An acceptable product is established by the observations on the quality variables conforming to the structure of an HDS consisting of 85 previously accepted lots. Establishing the reason for lot rejection is most important, since, in most cases, it is possible to rework the rejected lots to acceptable standards. In this example, the control procedure on product quality serves two purposes. First, if a lot conforms to the structure of the HDS, it is considered fit for certification and shipment to the customer. Thus, the control procedure produces information to help in making this decision. Second, by analyzing rejected lots, we can determine the reason(s) for rejection through use of the MYT decomposition of the signaling T2 statistic. Being able to determine what variables cause rejection and how they contribute to the overall signal can be quite helpful in the rework process of the rejected lots. For example, we predict new target values for those
166
Chapter 8. Interpretation of T2 Signals for the General Case Table 8.6a: Summary statistics.
XI
Mean
Min Max
Std Dev
87.15 86.40 87.50 0.26
7.29 0.21
6.8 7.9
X2
X3
X4
3.21
2.8 3.5
0.13
0.335 0.27 0.39 0.02
0.05 0.00 0.12 0.03
X5
XQ
X7
0.864
1.00 0.05
0.8
1.0875 0.85
0.11
1.4
Table 8.6b: Correlation matrix.

Xl Xl X2 X3 4 X5 X2
X3
1.000
-0.500 1.000
-0.623 -0.160 1.000
4 -0.508 -0.063 0.700 1.000
X5
XQ
x7
-0.329 -0.473 0.400 0.309 0.377 0.368 1.000
-0.073 -0.471 0.432 0.216 1.000
XQ
XI
-0.464 0.037 0.118 0.110 -0.030 1.000
Figure 8.5: T2 chart for HDS. variables that contribute to rejection. These new values conform not only to the HDS, but also to the observations on the variables of the data vector that did not contribute to the signal. The HDS on the seven (coded) quality variables is characterized by the summary statistics and the correlation matrix presented in Tables 8.6a and 8.6b. As demonstrated later, these statistics play an important role in signal interpretation for the individual components of the decomposition. Individual observation vectors are not presented due to proprietary reasons; however, an understanding of the process variation for this product can be gained by observing a graph of the T2 values for the HDS. This graph is presented in Figure 8.5. The time ordering of the T2 values corresponds to the order of lot production
8.9. Case Study Table 8.7: Unconditional terms for observation 1.

T2 Component Tl Value 47.0996* 0.0014 0.0092 0.0751 0.0056 0.0071
167
0.0047 T? "Denotes significance at 0.05 level.
n Tl n n
T|
and acceptance by the customer. Between any two T2 values, other products could have been produced as well as other rejected lots. The seven process variables represent certain chemical compositions contained in the product. Not only do the observations on these variables have to be maintained in strict operation ranges, but they also must conform to relationships specified by the correlation matrix of the HDS. This situation is representative of a typical multivariate system. Consider an observation vector X' - (89.0,7.3,3.2,0.33,0.05,0.86,1.08) for a rejected lot. The lot was rejected because the T2 value of the observation vector was greater than its UCL; i.e.,
A relatively large value of the Type I error rate (i.e., a = 0.05) is used to protect the customer from receiving an out-of-control lot. The risk analysis used in assessing the value of the Type I error rate deemed it more acceptable to reject an in-control lot than to ship an out-of control lot to the customer. The T2 value of this signaling observation vector is decomposed using the computing scheme described in section 8.8. Table 8.7 contains the individual T2 values for the seven unconditional terms. Significance is determined by comparing the individual unconditional T2 values to a critical value computed from (8.14). Using n 85 and a = 0.05, we obtain Only the unconditional term T2 produces a signal, indicating that x\ is contributing to the overall signal. When compared to the minimum and maximum x\ values contained in Table 8.6, we find the observed value of x\ to be larger than the maximum value contained in the HDS, i.e., 89 > 87.5. To make the lot acceptable to the customer, the value of this component must be reduced. The second part of the computing scheme given in section 8.8 recommends the removal of this value (i.e., x\ = 89.0) from the observation vector and the testing of the remaining subvector for possible signal contributions. A T2 value for the subvector (x 2 , x 3 , . . . , x7}' = (7.3, 3.2,0.33, 0.05, 0.86,1.08) is computed as
168
Chapter 8. Interpretation of T2 Signals for the General Case Table 8.8: Unconditional terms for observation 2.
T2 Component
7?
r| T32
n n
T2
T?
Value 0.0321 0.1749 0.6977 1.3954 0.0056 0.0071 0.0047
This T2 value is compared to a critical value of 14.3019. The smallness of the T2 indicates that no signal is present. From this analysis we conclude that the overall observation signal can be attributed to the large value observed on x\. If possible, through the rework process, this value needs to be reduced. A more acceptable value can be found by predicting the value of x\ based on the fixed values of the other variables, (#2, 3 , . . . ,#7)' = (7.3,3.2,0.33,0.05,0.86,1.08). A regression equation for xi, developed from the historical data using a stepwise procedure, is given as
Note that #4 is not in the equation, as it was not significant. The predicted value of x\ using this equation is 87.16. The T2 value using this predicted value of x\ and the observed values of the remaining variables is
This small value indicates that, if the value x\ = 87.16 is attainable in the re-work process, the lot will be acceptable to the customer. This judgment is made because the T2 value is insignificant when compared to a critical value of 16.2411. A second observation vector of a rejected lot is given as X' = (87.2, 7.2, 3.1, 0.36, 0.05, 0.86, 1.08). Again, its T2 statistic is greater than the critical value; i.e.,
We begin the signal determination procedure by computing the individual T2 values of the unconditional terms. The values of these terms are presented in Table 8.8. When compared to a critical value of 4.0011, none of the individual terms are found to contribute to the signal. This indicates that observations on all variables are within the control box determined by the HDS. Our search for variables contributing to the overall signal continues by computing the two-way conditional T2 terms as specified in Step 2 of the computing scheme. These terms are presented in Table 8.9. The computing scheme detects only one significant term, Tf 4. At this stage, we remove the observations on the third and fourth variables and check for a signal on the subvector (xi, #2, x5, XQ, x7)' = (87.2, 7.2,0.05,0.86,1.08). The computed T2 value is
.10. Summary Table 8.9: Two-way conditional terms for observation 2.

*?.i
2 J 0.0249 T 2.4 1.6 2 2 2 T J T J J 0.0275 T 1.3 2.5 1.7 2 2 2 T J J 0.8183 T 0.1440 T M.4 2.6 2.1 2 2 T2 1 T T J J 0.0303 0.3135 1.5 2.7 2.3 k Denotes significance at 0.05 level.
169
TL
Value 0.0012 0.1909
2 Tt-j T2 J
Value
2 TT"] .
Value 0.1184 0.2646 0.1725 0.2617
*?.i TL
2 J T 3.2 2 J T 3.4
Value 0.8586 0.8363 5.4091*
When compared to a critical value of 12.3696, as computed with n = 85, p = 5, and a = 0.05, no signal is found to be present. A signal on the T|4 term indicates that something is wrong with the relationship between the observed values on the third and fourth variables (i.e., 3 = 3.1 and 4 = 0.36). More insight into the cause of the signal can be obtained by further analysis. Referring to Table 8.6, we find the correlation between these variables to be 0.70. Although moderate in size, this correlation implies that large or small values of the two variables are somewhat associated together. However, examination of the summary statistics in Table 8.6 indicates that the observation on 3 is below the mean (i.e., small), while the observation on 4 is above the mean (i.e., large). Thus, we have a small value for 3 associated with a large value for 4. This is a contradiction of the historical relationship and produces the signal on the T 2 4 term.
8.10
Summary
The interpretation of a signaling observation vector in terms of the variables of a multivariate process is a challenging problem. Whether it involves many variables or few, the problem remains the same: how to locate the signaling variable(s). In this chapter, we have shown that the MYT decomposition of the T2 statistic is a solution to this problem. This orthogonal decomposition is a powerful tool, as it allows examination of a signaling T2 from numerous perspectives. For example, the signaling of unconditional components readily locates the cause in terms of an individual variable or group of variables, while signaling of conditional terms locates countercorrelated relationships between variables as the cause. As we will see in Chapter 9, the monitoring of the regression residuals contained in the individual conditional T2 terms allows the detection of both large and small process shifts. By enhancing the models that are inherent to all conditional terms of the decomposition, the sensitivity of the overall T2 can be increased. Unfortunately, the T2 statistic with the MYT decomposition is not the solution to all process problems. For example, although this technique will identify the variable or set of variables causing a signal, it does not distinguish between mean shifts and shifts in the variability of these variables. Nevertheless, Hotelling's T2 with the MYT decomposition has been found to be very flexible and versatile in industrial applications requiring multivariate SPC.
Chapter 9
Improving the Sensitivity of the I2 Statistic
Old Blue: Success!

It's the moment of truth. You are ready to analyze the new data to determine the source of the problem. In the text, this was done by decomposing the T2 value into its component parts. You don't fully understand all the mathematics involved in this decomposition, but the concept is clear enough. A T2 value contains information provided by all the measured variables. When signals occur, the decomposition offers a procedure for separation of this information into independent component parts, with meaning attached to individual variables and groups of variables. Large component values indicate that something is wrong with these individual variables or with the linear relationships among the variables. Seems like a powerful tool, and you promise yourself to study this topic in more detail at a later date. You initialize the Phase II module of your computer program. After loading your historical data file, you enter the new data and observe the T2 plot containing upset conditions. Again, you are amazed. You can actually see that the upset conditions started when the T2 points began to drift towards the UCL. You begin decomposing the signaling T2 values, but something is wrong. The first signal indicates that fuel to the furnace is a problem. You quickly check the observation number and return to the time-sequence plot. Sure enough, there is a definite increase in fuel usage. This seems odd, since usage rates were not increased. Returning to the T2 decomposition, you note another variable that contributes to the signal, the oxygen level in the vent. When the effects of fuel usage and oxygen level are removed, you observe that there is no signal in the remaining observations. You quickly return to your computer graphic module and plot oxygen against time. There it is! The amount of oxygen in the vent shows an increase during the upset period. What could cause an increase in the fuel usage and the oxygen level in the vent at the same time? Then it hits you, the draft is wrong on the furnace. You pick up the telephone, and call the lead operator and pass on your suspicion.
171
172
Chapter 9. Improving the Sensitivity of the T2 Statistic
As you wait for the return call, your anxiety level begins to rise. Unlike this morning, there is no edge of panic. Only the unanswered question of whether this is the solution to the problem. As the minutes slowly tick away, the boss appears with his customary question, "Have we made any progress?" At that moment the telephone rings. Without answering him, you reach for the telephone. As the lead operator reports the findings, you slowly turn to the boss and remark, "Old Blue is back on line and running fine." You can see the surprise and elation as the boss leaves for his urgent meeting. He yells over his shoulder, "You need to tell me later how you found the problem so quickly."
9.1 Introduction
In Chapter 8, a number of properties of the MYT decomposition were explored for use in the interpretation of a signaling T2 statistic. For example, through the decomposition, we were able to locate the variable or group of variables that contributed to the signal. The major goal of this chapter is to further investigate ways of using the decomposition for improving the sensitivity of the T2 in signal detection. Previously, we showed the T2 statistic to be a function of all possible regressions existing among a set of process variables. Furthermore, we showed that the residuals of the estimated regression models are contained in the conditional terms of the MYT decomposition. Large residuals produce large T2 components for the conditional terms and are interpreted as indicators of counterrelationships among the variables. However, a large residual also could imply an incorrectly specified model. This result suggests that it may be possible to improve the performance of the T2 statistic by more carefully describing the functional relationships existing among the process variables. Minimizing the effects of model misspecification on the signaling ability of the T2 should improve its performance in detecting abrupt process shifts (see Mason and Young (1999)). When compared to other multivariate control procedures, the T2 lacks the sensitivity of detecting small process shifts. In this chapter, we show that this problem can be overcome by monitoring the error residuals of the regressions contained in the conditional terms of the MYT decomposition of the T2 statistic. Furthermore, we show that such monitoring can be helpful in certain types of on-line experimentation within a processing unit.
9.2
Alternative Forms of Conditional Terms
In Chapter 8, we defined a conditional term of the MYT decomposition as
This is the square of the jth variable of the observation vector adjusted by the estimates of the mean and variance of the conditional distribution of Xj given #1,
9.2. Alternative Forms of Conditional Terms #2, i xj-i. We later showed that (9.1) could be written as
173
This was achieved by noting that j-i,2,...,j-i can be obtained from the regression of Xj on xi, # 2 , . . . ixj-i'i i- e -7
where bj are the estimated regression coefficients. Since j.i,2,...,j-i is the predicted value of .Xj, the numerator of (9.1) is the raw regression residual,
given in (9.2). Another form of the conditional term in (9.1) is obtained by substituting the following quantity for the conditional variance contained in (9.2); i.e., by substituting where R? l 2 This yields ~_i is the squared multiple correlation between Xj and x\. x^,..., j-i.
Much information is contained in the conditional terms of the MYT decomposition. Since these terms are, in fact, squared residuals from regression equations, they can be helpful in improving the sensitivity of the T2 statistic in detecting both abrupt process changes and gradual shifts in the process. Large residuals in a regression analysis can be caused by using incorrect functional forms for the variables. For example, this may occur when one uses a linear term instead of an inverse term. Knowledge of this property may be useful in improving the sensitivity of the T2 in signal detection. By carefully describing the functional relationships existing among the process variables before constructing a multivariate control procedure, the resulting residuals and corresponding conditional T2 should be smaller in size. Since the effects of model misspecification on the signaling ability of the T2 would be minimized, the performance of the T2 statistic in signal detection should improve. Most variable specification problems can be eliminated by choosing the correct form of the process variable to monitor when the in-control historical database is being constructed. Process engineers often make such choices using theoretical knowledge of the process. The appropriate functional forms of the variables are selected at this time and may involve transforming the original variables to such new forms as a logarithmic or an inverse function. An analyst also may improve model specification through the use of data exploration techniques. We focus on
174
using both approaches to improve model specification, and thereby increase the sensitivity of the T2 statistic.
9.3
Improving Sensitivity to Abrupt Process Changes
When decomposing a specific T2 value that signals in a multivariate control chart, we seek to find which of its conditional and unconditional T2 terms are large. These terms are important because they indicate which variable or variable relationship is causing the signal. A conditional T2 term will be large if either the residual in the numerator of (9.5) is large or the R2 value in the denominator of (9.5) is near unity when the residual in (9.5) is marginal in size. Since an R2 value that approaches 1 in magnitude is generally an indicator of an extreme collinearity between a subset of the process variables, it is often recommended in such cases to eliminate the associated redundant process variables when constructing the incontrol HDS. Unfortunately, we cannot always eliminate such collinearities, as they may be inherent to the process. In the discussions below, we will focus on the problem of large residuals in the regressions associated with a signaling T2 value, but illustrate in an example how inherent process collinearities can actually be helpful in interpreting the T2 statistic. Large residuals in the numerator of the conditional T2 term in (9.5) occur when the observation on the jth variable is beyond the error range of its predicted value. If the regression equation for Xj provides a poor fit (i.e., a large prediction error associated with a small R2 value), large residuals may not be significant. In this situation it is possible, due to the large error of the regression fit, to accept an out-of-control signal as being in control. This would occur because of the large error in the fit of the regression equation rather than because of a process shift. Identifying such situations should lead to improvements in the regression model and, ultimately, in the sensitivity of the T2 statistic. How to do this in an efficient manner is an issue requiring further exploration. As noted in Chapter 8, the T2 statistic can be decomposed into p terms in pi ways. Removing redundant terms, there remain p(2p~1 1) unique conditional T2 components. The residuals of the regressions of an individual component on all possible subgroups of the remaining p 1 variables are contained in these terms. Since many different regression models are needed to obtain the conditional T2 components, it would be inefficient to seek the optimal model specification for each case. This would require a separate set of variable specifications for each model. A better solution would be to transform the data prior to any analyses using common functional forms for the variables requiring transformation. This is demonstrated in section 9.5. This approach is simplified if one has theoretical or expert knowledge about the process variables and their relationships to one another. If such information is not attainable, it may be necessary to analyze the in-control historical database or the current process data in order to determine these relationships. This is illustrated in section 9.6. In either case, model specification
9.4. Case Study: Steam Turbine
175
should lead to smaller residuals in (9.5). This is demonstrated in the following data examples.
9.4
Case Study: Steam Turbine
In signal detection, process knowledge can be a great aid in enhancing the performance of the T2 statistic. As an example of this approach, we consider the following case study. Large industries often use electricity and steam as energy sources in their production processes. It is not unusual for these companies to use their own generating facilities to produce both forms of energy, as well as to purchase energy from local utility companies or cogeneration facilities. A common method of producing electricity is by the use of a steam turbine; a generic description of a closed-loop steam turbine process was presented in section 6.3 and illustrated in Figure 6.3. The process for generating electricity begins with the input of fuel to the boiler. In this example, the boiler uses natural gas as the fuel source, and converts water into steam. The high-pressure steam is passed through the turbine, and the turbine turns the generator to produce electricity. Electricity is measured in megawatts, where a megawatt is 1 million watts. The work done by the turbine in generating electricity removes most of the energy from the steam. The remaining, or residual, steam is sent to a condenser, where it is converted back to water. The water in turn is pumped back to the boiler and the process repeats the loop. The condenser is a large consumer of energy, as it must convert the latent heat of vaporization of the steam to water. To operate efficiently, the condenser operates under a vacuum. This vacuum provides a minimum back-pressure and allows for the maximum amount of energy to be extracted from the steam before it is condensed. The condenser must have available some type of cooling agent. In this example, river water is used to condense the steam. However, the temperature of the river water is inconsistent, as it varies with the seasons of the year. When the river water temperature is high in the summer months, the absolute pressure of the condenser is also higher than when the river water temperature is low in the winter months. In this process, fuel is the input variable and megawatt-hour production is the output variable. Load change, in megawatt-hour production, is accomplished by increasing or decreasing the amount of steam from the boiler to the turbine. This is regulated by the amount of fuel made available to the boiler.
9.4.1
The Control Procedure
The steam-turbine process control procedure serves two purposes. The first is to monitor the efficiency of the unit. Turbines are run on a continuous basis and large turbines require large amounts of fuel. A decrease as small as 1% in the efficiency of a unit, when annualized over a year of run time, can greatly increase operating costs. Thus, it is desired to operate at maximum efficiency at all times. The second purpose of the control procedure is to provide information as to what part of the overall system is responsible when upset conditions develop. For example, a
176
Figure 9.1: Time-sequence plot of river water temperature.
problem may occur in the boiler, the turbine-generator, or the condenser. Knowing where to look for the source of a problem is very important in large systems such as these. The overall process is monitored by a T2 statistic on observations taken on the following key variables: (1) F = fuel to the boiler, (2) S = steam produced to the turbine, (3) ST = steam temperature, (4) W or MW (Megawatts) = megawatts of electricity produced, (5) P = absolute pressure or vacuum associated with the condenser (6) RT = temperature of the river water
9.4.2
Historical Data Set
Historical data sets for a turbine system, such as the one described above, contain observations from all time periods in a given year. One of the main reasons for this is the changing temperature of the river water. To show the cyclic behavior of river water temperature, we present a graph of the temperatures recorded over a two-year period in Figure 9.1. Note that river temperature ranges over a 45-degree span during a given year. Consultation with the power engineers indicates that the unit is more efficient in the winter than in the summer. The main reason given for this is the lower absolute pressure in the unit condenser in the winter months. If the absolute pressure is lower, the energy available for producing megawatts is greater, and less
177
Figure 9.2: Time-sequence plot of megawatt production.
fuel is required to produce a megawatt. The reverse occurs in the summer months, when the river water temperature is high, as more fuel is required to produce a megawatt of electricity. Typical HDSs for a steam turbine are too large to present in this book. However, we can present graphs of the individual variables over an extended period of time. For example, Figure 9.2 presents a graph of megawatt production for the same time period as is given in Figure 9.1. The irregular movement of megawatt production in this plot indicates the numerous load changes made on the unit in the time period of operation. This is not uncommon in a large industrial facility. For example, sometimes it is less expensive to buy electricity from another supplier than it is to generate it. If this is the situation, one or more of the units, usually the most expensive to operate, will take the load reduction. Fuel usage, for the same time period as in Figures 9.1 and 9.2, is presented in Figure 9.3. Perhaps a more realistic representation is contained in Figure 9.4, where a fuel usage plot is superimposed on an enlarged section of the graph of megawatt production. For a constant load, the fuel supplied to the boiler remains constant. However, to increase the load on the generator (i.e., increase megawatt production), additional fuel must be supplied to the boiler. Thus, the megawatt production curve moves upward as fuel usage increases. We must use more fuel to increase the load than would be required to sustain a given load. The opposite occurs when the load is reduced. To decrease the load, the fuel supply is reduced. The generator continues to produce megawatts, and the load curve follows the fuel graph downwards until a sustained load is reached. In other words, we recoup the additional cost to increase a load when we reduce the load.
178
Figure 9.3: Time-sequence plot of fuel consumption.
Figure 9.4: Time-sequence plot of megawatts and fuel. Steam production over the given time period is presented in Figure 9.5. Examination of this graph and the megawatt graph of Figure 9.2 shows the expected relationship between the two variables: megawatt production follows steam production. Again, the large shifts in the graphs are due to load changes. Steam temperature over the given time period is presented in Figure 9.6. Note the consistency of the steam temperature values. This is to be expected, since steam temperature does not vary with megawatt production or the amount of steam produced.
179
Figure 9.5: Time-sequence plot of steam production.
Figure 9.6: Time-sequence plot of steam temperature.
The absolute pressure or vacuum on the condenser is presented in Figure 9.7. Note the similarity of this plot to the one presented in Figure 9.1 for river water temperature. Previous discussions have described the relationship between these two variables.
180
Figure 9.7: Time-sequence plot of absolute pressure.
9.5
Model Creation Using Expert Knowledge
In this section, we will demonstrate the use of theoretical or expert knowledge in the construction of a control procedure for a steam turbine. For ease of presentation, we will use only a few of the variables monitored in the operation of these units. Two primary variables common to this system are the amount of fuel (F), or natural gas, consumed, and the production of electricity as measured in megawatts (W). A scatter plot of these two variables for an HDS collected on a typical unit is presented in Figure 9.8. Without additional information, one might assume that a linear relationship exists between the two variables, fuel and megawatts. Such an input/output (I/O) curve would be based on a model having the form
where the oti are the unknown regression coefficients. For example, the correlation coefficient for the data plotted in Figure 9.8 is 0.989, indicating a very strong linear relationship between the two variables. The theory on steam turbines, however, indicates that a second-order polynomial relationship exists between F and W. This is described by an I/O curve defined
by
where the /3j are the unknown coefficients. Without knowledge of this theory, the power engineer might have used a control procedure based only on the simple linear model given in (9.6). To demonstrate how correct model specification can be used to increase the sensitivity of the T2 statistic, suppose we compare these two models. Treating
9.5. Model Creation Using Expert Knowledge
181
Figure 9.8: Scatter plot of fuel consumption versus megawatts.
the observations plotted in Figure 9.8 as an in-control historical database, the coefficients of the linear equation given in (9.6) were estimated and the following equation was obtained: Similarly, the following equation was obtained using the I/O curve given in (9.7):
A comparison of the graphs of these two functions is presented in Figure 9.9. The use of the linear model given in (9.6) implies that fuel usage changes at a constant rate as the load increases. Use of a linear relationship between the fuel and power variables implies that fuel usage would remain the same regardless of the power. We wish this were the case. Unfortunately, in operating steam turbines, more fuel is required when the load increases. Only the quadratic I/O curve describes this type of relationship. In Figure 9.9, both functions provide a near-perfect fit to the data. The linear equation has an R2 value of 0.9775, while the quadratic equation has an R2 value of 0.9782. Although the difference in these R2 values is extremely small, the two curves do slightly deviate from one another near the middle and at the endpoints of the range for W. In particular, the linear equation predicts less fuel usage at the ends and more in the middle than the quadratic equation. In these areas, there could exist a set of run conditions acceptable using the linear model in the T2 statistic but unacceptable using the quadratic I/O model. Since the quadratic model is theoretically correct, use of it should improve the sensitivity of the T2 statistic to signal detection in the described region.
182
Figure 9.9: Comparison of models for fuel consumption.
To demonstrate this result, suppose that a new operating observation on F and W is available and is given by
The corresponding T2 statistic based on only the two variables (F, W) has a value of 18.89. This is insignificant (for a = 0.001) when compared to a critical value of 19.56 and indicates that there is no problem with the observation. This is in disagreement with the result using the three variables x\ F, x^ = W, and 3 = W2. The resulting T2 value of 25.74 is significantly larger than the critical value of 22.64 (for a = 0.001). Investigation of this signal using the T2 decomposition indicates that T 2 3 = 18.54 and T22 3 = 14.34 are large. The large conditional T23 term indicates that there is a problem in the relationship between F and W 2 . It would appear that the value F = 9675 is smaller than the predicted value based on a model using W2 and the HDS. The large conditional T22 3 term indicates something is wrong with the fit to the I/O model in (9.8). It appears again that the fuel value is too low relative to the predicted value. The ability of the quadratic model to detect a signal, when the linear model failed and when both models had excellent fits, is perplexing. When comparing the two models in Figure 9.9, the curves were almost identical except in the tails. This result occurred because the correlation between W and W2 was extremely high (i.e., R2 = 0.997), indicating that these two variables were basically redundant in the HDS. If initial screening tools had been used in analyzing this data set, the severe collinearity would have been detected and the redundant squared megawatt variable probably would have been deleted. However, because of theoretical knowledge about the process, we found that the I/O model needed to be quadratic in the megawatt variable. Thus, the collinearity was an inherent part of the process and
9.6. Model Creation Using Data Exploration
183
could not be excluded. This information helped improve model specification, which reduced the regression residuals and ultimately enhanced the sensitivity of the T2 statistic to a small process shift. As an additional note, the (I/O) models created for a steam turbine control procedure can play another important role in certain situations. Consider a number of units operating in parallel, each doing its individual part to achieve a common goal. Examples of this would be a powerhouse consisting of a number of steam turbines used in the generation of a fixed load, a number of pumps in service to meet the demand of a specific flow, and a number of processing units that must process a fixed amount of feedstock. For a system with more than one unit, the proper division of the load is an efficiency problem. Improper load division may appreciably decrease the efficiency of the overall system. One solution to the problem of improper load division is to use "equal-incrementalrate formulation." The required output of such a system can be achieved in many ways. For example, suppose we need a total of 100 megawatts from two steam turbines. Ideally, we might expect each turbine to produce 50 megawatts, but realistically this might not be the most efficient way to obtain the power. For a fixed system output, equal-incremental-rate formulation divides the load among the individual units in the most economic way; i.e., it minimizes the amount of input. This requires the construction of I/O models that accurately describe all involved units. It can be shown that the incremental solution to the load division problem is given at the point where the slopes of the I/O curves are equal.
9.6
Model Creation Using Data Exploration
Theoretical knowledge about a process is not always available. In such situations, it may become necessary to use data exploration techniques to ensure that the correct functional forms are used for the variables examined with the in-control historical data. Alternative models are posed based on global variable specifications that hold across all the possible regressions associated with the T2 statistic. As an example of the use of data exploration techniques, consider the condensing unit (condenser) used to convert the low-temperature steam vapor leaving a steam turbine into a liquid state. This is achieved by exposing the vapor to a coolant of lesser temperature in the condensing unit. The condensed liquid is sent to a boiler, transformed to steam, and returned to run the turbine. A control procedure on the condensing unit provides valuable information in that it provides a check on turbine performance as well as a check on the performance of the condensing unit. A generic description of how such a unit works is given below. High-temperature steam is used as the energy source for a steam turbine. As the steam moves through the turbine, its energy is removed and only warm steam vapor is left. This warm vapor has a strong tendency to move to the cool condensing unit and is enhanced by the creation of a slight vacuum. The temperature of the vapor effects this total vacuum. Thus, the vacuum as measured by the absolute pressure becomes an indicator of turbine performance. For example, if the turbine removes all the energy from the steam at a given load, a strong vacuum is needed to move
184
Chapter 9. Improving the Sensitivity of the T2 Statistic Table 9.1: Partial HDS for condensing unit data.
Obs. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Temp. 56 60 63 70 68 73 78 78 82 85 86 86 88 86 92
Megawatts 45 45 28 38 40 45 41 45 25 29 35 45 45 45 45
Vacuum 0.92 1.13 0.97 1.26 1.14 1.54 1.64 1.86 1.46 1.73 2.04 2.46 2.68 2.41 2.80
Obs. No. 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Temp. 89 94 92 92 85 76 77 74 68 66 74 67 51 51 56
Megawatts 45 45 45 45 45 20 34 45 39 46 33 20 30 24 45
Vacuum 2.54 2.89 3.11 2.87 2.28 1.05 1.62 1.77 1.17 1.28 1.25 0.75 0.55 0.44 0.97
the vapor to the condensing unit. However, if the turbine allows hot steam to pass (i.e., running too hot), less vacuum is needed to move the vapor. The vacuum is a function of the temperature of the coolant and the amount of coolant available in the condensing unit. A lower temperature for the coolant, which is river water, increases the tendency of the warm vapor to move to the condenser. The amount of the coolant available depends on the cleanness of the unit. For example, if the tubes that the coolant passes through become clogged or dirty, inhibiting the flow, less coolant is available to help draw the warm steam vapor through the unit and, hence, create a change in the vacuum. Without data exploration, one might control the system using the three variables, vacuum (V), coolant temperature (T), and megawatt load (W). The conditional term, TVTW, would contain the regression of vacuum on temperature and megawatt load. In equation form, this is given as
where the bi are estimated constants. The sensitivity of the T2 statistic for the condensing unit will be improved if the regression of vacuum on temperature and megawatt load is improved, as TVTW is an important term in the decomposition of this statistic. The theoretical functional form of this relationship is unknown, but it can be approximated using data exploration techniques. The HDS for this unit contains hundreds of observations on the three process variables taken in time sequence over a time period of one year. For discussion purposes, a partial HDS consisting of 30 points is presented in Table 9.1. However, our analyses are based on the overall data set. Results for the fitting of the regression model given in (9.9) for the overall data set are presented in Table 9.2. The R2 value of 0.9517 in Table 9.2 indicates a good fit to this data. However, the standard error of prediction has a value of 0.1658,
9.6. Model Creation Using Data Exploration Table 9.2: Regression statistics for vacuum model.
Regression Statistics 0.9517 R2 Adjusted R2 0.9514 Standard Error 0.1658 Observations 305
185
Figure 9.10: Standardized residuals versus time for vacuum model.
which is more than 5% of the average vacuum. This is considered somewhat large for this type of data and, if possible, needs to be reduced. A graph of the standardized residuals for the model in (9.9) is presented in Figure 9.10. There is a definite cyclical pattern in the plot. This needs to be dampened or removed. Figure 9.11 contains a graph of the megawatt load on the generator over this same time period. From the graph, it appears that the generator is oscillating over its operation range throughout the year, with somewhat lower loads occurring at the end of the year. A quick comparison of this graph to that of the standardized residuals in Figure 9.10 gives no indication of a connection to the cyclic nature of the errors. An inspection of the graph of the vacuum over time, given in Figure 9.12, indicates a strong relationship between vacuum and time. This can be explained by noting the seasonal variation of the coolant temperature displayed in Figure 9.5. The similar shapes of the graphs in Figures 9.12 and 9.13 also indicate that a strong relationship exists between the vacuum and temperature of the coolant. This is confirmed by the correlation of 0.90 between these two variables.
186
Figure 9.11: Time-sequence plot of megawatt.
Figure 9.12: Time-sequence of vacuum.
Figure 9.13: Time-sequence plot of coolant temperature.
9.6. Model Creation Using Data Exploration Table 9.3: Regression statistics for revised vacuum model.
Regression Statistics 0.9847 0.9844 Adjusted R2 Standard Error 0.0939 Observations 305
R2
187
Figure 9.14: Standardized residual plot for revised vacuum model. The curvature in the above series of plots suggests the need for squared terms in temperature and load in the vacuum model. We also will add a cross-product term between the temperature and W , since this will help compensate for the two variables varying together. In functional form, the model in (9.9) is respecified to obtain the prediction equation
where the bi are estimated constants. A summary of the regression statistics is presented in Table 9.3. The value of R2 for the regression fit is 0.9847, which is an improvement over the previous model. Also, the standard error of prediction, 0.0939, is smaller than the previous error and is within the 5% range of the average value of the vacuum. A graph of the standardized residuals is presented in Figure 9.14. Included on the plot are lower and upper control limits, placed at 3 and +3, to help identify any outlying residuals. Some, but not all, of the oscillating of the residuals over time is removed by including the cross-product term in the revised vacuum model.
188
Overall, the model specified in (9.10) is considered superior. It was decided to use this model, even though it too can be improved. It also stays within the global specification of the variables that are used on other units, such as the boiler, that are associated with the steam turbine.
9.7
Improving Sensitivity to Gradual Process Shifts
The T2 statistic is known to be less sensitive to small process shifts. Due to the fact that it is a squared value, it lacks the run rules (Montgomery (2001)) that are prevalent in univariate Shewhart charts. One suggestion for overcoming this problem is to examine trend patterns in the plots of the square root of the associated conditional term given by
The upper and lower control limits for these plots are given by
where k = j I is the number of terms in the regression equation and t(a j n _/ c _ 1 ) refers to the 100(1 a/2)th percentile of the Student t distribution with n k 1 degrees of freedom. The percentage points of the Student t distribution are given in Table A.2 in the appendix. An alternative form of (9.11) is
where MSB is the mean squared error from the regression fit of Xj on x\, # 2 , . . . , Xj-i. With this form it is easy to see that, apart from the constant term in the denominator, the square root of the conditional T2 term is simply a standardized residual from the above regression fit. Hence, rather than plot the conditional term, we can simply plot the standardized residuals
Any large deviation in these standardized residuals will be detected by a signaling conditional T2 statistic. However, small but consistent changes in the residuals, where the square root of the conditional T2 statistic is within its control chart error limits, may not be detected. In such situations, a plot of the standardized residuals against time can be used to detect nonrandom trends, and these plots can
9.7. Improving Sensitivity to Gradual Process Shifts
189
Figure 9.15: Standardized residual plot for in-control process.
be subjected to various run rules to determine out-of-control situations. We will demonstrate the advantages of using such plots to detect small process shifts with the following example. Fuel consumption is an important variable for a large generator of electricity. Consumption is monitored using an overall T2 control procedure along with two other variables: megawatt production and squared megawatt production. However, small but consistent shifts in performance, indicated by increases or decreases in fuel consumption, can go undetected. Since the generator is large, with an average fuel consumption of nearly a million fuel units per hour of operation, an increase in usage of 1% would mean a need for an additional 10,000 units of fuel per hour. This translates into a considerable increase in annual fuel costs. Thus, it is most important to detect small changes in performance that lead to greater fuel consumption. The regression model relating fuel consumption and megawatt production is given in (9.7) and is incorporated in the conditional term, Tj- w W2. To demonstrate the actual fit of the estimated model for this term, a plot of the standardized residuals corresponding to this conditional T2 over a typical run period is given in Figure 9.15. Lower and upper control limits have been placed at 3 and +3 to help identify outlying residuals. A review of this graph indicates how well the model is predicting fuel consumption for a given load (megawatt production). The peaks in this graph, beyond +/3, indicate out-of-control points where a severe load change occurred. For example, to make a substantial load increase on the turbine, additional fuel quantities must be supplied. Since the load rises slowly, there is a gap between the time when the fuel is consumed and the time when the megawatts are produced. Likewise, when the load is decreased, the fuel is reduced first, but the generator drops slowly and provides more megawatts than is expected for the incoming fuel. This produces the peaks in the run chart.
190
Figure 9.16: Standardized residual plot for out-of-control process. As a note of interest, HDSs for a steam turbine contain a year of data taken over the normal operational range (megawatt production) of the unit and have these outliers (peaks) removed. The data in Figure 9.15 are very indicative of the performance of a unit operating at maximum efficiency. Monitoring the performance of a steam turbine can be accomplished by examining incoming observations (F, W, W ) and computing and charting the overall T2 statistic. This statistic indicates when an abrupt change occurs in the system and can be used as a diagnostic tool for determining the source of the signal. In addition, the standardized residuals from the fit of F to W and W2 can be plotted in a Shewhart chart and monitored to determine if small process changes have occurred in the fuel consumption. The residual plot in Figure 9.16, with its accompanying control limits at +/3, represents a time period when a small process change occurred in the operation of the steam turbine. Residual values plotted above the zero line indicate that fuel usage exceeded the amount established in the HDS, while those plotted below the zero line indicate that the opposite is true. Thus, a run of positive residuals indicates that the unit is less efficient in operation than that established in the baseline period. In contrast, a run of negative residuals indicates that the unit is using less fuel than it did in the baseline period. The trends in the graph in Figure 9.16 indicate that the unit became less efficient around the time period labeled "upset." At that point, the residuals moved above the zero line, implying fuel usage was greater than that predicted by the model given in (9.7). Notice that, while this pattern of positive residuals is consistent, the residuals themselves are well within the control limits of the chart. The only exceptions are the spikes in the plot, which occur with radical load changes. Another example of using trends from plots of the conditional terms of the T2 to detect small process shifts is given in Figure 9.17. This is a residual plot using the regression model given in (9.10) for the vacuum on a condensing unit. The upset condition, indicated with a label on the graph, occurred when a technician inadvertently adjusted the barometric gauge used in the calculation of the absolute
9.8. Summary
191
Figure 9.17: Standardized residual plot of vacuum model with an upset condition. pressure. After that point, the residual values shift upward, although they remain within the control limits of the standardized residuals. Plots of standardized residuals, such as those in Figures 9.15-9.17, provide a useful tool for detecting small process shifts. However, they should be used with an overall T2 chart in order to avoid the risk of extrapolating values outside the operational range of the variable. While we seek to identify systematic patterns in these plots, a set of rigorous run rules is not yet available. Thus, we recommend that a residual pattern be observed for a extensive period of time before taking action. The proposed technique using the standardized residuals of the fitted models associated with a specific conditional T2 term is not a control procedure per se. Rather, it is a tool to monitor process performance in situations where standard control limits would be too wide. Any consistent change in the process is of interest, and not just signaling values. While there is some subjectivity involved in the determination of a trend in these plots, process data often is received continuously, so that visual inspection of runs of residuals above or below zero is readily available.
9.8
Summary
The T2 statistic can be enhanced by improving its ability to detect (1) abrupt process changes as well as (2) gradual process shifts. Abrupt changes can be better identified by correctly modeling in Phase I operations the functional relationships existing among the variables. One means of doing this is by examining the square root of the conditional terms in the T2 decomposition or the corresponding related regression residual plots. These values represent the corresponding standardized residuals obtained from fitting a regression model. Gradual process shifts can be
192
identified better in Phase II operations by monitoring the trends in the conditional T2 terms or standardized residuals of these fits for all incoming observations. A necessary first step in the application of any multivariate control procedure is to thoroughly explore the process data. In addition, we also strongly recommend consulting with the process engineer so that a sound control procedure is established. If the model that is inherent to a particular T2 statistic can be improved, it will require information about the functional relationships between the process variables. This approach is summarized in the following quote taken from Myers (1990): "We cannot ignore input from experts in the scientific discipline involved. Statistical procedures are vehicles that lead us to conclusions; but scientific logic paves the road along the way.... [F]or these reasons, a proper marriage must exist between the experienced statistician and the learned expert in the discipline involved." Graphical output is very valuable when monitoring a multivariate process. Control charts based on the overall T2 statistic are helpful in isolating signaling observations. In addition, plots corresponding to the important conditional terms in a T2 decomposition are useful in detecting small process shifts. This is best achieved using process knowledge to select the important conditional terms to be plotted.
Chapter 10
Autocorrelation in T2 Control Charts
10.1
Introduction
Development and use of the T2 as a control statistic for a multivariate process has required the assumption of independent observations. Certain types of processing units may not meet this assumption. For example, many units produce timedependent or autocorrelated observations. This may be due to factors such as equipment degradation, depletion of critical process components, environmental and industrial contamination, or the effect of an unmeasured "lurking" variable. The use of the T2 as a control statistic, without proper adjustment for a time dependency, can lead to incorrect signals (e.g., see Alt, Deutsch, and Walker (1977) or Montgomery and Mastrangelo (1991)). In Chapter 4 we discussed detection procedures for autocorrelated data. These included examination of trends in time-sequence plots of individual variables and the determination of the pairwise correlation between process variables and a categorical time-sequence variable. In this chapter, we add a third procedure. We show that special patterns occurring in the graph of a T2 chart can be used to indicate the presence of autocorrelation in the process data. We also demonstrate that if autocorrelation is detected and ignored, one runs the risk of weakening the overall T2 control procedure. This happens because the main effect of the autocorrelated variable is confounded with the time dependency. Furthermore, relationships with other variables may be masked by the time dependency. When autocorrelation is present, an adjustment procedure is needed in order to obtain a true picture of process performance. In a univariate setting, one such adjustment involves modeling the time dependency with an appropriate autoregressive model and examining the resulting regression residuals. The residuals are free of the time dependency and, under proper assumptions, can be shown to be independent and normally distributed. The resulting control procedure is based on these autoregressive residuals (e.g., see Montgomery (2001)).
193
194
Chapter 10. Autocorrelation in T2 Control Charts
In this chapter, a somewhat similar solution is applied to the more complex multivariate problem where we have to contend with a time dependency in multiple variables and determine how these variables relate to the other process variables. We first demonstrate how a time dependency can hide the true linear relationships existing between the process variables. We then present a procedure for adjusting the variables of an observation vector for a time dependency. As in the univariate case, the control procedure based on the adjusted observations is shown to be free of the time dependency.
10.2 Autocorrelation Patterns in 7"2 Charts

To illustrate the effect of autocorrelated observations on the behavior of a T2 statistic, we first consider the behavior of the T2 statistic in a bivariate setting. Suppose two variables, (xi, 0:2), are measured on an industrial process that exhibits no autocorrelation. Time-sequence plots of the observations on each variable, including the corresponding mean lines, are presented in Figures 10.1 and 10.2. Note that there is no upward or downward trend in the points over the time span. The graph of the corresponding T2 statistic, computed for the observation vector Oi, x 2 ), using is presented in Figure 10.3. As expected for a statistic based on random error, no systematic patterns occur in the graph. The T2 values are very close to the zero value, and this indicates that the location of the process is being maintained at the value of the mean vector of the HDS. There is only random fluctuation and very little variation in the T2 statistic except for one signaling point. Consider a bivariate observation vector (xi, x^) from a process that decays over time. In this type of process, both variables exhibit a linear trend over the sampling period. Time-sequence plots of the observations on each variable, including their
Figure 10.1: Time-sequence plot of variable x\ with no time dependency.
10.2. Autocorrelation Patterns in T2 Charts
195
Figure 10.2: Time-sequence plot of variable x^ with no time dependency.
Figure 10.3: T chart for data without time dependency. corresponding mean line, are presented in Figures 10.4 and 10.5. Note the upward trend in the plot of the points in both graphs. To investigate the effects of these two autocorrelated variables on the behavior of the T2 statistic, we examine in Figure 10.6 a graph of the corresponding T2 statistic. Observe the very slight, [/-shaped curvature in the graph of the statistic over the operational range of the two variables. Note also the large variation in the T2 values and the absence of numerous values close to zero. This is in direct contrast to the trends seen in Figure 10.3 for the variables with no time dependency. Since the T2 statistic should exhibit only random fluctuation in its graph, further examination is required in order to determine the reason for this systematic pattern. The plots in Figures 10.4 and 10.5 of the correlated data indicate the presence of large deviations from the respective mean values of both variables at the beginning and end of their sampling periods. Since the T2 is a squared statistic, such a
196
Figure 10.4: Time-sequence plot of variable x\ with time dependency.
Figure 10.5: Time-sequence plot of variable x2 with time dependency. trend produces large T2 values. For example, while deviations below the mean are negative, squaring them produces large positive values. As the variables approach their mean values (in time), the value of the T2 declines to smaller values. The curved [/-shaped pattern in Figure 10.6 is thus due to the linear time dependency inherent in the observations. This provides a third method for detecting process data with a time dependency. Autocorrelation of the form described in the previous example is a cause-effect relationship between the process variables and time. The observation on the process variable is proportional to the variable at some prior time. In other cases this time relationship may be only empirical and not due to a cause-and-effect relationship. In this situation, the current observed value is not determined by a prior value, but only associated with it. The association is usually due to a "lurking variable."
10.2.
Autocorrelation Patterns in T2 Charts
197
Figure 10.6: T2 chart with time dependency.
Figure 10.7: Time-sequence plot of process variable with cyclical time effect. Consider the cyclic nature of the process variable depicted in Figure 10.7. The cyclical or seasonal variation is due to the rise and fall of the ambient temperature for the corresponding time period. This is illustrated in Figure 10.8. Cyclical or seasonal variation over time is assumed to be based on systematic causes; i.e., the variation does not occur at random, but reflects the influence of "lurking" variables. Variables with a seasonal effect will have a very regular cycle, whereas variables with a cyclical trend may have a somewhat irregular cycle. Such trends will be reflected in the T2 chart, and the curved [/-shaped pattern seen previously in other T2 charts may have short cycles. A T2 chart including the cyclical process variable in Figure 10.7 with no adjustment for the seasonal trend is presented in Figure 10.9. Close examination of the run chart reveals a cyclic pattern due to the seasonal variation of the ambient temperature. As the temperature approaches its maximum and minimum values, the
198
Figure 10.8: Seasonal trend of temperature in a time-sequence plot.
Figure 10.9: T2 chart containing seasonal trend; UCL = 5.99.
T2 statistic moves upward. When the temperature approaches its average value, the T2 moves toward zero. Also notice the excess variation due to the temperature swings in the T2 values. The above examples illustrate a number of the problems occurring with autocorrelated data and the T2 statistic. Autocorrelation produces some type of systematic pattern over time in the observations on the variables. If not corrected, the patterns are transformed to nonrandom patterns in the T2 charts. As illustrated in the following sections, these patterns can greatly affect signals. The presence of autocorrelation also increases variation in the T2 statistic. This increased variation can smother the detection of process movement and hamper the sensitivity of the T2 statistic to small but consistent process shifts. As in other statistical procedures, nonrandom variation of this form is explainable, but it can and should be removed.
10.3. Control Procedure for Uniform Decay
199
10.3
Control Procedure for Uniform Decay
The time dependency between process observations can take different forms. A common form is that of uniform or continuous decay. This type of autocorrelation occurs when the observed value of the process variable depends on some immediate past value. For example, the variables displayed in Figures 10.4 and 10.5 both display uniform or continuous decay. Due to the linear trend in the graphs over time, the present value of the observation can be predicted from a past value using an appropriate first-order autogressive, AR(1), model (i.e., a regression model including a linear time variable). Note that continuous or uniform decay does not have to occur in a linear fashion. The relationship could be of another functional form, such as quadratic; this would produce a second-order autoregressive, AR(2), model (i.e., a regression model including both a quadratic and a linear time variable). We now construct a multivariate control procedure for processes that exhibit uniform decay. The variation about the mean line of a variable, with no time dependency and no relationship to "lurking" variables, is due only to random fluctuation. Examples are exhibited in Figures 10.1 and 10.2. In contrast, the variation about the mean line of a variable with a time dependency, as shown in Figures 10.4 and 10.5, is due to both random error and the time dependency; i.e.,
To accurately assess the T2 statistic, this time effect must be separated and removed from the random error. As an example, reconsider the data for x\ given in Figure 10.4. The time dependency can be explained by a first-order autoregressive model,
where /30 and (3\ are the unknown regression coefficients, x\^ is the current observation, and xi,t_i is the immediate prior observation. Since the mean of x\, conditioned on time, is given by
the time effect can be removed using
The above relationship suggests a method for computing the T2 statistic for an observation vector with some observations exhibiting a time dependency. This is achieved using the formula
where X't = ( x i \ t , x 2 | t , . . . , x p \ t ] represents the sample mean of X conditioned on time. For those variables with no time dependencies, Xj\t would simply reduce
200
to the unadjusted mean, Xj. However, for those variables with a first-order time dependency, Xj\t would be obtained using a regression equation based on the model in (10.2) or some similar autoregressive function. Thus, for an AR(1) process,
when no time dependency is present and
when a time dependency is present, where &o and b\ are the estimated regression coefficients. The common estimator of S for a sample of size n is usually given as
where X is the overall sample mean. However, if some of the components of the observation vector X have a time dependency, S also must be corrected. This is achieved by taking deviations from Xt; i.e.,
The variance terms of St will be denoted as Sj\t to indicate a time adjustment has been made, while the general covariance terms will be designated as Sj.i,2,...,p-i|tDecomposition of T2t Suppose the components of an observation vector with a time dependency have been determined using the methods of section 10.1, and the appropriate autoregressive functions fitted. We assume that Xt and St have been computed from the HDS. To calculate the T2 value for a new incoming observation, we compute (10.3). The general form of the MYT decomposition of the T2 value associated with a signaling p-dimensional data vector X' = ( x i , . . . ,xp) is given in Chapter 8. The decomposition of T2 follows a similar procedure but uses time adjustments similar to (10.4). If a signal is observed, we decompose the T2 statistic, adjusted for time effects, as follows: The unconditional terms are given as
If Xj has no time dependency, this is the same unconditional term as given in Chapter 8. However, if Xj has a time dependency, computing the term in (10.7) removes the time effect. Similarly, a general conditional T2 term is computed as
10.4. Example of a Uniform Decay Process
201
Close examination reveals how the time effect is removed. Consider the term
The conditional mean can be written as
so that the square root of the numerator of the T2 term becomes
Observations on both x\ and x<2 are corrected for the time dependency by subtracting the appropriate xt term. The standard deviation is time corrected in a similar manner.
10.4
Example of a Uniform Decay Process
We demonstrate the construction of a T2 control procedure for a uniform decay process by considering a chemical process where observations are taken on a reactor used to convert ethylene (C2H4) to EDC. EDC is a basic building block for much of the vinyl product industry. Input to the reactor in the form of feedstock is hydrochloric acid gas (HCL) along with ethylene and oxygen (O 2 ). The split among the three feed components is a constant ratio and yields a perfect correlation among these components. To avoid a singularity problem, we record only one variable as feed flow (Feed). The process of converting the feedstock to EDC takes place in a reactor under high temperature, and the process is known as OHC. Although reactor temperature readings are available from many different locations on a reactor, we will only use the average temperature (Temp) of the reactor column in our example. Many types of OHC reactors are available to perform the conversion of ethylene and HCL to EDC. All involve the monitoring of many process variables. We consider only two process variables, labeled x\ and x^. The type of reactor is a fixed-lif or fixed-bed reactor and must have critical components replaced at the end of each run cycle as the components are slowly depleted during operation. Performance of the reactor is directly related to the depletion of the critical components. Best performance, as measured by percentage conversion, is at the beginning of the run cycle, and the unit gradually becomes less efficient during the remainder of the cycle. This inherent uniform decay in the performance of the reactor produces a time dependency in many of the resulting process and quality variables. Decay reactors of this type differ from steady-state processes. For the steadystate case, reactor efficiency stays relatively constant, the efficiency variable will contain very little variation (due to the steady-state conditions), and its operation range will be small. Any significant deviation from this range should be detected by the process control procedure. However, over the life cycle of a uniform decaying reactor, the unit efficiency might have a very large operational range. For instance, it might range from 98% at the beginning of a cycle to 85% at the end of the
202
Chapter 10. Autocorrelation in T2 Control Charts Table 10.1: Correlation matrix with time-sequence variable.
Time Feed
XI X2
Time
1
Temp
0.037 0.880 0.843 0.691
Feed 0.037
1
-0.230 -0.019 0.118
0.880 -0.230
1
XI
0.795 0.737
0.843 -0.019 0.795

1
X2
0.392
Temp 0.691 0.118 0.737 0.392

1
cycle and thus would contain more variation than a steady-state variable, which would remain relatively constant. If we fail to consider the decay in the process, any efficiency value between 85% and 98% would be acceptable, even 85% at the beginning of a cycle. As discussed in Chapter 8, a deviation beyond its operational range (established using in-control historical data) for a process variable can be detected using the corresponding unconditional T2 term of the MYT decomposition. In addition, incorrect movement of the variable within its range because of improper linear relationships with other process variables can be detected using the conditional T2 terms. However, this approach does not account for the effects of movement due to time dependencies.
10.4.1
Detection of Autocorrelation
A correlation matrix based on 79 observations for the four variables Feed, x\, ^2, and Temp, taken from a reactor process, is presented in Table 10.1. Also included is a time-sequence variable, labeled Time. Note the moderate-to-large correlations between the three process variables and the time-sequence variable. Also, note the virtual absence of correlation between Feed and Time. To demonstrate the time decay in the measured temperatures, we present a timesequence plot in Figure 10.10 of the average temperature of the reactor during a good production run. The graph indicates that the average temperature is gradually increasing over the life cycle of the unit. The temperature increase is due to the decay of the reactor. For example, if the reactor is "coking" up, it takes more heat at the end of a cycle to do the same or less "work" than at the beginning of a cycle. Figures 10.11 and 10.12 contain time-sequence plots of the other two process variables, x% and x\. The decay effect for x% in Figure 10.11 has the appearance of an AR(1) relationship, while the decay effect for x\ in Figure 10.12 has the appearance of some type of a quadratic (perhaps a second-order) or an exponential autoregressive relationship. However, for simplicity, we will fit an AR(1) model to
Xi.
Figure 10.13 contains a time-sequence plot of the Feed variable. Notice that during a run cycle, the feed to the reactor is somewhat consistent and does not systematically vary with time. 10.4.2 Autoregressive Functions
Figure 10.11 gives some indication of a time dependency for the process variable 2- This is supported by the moderate-to-strong correlation, 0.843, between x% and
203
Figure 10.10: Time-sequence plot of reactor temperature.
Figure 10.11: Time-sequence plot of x<2.
the time-sequence variable. Confirmation also is given in the analysis-of-variance table presented in Table 10.2 for an AR(1) model fit. The regression fit is highly significant (p < 0.0001), and it indicates that there is a linear relationship between the current value of x^ and its immediate past value. The corresponding regression statistics for this model are presented in Table 10.3. These results indicate the lag X2 values explain over 56% of the variation in the present values of x^. A plot of the raw regression residuals from the AR(1) model for x<2 is presented in Figure 10.14. The plot indicates the presence of possibly another factor contributing to the variation present in x^. Note the predominance of values below the zero line at the beginning of the cycle and the reverse of this trend at the end of the cycle.
204
Figure 10.12: Time-sequence plot of x\.
Figure 10.13: Time-sequence plot of feed. Table 10.2: ANOVA table for AR(1) model for x2.
df
1 77 78
SS 2077.81 1602.93 3680.74
MS 2077.81 20.82
F 99.81
Significance of F < 0.0001
This nonrandom trend coupled with the moderate value of 0.565 for the R2 statistic supports our belief. Process variable x\ shows a strong upward linear time trend in its time-sequence plot given in Figure 10.12. This is confirmed by its high correlation (0.880) with
10.4. Example of a Uniform Decay Process Table 10.3: Regression statistics for variable x-^.
Regression Statistics R2 Adjusted R2 Standard Error 0.565 0.559 4.562
205
Table 10.4: ANOVA table for AR(1) model for Xl.

df 1 77 78 SS 11.69 1.73 13.42 MS 11.69 0.02
F 518.64
Figure 10.14: Plot of x% residuals versus time. time given in Table 10.1. The analysis-of-variance table from the regression analysis for an AR(1) model for this variable is presented in Table 10.4. The fit is highly significant (p < 0.000) and indicates that there is a linear relationship between x\ and its immediate past value. Summary statistics for this fit are presented in Table 10.5. The large R2 value, 0.871, in addition to the small residuals, given in the residual plot in Figure 10.15, indicates a good fit over most of the data. The increase in variation at the end of the plot is due to decreasing unit efficiency as the unit life increases. The third variable to show a time dependency is the average reactor temperature (Temp). As noted in Figure 10.10, reactor temperature has a nonlinear (i.e., curved) relationship with time. Thus, an AR(2) model of the form
206
Chapter 10. Autocorrelation in T2 Control Charts Table 10.5: Regression statistics for x\.
Regression Statistics R2 0.871 Adjusted R2 0.869 Standard Error 0.150
Figure 10.15: Plot of xi residuals versus time. where the (3j are the unknown regression coefficients, might result in decreasing the error seen at the end of the cycle in Figure 10.10. However, for simplicity, we will use the AR(1) model. Although the pairwise correlation between this variable and the time-sequence variable is only 0.691 in Table 10.1, this is mainly a result of the flatness of the plot at the earlier time points. The analysis-of-variance table for the AR(1) model fit to the average temperature is presented in Table 10.6. The fit is significant (p < 0.000) and indicates that there is a linear relationship between average temperature and its immediate past value. Summary statistics for the AR(1) fit are presented in Table 10.7. The R2 value of 0.565 is moderate, and the larger residuals in the residual plot in Figure 10.16 confirm this result. For the three variables exhibiting some form of autocorrelation, the simplest autoregressive function was fit. This is to simplify the discussion of the next section. The fitted AR(1) models depend only on the first-order lag of the data. A substantial amount of lack of fit was noted in the discussion of the residual plots. These models could possible be improved by the addition of different lag terms. The use of a correlogram, which displays the lag correlations as a function of the lag value (see section 4.8), can be a useful tool in making this decision. The correlogram for the three variables xi, x-z, and Temp is presented in tabular form for the first three lags in Table 10.8. In the case of variable xi, the correlogram suggests using all three lags as the lag correlations remain near 1 for all three time points.
207
Figure 10.16: Plot of temperature residuals versus time. Table 10.6: ANOVA table for AR(1) model for temp.
df 1 77 78
4861.44 3445.93 8307.37
SS
4861.44 44.75
MS
108.63
Table 10.7: Regression statistics for temperature.

Regression Statistics 0.585 R2 Adjusted R2 0.580 Standard Error 6.690
Table 10.8: Correlations for different lags.

Temp
X2 Xl
Lag 1 0.76 0.75 0.93
Lag 2 0.74 0.67 0.92
Lag 3 0.61 0.64 0.90
10.4.3
Estimates
Using the results of section 10.3, we can construct time-adjusted estimates of the mean vector and covariance matrix for our reactor data. For notational purposes, the four variables are denoted by x\ (for process variable xi), x% (for process variable ^2)5 3 (for Temp), and x^ (for Feed). The estimate of the time-adjusted mean
208
Chapter 10. Autocorrelation in T2 Control Charts Table 10.9: Correlation matrix with time adjustment.
Time Time Feed
x
X
Feed 0.037
1
i\t
Temp*
2\t
0.037 0.184 0.322 0.238
0.184 -0.069
1
x\\t
*2|t
-0.069 0.001 0.203
0.322 0.001 0.586

1
0.586 -0.026
Temp* 0.238 0.203 -0.026 -0.342

1
-0.342
vector is given as where
Since Feed has no time dependency, no time adjustments are needed, and the average of the Feed data is used. Removing the time dependency from the original data produces some interesting results. For example, consider the correlation matrix of the 79 observations with the time dependency removed. This is calculated by computing
and converting the covariance estimate to a correlation matrix. The resulting estimated correlation matrix is presented in Table 10.9. In contrast to the unadjusted correlation matrix given in Table 10.1, there now is only a weak correlation between the time-sequence variable and each of the four process variables. Other correlations not directly involving the time-sequence variable were also affected. For example, the original correlation between temperature and x\ was 0.737. Corrected for time, the value is now 0.026. Thus, these two variables were only correlated due to the time effect. Also, observe the correlation between x\ and #2- Originally, a correlation of 0.795 was observed in Table 10.1, but correcting for time decreases this value to 0.586. The T2 values of the preliminary data are plotted in Figure 10.17. These values are computed without any time adjustment. Close inspection of this graph reveals the U-shaped trend in the data that is common to autocorrelated processes. The upward trend prevails more at the end of the cycle than at its beginning. This is mainly due to the instability of the reactor as it nears the end of its life cycle. Consider the T2 graph for the time-adjusted data. This is presented in Figure 10.18. In comparison to Figure 10.17, there is no curvature in the plotted points. However, the expanded variation at the end of the life cycle is still present, as it
209
Figure 10.17: T2 chart with no time adjustment.
Figure 10.18: T2 chart with values adjusted for time. is not due to the time variable. Note also that this plot identifies eight outliers in the data set as compared to only five outliers being detected in the plot of the uncorrected data.
10.4.4
Examination of New Observations
Fourteen new observations, taken in sequence near the end of a life cycle of the reactor, are examined for signals. The observations and their corresponding (firstorder) lag values are presented in Table 10.10. The three lag variables are denoted as I/xi, Z/^2, and Ltemp. The adjusted observations of Figure 10.18, with the eight outliers removed, were used as the HDS, and these 14 new values were checked for signals. The resulting T2 values are given in Figure 10.19. Using an a = 0.05, a signal is observed for
210
Chapter 10. Autocorrelation in T2 Control Charts Table 10.10: New observations on reactor data.
Obs. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Feed 188,300 189,600 198,500 194,700 206,800 198,600 205,800 194,600 148,000 186,000 200,200 189,500 186,500 180,100
XI X2
0.98 0.81 0.46 0.42 0.58 0.63 0.79 0.84 0.99 1.19 1.33 1.43 1.10 0.88
44.13 33.92 28.96 29.61 29.31 28.28 29.08 30.12 39.77 34.13 32.61 35.52 34.42 37.88
Temp 510 521 524 521 530 529 534 526 506 528 532 526 524 509
L#i 0.98 0.98 0.81 0.46 0.42 0.58 0.63 0.79 0.84 0.99 1.19 1.33 1.43 1.10
Lx2
44.13 44.13 33.92 28.96 29.61 29.31 28.28 29.08 30.12 39.77 34.13 32.61 35.52 34.42
Ltemp 510 510 521 524 521 530 529 534 526 506 528 532 526 524
Figure 10.19: Values for new observations on the reactor.
observation 10. The corresponding T2 value of 27.854 is decomposed for signal interpretation. We begin by examining the T2 values for the four unconditional terms. These are given as T2 = 1.280 (process variable xi), T$ =4.195 (process variable 2), T32 = 17.145** (Temp), Tl = 1.312 (Feed), where the symbol (**) denotes that the unconditional T2 term for temperature
10.5. Control Procedure for Stage Decay Processes
211
produces a signal as it exceeds the critical value of 7.559. The usual interpretation for a signal on an unconditional term is that the observation on the variable is outside the operational range. However, for time-adjusted variables, the implication is different. In this example, the observed temperature value, 526, is not where it should be relative to its lag value of 506. For observation 10, the increase in temperature from the value observed for observation 9 was much more than that predicted using the historical data. Removing the Temp variable from observation 10 and examining its subvector (Feed, xit, x^t] produced a T2 value of 15.910. When compared to a critical value of 13.38, a signal was still present. Further decomposition of the T2 value on this subvector produced the following two-way conditional T2 terms:
where the symbol (**) denotes the term that exceeds the critical value of 7.675. There are two signaling conditional terms, and these imply that the relationship between the operational variables x% and 0:4 (Feed), after adjustment for time, does not agree with the historical situation. These results indicate the need to remove the process variables x2 and x 4 . With their removal, the only variable left to be examined is x\. However, the small value of the unconditional T2 term, T2 = 1.3124, indicates that no signal is present in observation 10 on this variable.
10.5
Control Procedure for Stage Decay Processes
Not all autocorrelation is of the form of uniform or continuous decay, where present performance values are dependent on some immediate past value. For example, in stage decay, the time dependency is between different stages of the process. Process performance in the second stage may be dependent on first-stage performance, and performance in the third stage may be dependent on the performance of the process in stages 1 and 2. The control procedure at any stage (other than the first) must be adjusted for performance in the previous stages. Process monitoring detects when significant deviation occurs from the expected adjusted performance. An overview of how this is done is contained in Mason and Young (2000) and more extensive details and examples are found in Mason, Tracy, and Young (1996).
212
10.6
Summary
The charting of autocorrelated multivariate data in a control procedure presents a number of serious challenges. A user must not only examine the linear relationships existing between the process variables to determine if any are unusual, but also adjust the control procedure for the effects of the time dependencies existing among these variables. This chapter presents one possible solution to problems associated with constructing multivariate control procedures for processes experiencing either uniform decay or stage decay. Autocorrelated observations are common in many industrial processes. This is due to the inherent nature of the processes, especially any type of decay process. Because of the potentially serious effects of autocorrelation on control charts, it is important to be able to detect its presence. We have offered two methods of detection. The first involves examining the correlations between each variable and a constructed time-sequence variable. Large correlation will imply some type of time dependency. Graphical techniques are a second aid in detecting time dependencies. Trends in the plot of an individual variable versus time will give insight to the type of autocorrelation that is present. Correlogram plots for individual variables also can be helpful in locating the lag associated with the autocorrelation. For uniform decay data that can be fit to an autoregressive model, the current value of an autocorrelated variable is corrected for its time dependency. The proposed control procedure is based on using the T2 value of the time-adjusted observation and decomposing it into components that lead to an interpretation of the time-adjusted signal. The resulting decomposition terms can be used to monitor relationships with the other variables and to determine if they are in agreement with those found in the HDS. This property is also helpful in examining stagedecay processes as the decay occurs sequentially and thus lends itself to analysis by repeated decompositions of the T2 statistic obtained at each stage.
The T Statistic and Batch Processes
Chapter 11 2
11.1
Introduction
Our development of a multivariate control procedure has been limited to applications to continuous processes. These are processes with continuous input, continuous processing, and continuous output. We conclude the text with a description of a T2 control procedure for batch processes. These are processes that use batches as input (e.g., see Fuchs and Kenett (1998)). There are several similarities between the T2 procedures for batch processes and for continuous processes. Phase I still consists of constructing an HDS, and Phase II continues to be reserved for monitoring new (future) observations. Also, control procedures for batch processes can be constructed for the overall process, or for individual components of the processing unit. In some settings, multiple observations on the controlled component may be treated as a subgroup with control based on the sample mean. In other situations, a single observation may be used, such as monitoring the quality of the batch or grade being produced. Despite these similarities, differences do exist when monitoring batch processes. For example, the estimators of the covariance matrix and the overall mean vector may vary. Changes also may occur in the form of the T2 statistic and the probability function used to describe its behavior. A detailed discussion of the application of the T2 statistic to batch processes can be found in Mason, Chou, and Young (2001).
11.2 Types of Batch Processes

There are two basic types of batch processes. The first will be referred to as a Category 1 batch process. In this category, observations on different batches are assumed to come from the same multivariate distribution, having a common mean
213
214
Chapter 11. The T2 Statistic and Batch Processes
Figure 11.1: Category 1 process with batch input.
Figure 11.2: Control region for a continuous process.
vector /^ and a common covariance matrix S. Very little variation is tolerated between batches. For example, in certain types of glass production, the processing and the finished glass (output) are continuous. However, feedstock construction for this operation is a batch process since consecutive batches of feedstock are fed on a continuous basis to the production system. Thus, it is very important to maintain the quality characteristics of the individual batches that are used as input to the system. When this is achieved, it guarantees uninterrupted processing and a continuous flow of quality glass; otherwise, fouled batch input can interrupt the entire system. An example of a process with batch input is illustrated in Figure 11.1. Figure 11.2 represents a control region for a steady-state continuous process containing two variables, x\ and #2- Note that all observations on this process would be contained within the designated ellipse. The process mean is represented by the dark circle located in the center of the ellipse. In contrast, the control region
11.2. Types of Batch Processes
215
Figure 11.3: Control regions for Category 1 batch process.
Figure 11.4: Example of a process with batch output.
for a Category 1 batch process is presented in Figure 11.3. A different ellipse represents each batch. The lightly shaded circles represent the individual batch means and the dark circle in the middle of the plot represents the overall mean. Note the closeness of the different batch means. In some processes, input is continuous, but changing the processing component produces changes in the output. For example, certain plastic-producing units use the same feedstock as input. However, changing the amount of hydrogen, or other control variables, in the processing component will produce a different finished product (output). The work order on such a unit is usually stated in terms of the number of tons of a variety of products. Over a given period of time, different batches of each product, with the same quality characteristics, are produced. This type of process is illustrated in Figure 11.4. Batch processing can also occur when the input, processing, and output are continuous. A decaying unit provides such an example. When the efficiency of the unit declines to a certain level, the unit is shut down and refurbished before restarting
216
Figure 11.5: Example of a process with batch runs.
production. In this setting, the individual production runs may be considered as batches. An evaporator used in removing water from a caustic soda-water solution is an example of such a production unit. In addition to caustic soda and water, the solution contains large amounts of salt, which fouls the evaporator under the high temperatures. When the efficiency of the evaporator drops, the unit must be shut down, cleaned (removal of the salt), and restarted. This type of process is illustrated in Figure 11.5. The processes in Figures 11.4 and 11.5 represent the second type of batch process, labeled Category 2. This category relaxes the stringent condition that all observations come from the same multivariate distribution. Mean variation is acceptable among the k different runs or batches, although it is assumed that the covariance matrix is constant across batches. Each batch is characterized by its own multivariate distribution with the same covariance matrix, S, but with a possibly different mean vector ^, where //j, i = 1,2, . . . , / c , denotes the population mean of the ^th batch. A control region for a Category 2 batch process is presented in Figure 11.6. The small ellipses represent the control regions for the different batches. The large ellipse represents the acceptable overall region that is determined by customer specifications.
11.3. Estimation in Batch Processes
217
Figure 11.6: Control region for a Category 2 batch process.
11.3
Estimation in Batch Processes
Consider observations taken from a Category 1 batch process. These observations are assumed to be from the same multivariate distribution with only random variation between batches. A known mean vector, p, = (/î, //2 5 5/^)5 is referred to as the target. However, if the mean vector is unknown, it is estimated using the overall mean of the data from all batches. Suppose we have k batches of size m, r i 2 , . . . , n^ and each ^-dimensional observation vector is denoted by X i j , where j 1, 2 , . . . . HI indicates the observation within the batch and i = 1, 2 , . . . , k denotes the batch. The estimate of the overall mean is given as
where X% represents the mean of the ith batch. The total sample size N is obtained as the sum of the batch sizes, i.e., N = n\ + n? + + n^. The estimate of the covariance matrix is computed as
218
The quantity SSr in (11.2) is referred to as the total sum of squares of variation. It can be separated into two separate components. One part, referred to as the within-variation, represents the variation within the batches. The other part, labeled the between-variation, is the variation between the batches. We write this as
The component 883 represents the between-batch variation and, when significant, can distort the common estimator 5. However, for Category 1 processes, it is assumed that the between-batch, as well as the within-batch, variation is minimal and due to random variation. Therefore, for a Category 1 situation, we estimate the overall mean using (11.1) and the covariance matrix using (11.2). We emphasize that these are the appropriate estimates only if we adhere to the basic assumption that a single multivariate distribution can describe the process. For a Category 2 process, we have multiple distributions describing the process. Multiplicity comes from the possibility that the mean vector of the various batches may differ, i.e., that /^ ^ Hj for all i and j. For this case, the overall mean is still estimated using (11.1). However, the covariance matrix estimator in (11.2) is no longer applicable due to the effects of the between-batch variation. As an illustration of the effects of between-batch variation, consider the plot given in Figure 11.7 for two variables, x\ and x-2, and two batches of data. The orientation of the two sets of data implies that x\ and x% have the same correlation in each batch, but the batch separation implies that the batches have different means. If the batch classification is ignored, the overall sample covariance matrix, 5, will be based on deviations taken from the overall mean, indicated by the center of the ellipse, and will contain any between-group variation. For a Category 2 process, the covariance matrix is estimated as
where Si is the covariance matrix estimate for the iih batch and SSw represents the within-batch variation as defined in (11.3). Close inspection of (11.4) reveals the estimator Sw to be a weighted average (weighted on the degrees of freedom) of the within-batch covariance matrix estimators. With mean differences between the batches, the common estimator obtained by considering the observations from all batches as one group would be contaminated with the between-batch variation, represented by 885. Using only the within-batch variation to construct the estimator of the common covariance matrix will produce a true estimate of the relationships among the process variables. We demonstrate this in a latter example (see section 11.7).
11.4. Outlier Removal for Category 1 Batch Processes
219
Figure 11.7: Control region containing two batches of data.
11.4
Outlier Removal for Category 1 Batch Processes
Consider a Category 1 batch process. The T2 statistic and its distribution for a Phase I operation is given as
where X and S are the common estimators obtained from (11.1) and (11.2), respectively, and -B( p /2,AT-p-i/2) represents the beta distribution with parameters (p/2) and ((TV p l)/2), where N is total sample size (all batches combined). The UCL, used for outlier detection in a Phase I operation, is given as
where -B[ a ,p/2,(w-p-i)/2] ig the upper ath quantile of -B[p/2,(w-p-i)/2]- For this category, the distribution of the T2 statistic and the purging procedure for outlier removal is the same as those used for a continuous process. We emphasize that the statistic in (11.5) can only be used when there is no between-batch variation. All observations from individual batches must come from the same multivariate distribution. This assumption is so critical that it is strongly recommended that a test of hypothesis be performed to determine if the batch means are equal. This creates the dilemma of whether to remove outliers first or to test the equality of the batch means, since mean differences could be due to individual batches containing atypical observations.
220
As an example, reconsider the two-variable control region and data set illustrated in Figure 11.7. If one treats the two batches of data as one overall group, the asterisk in the middle of the ellipse in the graph represents the location of the overall mean vector. Since deviations used in computing the common covariance matrix S are taken from this overall mean, the variances of x\ and x% would be considerably larger when using S instead of Sw Also, the estimated correlation between the two variables would be distorted, as the orientation of the ellipse is not the same as the orientation of the two separate batches. Finally, note that the two potential outliers in the ellipse would not be detected using the common estimator S as these points are closer to the overall mean than any of the other points of the two separate batches. The solution to the problem of outliers in batches is provided in Mason, Chou, and Young (2001). These authors recommend the following procedure for this situation. Step 1. Center all the individual batch data by subtracting the particular batch mean from the batch observation; i.e., compute
where i = 1, 2 , . . . , k and j = 1, 2 , . . . , m. With this translation, Y^- has a zero mean vector and a covariance matrix of S, as any possible mean difference has been removed by translation to a zero mean. Step 2. Using the translated data, construct a covariance estimator given by
Step 3. Use the T2 statistic and the distribution given by
and remove outliers following the established procedures (see Chapter 5). Step 4. After outlier removal, Sw and X must be recalculated using only the retained observations. To test the hypothesis of equal batch means, apply the outlier removal procedure to the batch mean vectors. The T2 statistic for this procedure is given as
where S\y is the sample covariance matrix computed using (11.7) and the translated data with the individual outliers removed. The distribution of the statistic in (11.9), under the assumption of a true null hypothesis, is that of an F variable (i.e., see Wierda (1994)), and is given by
11.5. Example: Category 1 Batch Process
221
where ^(p, n fc-fc-p+i) represents the F distribution with parameters (p) and (nk k p + 1). For a given a level, the UCL for the T2 statistic is computed as
where ^( Q ,p,nfc-fc-p+i) is the upper aih quantile of F(p,nk-k-p+i)- The T2 value in (11.9) is computed for each of the k batch means and compared to the UCL in (11.11). Batch means with T2 values that exceed the UCL are declared outliers, and the corresponding batches are removed. With this procedure, we have accomplished the goals of removing all observation outliers, and all atypical batches. We are now ready to obtain an estimate of the target mean vector. Using the retained batches, compute the target mean estimate using (11.1). The corresponding estimate of the covariance matrix is provided by (11.2). The common covariance matrix estimator S is used instead of Sw because whatever between-variation remains is only due to inherent process variation (see Alt (1982) and Wierda (1994)). Also, the estimator S is a more efficient estimator of E than Sw when there are no mean differences between the batches.
11.5
Example: Category 1 Batch Process
To demonstrate the above procedure, we consider a preliminary data set of approximately 100 observations on three separate batches where monitoring is to be imposed on two process variables. Scatter plots of the two variables for the
Figure 11.8: Scatter plot for Batch 1.
222
Figure 11.9: Scatter plot for Batch 2.
Figure 11.10: Scatter plot for Batch 3. three separate batches are presented in Figures 11.8, 11.9, and 11.10, respectively. Observe the same general shape of the data swarm for each of the different batches. Note also potential outliers in each batch. For example, the observation located at the extreme left end of the data swarm of Figure 11.8 and the cluster of points located in the extreme right-hand corner of Figure 11.9. Summary statistics for the three separate batches are presented in Table 11.1. Observe the similarities among the statistics for the three batches, especially for the
11.5. Example: Category 1 Batch Process Table 11.1: Summary statistics for batches.
Sample Size xi Mean x<2 Mean xi Std Dev X2 Std Dev Correlation (x\ , 2) Batch 1 99 1518.9 15.6 366.8 36.6 0.996 Batch 2 100 1570.6 16.1 321.4 32.3 0.994 Batch 3 100 1560.1 16.0 330.7 33.3 0.995 Translated 299 0.0 0.0 338.9 34.0 0.995
223
Figure 11.11: Scatter plot of combined translated data. pairwise correlations between x\ and x^. This is also true for the separate standard deviations for each variable. Centering of the data for the three separate batches is achieved by subtracting the respective mean. For example, the translated vector based on centering the observations of Batch 1 is obtained using
The summary statistics for the combined translated data is given in the last column of Table 11.1, and a graph of the translated data is presented in Figure 11.11. Observe the similarities between the standard deviations of the individual variables in the three separate batches with the standard deviation of the variables in the overall translated batch. Likewise, the same is true for the pairwise correlation between the variables in the separate and combined batches. Only one observation appears as a potential outlier in the scatter plot presented in Figure 11.11. This is observation 46 of Batch 1, and it is located at the (extreme) left end of the data swarm. This observation also was noted as a potential outlier in a similar scatter plot of the Batch 1 data presented in Figure 11.8. A T2 chart based on (11.8) and the combined translated data is presented in Figure 11.12. The
224
Figure 11.12: T2 chart for combined translated data.
Figure 11.13: T2 chart of HDS.
first 99 T2 values correspond to Batch 1; the second 100 values correspond to Batch 2; and the last 100 values refer to Batch 3. The results confirm that the T2 value of observation 46 exceeds the UCL. Thus, it is removed from the data set. A revised T2 value chart, based on 298 observations, is given in Figure 11.13. The one large T2 value in the plot corresponds to observation 181 from Batch 2. However, the change in TV from 299 to 298, by excluding observation 46 in Batch 1, does not reduce the UCL sufficiently to warrant further deletion. Thus, the previous removal of the one outlier is adequate to produce a homogeneous data set. The distribution of the T2 statistic is verified by examining a Q-Q plot of the HDS. This plot is presented in Figure 11.14. The plot has a strong linear trend, and no serious deviations from it are noted other than the few points located in the upper right-hand corner of the plot. The extreme value is observation 181 from
225
Figure 11.14: Q-Q plot of translated data.
Table 11.2: Summary statistics for HDS.

Variable
Xl X2
Sample Size
298 298
Mean 1549.9 15.9
Std Dev 339.68 3.44
Table 11.3: T2 values of batch means.

Batch No.
1 2 3
T2 Value 0.009 0.006 0.002
Figure 11.13. It appears that the beta distribution can be used in Phase I analyses, and the corresponding F distribution should be appropriate for Phase II operations. Using the combined translated data with the single outlier removed, estimates of Sw and X are obtained and the mean test given in Step 4 of section 11.5 is performed. Summary statistics for the HDS are presented in Table 11.2. Very close agreement is observed when the overall mean and standard deviation are compared to the individual batch means and standard deviations of Table 11.1. The T2 values for the three individual batch means are computed using (11.9) and are presented in Table 11.3. All three values are extremely small due to the closeness of the group means to the overall mean (see Tables 11.1 and 11.2). When compared to the UCL value of 0.0622 as computed using (11.11) with p = 2, k = 3, and n ~ 100, none are significantly different from the others. From these results, we conclude that all three batches are acceptable for use in the HDS.
226
11.6
Outlier Removal for Category 2 Batch Processes
For Category 2 batch processes, the batch means are known to differ. Thus, parameter estimation involves the estimation of the common covariance matrix and the target mean vector under the assumption that the batches are taken from different multivariate distributions. Detecting atypical observations is still a concern, but now there is the additional problem of detecting atypical batches. As in a Category 1 batch process, we can identify individual outliers in a Category 2 batch process using the mean-centering procedure described in section 11.6. However, a different procedure is needed to identify batch outliers. First, we must establish an acceptable target region for all the batches. This is depicted as the large ellipse that encompasses the individual batches given in Figure 11.6. This region (ellipse) is the equivalent of a control region established on the individual batch means, denoted by X, X^1..., XkAssuming the batch mean data also is described by a multivariate normal distribution, a control region based on a T2 statistic can be developed. The form and distribution of the statistic is given as
where #(fc/2,(fc-p-i)/2) represents the beta distribution with parameters (k/2) and ((kp l)/2), SB = SSp/k is the covariance estimate defined in (11.3), and X is the overall mean computed using (11.1). The corresponding UCL is given by
where B[ a ,fc/2,(fc-p-i)/2] is the upper ath quantile of B[k/2,(k-p-i)/2]The T2 statistic in (11.12) is based on the between-batch variation SB, which describes the relationships between the components of the batch mean vectors rather than of the individual observations. For example, in the Category 2 batch process example presented in Figure 11.6, the correlation between the two process variables is positive, while the correlation between the batch mean components is negative. The downward shifting of the overall process could be due to an unknown "lurking" variable or to a known uncontrollable variable. Any batch whose T2 value computed using (11.12) exceeds the UCL specified in (11.13) is excluded from the target region. The estimate of the target mean is computed from the retained batches in the target region. These are also used to obtain Sw, the estimate of the common covariance matrix.
11.7
Example: Category 2 Batch Process
Consider the scatter plot presented in Figure 11.15 for 11 observations on two process variables, (xi, 2), fr three different batches from a Category 2 batch process.
227
Figure 11.15: Scatter plot of three batch processes. Table 11.4: Batch summary statistics.
Batch No.
1 2 3
Overall Translated
Mean x\ 133.36 149.08 147.61 143.33

0
Std Dev 1 3.60 3.50 3.91 8.04 3.56
Mean 2 200.78 202.59 190.6 197.99

0
Std Dev 2 4.76 4.17 6.76 7.45 5.17
Correlation -0.437 -0.706 -0.779 -0.390 -0.650
For this industrial situation, it is important to maintain the relationship between the process variables as well as to maintain the tolerance on the individual variables. Taken together, these data were collected under good operating conditions and contain no obvious outliers. Observe that the data pattern for each independent batch is similar. However, there is more variation for the second process variable, 2, in the third batch than for the other two batches. Close examination of Batch 3 indicates the presence of three to four possible outliers (i.e., points outside the ellipse). Some of these observations lie between Batches 1 and 2 and appear to be a substantial distance from the mean of Batch 3. Thus, they are somewhat removed from the remaining observations of the batch. Summary statistics for the three batches are presented in Table 11.4. Given the small sample size (i.e., n = 11), the difference among the pairwise correlation coefficients as well as between the individual standard deviations for the three groups is not significant. Large variation among correlation coefficients of the variables is not abnormal for small samples. However, for larger samples this could imply a change in the covariance structure. The overall batch statistics are given in the fourth row of Table 11.4. The overallbatch correlation of 0.390 is lower than any of the within-batch correlations
228
Figure 11.16: Translated batch process data.
and considerably lower than those for Batches 2 and 3. This is due to the mean separation of the individual batches. Even though all three batches represent an in-control process, combining them into one data set can mask the true correlation between the variables or create a false correlation. Note also the difference between the standard deviations of the variables within each batch and the standard deviation of the variables for the overall batch. The latter is larger for x\ and nearly twice as large for x% and does not reflect the true variation for an individual batch. We begin the outlier detection procedure by translating the three sets of batch data to a common group that is centered at the origin. This is achieved by subtracting the respective batch mean from each observation vector within the batch. For an observation vector from Batch 1, we use (xu 133.36, #12 200.78); for Batch 2, we compute (xî 149.08, ^22 202.59); and for Batch 3, we use (^31 147.6 32 190.60). A scatter plot of the combined data set after the translation is presented in Figure 11.16, and summary statistics are presented in the last row of Table 11.4. Comparing the summary statistics of the within-batches to those of the translated batch presents a more agreeable picture. The standard deviations of the variables of the overall translated batch compare favorably to the standard deviation of any individual batch. Likewise, the correlation of the overall translated group is more representative of the true linear relationship between the two process variables. Translation of the data in Figure 11.16 presents a different perspective. Overall, the scatter plot of the data does not indicate obvious outliers. Three observations from Batch 3 are at the extreme scatter of the data, but do not appear as obvious outliers. Using an a 0.05, the T2 statistic based on the common overall batch
229
Figure 11.17: T2 chart for batch data.
Figure 11.18: T2 mean chart of acceptable batch region.
was used to detect observations located a significant distance from the mean of (0,0). These results are presented in the T2 chart given in Figure 11.17, with Batch 1 data first, then Batch 2 data, followed by Batch 3 data. No observation has a T2 value larger than the UCL of 5.818. Also, the eighth observation of Batch 1 has the largest T2 value, though this was not obvious when examining the scatter plot given in Figure 11.15. If there is an indication of a changing covariance matrix among the different data runs or batches, a test of hypotheses of equality of covariance matrices may be performed (e.g., see Anderson (1984)). Rejection of the null hypothesis of equal group covariance matrices would imply that different MVN distributions are needed to describe the different runs or batches, and that the data cannot be pooled or translated to a common group. From a practical point of view, this would imply a very unstable process with no repeatability, and each run or batch would need to
230
be described by a different distribution. No solution is offered for an overall control procedure for this case. To demonstrate the procedure for construction of a control region for batch production, we consider a batch process for producing a specialty plastic polymer (see Mason, Chou, and Young (2001)). A detailed chemical analysis is performed on each batch to assure that the composition of seven measured components adhere to a rigid chemical formulation. The rigid formulation is necessary for mold release when the plastic is transformed to a usable product. A preliminary data set consisting of 52 batches is used to construct an HDS. A T2 chart for the sample means is presented in Figure 11.18. Using the statistic in (11.13) with k = 52 and a = 0.001, the UCL is 20.418. Batch means 23 and 37 produce signals on the T2 mean chart. Since the customer removed neither batch, both were included in the final calculation of the acceptable batch region.
11.8
Phase II Operation with Batch Processes
Throughout this book, we have presented a monitoring system based upon observing a single observation from a distribution with unknown parameters. We have noted in several instances, however, that the T2 control statistic could be based on the mean of a subgroup of (future) observations. These cases can occur with a known or unknown target vector. We also have presented several different estimators of the covariance matrix, such as 5 and Sw- These many combinations have led to an assortment of possible T2 control statistics for a Phase II situation, all of which are summarized in Table 11.5 (for more details, see Mason, Chou, and Young (2001)). The T2 statistic for a future observation that is taken from a multivariate distribution with an unknown mean (target) vector and covariance matrix is given as
where N is the total sample size of the HDS. The common covariance matrix estimate S and the target mean vector estimate X are obtained using the HDS, and F(PIJV-P) denotes the F distribution with p and N p degrees of freedom. For a given value of a and the appropriate values of N and p, we compute the UCL using
where ^(a,p,Ar-p) is the upper ath quantile of F(^p^_p). If, for a given observation X, the T2 value does not exceed the UCL, it is concluded that control of the process is being maintained; otherwise, a signal is declared. The T2 distribution in (11.14) and the UCL in (11.15) would be appropriate for monitoring a Phase II operation for a Category 1 batch process.
Table 11.5: Phase II formulas for batch processes.

Subgroup Size: m
1 1 1 1
Target Mean Mt Known Known Unknown Unknown Known Known Unknown Unknown
Covariance Estimator
S
T2 Statistic {A
( X llmY <3~~ \A( X HT ) o
l
llm\ MT )
T2 Distribution [p(JV-i)l F . r
L (N-p) J (P,n-p)
T2 UCL
fp(JV-l)] p [ (AT-p) J P(<*,P,N-p)
S\v
{^\
( X
llmY !~ ( X HT ! >->w {^
C l
llrr,} PT )
r[(N-k-p+l)\ p(N-k) i p lP,JV-fc-p+lj

r
rL ( AP(N-fc) i F. T-fe-p+l)J ^(,P,A'-fc-p+l)

r P ( N + l)(JV-l)] L N(N-p) J ^(a.P^-Pj fp(AT-fc)(JV + l)] p [N(N-k-p+l)\ (<*,p,JV - K p+i;
S
Sw
(X
xV cl~l ( X
X}
[p(N + l)(N-l)-\F [ N(N-p) J r(P,N-p) \p(N~k)(N + l)l [N(N-k-p+l)\ p r (p,N-k-p+l)
(x
l ^) s~ J {s\ xY (x
W
y^ )
x}
MT )
"T )
1 1 m}
>1 >1 >1 >1
S
S\v
{^s
\^S
( X C-
( X r,
Mr ) >->
MrrnV
llrnY S""
1
Q^1
{^-s
(X^
llr^\ s
F p ( J V - l ) 1 F. [m(N-p)\ r(P,n~P) f p(N-k) 1F L m ( N - f e - p + l ) J Mp,-<v-fc-p+l) \p(m + N)(N-l)l p L m7V(AT-p) J ÎP,^-PJ f p(m + N)(N~k) [mN(N-k-p+l) ] p J ^(p.JV-fc-p+l)
[p(^-i) i F. L m ( J V - p ) j (.a,P,-?v-p;
r
H'T ) ^vt7 V
(Xo
Lm(N-fc-p+l)J ^(a,p,^V~fc-p+l) rp(m+AO(Ar-l)l p L mN(N-p) J r(a,p,n-p) [ p(m + 7V)(N~fc) ] p LmJV(AT-fc-p+l)J r(.Q,p,N-fc-p+l)
p(Ar-fc)
i p
S
S\v
(Ye,
Y"mV 'l"1 ( X C*
X^\
( X c*
XrnY
i~^ ( X V
X^\
232
The changes in the distribution of the T2 statistic when using the estimator S\v are given by
and
The T2 distribution and the UCL given in (11.16) would be appropriate for monitoring a Phase II operation for a Category 2 batch process. This statistic can also be used to monitor a Category 1 batch process, but it produces a more conservative control region. The changes that occur in the two statistics when a target mean vector HT is specified are given as
where S is again obtained from the HDS. For a given value of a, the UCL is computed using
When the target mean vector is specified but the estimator Sw is used, the T2 statistic and its distribution are given as
and the UCL is given as
When inference is based on the monitoring of a subgroup mean X$ of a future sample of m observations, noted changes occur in both the distribution and UCL of the T2 statistic. These formulas are included in Table 11.5 along with the distributions for the case where the subgroup size is 1.
11.9
Example of Phase II Operation
As an example of a Phase II operation for a Category 1 batch process, consider the data example introduced in section 11.6. The T2 values being observed on approximately 50 observations are presented in Figure 11.19. Since the T2 values
11.9. Example of Phase II Operation
233
Figure 11.19: T2 chart for a Phase II operation.
Figure 11.20: Scatter plot of partial batch run. of all observations are well below the UCL of 9.4175, the process appears to be in control. However, closer inspection of the chart reveals a definite linear trend in the T2 values. An inspection of the data leads to the cause for this pattern. Consider the scatter plot presented in Figure 11.20. When compared to the scatter plot of the HDS given in Figure 11.11 or to the scatter plots of the individual batch data given in Figures 11.8-11.10, the reason becomes obvious. The process is operating in only a portion (i.e., the lower left-hand corner of the plot in Figure 11.20) of the variable range specified in the HDS.
234
Figure 11.21: Time-sequence plot for variable x\.
Figure 11.22: Time-sequence plot for variable x^.
Further investigation confirms this conclusion and also gives a strong indication, as does the T2 chart, that the entire process is moving beyond the operational region of the variables. This is exhibited in the time-sequence plots of the individual variables that are presented in Figures 11.21 and 11.22. From this analysis, it is concluded that the process must be immediately recentered; otherwise the noted drift in both variables will lead to upset conditions.
11.10. Summary
235
11.10
Summary
When monitoring batch processes, the problems of outlier detection, covariance estimation, and batch mean differences are interrelated. To identify outliers and estimate the covariance matrix, we recommend translating the data from the different batches to the origin prior to analysis. This is achieved by subtracting the individual batch mean from the batch observations. With this translation, outliers can be removed using the procedures identified in Chapter 5. To detect batches with atypical means, we recommend testing for mean batch differences following the procedures described in this chapter.
Old Blue: Epilogue

As you walk out of your office with your boss, you explain how you used multivariate statistical process control to locate the cause of the increased fuel usage on Old Blue. You add that this would be an excellent tool to use in real-time applications within the unit. You also ask permission to make a presentation at the upcoming staff meeting on what you 've learned from reading this new book on multivariate statistical process control. The boss notices the book in your hand and asks who wrote it. You glance at the names of the authors, and comment: Mason and Young. Then it all connects. That old statistics professor wasn't named Dr. Old . . . his name was Dr. Young.
Appendix
Distribution Tables
Table A.I. Standard normal cumulative probabilities. Table A.2. Percentage points of the Student t distribution. Table A.3. Percentage points of the chi-square distribution. Table A.4. Percentage points of the F distribution. Table A.5. Percentage points of the beta distribution.
Appendix. Distribution Tables Table A.I: Standard normal cumulative probabilities."

z Value
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
239
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0.9713 0.9772 0.9821 0.9861 0.9893 0.9918 0.9938 0.9953 0.9965 0.9974 0.9981 0.9987 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 1.0000
0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665 0.8869 0.9049 0.9207 0.9345 0.9463 0.9564 0.9649 0.9719 0.9778 0.9826 0.9864 0.9896 0.9920 0.9940 0.9955 0.9966 0.9975 0.9982 0.9987 0.9991 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 1.0000
0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.9474 0.9573 0.9656 0.9726 0.9783 0.9830 0.9868 0.9898 0.9922 0.9941 0.9956 0.9967 0.9976 0.9982 0.9987 0.9991 0.9994 0.9995 0.9997 0.9998 0.9999 0.9999 0.9999 1.0000
0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.9484 0.9582 0.9664 0.9732 0.9788 0.9834 0.9871 0.9901 0.9925 0.9943 0.9957 0.9968 0.9977 0.9983 0.9988 0.9991 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 1.0000
0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.9495 0.9591 0.9671 0.9738 0.9793 0.9838 0.9875 0.9904 0.9927 0.9945 0.9959 0.9969 0.9977 0.9984 0.9988 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 1.0000
0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.9505 0.9599 0.9678 0.9744 0.9798 0.9842 0.9878 0.9906 0.9929 0.9946 0.9960 0.9970 0.9978 0.9984 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 1.0000
0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.9515 0.9608 0.9686 0.9750 0.9803 0.9846 0.9881 0.9909 0.9931 0.9948 0.9961 0.9971 0.9979 0.9985 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 1.0000
0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.9525 0.9616 0.9693 0.9756 0.9808 0.9850 0.9884 0.9911 0.9932 0.9949 0.9962 0.9972 0.9979 0.9985 0.9989 0.9992 0.9995 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 1.0000
0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.9535 0.9625 0.9699 0.9761 0.9812 0.9854 0.9887 0.9913 0.9934 0.9951 0.9963 0.9973 0.9980 0.9986 0.9990 0.9993 0.9995 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 1.0000
0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 0.9545 0.9633 0.9706 0.9767 0.9817 0.9857 0.9890 0.9916 0.9936 0.9952 0.9964 0.9974 0.9981 0.9986 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.9999 1.0000
* Entries in the table are the probability that a standard normal variate is less than or equal to the given z value.
240
Appendix. Distribution Tables
Table A.2: Percentage points of the Student t distribution/

DF 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.2
0.15
0.1
0.05
Alpha 0.025
0.01
0.005
0.001
0.0005
1.376 1.061 0.978 0.941 0.920 0.906 0.896 0.889 0.883 0.879 0.876 0.873 0.870 0.868 0.866
1.963 3.078 6.314 1.386 1.886 2.920 1.250 1.638 2.353 1.190 1.533 2.132 1.156 1.476 2.015 1.134 1.119 1.108 1.100 1.093 1.088 1.083 1.079 1.076 1.074 1.071 1.069 1.067 1.066 1.064 1.063 1.061 1.060 1.059 1.058 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.296 1.291 1.289 1.282 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.671 1.662 1.658 1.645
12.706 31.821 63.656 318.29 636.58 4.303 6.965 9.925 22.328 31.600 3.182 4.541 5.841 10.214 12.924 2.776 3.747 4.604 7.173 8.610 4.032 5.894 6.869 2.571 3.365 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.000 1.987 1.980 1.960 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.390 2.368 2.358 2.326 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.660 2.632 2.617 2.576 5.208 4.785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.610 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.435 3.421 3.408 3.396 3.385 3.307 3.232 3.183 3.160 3.090 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.768 3.745 3.725 3.707 3.689 3.674 3.660 3.646 3.551 3.460 3.402 3.373 3.291
16 0.865 17 0.863 18 0.862 19 0.861 20 0.860 21 22 23 24 25 26 27 28 29 30
0.859 0.858 0.858 0.857 0.856
0.856 1.058 0.855 1.057 0.855 1.056 0.854 1.055 0.854 1.055 1.050 1.045 1.042 1.041 1.036
40 0.851 60 0.848 90 0.846 120 0.845

00
0.842
"Entries in the table are the t values for an area (Alpha probability) in the upper tail of the Student t distribution for the given degrees of freedom (DF).
Appendix.
Distribution Tables Table A.3: Percentage points of the chi-square distribution."

Alpha 0.1 0.9
241
DF 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 70 80 90 100 150 200 250 500
0.001
0.005
0.01 0.00 0.02 0.11 0.30 0.55 0.87 1.24 1.65 2.09 2.56 3.05 3.57 4.11 4.66 5.23 5.81 6.41 7.01 7.63 8.26 8.90 9.54 10.20 10.86 11.52
12.20 12.88 13.56 14.26 14.95 22.16 29.71 37.48 45.44 53.54 61.75 70.06 112.67 156.43 200.94 429.39
0.025
0.05 0.00 0.10 0.35 0.71 1.15 1.64 2.17 2.73 3.33 3.94 4.57 5.23 5.89 6.57 7.26 7.96 8.67 9.39 10.12 10.85
11.59 12.34 13.09 13.85 14.61 15.38 16.15 16.93 17.71 18.49 26.51 34.76 43.19 51.74 60.39 69.13 77.93 122.69 168.28 214.39 449.15
0.95 3.84 5.99 7.81 9.49 11.07

12.59 14.07 15.51 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 37.65 38.89 40.11 41.34 42.56 43.77
0.975
0.99 6.63 9.21 11.34 13.28 15.09

16.81 18.48 20.09 21.67 23.21 24.73 26.22 27.69 29.14 30.58 32.00 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 44.31 45.64 46.96 48.28 49.59 50.89
0.995
0.999 10.83 13.82 16.27 18.47 20.51
0.00 0.00 0.02 0.09 0.21 0.38 0.60 0.86 1.15 1.48 1.83 2.21 2.62 3.04 3.48 3.94 4.42 4.90 5.41 5.92 6.45 6.98 7.53 8.08 8.65 9.22 9.80 10.39 10.99 11.59
17.92 24.67 31.74 39.04 46.52 54.16 61.92 102.11 143.84 186.55 407.95
0.00 0.01 0.07 0.21 0.41 0.68 0.99 1.34 1.73 2.16 2.60 3.07 3.57 4.07 4.60 5.14 5.70 6.26 6.84 7.43 8.03 8.64 9.26 9.89 10.52
11.16 11.81 12.46 13.12 13.79 20.71 27.99 35.53 43.28 51.17 59.20 67.33 109.14 152.24 196.16 422.30
0.00 0.05 0.22 0.48 0.83 1.24 1.69 2.18 2.70 3.25 3.82 4.40 5.01 5.63 6.26 6.91 7.56 8.23 8.91 9.59
10.28 10.98 11.69 12.40 13.12 13.84 14.57 15.31 16.05 16.79 24.43 32.36 40.48 48.76 57.15 65.65 74.22 117.98 162.73 208.10 439.94
0.02 0.21 0.58 1.06 1.61 2.20 2.83 3.49 4.17 4.87 5.58 6.30 7.04 7.79 8.55 9.31 10.09 10.86 11.65 12.44
13.24 14.04 14.85 15.66 16.47 17.29 18.11 18.94 19.77 20.60 29.05 37.69 46.46 55.33 64.28 73.29 82.36 128.28 174.84 221.81 459.93
2.71 4.61 6.25 7.78 9.24

10.64 12.02 13.36 14.68 15.99 17.28 18.55 19.81 21.06 22.31 23.54 24.77 25.99 27.20 28.41 29.62 30.81 32.01 33.20 34.38 35.56 36.74 37.92 39.09 40.26
5.02 7.38 9.35 11.14 12.83

14.45 16.01 17.53 19.02 20.48 21.92 23.34 24.74 26.12 27.49 28.85 30.19 31.53 32.85 34.17 35.48 36.78 38.08 39.36 40.65 41.92 43.19 44.46 45.72 46.98
7.88 10.60 12.84 14.86 16.75
18.55 22.46 20.28 24.32 21.95 26.12 23.59 27.88 25.19 26.76 28.30 29.82 31.32 32.80 34.27 35.72 37.16 38.58 40.00 41.40 42.80 44.18 45.56 46.93 48.29 49.65 50.99 52.34 53.67 29.59 31.26 32.91 34.53 36.12 37.70 39.25 40.79 42.31 43.82 45.31 46.80 48.27 49.73 51.18 52.62 54.05 55.48 56.89 58.30 59.70
51.81 55.76 59.34 63.69 66.77 73.40 63.17 67.50 71.42 76.15 79.49 86.66 74.40 79.08 83.30 88.38 91.95 99.61 85.53 90.53 95.02 100.43 104.21 112.32 96.58 101.88 106.63 112.33 116.32 124.84 107.57 118.50 172.58 226.02 279.05 540.93 113.15 124.34 179.58 233.99 287.88 553.13 118.14 129.56 185.80 241.06 295.69 563.85 124.12 135.81 193.21 249.45 304.94 576.49 128.30 140.17 198.36 255.26 311.35 585.21 137.21 149.45 209.27 267.54 324.83 603.45
* Entries in the table are the chi-square values for an area (Alpha probability) in the upper tail of the chi-square distribution for the given degrees of freedom (DF).
Table A.4a: Percentage points of the F distribution at Alpha = 0.05/

Denominator DF
1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 60 90 120 150 200 250 500 oo
1 2 3 4 5 6 7 8 9 10
Numerator DF 20 12 15
24
30
60
90
120
150
200
250
500
oo
161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9 243.9 245.9 248.0 249.0 250.1 252.2 252.9 253.2 253.5 253.7 253.8 254.1 254.3 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 19.41 19.43 19.45 19.45 19.46 19.48 19.48 19.49 19.49 19.49 19.49 19.49 19.50 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.74 8.70 8.66 8.64 8.62 8.57 8.56 8.55 8.54 8.54 8.54 8.53 8.53 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.91 5.86 5.80 5.77 5.75 5.69 5.67 5.66 5.65 5.65 5.64 5.64 5.63 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.68 4.62 4.56 4.53 4.50 4.43 4.41 4.40 4.39 4.39 4.38 4.37 4.36
5.99 5.59 5.32 5.12 4.96 4.75 4.54 4.35 4.26 4.17 4.00 3.95 3.92 3.90 3.89 3.88 3.86 3.84
5.14 4.74 4.46 4.26 4.10 3.89 3.68 3.49 3.40 3.32 3.15 3.10 3.07 3.06 3.04 3.03 3.01 3.00
4.76 4.35 4.07 3.86 3.71 3.49 3.29 3.10 3.01 2.92 2.76 2.71 2.68 2.66 2.65 2.64 2.62 2.60
4.53 4.12 3.84 3.63 3.48 3.26 3.06 2.87 2.78 2.69 2.53 2.47 2.45 2.43 2.42 2.41 2.39 2.37
4.39 3.97 3.69 3.48 3.33 3.11 2.90 2.71 2.62 2.53 2.37 2.32 2.29 2.27 2.26 2.25 2.23 2.21
4.28 3.87 3.58 3.37 3.22 3.00 2.79 2.60 2.51 2.42 2.25 2.20 2.18 2.16 2.14 2.13 2.12 2.01
4.21 3.79 3.50 3.29 3.14 2.91 2.71 2.51 2.42 2.33 2.17 2.11 2.09 2.07 2.06 2.05 2.03 1.94
4.15 3.73 3.44 3.23 3.07 2.85 2.64 2.45 2.36 2.27 2.10 2.04 2.02 2.00 1.98 1.98 1.96 1.88
4.10 3.68 3.39 3.18 3.02 2.80 2.59 2.39 2.30 2.21 2.04 1.99 1.96 1.94 1.93 1.92 1.90 1.83
4.06 3.64 3.35 3.14 2.98 2.75 2.54 2.35 2.25 2.16 1.99 1.94 1.91 1.89 1.88 1.87 1.85 1.75
4.00 3.57 3.28 3.07 2.91 2.69 2.48 2.28 2.18 2.09 1.92 1.86 1.83 1.82 1.80 1.79 1.77 1.67
3.94 3.51 3.22 3.01 2.85 2.62 2.40 2.20 2.11 2.01 1.84 1.78 1.75 1.73 1.72 1.71 1.69 1.57
3.87 3.44 3.15 2.94 2.77 2.54 2.33 2.12 2.03 1.93 1.75 1.69 1.66 1.64 1.62 1.61 1.59 1.52
3.84 3.41 3.12 2.90 2.74 2.51 2.29 2.08 1.98 1.89 1.70 1.64 1.61 1.59 1.57 1.56 1.54 1.46
3.81 3.38 3.08 2.86 2.70 2.47 2.25 2.04 1.94 1.84 1.65 1.59 1.55 1.54 1.52 1.50 1.48 1.32
3.74 3.30 3.01 2.79 2.62 2.38 2.16 1.95 1.84 1.74 1.53 1.46 1.43 1.41 1.39 1.37 1.35 1.38
3.72 3.28 2.98 2.76 2.59 2.36 2.13 1.91 1.81 1.70 1.49 1.42 1.38 1.36 1.33 1.32 1.29 1.20
3.70 3.27 2.97 2.75 2.58 2.34 2.11 1.90 1.79 1.68 1.47 1.39 1.35 1.33 1.30 1.29 1.26 1.22
3.70 3.26 2.96 2.74 2.57 2.33 2.10 1.89 1.78 1.67 1.45 1.38 1.33 1.31 1.28 1.27 1.23 1.20
3.69 3.25 2.95 2.73 2.56 2.32 2.10 1.88 1.77 1.66 1.44 1.36 1.32 1.29 1.26 1.25 1.21 1.17
3.69 3.25 2.95 2.73 2.56 2.32 2.09 1.87 1.76 1.65 1.43 1.35 1.30 1.28 1.25 1.23 1.19 1.15
3.68 3.24 2.94 2.72 2.55 2.31 2.08 1.86 1.75 1.64 1.41 1.33 1.28 1.25 1.22 1.20 1.16 1.13
3.67 3.23 2.93 2.71 2.54 2.30 2.07 1.84 1.73 1.62 1.39 1.30 1.25 1.22 1.19 1.17 1.11 1.00
"Entries in the table are the F values for an area (Alpha probability) in the upper tail of the F distribution for the given denominator and numerator degrees of freedom (DF).
Table A.4b: Percentage points of the F distribution at Alpha = 0.025."

Denominator
1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 60 90 120 150 200 250 500
00
10
Numerator DF
12 15
20
24
30
60
90
120
150
200
250
500
oo
647.8 38.51 17.44 12.22 10.01
799.5 864.2 899.6 921.8 937.1 948.2 956.6 963.3 968.3 976.2 984.9 993.1 997.3 1001 1010 1013 1014 1015 1016 1016 1017 1018 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.39 39.40 39.41 39.43 39.45 39.46 39.46 39.48 39.49 39.49 39.49 39.49 39.49 39.50 39.50 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.47 14.42 14.34 14.25 14.17 14.12 14.08 13.99 13.96 13.95 13.94 13.93 13.92 13.91 13.90 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.90 8.84 8.75 8.66 8.56 8.51 8.46 8.36 8.33 8.31 8.30 8.29 8.28 8.27 8.26 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68 6.62 6.52 6.43 6.33 6.28 6.23 6.12 6.09 6.07 6.06 6.05 6.04 6.03 6.02
8.81 8.07 7.57 7.21 6.94 6.55 6.20 5.87 5.72 5.57 5.29 5.20 5.15 5.13 5.10 5.08 5.05 5.02
7.26 6.54 6.06 5.71 5.46 5.10 4.77 4.46 4.32 4.18 3.93 3.84 3.80 3.78 3.76 3.74 3.72 3.69
6.60 5.89 5.42 5.08 4.83 4.47 4.15 3.86 3.72 3.59 3.34 3.26 3.23 3.20 3.18 3.17 3.14 3.12
6.23 5.52 5.05 4.72 4.47 4.12 3.80 3.51 3.38 3.25 3.01 2.93 2.89 2.87 2.85 2.84 2.81 2.79
5.99 5.29 4.82 4.48 4.24 3.89 3.58 3.29 3.15 3.03 2.79 2.71 2.67 2.65 2.63 2.62 2.59 2.57
5.82 5.12 4.65 4.32 4.07 3.73 3.41 3.13 2.99 2.87 2.63 2.55 2.52 2.49 2.47 2.46 2.43 2.41
5.70 4.99 4.53 4.20 3.95 3.61 3.29 3.01 2.87 2.75 2.51 2.43 2.39 2.37 2.35 2.34 2.31 2.29
5.60 4.90 4.43 4.10 3.85 3.51 3.20 2.91 2.78 2.65 2.41 2.34 2.30 2.28 2.26 2.24 2.22 2.19
5.52 4.82 4.36 4.03 3.78 3.44 3.12 2.84 2.70 2.57 2.33 2.26 2.22 2.20 2.18 2.16 2.14 2.11
5.46 4.76 4.30 3.96 3.72 3.37 3.06 2.77 2.64 2.51 2.27 2.19 2.16 2.13 2.11 2.10 2.07 2.05
5.37 4.67 4.20 3.87 3.62 3.28 2.96 2.68 2.54 2.41 2.17 2.09 2.05 2.03 2.01 2.00 1.97 1.94
5.27 4.57 4.10 3.77 3.52 3.18 2.86 2.57 2.44 2.31 2.06 1.98 1.94 1.92 1.90 1.89 1.86 1.83
5.17 4.47 4.00 3.67 3.42 3.07 2.76 2.46 2.33 2.20 1.94 1.86 1.82 1.80 1.78 1.76 1.74 1.74
5.12 4.41 3.95 3.61 3.37 3.02 2.70 2.41 2.27 2.14 1.88 1.80 1.76 1.74 1.71 1.70 1.67 1.64
5.07 4.36 3.89 3.56 3.31 2.96 2.64 2.35 2.21 2.07 1.82 1.73 1.69 1.67 1.64 1.63 1.60 1.57
4.96 4.25 3.78 3.45 3.20 2.85 2.52 2.22 2.08 1.94 1.67 1.58 1.53 1.50 1.47 1.46 1.42 1.39
4.92 4.22 3.75 3.41 3.16 2.81 2.48 2.18 2.03 1.89 1.61 1.52 1.47 1.44 1.41 1.39 1.35 1.31
4.90 4.20 3.73 3.39 3.14 2.79 2.46 2.16 2.01 1.87 1.58 1.48 1.43 1.40 1.37 1.35 1.31 1.30
4.89 4.19 3.72 3.38 3.13 2.78 2.45 2.14 2.00 1.85 1.56 1.46 1.41 1.38 1.35 1.33 1.28 1.24
4.88 4.18 3.70 3.37 3.12 2.76 2.44 2.13 1.98 1.84 1.54 1.44 1.39 1.35 1.32 1.30 1.25 1.21
4.88 4.17 3.70 3.36 3.11 2.76 2.43 2.12 1.97 1.83 1.53 1.43 1.37 1.34 1.30 1.28 1.24 1.18
4.86 4.16 3.68 3.35 3.09 2.74 2.41 2.10 1.95 1.81 1.51 1.40 1.34 1.31 1.27 1.24 1.19 1.13
4.85 4.14 3.67 3.33 3.08 2.72 2.40 2.09 1.94 1.79 1.48 1.37 1.31 1.27 1.23 1.20 1.14 1.00
Entries in the table are the F values for an area (Alpha probability) in the upper tail of the F distribution for the given denominator and n umerator degrees of freedom (DF).
Table A.4c: Percentage points of the F distribution at Alpha = 0.01/

Denominator
DF 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 60 90 120 150 200 250 500
00
10
Numerator DF
12 15
20
24
30
60
90
120
150
200
250
500
00
4052 98.50 34.12 21.20 16.26
4999 99.00 30.82 18.00 13.27
5403 99.16 29.46 16.69 12.06
5624 99.25 28.71 15.98 11.39
5764 99.30 28.24 15.52 10.97
5859 99.33 27.91 15.21 10.67
5928 99.36 27.67 14.98 10.46
5981 99.38 27.49 14.80 10.29
6022 99.39 27.34 14.66 10.16
6056 99.40 27.23 14.55 10.05
6107 99.42 27.05 14.37 9.89 7.72 6.47 5.67 5.11 4.71 4.16 3.67 3.23 3.03 2.84 2.50 2.39 2.34 2.31 2.27 2.26 2.22 2.18
6157 99.43 26.87 14.20 9.72 7.56 6.31 5.52 4.96 4.56 4.01 3.52 3.09 2.89 2.70 2.35 2.24 2.19 2.16 2.13 2.11 2.07 2.04
6209 99.45 26.69 14.02 9.55
6234 99.46 26.60 13.93 9.47
6260 99.47 26.50 13.84 9.38
6313 99.48 26.32 13.65 9.20 7.06 5.82 5.03 4.48 4.08 3.54 3.05 2.61 2.40 2.21 1.84 1.72 1.66 1.62 1.58 1.56 1.52 1.47
6331 99.49 26.25 13.59 9.14 7.00 5.77 4.97 4.43 4.03 3.48 2.99 2.55 2.34 2.14 1.76 1.64 1.58 1.54 1.50 1.48 1.43 1.38
6339 99.49 26.22 13.56 9.11
6345 99.49 26.20 13.54 9.09
6350 99.49 26.18 13.52 9.08
6353 99.50 26.17 13.51 9.06
6359 99.50 26.15 13.49 9.04
6366 99.50 26.13 13.46 9.02
13.75 10.92 12.25 9.55 11.26 8.65 10.56 8.02 10.04 7.56 9.33 8.68 8.10 7.82 7.56 7.08 6.93 6.85 6.81 6.76 6.74 6.69 6.63 6.93 6.36 5.85 5.61 5.39 4.98 4.85 4.79 4.75 4.71 4.69 4.65 4.61
9.78 8.45 7.59 6.99 6.55 5.95 5.42 4.94 4.72 4.51 4.13 4.01 3.95 3.91 3.88 3.86 3.82 3.78
9.15 7.85 7.01 6.42 5.99 5.41 4.89 4.43 4.22 4.02 3.65 3.53 3.48 3.45 3.41 3.40 3.36 3.32
8.75 7.46 6.63 6.06 5.64 5.06 4.56 4.10 3.90 3.70 3.34 3.23 3.17 3.14 3.11 3.09 3.05 3.02
8.47 7.19 6.37 5.80 5.39 4.82 4.32 3.87 3.67 3.47 3.12 3.01 2.96 2.92 2.89 2.87 2.84 2.80
8.26 6.99 6.18 5.61 5.20 4.64 4.14 3.70 3.50 3.30 2.95 2.84 2.79 2.76 2.73 2.71 2.68 2.64
8.10 6.84 6.03 5.47 5.06 4.50 4.00 3.56 3.36 3.17 2.82 2.72 2.66 2.63 2.60 2.58 2.55 2.51
7.98 6.72 5.91 5.35 4.94 4.39 3.89 3.46 3.26 3.07 2.72 2.61 2.56 2.53 2.50 2.48 2.44 2.41
7.87 6.62 5.81 5.26 4.85 4.30 3.80 3.37 3.17 2.98 2.63 2.52 2.47 2.44 2.41 2.39 2.36 2.32
7.40 6.16 5.36 4.81 4.41 3.86 3.37 2.94 2.74 2.55 2.20 2.09 2.03 2.00 1.97 1.95 1.92 1.88
7.31 6.07 5.28 4.73 4.33 3.78 3.29 2.86 2.66 2.47 2.12 2.00 1.95 1.92 1.89 1.87 1.83 1.79
7.23 5.99 5.20 4.65 4.25 3.70 3.21 2.78 2.58 2.39 2.03 1.92 1.86 1.83 1.79 1.77 1.74 1.70
6.97 5.74 4.95 4.40 4.00 3.45 2.96 2.52 2.31 2.11 1.73 1.60 1.53 1.49 1.45 1.43 1.38 1.36
6.95 5.72 4.93 4.38 3.98 3.43 2.94 2.50 2.29 2.09 1.70 1.57 1.51 1.46 1.42 1.40 1.34 1.29
6.93 5.70 4.91 4.36 3.96 3.41 2.92 2.48 2.27 2.07 1.68 1.55 1.48 1.43 1.39 1.36 1.31 1.25
6.92 5.69 4.90 4.35 3.95 3.40 2.91 2.47 2.26 2.06 1.66 1.53 1.46 1.42 1.37 1.34 1.28 1.22
6.90 5.67 4.88 4.33 3.93 3.38 2.89 2.44 2.24 2.03 1.63 1.49 1.42 1.38 1.33 1.30 1.23 1.15
6.88 5.65 4.86 4.31 3.91 3.36 2.87 2.42 2.21 2.01 1.60 1.46 1.38 1.33 1.28 1.24 1.16 1.00
"Entries in the table are the F values for an area (Alpha probability) in the upper tail of the F distribution for the given denominator and numerator degrees of freedom (DF).
Table A.4d: Percentage points of the F distribution at Alpha = 0.005.*

Denominator
DF 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 60 90 120 150 200 250 500 oo 1 2 3 4 5 6 7 8 9 10
Numerator DF
12 15 ++
20
24
30
60
90
120
150
200
250
500
oo
++ 198.5 55.55 31.33 22.78
++ 199.0 49.80 26.28 18.31
++ 199.2 47.47 24.26 16.53
199.2 46.20 23.15 15.56
++
199.3 45.39 22.46 14.94
++
++ 199.3 44.84 21.98 14.51
199.4 44.43 21.62 14.20
++
199.4 44.13 21.35 13.96
++
199.4 43.88 21.14 13.77
++
++ 199.4 43.68 20.97 13.62
199.4 43.39 20.70 13.38
++ 199.4 43.08 20.44 13.15
199.4 42.78 20.17 12.90 9.59 7.75 6.61 5.83 5.27 4.53 3.88 3.32 3.06 2.82 2.39 2.25 2.19 2.15 2.11 2.09 2.04 2.00
++
199.4 42.62 20.03 12.78 9.47 7.64 6.50 5.73 5.17 4.43 3.79 3.22 2.97 2.73 2.29 2.15 2.09 2.05 2.01 1.99 1.94 1.90
++
199.5 42.47 19.89 12.66 9.36 7.53 6.40 5.62 5.07 4.33 3.69 3.12 2.87 2.63 2.19 2.05 1.98 1.94 1.91 1.88 1.84 1.79
++
++ 199.5 42.15 19.61 12.40
++ 199.5 42.04 19.52 12.32
199.5 41.99 19.47 12.27 9.00 7.19 6.06 5.30 4.75 4.01 3.37 2.81 2.55 2.30 1.83 1.68 1.61 1.56 1.51 1.48 1.42 1.36
++
++ 199.5 41.96 19.44 12.25
199.5 41.92 19.41 12.22
++
199.5 41.91 19.39 12.21 8.94 7.13 6.01 5.24 4.69 3.96 3.31 2.75 2.49 2.24 1.76 1.60 1.52 1.47 1.42 1.39 1.32 1.25
++
++ 199.5 41.87 19.36 12.17
++ 199.5 41.83 19.32 12.14
18.63 16.24 14.69 13.61 12.83 11.75 10.80 9.94 9.55 9.18 8.49 8.28 8.18 8.12 8.06 8.02 7.95 7.88
14.54 12.92 12.03 11.46 11.07 10.79 10.57 10.39 10.25 10.03 12.40 10.88 10.05 9.52 9.16 8.89 8.68 8.51 8.38 8.18 11.04 9.60 8.81 8.30 7.95 7.69 7.50 7.34 7.21 7.01 10.11 8.72 7.96 7.47 7.13 6.88 6.69 6.54 6.42 6.23 9.43 8.08 7.34 6.87 6.54 6.30 6.12 5.97 5.85 5.66 8.51 7.70 6.99 6.66 6.35 5.79 5.62 5.54 5.49 5.44 5.41 5.35 5.30 7.23 6.48 5.82 5.52 5.24 4.73 4.57 4.50 4.45 4.41 4.38 4.33 4.28 6.52 5.80 5.17 4.89 4.62 4.14 3.99 3.92 3.88 3.84 3.81 3.76 3.72 6.07 5.37 4.76 4.49 4.23 3.76 3.62 3.55 3.51 3.47 3.44 3.40 3.35 5.76 5.07 4.47 4.20 3.95 3.49 3.35 3.28 3.25 3.21 3.18 3.14 3.09 5.52 4.85 4.26 3.99 3.74 3.29 3.15 3.09 3.05 3.01 2.99 2.94 2.90 5.35 4.67 4.09 3.83 3.58 3.13 3.00 2.93 2.89 2.86 2.83 2.79 2.74 5.20 4.54 3.96 3.69 3.45 3.01 2.87 2.81 2.77 2.73 2.71 2.66 2.62 5.09 4.42 3.85 3.59 3.34 2.90 2.77 2.71 2.67 2.63 2.61 2.56 2.52 4.91 4.25 3.68 3.42 3.18 2.74 2.61 2.54 2.51 2.47 2.45 2.40 2.36
9.81 7.97 6.81 6.03 5.47 4.72 4.07 3.50 3.25 3.01 2.57 2.44 2.37 2.33 2.30 2.27 2.23 2.19
9.12 7.31 6.18 5.41 4.86 4.12 3.48 2.92 2.66 2.42 1.96 1.82 1.75 1.70 1.66 1.64 1.58 1.53
9.04 7.23 6.10 5.34 4.79 4.05 3.41 2.84 2.58 2.34 1.88 1.73 1.66 1.61 1.56 1.54 1.48 1.43
8.98 7.17 6.04 5.28 4.73 3.99 3.35 2.78 2.52 2.28 1.81 1.65 1.57 1.53 1.48 1.45 1.39 1.32
8.95 7.15 6.02 5.26 4.71 3.97 3.33 2.76 2.50 2.25 1.78 1.62 1.54 1.49 1.44 1.41 1.35 1.28
8.91 7.10 5.98 5.21 4.67 3.93 3.29 2.72 2.46 2.21 1.73 1.56 1.48 1.42 1.37 1.33 1.26 1.17
8.88 7.08 5.95 5.19 4.64 3.90 3.26 2.69 2.43 2.18 1.69 1.52 1.43 1.37 1.31 1.27 1.18 1.00
* Entries in the table are the F values for an area (Alpha probability) in the upper tail of the F distribution for the given denominator and numerator degrees of freedom (DF). F values exceed 16.000.
Table A.4e: Percentage points of the F distribution at Alpha = 0.001.H

Denominator
DF 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 60 90 120 150 200 250 500
00
Numerator DF
10 12 15 ++ ++ ++
20
24
30
60
90
120
150
200
250
500
00
998.4 167.1 74.13 47.18
++
998.8 148.5 61.25 37.12 27.00 21.69 18.49 16.39 14.90
++
999.3 141.1 56.17 33.20
++
999.3 137.1 53.43 31.08
++
999.3 134.6 51.72 29.75 20.80 16.21 13.48 11.71 10.48
++
999.3 132.8 50.52 28.83 20.03 15.52 12.86 11.13 9.93
++
999.3 131.6 49.65 28.17
++
999.3 130.6 49.00 27.65
++
999.3 129.9 48.47 27.24
++
999.3 129.2 48.05 26.91
999.3 128.3 47.41 26.42
999.3 127.4 46.76 25.91
999.3 126.4 46.10 25.39
++
999.3 125.9 45.77 25.13
++
999.3 125.4 45.43 24.87
++
999.3 124.4 44.75 24.33
++
999.3 124.2 44.51 24.15
++
999.3 124.0 44.40 24.06
++
999.3 123.9 44.33 24.00
++
999.3 123.7 44.27 23.95
++
999.3 123.7 44.22 23.92
++
999.3 123.6 44.14 23.85
++
999.5 123.5 44.05 23.78
++
35.51 29.25 25.41 22.86 21.04
23.71 18.77 15.83 13.90 12.55
21.92 17.20 14.39 12.56 11.28 9.63 8.25 7.10 6.59 6.12 5.31 5.06 4.95 4.88 4.81 4.77 4.69 4.62
19.46 15.02 12.40 10.70 9.52 8.00 6.74 5.69 5.24 4.82 4.09 3.87 3.77 3.71 3.65 3.61 3.54 3.47
19.03 14.63 12.05 10.37 9.20 7.71 6.47 5.44 4.99 4.58 3.86 3.65 3.55 3.49 3.43 3.40 3.33 3.27
18.69 18.41 17.99 17.56 17.12 16.90 16.67 16.21 16.06 15.98 15.93 15.89 15.86 15.80 15.75 14.33 14.08 13.71 13.32 12.93 12.73 12.53 12.12 11.98 11.91 11.87 11.82 11.80 11.75 11.70 11.77 11.54 11.19 10.84 10.48 10.30 10.11 9.73 9.60 9.53 9.49 9.45 9.43 9.38 9.33 10.11 9.89 9.57 9.24 8.90 8.72 8.55 8.19 8.06 8.00 7.96 7.93 7.90 7.86 7.81 8.96 8.75 8.45 8.13 7.80 7.64 7.47 7.12 7.00 6.94 6.91 6.87 6.85 6.81 6.76 7.48 6.26 5.24 4.80 4.39 3.69 3.48 3.38 3.32 3.26 3.23 3.16 3.10 7.29 6.08 5.08 4.64 4.24 3.54 3.34 3.24 3.18 3.12 3.09 3.02 2.96 7.00 5.81 4.82 4.39 4.00 3.32 3.11 3.02 2.96 2.90 2.87 2.81 2.74 6.71 5.54 4.56 4.14 3.75 3.08 2.88 2.78 2.73 2.67 2.64 2.58 2.51 6.40 5.25 4.29 3.87 3.49 2.83 2.63 2.53 2.48 2.42 2.39 2.33 2.27 6.25 5.10 4.15 3.74 3.36 2.69 2.50 2.40 2.35 2.29 2.26 2.20 2.13 6.09 4.95 4.00 3.59 3.22 2.55 2.36 2.26 2.21 2.15 2.12 2.05 1.99 5.76 4.64 3.70 3.29 2.92 2.25 2.05 1.95 1.89 1.83 1.80 1.73 1.66 5.65 4.53 3.60 3.19 2.81 2.14 1.93 1.83 1.77 1.71 1.67 1.60 1.52 5.59 4.48 3.54 3.14 2.76 2.08 1.87 1.77 1.70 1.64 1.60 1.53 1.45 5.56 4.44 3.51 3.10 2.73 2.05 1.83 1.73 1.66 1.60 1.56 1.48 1.40 5.52 4.41 3.48 3.07 2.69 2.01 1.79 1.68 1.62 1.55 1.51 1.43 1.34 5.50 4.39 3.46 3.05 2.67 1.99 1.77 1.66 1.59 1.52 1.48 1.39 1.30 5.46 4.35 3.42 3.01 2.63 1.94 1.72 1.60 1.53 1.46 1.42 1.32 1.21 5.42 4.31 3.38 2.97 2.59 1.89 1.66 1.54 1.47 1.39 1.34 1.23 1.00
18.64 12.97 10.80 16.59 11.34 9.34 14.82 9.95 8.10 14.03 9.34 7.55 13.29 8.77 7.05 11.97 11.57 11.38 11.27 11.15 11.09 10.96 10.83 7.77 7.47 7.32 7.24 7.15 7.10 7.00 6.91 6.17 5.91 5.78 5.71 5.63 5.59 5.51 5.42
8.89 7.57 6.46 5.98 5.53 4.76 4.53 4.42 4.35 4.29 4.25 4.18 4.10
8.38 7.09 6.02 5.55 5.12 4.37 4.15 4.04 3.98 3.92 3.88 3.81 3.74
* Entries in the table are the F values for an area (Alpha probability) in the upper tail of the F distribution for the given denominator and numerator degrees of freedom (DF). ++ F values exceed 400,000.
Table A.5: Percentage points of the beta distribution.'

P/2 Alpha 0.5
0 . 9 9 9 0.75 0.10 0.05 0 . 0 2 5 0.01 0.005 0.001 0.999 0.75 0.10 0.05 0.025 0.01 0 . 0 0 5 0.001 0 . 9 9 9 0.75 0.10 0.05 0.025 0.01 0.005 0.001 0.999 0.75 0.10 0.05 0.025 0.01 0 . 0 0 5 0.001
5
0 . 0 0 0 0 0.0106 0.2473 0.3318 0 . 4 0 9 6 0.5011 0.5619 0 . 6 7 7 8 0.0002 0.0559 0 . 3 6 9 0 0.4507 0.5218 0.6019 0.6534 0.7488 0 . 0 0 2 3 0.1093 0.4500 0.5266 0.5915 0.6628 0.7080 0.7902 0.0083 0.1612 0.5103 0.5818 0.6412 0.7057 0 . 7 4 6 0 0.8186
6
0 . 0 0 0 0 0.0088 0.2093 0.2835 0.3532 0.4374 0.4948 0 . 6 0 8 4
7
0 . 0 0 0 0 0.0075 0.1814 0.2473 0.3103 0.3876 0.4413 0.5505
10
0 . 0 0 0 0 0.0052 0.1295 0.1787 0.2269 0.2882 0.3321 0.4256
20
0 . 0 0 0 0 0.0026 0.0662 0.0927 0.1194 0.1546 0.1808 0.2397 0.0001 0.0143 0.1087 0.1391 0.1684 0.2057 0.2327 0.2921 0 . 0 0 0 6 0 . 0 2 9 5 0.1431 0.1755 0.2062 0.2444 0.2718 0.3309 0.0022 0.0458 0.1729 0 . 2 0 6 7 0.2382 0.2768 0.3043 0.3630
30
0 . 0 0 0 0 0.0017 0.0445 0.0625 0.0810 0.1055 0.1240 0.1664 0.0000 0.0095 0 . 0 7 3 9 0.0950 0.1157 0.1423 0.1619 0.2057 0 . 0 0 0 4 0.0198 0.0982 0.1212 0.1432 0.1710 0.1912 0.2358 0.0015 0.0310 0.1198 0.1441 0.1670 0.1957 0.2163 0.2613
40
(n-p-l)/2 50 0 . 0 0 0 0 0.0010 0.0268 0.0379 0 . 0 4 9 2 0.0645 0.0761 0.1031 0.0000 0.0057 0 . 0 4 5 0 0.0582 0.0711 0 . 0 8 8 0 0.1005 0.1290 0 . 0 0 0 2 0.0120 0.0603 0.0748 0 . 0 8 8 8 0.1068 0.1199 0.1494
60
0 . 0 0 0 0 0.0008 0.0224 0.0316 0.0412 0.0540 0.0638 0 . 0 8 6 6 0.0000 0 . 0 0 4 8 0 . 0 3 7 6 0.0487 0.0596 0 . 0 7 3 9 0 . 0 8 4 5 0.1087 0 . 0 0 0 2 0 . 0 1 0 0 0.0506 0.0628 0 . 0 7 4 7 0.0899 0.1011 0.1263
70
0 . 0 0 0 0 0.0007 0.0192 0.0272 0.0354 0.0464 0.0549 0 . 0 7 4 7 0 . 0 0 0 0 0.0006 0.0168 0.0238 0.0310 0.0407 0.0482 0.0656
90
0 . 0 0 0 0 0.0006 0.0150 0.0212 0 . 0 2 7 6 0.0363 0.0429 0.0585 0.0000 0 . 0 0 3 2 0.0253 0.0327 0.0402 0 . 0 4 9 9 0.0572 0.0739 0.0001 0 . 0 0 6 7 0.0340 0.0424 0.0505 0.0609 0.0687 0.0862 0.0005 0.0106 0.0421 0.0511 0.0597 0.0707 0 . 0 7 8 8 0 . 0 9 7 0
120
0 . 0 0 0 0 0.0004 0.0112 0.0159 0 . 0 2 0 8 0.0273 0.0324 0 . 0 4 4 2 0.0000 0.0024 0.0190 0.0247 0.0303 0 . 0 3 7 6 0 . 0 4 3 2 0.0559 0.0001 0 . 0 0 5 0 0.0257 0.0320 0.0381 0.0461 0.0520 0.0654 0.0004 0.0079 0.0318 0 . 0 3 8 6 0.0452 0.0536 0.0598 0 . 0 7 3 8
150
0 . 0 0 0 0 0.0003 0.0090 0.0127 0.0166 0.0219 0.0260 0.0355 0.0000 0.0019 0.0152 0.0198 0.0243 0.0302 0 . 0 3 4 7 0.0450 0 . 0 0 0 1 0 . 0 0 4 0 0.0206 0.0257 0 . 0 3 0 6 0.0370 0.0418 0.0527 0.0003 0.0064 0.0255 0.0310 0.0363 0.0432 0 . 0 4 8 2 0.0595
200
0 . 0 0 0 0 0.0003 0.0067 0 . 0 0 9 6 0.0125 0.0165 0.0195 0.0267 0.0000 0.0014 0.0114 0.0149 0.0183 0.0228 0.0261 0.0339 0 . 0 0 0 1 0 . 0 0 3 0 0.0155 0.0193 0.0231 0.0279 0.0315 0.0398 0.0002 0.0048 0.0192 0.0234 0.0274 0.0326 0 . 0 3 6 4 0 . 0 4 5 0
250
0 . 0 0 0 0 0.0002 0.0054 0 . 0 0 7 7 0.0100 0.0132 0.0157 0.0214 0.0000 0.0012 0.0092 0.0119 0.0146 0.0183 0.0210 0.0273 0 . 0 0 0 0 0 . 0 0 2 4 0.0124 0.0155 0.0185 0.0224 0.0253 0.0320 0.0002 0.0038 0.0154 0.0188 0.0220 0.0262 0.0292 0.0362
500
0 . 0 0 0 0 0.0001 0.0027 0 . 0 0 3 8 0 . 0 0 5 0 0 . 0 0 6 6 0 . 0 0 7 9 0.0108 0 . 0 0 0 0 0 . 0 0 0 6 0 . 0 0 4 6 0.0060 0.0074 0 . 0 0 9 2 0.0105 0.0137 0 . 0 0 0 0 0.0012 0.0062 0.0078 0 . 0 0 9 3 0.0113 0.0127 0.0161 0.0001 0.0019 0 . 0 0 7 7 0 . 0 0 9 4 0.0111 0.0132 0.0147 0.0183
0 . 0 0 0 0 0 . 0 0 0 0 0.0065 0.0058 0.1600 0.1431 0.2193 0.1969 0.2765 0 . 2 4 9 3 0.3478 0.3152 0.3979 0.3621 0.5019 0 . 4 6 0 8 0.0001 0.0353 0.2501 0.3123 0.3694 0.4377 0 . 4 8 4 3 0.5783 0.0015 0 . 0 7 0 9 0.3158 0.3778 0.4332 0.4981 0.5417 0.6281 0.0053 0.1072 0 . 3 6 8 4 0.4291 0.4825 0.5440 0.5850 0.6651
0 . 0 0 0 0 0.0013 0.0335 0.0472 0.0612 0.0801 0.0944 0.1273 0.0000 0.0072 0.0559 0.0722 0.0881 0.1087 0.1241 0.1586 0 . 0 0 0 3 0.0150 0.0747 0.0925 0.1096 0.1315 0.1474 0.1830 0.0011 0.0235 0.0916 0.1106 0.1286 0.1512 0.1677 0.2039
1.0
0.0002 0.0001 . 0 4 0 3 0 . 0 4 6 8 0 0.3187 0.2803 0.3930 0.3482 0.4593 0.4096 0.5358 0.4821 0 . 5 8 6 5 0.5309 0.6838 0.6272 0.0019 0 . 0 9 2 6 0.3944 0.4660 0.5280 0.5981 0.6437 0.7298 0.0017 0 . 0 8 0 3 0.3509 0.4174 0.4761 0.5438 0.5887 0.6758
0.0001 0.0001 0.0315 0.0284 0.2257 0.2057 0.2831 0.2589 0.3363 0.3085 . 3 6 9 0 0 . 4 0 0 5 0 0 . 4 4 5 0 0.4113 0.5358 0.4988 0.0013 0 . 0 6 3 5 0.2871 0.3450 0.3972 0.4591 0.5012 0.5858 0 . 0 0 4 8 0.0964 0.3368 0.3942 0.4450 0.5044 0.5443 0.6237 0.0012 0 . 0 5 7 5 0.2631 0.3173 0 . 3 6 6 6 0.4255 0.4660 0.5485 0.0043 0.0876 0.3102 0 . 3 6 4 4 0.4128 0.4698 0 . 5 0 8 6 0 . 5 8 6 6
0.0000 0.0000 . 0 0 3 6 0 . 0 0 4 1 0 . 0 2 8 4 0.0324 0 0.0419 0.0368 0.0513 0.0451 0 . 0 6 3 7 0.0559 . 0 6 4 1 0 . 0 7 2 9 0 0.0940 0.0827 0.0002 0 . 0 0 8 6 0.0435 0.0541 0 . 0 6 4 4 0.0776 0.0873 0.1093 0.0006 0.0135 0.0537 0.0651 0.0760 0.0899 0.1000 0.1228 0 . 0 0 0 2 0 . 0 0 7 5 0.0382 0.0475 0 . 0 5 6 6 0.0682 0.0769 0.0964 0.0006 0.0119 0 . 0 4 7 2 0.0572 0.0669 0.0792 0 . 0 8 8 2 0.1084
1.5
2.0
0.0070 0.0060 0.1380 0.1206 0 . 4 5 2 6 0 . 4 0 6 2 0.5207 0 . 4 7 0 7 0.5787 0.5265 0.6434 0.5899 0 . 6 8 4 9 0.6315 0.7625 0.7113
0.0009 0.0008 0.0189 0.0158 0 . 0 7 4 1 0.0623 0 . 0 8 9 7 0.0754 0.1045 0.0880 0.1232 0.1039 0.1368 0.1156 0.1671 0.1416
*Entries in the table are the beta values -B(a,p/2.(n-P-2)/2) for foi an area (Alpha probability) in the upper tail of the beta distribution for the given parameter values of p/2 and (n-p-l)/2.
Table A.5 (continued): Percentage points of the beta distribution/

p/2 Alpha 2.5
0 . 9 9 9 0.75 0.1 0.05 0.025 0.01 0.005 0.001 0 . 9 9 9 0 . 7 5 0.1 0.05 0.025 0.01 0.005 0.001 0 . 9 9 9 0.75 0.1 0.05 0.025 0.01 0.005 0.001 0 . 9 9 9 0.75 0.1 0 . 0 5 0 . 0 2 5 0.01 0.005 0.001 0.0182 0.0155 0.0135 0 . 2 0 9 2 0.1808 0.1592 . 4 9 9 4 0.4517 0.5577 0 0.6245 0.5641 0.5137 0.6793 0.6185 0.5668 0.7381 0.6785 0 . 6 2 6 4 0 . 7 7 4 6 0.7167 0.6652 0.8398 0.7875 0.7389 0.0316 0.2531 0.5962 0.6587 0 . 7 0 9 6 0 . 7 6 3 7 0.7970 0.8562 0 . 0 4 7 3 0.2929 0.6282 0 . 6 8 7 0 0 . 7 3 4 4 0.7845 0.8152 0.8695 . 0 2 3 7 0 . 0 2 7 0 0 0 . 2 2 0 6 0.1955 0.5382 0.4901 0.5997 0.5496 0 . 6 5 0 9 0.6001 0 . 7 0 6 8 0.6563 0.7422 0.6926 0.8073 0.7612 0 . 0 4 0 8 0.2572 0.5711 0 . 6 2 9 6 0 . 6 7 7 8 0.7302 0.7632 0.8235 0.0120 0.1422 0.4122 0.4713 0.5225 0.5810 0.6196 0.6944 0.0210 0.1756 0 . 4 4 9 6 0.5069 0.5561 0.6117 0.6482 0.7185
9
0.0107 0.1286 0.3789 0.4351 0 . 4 8 4 4 0.5413 0.5792 0.6541
10
0 . 0 0 9 7 0.1173 0.3505 0.4040 0.4512 0.5063 0.5435 0.6176
20
0.0051 0.0625 0.1997 0.2344 0.2663 0.3052 0.3326 0.3906 0 . 0 0 9 0 0 . 0 7 9 0 0.2242 0.2595 0.2916 0.3305 0.3577 0.4151
30
0 . 0 0 3 4 0 . 0 4 2 6 0.1395 0.1648 0.1884 0.2177 0.2386 0.2839 0.0061 0.0542 0.1579 0.1839 0.2081 0 . 2 3 7 7 0.2588 0.3042
40
(n-p-l)/2 50 0.0021 0 . 0 2 6 0 0.0870 0.1034 0.1188 0.1381 0.1522 0.1831 0 . 0 0 3 7 0 . 0 3 3 3 0.0991 0.1162 0.1321 0.1520 0.1663 0.1977
60
70
0.0013 0.0164 0.0556 0.0663 0 . 0 7 6 4 0 . 0 8 9 2 0.0985 0.1193 0 . 0 0 2 3 0.0211 0.0636 0 . 0 7 4 8 0.0853 0 . 0 9 8 6 0.1082 0.1295 0 . 0 0 3 7 0.0258 0.0713 0 . 0 8 2 9 0 . 0 9 3 9 0.1075 0.1173 0.1390 0052 0306 0787 0908 1020 1160 1261 1481
90
0.0012 0 . 0 1 4 6 0 . 0 4 9 6 0.0592 0 . 0 6 8 3 0 . 0 7 9 8 0.0882 0.1069
120
0 . 0 0 0 9 0.0110 0.0375 0 . 0 4 4 8 0.0518 0 . 0 6 0 6 0 . 0 6 7 0 0.0814
150
0 . 0 0 0 7 0 . 0 0 8 8 0.0302 0.0361 0.0417 0 . 0 4 8 8 0.0540 0.0658 0.0013 0.0114 0.0346 0 . 0 4 0 8 0 . 0 4 6 7 0.0541 0.0596 0.0717
200
0.0005 0 . 0 0 6 6 0.0227 0.0272 0.0315 0 . 0 3 6 9 0.0409 0.0498 0 . 0 0 0 9 0 . 0 0 8 6 0.0261 0.0308 0.0353 0 . 0 4 1 0 0.0451 0.0543 0.0015 0.0105 0.0294 0.0343 0 . 0 3 9 0 0 . 0 4 4 9 0.0491 0.0586
250
0 . 0 0 0 4 0 . 0 0 5 3 0.0182 0.0218 0.0253 0 . 0 2 9 6 0.0328 0.0401 0 . 0 0 0 8 0 . 0 0 6 9 0.0210 0.0248 0 . 0 2 8 4 0.0329 0.0363 0.0438 0.0012 0.0084 0.0236 0 . 0 2 7 6 0.0314 0.0361 0.0396 0.0473
500
0 . 0 0 0 2 0 . 0 0 2 7 0.0092 0.0110 0.0127 0.0150 0.0166 0.0203 0 . 0 0 0 4 0 . 0 0 3 4 0.0106 0.0125 0.0143 0 . 0 1 6 6 0.0183 0.0222 0 . 0 0 0 6 0.0042 0.0119 0.0139 0.0158 0.0183 0.0200 0.0240 0 . 0 0 0 9 0.0050 0.0132 0.0153 0.0173 0.0198 0.0217 0.0257
0 . 0 0 2 6 0.0323 0.1072 0.1271 0.1457 0.1690 0.1858 0.2226 0 . 0 0 4 6 0.0413 0.1218 0.1424 0.1616 0.1855 0.2026 0.2397
0.0017 0.0015 0.0218 0.0187 0.0732 0.0632 0.0871 0.0753 0.1002 0 . 0 8 6 7 0.1168 0.1011 0.1288 0.1117 0.1554 0.1350 0.0031 0 . 0 2 7 9 0.0836 0.0981 0.1117 0.1288 0.1411 0.1682 0 . 0 0 4 9 0.0341 0.0935 0.1085 0.1226 0.1400 0.1526 0.1802 0.0069 0 . 0 4 0 4 0.1030 0.1185 0.1329 0.1508 0.1636 0.1915 0 . 0 0 2 7 0 . 0 2 4 0 0.0722 0.0849 0 . 0 9 6 8 0.1117 0.1225 0.1463 0 . 0 0 4 2 0.0294 0.0809 0 . 0 9 4 0 0.1063 0.1216 0.1327 0.1570 0.0060 0.0348 0.0892 0.1028 0.1154 0.1311 0.1424 0.1670
3.0
0.0189 0.0172 0.1593 0.1459 0.4152 0.3855 0.4701 0.4381 . 4 8 4 1 0.5178 0 0.5723 0.5373 0.6085 0.5730 0.6793 0.6436 0 . 0 2 8 9 0.1886 0.4470 0.5005 0 . 5 4 6 6 0.5990 0.6335 0.7007 0.0402 0.2162 0.4753 0.5273 0.5719 0.6222 0.6552 0.7192 0 . 0 2 6 4 0.1732 0.4165 0.4681 0.5128 0.5642 0.5984 0.6658 0.0368 0.1991 0.4443 0 . 4 9 4 6 0.5381 0.5878 0.6206 0.6851
0.0021 0 . 0 0 1 6 . 0 1 4 2 0.0188 0 0.0568 0.0430 0.0669 0.0507 . 0 5 8 0 0 . 0 7 6 3 0 0 . 0 8 8 2 0 . 0 6 7 1 0.0969 0.0738 0.1161 0.0886 0.0033 0.0230 0.0637 0 . 0 7 4 2 0 . 0 8 4 0 0.0963 0.1052 0.1248 0.0047 0.0273 0.0704 0.0813 0.0914 0.1040 0.1131 0.1331
3.5
. 0 3 2 0 0.0359 0 0.2294 0.2070 0.5230 0.4821 0 . 5 8 0 2 0 . 5 3 7 6 . 5 8 4 8 0.6282 0 0.6814 0.6379 0.7156 0.6724 0.7797 0.7387 0.0496 0.2609 0.5517 0 . 6 0 6 6 0.6525 0.7029 0.7351 0 . 7 9 5 4 0 . 0 4 4 4 0.2364 0.5108 0 . 5 6 4 4 0 . 6 0 9 7 0.6604 0.6933 0.7560
0.0140 0.0095 0.0954 0.0659 0.2468 0.1751 0.2824 0.2018 0.3147 0.2263 0.3534 0.2562 0.3804 0.2774 0.4370 0.3228 0.0198 0.1114 0.2678 0 . 3 0 3 6 0.3359 0.3745 0.4012 0 . 4 5 6 9 0.0135 0.0774 0.1914 0.2185 0.2433 0.2735 0.2947 0.3401
. 0 0 5 8 0 . 0 0 7 2 0 . 0 4 0 7 0.0503 0 0.1356 0.1107 0.1569 0.1283 0.1765 0.1447 0.2008 0.1650 0.2181 0.1796 0.2556 0.2114 0.0103 0.0593 0.1488 0 . 1 7 0 6 0.1906 0.2152 0.2327 0.2703 0.0083 0.0481 0.1217 0.1398 0.1566 0.1773 0.1921 0.2242
. 0 0 2 0 0 . 0 0 2 5 0 0.0174 0.0140 0 . 0 4 8 4 0.0389 . 0 4 5 4 0 . 0 5 6 4 0 0 . 0 6 3 9 0.0516 0.0734 0.0593 0.0802 0.0648 0.0954 0.0773 0035 0207 0535 0618 0 6 9 6 0794 0864 1019
4.0
0 . 0 6 4 8 0.0562 0.3291 0.2910 0.6554 0.5994 0 . 7 1 0 8 0.6551 0.7551 0 . 7 0 0 7 0.8018 0.7500 0.8303 0.7809 0 . 8 8 0 4 0.8371
0.0028 0.0021 0.0017 0.0166 0.0125 0.0100 0.0431 0.0326 0.0262 0 . 0 4 9 9 0 . 0 3 7 8 0.0304 0 . 0 5 6 2 0 . 0 3 4 3 . 0 4 2 6 0 0.0642 0 . 0 4 8 6 0.0392 0.0699 0.0530 0.0427 . 0 5 0 6 0 . 0 8 2 6 0 . 0 6 2 8 0
"Entries in the table are the beta values B ( a , p / 2 , ( n - p - 2 ) / 2 ) for an area (Alpha probability) in the upper tail of the beta distribution for the given parameter values of p/2 and (n-p-l)/2.
Table A.5 (continued): Percentage points of the beta distribution/

Alpha
5
0.0834 0.3620 0 . 6 7 8 7 0.7311 0.7728 0.8165 0.8430 0.8896
6
0.0727 0.3220 0.6241 0.6771 0.7204 0 . 7 6 6 9 0.7960 0.8487 0.0898 0.3507 0 . 6 4 5 8 0.6965 0.7376 0.7817 0.8091 0.8587
7
0.0644 0.2901 0 . 5 7 7 0 0.6297 0.6735 0.7215 0.7520 0.8089 0.0799 0.3173 0.5995 0.6502 0.6921 0 . 7 3 7 8 0.7668 0.8206 0.1120 0.3663 0 . 6 3 7 7 0.6848 0.7233 0.7651 0.7915 0.8401
8
0.0579 0.2640 0.5362 0.5880 0.6317 0 . 6 8 0 2 0.7115 0.7710 0.0721 0.2898 0.5590 0.6091 0.6511 0 . 6 9 7 6 0.7275 0.7841 0.1016 0.3368 0.5982 0.6452 0.6842 0.7271 0.7546 0.8062 0.1315 0.3782 0 . 6 3 0 9 0.6750 0.7114 0.7512 0.7766 0.8241
9
0.0526 0.2422 0 . 5 0 0 6 0.5512 0.5942 0 . 6 4 2 7 0.6743 0.7354 0.0656 0.2668 0.5234 0.5726 0.6143 0 . 6 6 0 9 0.6913 0.7496 0.0929 0.3117 0.5631 0.6096 0.6486 0.6920 0.7201 0.7738 0.1209 0.3518 0.5965 0.6404 0.6771 0.7177 0.7439 0.7936
10
0.0481 0.2238 0 . 4 6 9 3 0.5185 0.5607 0 . 6 0 8 7 0.6403 0.7022 0.0602 0.2471 0 . 4 9 2 0 0.5400 0.5810 0 . 6 2 7 4 0.6579 0.7173 0.0857 0.2902 0.5317 0.5774 0.6162 0.6597 0.6882 0.7432 0.1119 0.3289 0.5654 0 . 6 0 9 0 0.6457 0 . 6 8 6 6 0.7132 0.7645
20
0.0262 0.1271 0 . 2 8 7 4 0.3234 0.3555 0 . 3 9 3 8 0.4203 0.4752 0.0331 0.1424 0 . 3 0 5 9 0.3418 0.3738 0.4118 0.4379 0 . 4 9 2 0
30
0.0180 0.0888 0 . 2 0 6 8 0.2343 0.2593 0.2897 0.3109 0.3561 0.0229 0.1001 0.2215 0.2493 0.2745 0 . 3 0 4 9 0.3262 0.3712
40
0.0137 0.0683 0.1614 0.1836 0.2040 0 . 2 2 8 8 0.2464 0.2842 0.0175 0.0771 0.1735 0.1961 0.2167 0 . 2 4 1 8 0.2595 0.2974 0.0257 0 . 0 9 4 6 0.1964 0.2195 0.2405 0.2659 0.2838 0.3217 0.0348 0.1117 0.2177 0.2413 0.2626 0.2882 0.3060 0.3439
(N-
50
60
0.0093 0.0467 0.1122 0.1281 0.1428 0.1610 0.1740 0.2022 0.0119 0.0529 0.1210 0.1373 0.1524 0 . 1 7 0 8 0.1840 0.2124 0.0176 0.0653 0.1380 0.1550 0.1705 0.1894 0.2028 0.2317 0.0239 0.0776 0.1541 0.1716 0.1874 0 . 2 0 6 7 0.2204 0.2496
70
0.0080 0 . 0 4 0 3 0 . 0 9 7 3 0.1113 0.1242 0.1402 0.1517 0.1767 0.0102 0.0457 0.1051 0.1194 0.1327 0 . 1 4 8 9 0.1606 0.1859 0.0152 0.0566 0.1202 0.1351 0.1488 0.1655 0.1774 0.2032 0.0206 0.0673 0.1345 0.1499 0.1640 0.1811 0.1933 0.2194
80
0 . 0 0 7 0 0.0354 0 . 0 8 5 9 0.0983 0.1099 0.1242 0.1344 0.1568 0.0090 0.0403 0 . 0 9 2 9 0.1057 0.1175 0.1320 0.1424 0.1652 0.0133 0 . 0 4 9 9 0.1064 0.1197 0.1320 0.1470 0.1577 0.1809 0.0182 0.0594 0.1192 0.1331 0.1457 0.1611 0.1721 0.1957
90
0.0063 0.0316 0 . 0 7 6 9 0.0881 0.0985 0.1114 0.1207 0.1410 0 . 0 0 8 0 0.0360 0 . 0 8 3 2 0 . 0 9 4 7 0.1054 0.1185 0.1280 0.1486 0.0119 0 . 0 4 4 6 0 . 0 9 5 4 0.1075 0.1186 0.1322 0.1419 0.1630 0.0162 0.0532 0.1071 0.1196 0.1311 0.1451 0.1551 0.1766
120
0 . 0 0 4 7 0.0239 0 . 0 5 8 5 0.0671 0.0752 0.0851 0.0923 0.1082 0 . 0 0 6 0 0.0272 0 . 0 6 3 4 0.0723 0.0805 0 . 0 9 0 8 0.0981 0.1142 0 . 0 0 9 0 0.0339 0 . 0 7 2 9 0.0823 0 . 0 9 0 9 0.1015 0.1091 0.1257 0.0123 0.0405 0.0821 0.0918 0.1008 0.1118 0.1196 0.1366
150
0.0038 0.0192 0 . 0 4 7 2 0.0542 0.0608 0 . 0 6 8 9 0 . 0 7 4 8 0.0878 0 . 0 0 4 9 0.0219 0.0512 0.0584 0.0652 0.0735 0.0795 0.0928 0.0072 0.0273 0 . 0 5 9 0 0 . 0 6 6 6 0.0737 0 . 0 8 2 4 0 . 0 8 8 6 0.1023 0.0099 0.0327 0 . 0 6 6 5 0.0745 0.0818 0 . 0 9 0 9 0.0973 0.1114
200
0.0029 0.0145 0.0357 0.0411 0.0461 0.0523 0.0568 0.0668 0.0037 0.0165 0 . 0 3 8 8 0.0443 0 . 0 4 9 4 0.0558 0.0604 0 . 0 7 0 6
250
0.0023 0.0116 0 . 0 2 8 7 0.0330 0.0371 0.0421 0.0458 0.0539 0.0029 0.0133 0.0312 0.0357 0.0398 0 . 0 4 5 0 0.0488 0.0570
500
0.0011 0.0059 0.0145 0.0167 0.0188 0.0214 0.0232 0.0274 0.0015 0.0067 0.0158 0.0181 0.0202 0 . 0 2 2 9 0.0248 0.0290 0.0022 0.0084 0.0183 0.0207 0.0230 0.0257 0.0278 0.0322 0.0030 0.0101 0 . 0 2 0 7 0.0233 0.0256 0 . 0 2 8 6 0.0307 0.0353
4.5
0.999 0.75 0.1 0.05 0.025 0.01 0.005 0.001 0.999 0.75 0.1 0.05 0.025 0.01 0.005 0.001 0.999 0.75 0.1 0.05 0.025 0.01 0.005 0.001 0.999 0.75 0.1 0.05 0.025 0.01 0.005 0.001
0.0111 0.0554 0.1324 0.1509 0.1680 0 . 1 8 9 0 0.2040 0.2363 0.0141 0.0628 0 . 1 4 2 6 0.1615 0.1789 0.2002 0.2153 0.2479 0.0209 0.0773 0.1621 0.1817 0.1995 0.2213 0.2366 0.2695 0.0283 0.0915 0.1805 0.2006 0.2188 0.2408 0.2563 0.2894
5.0
6.0
0.0482 0.0335 0.1717 0.1220 0.3397 0 . 2 4 9 0 0.3754 0.2772 0 . 4 0 7 0 0.3026 0 . 4 4 4 3 0.3330 0 . 4 6 9 8 0.3542 0.5222 0.3987 0.0642 0.1994 0 . 3 7 0 0 0.4054 0.4365 0 . 4 7 2 9 0.4977 0.5485 0.0451 0.1431 0 . 2 7 4 2 0.3027 0.3281 0.3584 0.3794 0.4234
. 0 0 4 4 0.0055 0 0.0206 0.0166 0 . 0 4 4 8 0.0361 0.0506 0.0408 0.0560 0.0452 . 0 5 0 6 0.0627 0 0.0675 0.0545 0.0781 0.0631 0.0075 0.0247 0.0506 0.0567 0.0623 0 . 0 6 9 3 0.0743 0.0851 0.0060 0.0199 0 . 0 4 0 8 0.0457 0.0503 0 . 0 5 6 0 0.0600 0.0689
7.0
*Entries in the table arc the beta values B(n,p/2,(n-p-2)/2) fr an area (Alpha probability) in the tipper tail of the beta distribution for the given parameter values of p/2 and (n-p-l)/2.
Table A.5 (continued): Percentage points of the beta distribution.*

p/2 8.0 Alpha 0.999 0.75 0.1 0.05 0.025 0.01 0.005 0.001 0.999 0.75 0.1 0.05 0.025 0.01 0.005 0.001 0 . 9 9 9 0.75 0.1 0.05 0.025 0.01 0.005 0.001 0 . 9 9 9 0.75 0.1 0.05 0.025 0.01 0.005 0.001 9 0.1487 0.3877 0.6250 0.6666 0.7012 0 . 7 3 9 3 0.7638 0.8101 10 0.1381 0.3638 0.5945 0.6360 0 . 6 7 0 8 0 . 7 0 9 4 0.7344 0.7824 0.1639 0.3954 0 . 6 1 9 8 0.6594 0.6924 0 . 7 2 9 0 0.7526 0.7977 20 0 . 0 8 0 9 0.2255 0 . 3 9 7 4 0.4323 0.4628 0 . 4 9 8 4 0.5226 0.5718 0.0978 0.2500 0 . 4 2 2 4 0.4567 0.4867 0.5214 0.5449 0.5925 0.1148 0.2732 0 . 4 4 5 2 0.4790 0.5083 0.5422 0.5651 0.6113 0.1963 0.3712 0.5361 0.5668 0.5930 0 . 6 2 3 0 0.6430 0.6830 30 0.0573 0.1635 0 . 2 9 7 6 0.3262 0.3516 0.3817 0.4025 0.4458 0.0698 0.1830 0.3194 0.3479 0.3732 0 . 4 0 3 1 0.4236 0.4663 0.0826 0.2017 0 . 3 3 9 7 0.3682 0.3933 0.4228 0.4431 0.4851 0.1462 0.2846 0 . 4 2 4 5 0.4519 0.4758 0.5035 0.5224 0.5609 40 0.0444 0.1282 0.2377 0.2616 0.2831 0 . 3 0 8 7 0.3266 0.3644 0.0543 0.1443 0.2566 0.2807 0.3022 0.3279 0.3458 0.3833 0.0645 0.1599 0 . 2 7 4 4 0.2986 0.3202 0 . 3 4 5 9 0.3637 0.4010 0.1166 0.2308 0.3511 0.3753 0.3965 0.4216 0.4387 0 . 4 7 4 3 50 0.0362 0.1055 0.1979 0.2183 0.2368 0.2591 0.2747 0.3078 0.0445 0.1191 0.2144 0.2351 0.2538 0 . 2 7 6 2 0.2919 0.3251 0.0530 0.1325 0.2301 0.2511 0.2699 0.2925 0.3082 0.3413 0 . 0 9 7 0 0.1941 0.2991 0.3206 0.3397 0.3622 0.3778 0.4103 60 0.0306 0 . 0 8 9 6 0.1694 0.1873 0.2035 0.2231 0.2369 0.2663 0.0376 0.1015 0.1841 0.2023 0.2187 0.2385 0.2525 0.2821 0.0449 0.1131 0.1981 0.2166 0.2332 0.2532 0.2672 0.2970 0.0831 0.1675 0.2605 0.2798 0.2970 0.3174 0.3315 0.3612 (n-p-l)/2 70 0.0265 0 . 0 7 7 9 0.1481 0.1640 0.1784 0.1959 0.2082 0.2347 0.0326 0.0883 0.1613 0.1775 0.1921 0 . 2 0 9 9 0.2224 0.2491 0.0390 0 . 0 9 8 6 0.1739 0.1904 0.2053 0.2232 0.2359 0.2627 0.0726 0.1473 0 . 2 3 0 7 0.2482 0.2638 0 . 2 8 2 4 0.2953 0.3226 80 90 120 0.0158 0.0471 0 . 0 9 0 9 0.1010 0.1103 0.1216 0.1296 0.1470 0.0196 0.0537 0 . 0 9 9 5 0.1099 0.1194 0.1310 0.1392 0.1570 0.0235 0.0602 0.1079 0.1186 0.1283 0.1401 0.1485 0.1665 0 . 0 4 4 6 0.0920 0 . 1 4 6 7 0.1585 0.1691 0.1818 0.1907 0.2098 150 0.0128 0.0381 0 . 0 7 3 8 0.0821 0 . 0 8 9 7 0 . 0 9 9 0 0.1056 0.1201 0.0158 0.0434 0 . 0 8 0 9 0.0895 0.0973 0.1069 0.1137 0.1284 0.0190 0 . 0 4 8 8 0 . 0 8 7 9 0.0967 0.1047 0.1145 0.1215 0.1365 0.0363 0.0750 0.1204 0.1302 0.1391 0.1498 0.1573 0.1733 200 0 . 0 0 9 6 0.0288 0 . 0 5 6 2 0.0626 0 . 0 6 8 4 0 . 0 7 5 6 0.0808 0.0920 250 500
0.0233 0.0209 0 . 0 6 8 9 0.0617 0.1316 0.1184 0.1458 0.1313 0.1588 0.1430 0.1745 0.1574 0.1857 0.1676 0.2097 0.1895 0.0288 0.0782 0.1435 0.1581 0.1713 0.1873 0.1987 0.2229 0.0345 0.0875 0.1549 0.1698 0.1833 0.1996 0.2111 0.2356 0.0645 0.1315 0.2071 0.2230 0.2372 0.2543 0.2662 0.2913 0.0258 0.0702 0.1292 0.1425 0.1545 0.1692 0.1795 0.2018 0.0309 0.0786 0.1397 0.1533 0.1656 0.1805 0.1910 0.2134 0.0581 0.1187 0.1878 0.2024 0.2155 0.2313 0.2422 0.2655
0 . 0 0 7 7 0.0039 0.0232 0.0118 0 . 0 4 5 4 0.0231 0.0505 0.0258 0.0553 0.0282 0.0612 0.0313 0.0654 0.0335 0.0745 0.0382
9.0
0.0120 0.0096 0 . 0 0 4 9 0.0330 0.0266 0.0135 0 . 0 6 1 7 0 . 0 4 9 9 0.0255 0.0683 0.0553 0.0282 0 . 0 7 4 4 0.0602 0.0308 0 . 0 8 1 8 0 . 0 6 6 2 0 . 0 3 3 9 0.0871 0.0705 0.0362 0.0985 0.0799 0.0411 0.0144 0.0371 0 . 0 6 7 1 0.0739 0.0802 0 . 0 8 7 8 0.0932 0.1049 0.0276 0.0574 0 . 0 9 2 7 0.1004 0.1073 0.1157 0.1217 0.1344 0.0116 0.0059 0.0299 0.0152 0 . 0 5 4 3 0 . 0 2 7 8 0.0599 0.0307 0 . 0 6 4 9 0.0333 0.0712 0 . 0 3 6 5 0.0756 0 . 0 3 8 9 0.0852 0.0439 0.0223 0.0114 0.0465 0.0239 0 . 0 7 5 4 0 . 0 3 8 9 0.0817 0.0423 0 . 0 8 7 4 0.0453 0 . 0 9 4 3 0 . 0 4 9 0 0.0992 0.0516 0.1097 0.0572
10
15
*Entries in the table are the beta values B(aip/2](n-p-2)/2) fr an area (Alpha probability) in the upper tail of the beta distribution for the given parameter values of p/2 and (n-p-l)/2.
Table A.5 (continued): Percentage points of the beta distribution.*

p/2 20 Alpha 0 . 9 9 9 0.75 0.1 0.05 0.025 0.01 0.005 0.001 0.999 0.75 0.1 0.05 0.025 0.01 0.005 0.001 30 0.2057 0.3524 0.4893 0.5152 0.5376 0.5634 0.5809 0.6162 0.2595 0 . 4 0 8 9 0.5407 0.5651 0.5860 0.6100 0.6262 0.6587 40 0.1670 0.2913 0.4122 0.4358 0.4564 0 . 4 8 0 4 0.4967 0.5303 0.2139 0.3432 0.4625 0.4852 0.5049 0.5278 0.5433 0.5750 50 0 . 1 4 0 6 0.2482 0.3560 0 . 3 7 7 4 0.3962 0.4182 0.4334 0 . 4 6 4 7 0.1821 0.2958 0.4038 0 . 4 2 4 8 0.4432 0 . 4 6 4 6 0.4792 0.5093 60 0.1215 0.2163 0.3131 0.3326 0.3498 0.3701 0.3841 0.4132 0.1586 0.2599 0.3583 0.3777 0.3947 0.4146 0.4283 0.4567 70 0.1069 0.1916 0.2795 0.2973 0.3131 0.3319 0.3448 0.3719 0.1404 0.2318 0.3219 0.3399 0.3557 0.3743 0.3871 0.4137 (n-p-l)/2 80 90 0.0955 0.1720 0.2523 0 . 2 6 8 8 0.2834 0 . 3 0 0 7 0.3127 0.3380 0.1260 0 . 2 0 9 2 0.2922 0 . 3 0 8 9 0.3236 0.3410 0.3530 0.3781 0 . 0 8 6 3 0.1560 0.2300 0.2452 0.2587 0 . 2 7 4 9 0.2861 0 . 3 0 9 7 0.1143 0 . 1 9 0 6 0.2676 0.2831 0.2969 0.3131 0.3244 0.3480 120 0 . 0 6 6 9 0.1221 0.1816 0.1941 0.2052 0.2185 0.2278 0 . 2 4 7 4 0.0894 0.1505 0.2134 0.2263 0.2378 0.2514 0.2608 0.2808 150 0 . 0 5 4 7 0.1003 0.1501 0.1606 0.1700 0.1813 0.1891 0.2059 0.0734 0.1243 0.1775 0.1885 0.1982 0 . 2 0 9 9 0.2180 0.2352 200 0.0419 0.0772 0.1164 0.1247 0.1322 0.1411 0.1474 0.1609 0.0566 0 . 0 9 6 4 0.1386 0.1474 0.1552 0.1646 0.1712 0.1851 250 0 . 0 3 3 9 0.0628 0.0950 0.1019 0.1081 0.1156 0.1208 0.1320 0 . 0 4 6 0 0 . 0 7 8 7 0.1137 0.1210 0.1275 0.1354 0.1409 0.1526 500 0 . 0 1 7 4 0.0325 0 . 0 4 9 6 0.0533 0.0566 0 . 0 6 0 6 0.0634 0 . 0 6 9 5 0.0238 0.0411 0.0598 0 . 0 6 3 8 0.0674 0.0717 0.0747 0.0812
25
*Entries in the table arc the beta values -B(Q.p/2,(,-p-2)/2) fr an area (Alpha probability) in the upper tail of the beta distribution for the given parameter values of p/2 and (n-p-l)/2.
Bibliography
Agnew, J.L., and Knapp, R.C. (1995). Linear Algebra with Applications, Brooks/Cole, Pacific Grove, CA. Alt, F.B. (1982). "Multivariate Quality Control: State of the Art," Quality Congress Transactions, American Society for Quality, Milwaukee, WI, pp. 886-893. Alt, F.B., Deutch, S.J., and Walker, J.W. (1977). "Control Charts for Multivariate, Correlated Observations," Quality Congress Transactions, American Society for Quality, Milwaukee, WI, pp. 360-369. Anderson, D.R., Sweeney, D.J., and Williams, T.A. (1994). Introduction to Statistics Concepts and Applications, West Publishing Company, New York. Anderson, T.W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed., Wiley, New York. Barnett, V., and Lewis, T. (1994). Outliers in Statistical Data, 3rd ed., Wiley, New York. Belsley, D.A. (1991). Conditioning Diagnostics: Collinearity and Weak Data in Regression, Wiley, New York. Belsley, D.A., Kuh, E., and Welsch, R.E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, Wiley, New York. Box, G.E.P., and Jenkins, G.M. (1976). Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, CA. Chatterjee, S., and Price, B. (1999). Regression Analysis by Example, 3rd ed., Wiley, New York. Chou, Y.M., Mason, R.L., and Young, J.C. (1999) "Power Comparisons for a Hotelling's T2 Statistic," Commun. Statist. Simulation Comput., 28, pp. 10311050. Chou, Y.M., Mason, R.L., and Young, J.C. (2001). "The Control Chart For Individual Observations from a Multivariate Non-Normal Distribution." Comm. Statist., 30, pp. 1937-1949.
253
254
Bibliography
Chou, Y.M., Polansky, A.M., and Mason, R.L. (1998). "Transforming Non-Normal Data to Normality in Statistical Process Control," J. Quality Technology, 30, pp. 133-141. Conover, W. J. (2000). Practical Nonparametric Statistics, 3rd ed., Wiley, New York. David, H.A. (1970). Order Statistics, Wiley, New York. Doganaksoy, N., Faltin, F.W., and Tucker, W.T. (1991). "Identification of Outof-Control Quality Characteristics in a Multivariate Manufacturing Environment," Comm. Statist. Theory Methods, 20, pp. 2775-2790. Dudewicz, E.J., and Mishra, S.N. (1988). Modern Mathematical Statistics, Wiley, New York. Duncan, A.J. (1986). Quality Control and Industrial Statistics, 5th ed., Richard D. Irwin, Homewood, IL. Fuchs, C., and Kenett, R.S. (1998). Multivariate Quality Control, Dekker, New York. Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate Observations, Wiley, New York. Hawkins, D.M. (1980). Identification of Outliers, Chapman and Hall, New York. Hawkins, D.M. (1981). "A New Test for Multivariate Normality and Homoscedasticity," Technometrics, 23, pp. 105-110. Hawkins, D. M. (1991). "Multivariate Quality Control Based on RegressionAdjusted Variables," Technometrics, 33, pp. 61-75. Hawkins, D.M. (1993). "Regression Adjustment for Variables in Multivariate Quality Control," J. Quality Technology, 25, pp. 170-182. Holmes, D.S., and Mergen, A.E. (1993). "Improving the Performance of the T2 Control Chart," Quality Engrg., 5, pp. 619-625. Hotelling, H. (1931). "The Generalization of Student's Ratio," Ann. Math. Statist., 2, pp. 360-378. Hotelling, H. (1947). "Multivariate Quality Control," in Techniques of Statistical Analysis, edited by C. Eisenhart, M.W. Hastay, and W.A. Wallis, McGraw-Hill, New York, pp. 111-184. Jackson, J.E. (1991). A User's Guide to Principal Components, Wiley, New York. Johnson, M.E. (1987). Multivariate Statistical Simulation, Wiley, New York. Johnson, R.A., and Wichern, D.W. (1998). Applied Multivariate Statistical Analysis, 4th ed., Prentice-Hall, Englewood Cliffs, NJ.
Bibliography
255
Kourti, T., and MacGregor, J.F. (1996). "Multivariate SPG Methods for Process and Product Monitoring," J. Quality Technology, 28, pp. 409-428. Kshirsagar, A.M., and Young, J.C. (1971). "Correlation Between Two Hotelling's T2," Technical Report, Department of Statistics, Southern Methodist University, Dallas, TX. Langley, M.P., Young, J.C., Tracy, N.D., and Mason, R.L. (1995). "A Computer Program for Monitoring Multivariate Process Control," in Proceedings of the Section on Quality and Productivity, American Statistical Association, Alexandria, VA, pp. 122-123. Little, R.J.A., and Rubin, D.B. (1987). Statistical Analysis with Missing Data, Wiley, New York. Looney, S.W. (1995). "How to Use Tests for Univariate Normality to Assess Multivariate Normality," Amer. Statist., 49, pp. 64-70. Mahalanobis, P.C. (1930). "On Tests and Measures of Group Divergence." J. Proc. Asiatic Soc. Bengal, 26, pp. 541-588. Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979). Multivariate Analysis, Academic Press. New York. Mason, R.L., Champ, C.W., Tracy, N.D., Wierda, S.J., and Young, J.C. (1997). "Assessment of Multivariate Process Control Techniques," J. Quality Technology, 29, pp. 140-143. Mason, R.L., Chou, Y.M., and Young, J. C. (2001). "Applying Hotelling's T2 Statistic to Batch Processes," J. Quality Technology, 33, pp. 466-479. Mason, R.L., Tracy, N.D., and Young, J.C. (1995). "Decomposition of T2 for Multivariate Control Chart Interpretation," J. Quality Technology, 27, pp. 99-108. Mason, R.L., Tracy. N.D., and Young, J.C. (1996). "Monitoring a Multivariate Step Process," J. Quality Technology, 28, pp. 39-50. Mason, R.L., Tracy, N.D., and Young, J.C. (1997). "A Practical Approach for Interpreting Multivariate T2 Control Chart Signals," J. Quality Technology. 29, pp. 396-406. Mason, R.L., and Young, J.C. (1997), "A Control Procedure for Autocorrelated Multivariate Process," in Proceedings of the Section on Quality and Productivity, American Statistical Association, Alexandria, VA, pp. 143-145. Mason, R.L., and Young, J.C. (1999). "Improving the Sensitivity of the T2 Statistic in Multivariate Process Control," J. Quality Technology, 31, pp. 155-165. Mason, R.L., and Young, J.C. (2000). "Autocorrelation in Multivariate Processes," in Statistical Monitoring and Optimization for Process Control, edited by S. Park and G. Vining, Marcel Dekker, New York, pp. 223-240.
256
Bibliography
Montgomery, D.C., and Mastrangelo, C.M. (1991). "Some Statistical Process Control Methods for Autocorrelated Data (with Discussion)," J. Quality Technology, 23, pp. 179-204. Montgomery, D.C. (2001). Introduction to Statistical Quality Control, 5th ed., Wiley, New York. Morrison, D.F. (1990). Multivariate Statistical Methods, 3rd ed., McGraw-Hill, New York. Myers, R.H. (1990). Classical and Modern Regression with Applications, 2nd ed., Duxbury Press, Boston, MA. Myers, R.H., and Milton, J. (1991). A First Course in the Theory of Linear Statistical Models, PWS-Kent, Boston, MA. Polansky, A.M., and Baker, E.R. (2000). "Multistage Plug-In Bandwidth Selection for Kernel Distribution Function Estimates," J. Statist. Comput. Simulation, 65, pp. 63-80. Rencher, A.C. (1993). "The Contribution of Individual Variables to Hotelling's T2, Wilks' A, and #2," Biometrics, 49, pp. 479-489. Runger, G.C., Alt, F.B., and Montgomery, D.C. (1996). "Contributors to a Multivariate Statistical Process Control Chart Signal," Comra. Statist. Theory Methods, 25, pp. 2203-2213. Ryan, T.P. (2000). Statistical Methods for Quality Improvement, 2nd ed., Wiley, New York. Scholz, F.W., and Tosch, T.J. (1994). "Small Sample Uni- and Multivariate Control Charts for Means," in Proceedings of the American Statistical Association, Quality and Productivity Section, American Statistical Association, Alexandria, VA, pp. 17-22. Seber, G.A.F. (1984). Multivariate Observations, Wiley, New York. Sharma, S. (1995). Applied Multivariate Techniques, Wiley, New York. Sullivan, J.H., and Woodall, W.H. (1996). "A Comparison of Multivariate Control Charts for Individual Observations," J. Quality Technology, 28, pp. 398-408. Sullivan, J.H., and Woodall, W.H. (2000). "Change-Point Detection of Mean Vector or Covariance Matrix Shifts Using Multivariate Individual Observations," HE Trans., 32, pp. 537-549. Timm, N.H. (1996). "Multivariate Quality Control Using Finite Intersection Tests," J. Quality Technology, 28, pp. 233-243. Tracy, N.D., Young, J.C., and Mason, R.L. (1992) "Multivariate Control Charts for Individual Observations," J. Quality Technology, 24, pp. 88-95.
Bibliography
257
Velilla, S. (1993). "A Note on the Multivariate Box-Cox Transformation to Normality," Statist. Probab. Lett. 17, pp. 259-263. Wade, M.R., and Woodall, W.H. (1993). "A Review and Analysis of Cause-Selecting Control Charts," J. Quality Technology, 25, pp. 161-169. Wierda, S.J. (1994). Multivariate Statistical Process Control, Groningen Theses in Economics, Management and Organization, Wolters-Noordhoff, Groningen, the Netherlands. Williams, B. (1978). A Sampler on Sampling, Wiley, New York. Wishart, J. (1928). "The Generalized Product Moment Distribution in Samples from a Normal Multivariate Population," Biometrika, 20A, pp. 32-52.
Index
abrupt process change, 174 acceptance region, 132 advanced process control (APC), 2, 3 alternative covariance estimators, 26 analysis-of-variance table, 203 Anderson-Darling test, 34 APC systems, see advanced process control AR(1) model, see first-order autoregressive model AR(2) model, see second-order autoregressive model ARL, see average run length autocorrelated observations, 55, 193, 194 autocorrelated processes, 78 autocorrelation, 69, 193, 196, 198 detecting, 68, 71-73 function, 72 occurrence, 69 pattern, 194 sample estimator, 72, 78 useful indication of, 76 autoregressive function, 72, 202, 206 autoregressive model, 193, 212 autoregressive residuals, 193 average run length (ARL) for a control procedure, 111 probability of detecting given shifts in, 114 backward-elimination scheme, 156 batch processes, 10, 27, 213 category 1, 213, 221 control regions for, 215 outlier removal for, 219
259
category 2, 216, 226 control region for, 217 outlier removal for, 226 estimation in, 217 monitoring, 213, 234 Phase II formulas for, 231 Phase II operation with, 230 T2 control procedure for, 213 types of, 213 batch production, 230 beta distribution, 24, 34, 50, 85, 247 beta quantiles, 41 between-batch variation, 218, 226 bivariate control region, 117, 120 geometrical representation, 145 bivariate normal contours, 18, 19, 35 distribution, 140, 141 probability function, 17, 19 cause-and-effect relationship, 69, 196 cause-selecting (CS) chart, 132 center line (CL), 111 central limit theorem, 107 Chebyshev procedure, 96 application of, 48 in a T2 control chart, 48 procedure based on, 92 theorem, 51, 92 chi-square distribution, 21, 23, 85, 113, 241 probability function, 23 CI, see confidence interval collinearity, 55, 80, 174 detecting, 65 occurrence, 65 outliers, 65
260
Index decay processes, 98 stage, 70, 211, 212 uniform, 69, 199, 201, 211, 212 determinant, 30 dimension-reduction technique, 80 discrete variables, 50 distance measure, see statistical distance distribution-free methods, 48, 81 eigenvalues, 78, 79 eigenvectors, 78, 79 electronic data collectors, 9, 61 elliptical contour, 18-20 control region, 115, 117 distance, 8 equal-incremental-rate formulation, 183 error residuals, 172 Euclidean distance, 6, 13-15, 21, 121, 137 experimental design, 11 expert knowledge, 180 exploring process data, 192 exponential autoregressive relationship, 202 F distribution, 22, 24, 50, 85, 242 false alarm rate, 98 first-order autogressive model AR(1), 72, 199, 203, 206 forward-iterative scheme, 155 functional form of variables, 64, 173 functional relationships, 173, 191 global variable specifications, 183 goodness-of-fit test, 36, 51 gradual process shifts improving sensitivity, 188 graphical output, 192 Historical data set (HDS), 22, 78, 176 analyzing, 174 atypical observations, 81
sampling restrictions, 65 severe, 67, 78, 182 theoretical relationships, 65 condition index, 67, 79 condition number, 67 conditional distribution, 148, 159, 172 conditional mean, 126, 141, 201 conditional probability function, 140 conditional T2, 126, 129, 149 conditional terms, 157 alternative forms of, 172 signaling of, 169 conditional variance, 126, 129, 141, 148, 162, 173 continuous process, 213 contours, 18 correlogram, 72, 206, 212 countercorrelation, 129, 159, 161, 169 covariance estimator alternative, 27 based on successive difference, 26 by partitioning the data, 27 common, 27 for batches, 218 covariance matrix, 20 known, 91 nonsingular, 55 sample, 21, 29, 65 singular, 55, 65, 78 test of equality, 229 cyclic trend, 197
data autocorrelated, 70, 211 collection, 61 example, 92, 135 exploration techniques, 183 in-control set, 54 incomplete, 62 missing, 61, 62 preliminary set, 78 purging, 85 time-adjusted, 208 transforming, 174
Index construction, 54-56 outlier in, 85 hyperellipsoid, 118, 158 hypothesis test, 83, 229 industrial process, 55 input/output (I/O) curve, 180 JMP, 11 Johnson system, 47 kernel smoothing technique, 48, 51, 92, 94, 96 for approximation to UCL, 48 Kolmogorov-Smirnov test, 33 kurtosis, 38, 43 Mardia's statistic, 39 sample estimate, 39 lag relationships, 60 lag values, 72, 209 LCL, see local control limit life cycle, 69 log sheets, 61 lower control limit (LCL), 4 lurking variable, 196, 199, 226 Mahalanobis's distance, 8, 13 matrix, 28, 31 collinearity, 65 correlation, 202 determinant of, 79 eigenvalues, 65 eigenvectors, 65 inverse, 29, 66 near-singular, 66, 67, 79 nonsingular, 30, 65 positive definite, 31, 32, 79, 144 square, 30 symmetric, 30, 79 transpose, 29 mean of a sample monitoring of, 26, 86, 119, 232 mean vector, 20, 29 estimate, 162 known, 91
261
methods of detection, 212 missing data, 62, 154 model creation using data exploration, 183 using expert knowledge, 180 model specification, 172 correct, 180 multivariate control procedure, 5, 6, 14 characteristics of, 9 multivariate normal (MVN) assessment of, 34 assumption, 33 distribution, 23, 31, 33, 85 probability function, 20 multivariate observation, 14 multivariate procedures, 9 multivariate process, 5 multivariate transformation, 47 MVN, see multivariate normal MYT decomposition, 125, 132, 142 components, 125, 127, 153, 160 computational scheme, 163 computing the terms, 149 distribution of the components, 131 general procedure, 147 largeness of the components, 131 locating signaling variables, 155 properties of, 152 shortcut approach, 149 total, 133 noncentral chi-square, 114 noncentrality parameter, 114 nonnormal distribution, 33, 34 multivariate, 37, 40 normal variate, 21 distribution, 239 transformation, 47 on-line experimentation, 10, 172 orthogonal decomposition, 120, 125 orthogonal transformation, 123 out-of-control point, 9, 119 out-of-control variables, 159
262
Index SD, see statistical distance seasonal effect, 197 second-order autoregressive model AR(2), 199 sensitivity, 10, 174 Shapiro-Wilk test, 33 Shewhart chart, 4, 10, 82, 85, 99, 108, 158, 188 control box, 130 control limits, 133 multivariate, 10 signal, 159 signal detection, 8, 118, 119 signal interpretation, 10, 119, 134, 157 singularities, 115 size of control region, 85, 99 small process shifts, 188 detecting, 190 SPC, see statistical process control stage decay, 70 standardized residual, 130, 162, 185, 188, 191 statistical distance (SD), 7, 11, 13-15,
22, 120-122, 127
outlier, 47, 54, 81 detection, 82, 85 known parameter case, 91 potential, 45 problem in batches, 220 purging, 86, 91, 95 parallelogram, 132, 133 parameters, 22 partitions, 153 PCA, see principal component analysis permuting components, 152 Phase I operation, 53, 54, 81, 83, 97, 219 Phase II operation, 53, 54, 98, 100, 105, 171, 232 planning stage, 55 pooled estimate of 5*, 26 potential outliers, 45 power of test, 83, 111 prediction equation, 162 preliminary data set, 57, 78 principal component analysis (PCA), 50, 65, 78, 79 principal components, 65, 115-118, 124, 143, 145 process map, 55 production units, 5 purging process, 54, 82, 86 Q-Q plot, 41, 43, 51, 88, 97, 136 QualStat, 11, 54 quantile technique, 51, 92, 94, 96 R chart, 119 regression analysis, 120, 173 model, 189 fitting, 191 perspective, 129, 162 residuals, 173, 193 sample size, 49 SAS, 11 scatter plot, 64
statistical process control (SPC), 3 steady-state control procedures, 78 steady-state operating conditions, 110 steady-state process, 98 subgroup mean, 26, 86, 106, 230, 232 systematic pattern over time, 198 t distribution, 22, 188, 240 t statistic, 13, 20, 21 T2 chart, 10, 101 autocorrelation patterns in, 194 determining ARL for, 118 including the cyclical process, 197 interpretive features of, 108 known parameters with, 105 plotting in principal component space, 118 subgroup means with, 106 trends in, 111 U pattern in, 109 unknown parameters with, 100
Index T2 component conditional, 174 interpretation of, 128 interpretation of a signal on, 157 invariance property, 153 T2 decomposition, 125, 130, 148, 163, 171, 200 conditional terms, 126, 191 distribution of the terms, 132 unconditional terms, 126 T2 control region, 85, 120, 145 T2 statistic, 8, 10, 21, 24, 91, 194,
226, 230
263
abrupt process changes with, 191 adjusted for time effects, 200 alternate form of, 143 assessing the sampling distribution of, 34, 41 assumptions, 33 computed for any subvector, 153 detecting small process shifts, 172 distribution for Phase I, 219 distribution properties of, 22 estimating the UCL of, 48 for individual observation vector,
23
time adjustments, 200 time dependency, 98, 193 time effect, 199 time-adjusted estimates of covariance matrix, 207 of mean vector, 207 time-adjusted statistic, 10 time-adjusted variables, 210 time-sequence plots, 71, 75, 136, 194 time-sequence variable, 71, 202, 208 tolerance regions, 133 transcribing errors, 61 transformations, 47, 51 Type I error, 85, 99 Type II error, 99 UCL, see upper control limit [U-shaped pattern, 109, 196, 208 unconditional components signaling of, 169 unconditional terms, 126, 127, 139, 150, 157 critical values (CV) for, 157 signaling on, 139 univariate control chart, 4 upper control limit (UCL), 4, 100 approximate, 48 estimation, 51 upset conditions, 9, 109 variables functional forms of, 64, 173 variation between-batch, 218 shifts, 119 within-batch, 218 total, 218 vectors, 28 Wishart distribution, 22 matrix, 31 probability function, 32 variate, 21 within-group estimator, 27
for subgroup mean, 107, 230 form and distribution of, 26, 226 gradual proces shifts with, 191 kernel distribution of, 94 matrix form of, 28 monitoring of. 169 nonnormal distributions and, 37 orthogonal components, 143 orthogonal decomposition of, 120, 125 principal component form of, 143 related assumptions, 33 sampling distribution of, 33, 37 sensitivity of, 147, 169 T2 value, 105 correlation between any two, 25 of the time-adjusted observation, 212

Multivariate Statistical Process Control With Industrial Applications - Mason, Young - Asa-Siam, 2002 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multivariate Statistical Process Control With Industrial Applications - Mason, Young - Asa-Siam, 2002 PDF

Uploaded by

Copyright:

Available Formats

Process Control with

^&o ^Saimen an& Qj^am

This page intentionally left blank

Appendix. Distribution Tables Bibliography Index

This page intentionally left blank

Introduction to the T2 Statistic

The Saga of Old Blue

Chapter 1. Introduction to the T2 Statistic

Chapter 1. Introduction to the T2 Statistic

Group Number Figure 1.1: Shewhart chart of a process variable.

Univariate Control Procedures

1.3. Multivariate Control Procedures

Multivariate Control Procedures

Chapter 1. Introduction to the T2 Statistic

1.3. Multivariate Control Procedures

Chapter 1. Introduction to the T2 Statistic

Figure 1.4: Scatter plot illustrating statistical distance. is given by

1.4. Characteristics of a Multivariate Control Procedure

Characteristics of a Multivariate Control Procedure

Chapter 1. Introduction to the T2 Statistic

Charting with the T2 Statistic

This page intentionally left blank

Basic Concepts about the 72 Statistic

Chapter 2. Basic Concepts about the T2 Statistic

and all points satisfying the relationship

2.2. Statistical Distance

Figure 2.1: Region of same straight-line distance.

Chapter 2. Basic Concepts about the T2 Statistic

Figure 2.2: Region of same statistical distance.

2.3. T2 and Multivariate Normality

Figure 2.4: Elliptical region encompassing data points.

72 and Multivariate Normality

Chapter 2. Basic Concepts about the T2 Statistic

Figure 2.5: Correlation and the ellipse.

where X' = (1,2:2), // = (//i, ^2), and E

is the inverse of the matrix E. Note

2.3. T2 and Multivariate Normality

Figure 2.6: A bivariate normal probability function (a\ = 1, cr2 = 1, p = 0.8).

Chapter 2. Basic Concepts about the T2 Statistic

Student t versus Hotelling's T"2

is the sample mean and

corresponding sample standard deviation. The square of the t statistic is given by

2.4. Student t versus He-telling's T2

The sample covariance matrix S also can be expressed as

Chapter 2. Basic Concepts about the T2 Statistic

Distributional Properties of the T2

2.5. Distributional Properties of the T2

Chapter 2. Basic Concepts about the T2 Statistic

Figure 2.9: F distribution. given as

2.5. Distributional Properties of the T2

Chapter 2. Basic Concepts about the T2 Statistic

Alternative Covariance Estimators

2.6. Alternative Covariance Estimators where

and the partition-based covariance estimator is given by

Chapter 2. Basic Concepts about the T2 Statistic

Appendix: Matrix Algebra Review

2.8. Appendix: Matrix Algebra Review

Vector and Matrix Notation

Chapter 2. Basic Concepts about the T2 Statistic

2.8. Appendix: Matrix Algebra Review

As an example, the sample covariance matrix S is a symmetric matrix, since,