You are on page 1of 6

10th IFAC International Symposium on Dynamics and Control of

Process Systems
The International Federation of Automatic Control
December 18-20, 2013. Mumbai, India

Real-time Performance Assessment of


Inferential Sensors ⋆
Shima Khatibisepehr ∗ Biao Huang ∗∗ Swanand Khare ∗∗∗
Ramesh Kadali ∗∗∗∗

Department of Chemical and Materials Engineering, University of
Alberta, Edmonton, Alberta T6G 2G6, Canada (e-mail:
shimak@ualberta.ca).
∗∗
Department of Chemical and Materials Engineering, University of
Alberta, Edmonton, Alberta T6G 2G6, Canada (e-mail:
biao.huang@ualberta.ca)
∗∗∗
Department of Chemical and Materials Engineering, University of
Alberta, Edmonton, Alberta T6G 2G6, Canada (e-mail:
khare@ualberta.ca)
∗∗∗∗
Suncor Energy Inc., Fort McMurray, Alberta, Canada (e-mail:
RKadali@Suncor.com)

Abstract: A data-driven Bayesian framework for real-time performance assessment of inferen-


tial sensors is proposed. The application of the proposed Bayesian solution does not depend on
the identification techniques employed for inferential model development. The effectiveness of
the proposed method is demonstrated through a simulation case study.

Keywords: Soft Sensor, Bayesian Inference, Reliability, Real-time Performance Assessment

1. INTRODUCTION operating space in which the model has been identified


would contribute to the model uncertainty.
Real-time analysis of process quality variables constitutes Therefore, the conditional dependence of the reliability
an essential prerequisite for advanced monitoring and con- of inferential sensor predictions on characteristics of the
trol of industrial processes. However, on-line measure- input space and reliability of the empirical process model
ment of such variables may involve difficulties due to should be thoroughly assessed in order to develop an on-
the inadequacy of measurement techniques or low relia- line performance measure. From the application point of
bility of measuring devices. Therefore, there has been a view, a desired performance measure has two essential
growing interest in the development of inferential sensors characteristics. First, it should effectively estimate any sig-
to provide real-time estimates of quality variables based nificant deterioration in the prediction performance when
on their correlation with other process measurements. In process operates outside the valid inferential region. Sec-
many industrial applications, complete and comprehensive ond, implementation and interpretation of a performance
knowledge of involved processes is often not available. In metric should be simple enough for practitioners to use.
such cases, inferential models are developed on the basis Therefore, designing a proper performance index is not
of first principles analysis as well as process data analysis. straightforward. Although inferential sensors have been
In order to maintain the reliability of an inferential sensor, widely used in process industries, there are only a few
it is important to assess the accuracy of its on-line pre- publications providing a methodology to assess their on-
dictions. Model uncertainty (plausible alternative model line performance. In Nomikos and MacGregor (1995) and
structures/parameters) is one of the major sources of pre- Vries and Braak (1995), approximate confidence intervals
diction uncertainty (McKay et al., 1999). In the context of have been developed to assess the accuracy of PLS pre-
process industries, deviations from design operating con- dictions based on the traditional statistical properties.
ditions are the main factors resulting in the model uncer- The principal limitation of these approaches is that the
tainty and thus deterioration in performance of inferential internal empty regions within the identification data (i.e.
sensors. In most of the classical identification methods, the internal regions that do not contain any identification
the objective is to minimize prediction errors pertaining data points) cannot be diagnosed (Soto et al., 2011).
to the identification data-set. Therefore, the generaliza- Kaneko et al. (2010) proposed a distance-based method
tion performance of the resulting inferential sensors are to quantify the relationship between applicability domains
not guaranteed. In such cases, significant changes in the and accuracy of inferential sensor predictions. The au-
⋆ Financial support from Syncrude Canada Ltd., Suncor Energy
thors discussed that a larger Euclidean distance of an
Inc., Alberta Innovates - Technology Futures (AITF), and the
observation to the center of identification data and to
Natural Sciences and Engineering Research Council of Canada in the its nearest neighbors would indicate a lower prediction
form of Industrial Research Chair in Control of Oil Sands Processes accuracy. This method suffers from two major drawbacks.
is gratefully acknowledged.

978-3-902823-59-5/2013 © IFAC 277 10.3182/20131218-3-IN-2045.00093


IFAC DYCOPS 2013
December 18-20, 2013. Mumbai, India

First, variability of the input variables is not taken into 2. PROBLEM STATEMENT
account when determining the Euclidean distance from
the center. Second, the different effects of input variables Consider a class of inferential models given by
on the prediction uncertainty are ignored by correlating yˆt = g(ut ; Θ) (1)
the prediction accuracy with a general distance measure.
Yang et al. (2009) applied an ensemble method to evalu- where yˆt denotes the predicted value of query variable
ate the uncertainty of inferential sensor predictions. The inferred from the real-time measurements of influential
basic idea is to repeatedly generate bootstrap samples of process variables, ut = {uk,t }K
k=1 .
the identification data-set to re-estimate inferential model Evaluating the performance of an inferential sensor often
parameters. With this multitude of models, the model amounts to analyzing the characteristics of prediction
variation and the average model bias can be estimated. errors. Prediction error, also known as residual, is defined
Depending on the identification procedure used, however, as the difference between the actual and predicted values
this method could be computationally intensive and would of query variable. That is, et = yt − yˆt , where yt denotes
not be suited for on-line applications. Kaneko and Funatsu the actual value of the query variable.
(2011) proposed to develop a multi-model inferential sen-
sor based on the time difference of input variables in order The absolute value of the prediction errors can be used
to combine the information included in a set of local sub- to identify the events that would affect the reliability
models into a global predictive model. Furthermore, the of the inferential model. Suppose that the performance
accuracy of global predictions has been estimated using of the inferential sensor at each time instant, rt , can
empirical models describing the relationship between stan- take Re reliability statuses, i.e. rt ∈ {r1 , ..., rRe }. For
dard deviation of local predictions and standard deviation instance, different degrees of reliability can be assigned
of prediction errors. The major problem of this method is to the inferential sensor predictions as follows:
{
that small variation in local predictions does not neces- Reliable 0 < |et | ≤ 2σe
sarily imply a small prediction error. The proposed metric r = Moderately reliable 2σe < |et | ≤ 3σe
j
(2)
only reflects the degree of similarities between the predic- Unreliable Otherwise
tion performance of different models and does not contain where the thresholds are considered as design parameters
any information about the reliability of each individual reflecting the tolerable amount of prediction error, and
model. need to be adjusted based on the requirements of each
To address the aforementioned issues, this paper provides a application.
data-driven Bayesian framework for real-time performance If yt is observed, calculation of the performance index
assessment of inferential sensors. Such Bayesian frame- is straightforward. During on-line implementation of an
works utilizing discrete probability distributions have inferential sensor, however, such real-time measurements
proven to be useful for a variety of fault diagnosis problems are often not available frequently and regularly. Therefore,
such as diesel engine fault diagnosis (Pernestål, 2007) and the main challenge is to assess the reliability of the
control loop performance diagnosis (Qi et al., 2010). The inferential sensor predictions in the absence of actual
major contribution of the present work is to formulate values. Mathematically, the objective is ( to evaluate ) the
and solve the problem of inferential sensor performance
assessment under a Bayesian framework utilizing discrete conditional probability mass function f et |ut , ŷt ; σe .
probability distributions. The main focus is to character-
ize the effect of the operating space on the prediction 3. REAL-TIME PERFORMANCE ASSESSMENT OF
accuracy in the absence of target measurements. The INFERENTIAL SENSORS
proposed method has the following attractive features:
(1) A priori knowledge of process operation and underly- Given the training data-set D = {(ut , yt )}N t=1 , an infer-
ing mechanisms can be easily incorporated in a Bayesian ential sensor provides a real-time prediction, yˆt , on the
scheme so as to identify the criteria that might affect on- basis of real-time measurements of k input variables, ut =
line performance of the designed inferential sensor. (2) {uk,t }K
k=1 . It is noteworthy that the identification data-
Since probability density functions would reflect the actual set contains both input and output measurements that
data distribution, empty regions within the identification are typically available from plant tests and/or historical
data-set can be identified. (3) Correlations between input plant operations at lower sampling frequency. Therefore,
variables are taken into account. (4) Contribution of each the model prediction errors within the identification data-
input variable in prediction uncertainty is studied. (5) Its set, {et }N
t=1 , can be directly calculated from {(ut , yt )}t=1 .
N

application does not depend on the identification tech- A set of indicator variables, {Qt }Nt=1 = {qt , . . . , qt }t=1 ∈
u1 uk N
niques employed for inferential model development. (6) Its R K×N
, is introduced to partition the operating space
real-time implementation is computationally efficient. into multiple modes. Suppose that each real-time input
The remainder of this paper is organized as follows. The measurement, uk,t , can take Ouk operating statuses. Prior
problem of real-time performance assessment of inferential knowledge of process operation (e.g. normal or unusual
sensors is explained in Section 2. In Section 3, the problem operating conditions) can be incorporated to properly
of reliability analysis of real-time predictions is rigorously partition the operating range of each process variable as
formulated under a Bayesian framework. In Section 4, well as the operating space of a set of process variables. In
the effectiveness of the proposed Bayesian approach is the absence of a priori knowledge, statistical analysis of
demonstrated through a simulation case srudy. Section 5 operational and laboratory data may guide the choice of
summarizes this paper with concluding remarks. partitions. For instance, if it can be assumed that the input
variables are Gaussian distributed random variables such

278
IFAC DYCOPS 2013
December 18-20, 2013. Mumbai, India

that uk ∼ N (µk , σk2 ), then different operating statuses The first term in the above integral is given by (7).
may be assigned to the input measurements as follows: Besides, Bayes’ rule can be applied to derive an explicit
{ expression for the second term. Therefore, the posterior
Normal 0 < |uk,t − µk | ≤ 2σk
uk
qt = High 2σk < |uk,t − µk | ≤ 3σk (3) probability distribution of the hyperparameters given the
Abnormal 3σk < |uk,t − µk | identification data D = {(Qt , et )}N
t=1 can be written as:

where the thresholds are considered as design parameters p(ϖjQ |rt = rj , D) = ξp(D|ϖjQ , rt = rj )p(ϖjQ |rt = rj ) (9)
chosen to provide adequate coverage of the operating where ξ is a normalizing constant.
space. Moreover, the generalization performance of the
inferential sensor may guide the assignment of operating The chain rule of probability theory is used to factorize
statuses. We would like to emphasize that our proposed the likelihood function in (9):
method does not require any assumption about the prob- ∏
Nj
ability density function (PDF) of input variables. p(D|ϖjQ , rt j
=r )= p(Qt |ϖjQ , rt = rj )
t=1
The on-line performance assessment of the inferential
sensor amounts to evaluating the posterior probability ∏S
( )νs|j
distribution of rt given the operating status of the current = ϖs|j (10)
measured inputs with reference to the historical data. The s=1
∑S
maximum a posteriori (MAP) estimate of reliability status where Nj = s=1 νs|j denotes the number of samples in
is thus obtained from the following expression: the identification data-set for which the reliability status of
rbt = argmax p(rt |Qt , D) (4) inferential sensor predictions was rj . Equation (10) holds
rt true only if it is reasonable to assume that the indicator
variables are time-wise statistically independent.
Applying Baye’s rule, the posterior probability of r given
reliability status of the current measured inputs and out- Furthermore, the following Dirichlet distribution is consid-
put can be written as ered as the hyperprior in (9) to assure generality:
( ∑S ) S
p(rt |Qt , D) = γp(Qt |rt , D)p(rt ) (5) Γ ∏( )α −1
Q s=1 αs|j
where γ is a normalizing constant. p(ϖj |rt = r ) = ∏S
j
ϖs|j s|j (11)
s=1 Γ(αs|j ) s=1
The random variable rt is a categorical variable and can where {αs|j }Ss=1 are the Dirichlet parameters specified
be modelled by ∑S
such that Aj = s=1 αs|j denotes the number of prior

Re
j samples for which the reliability status of inferential sensor
p(rt ) = p(rt = rj )[rt =r ]
predictions was rj . Also, Γ(x) = (x − 1)! for all positive
j=1
integers x. The fact that the Dirichlet distribution is the

Re
( e )[rt =rj ] conjugate prior to the multinomial distributions justifies
= ϖj (6) the choice of the Dirichlet hyperprior.
j=1
Combining (9), (10) and (11), the posterior probability of
where the operation [rt = rj ] evaluates to 1 if rt = rj and the hyperparameters then becomes (DeGroot, 1970),
evaluates to 0 otherwise.
Γ(Aj + Nj ) ∏
S
ν +αs|j −1
Each input indicator variable is a random categorical p(ϖ|rt = rj , D) = ∏S s|j
ϖs|j
variable. As a result, the vector of indicator variables s=1 Γ(αs|j + νs|j ) s=1
Qt is an assembly of K random categorical variables. (12)
Given the reliability status of the inferential sensor, Qt
can thus be modelled by a joint multinomial distribution Substituting (7) and (12) into (8), the posterior predictive
∏K distribution can be further expressed as
with S = k=1 Ouk points in its sample space (i.e.
Qt ∈ {Q1 , ..., QS }): ∫ ∏
S
( )[Qt =Qs ]+νs|j +αs|j −1 Q
p(Qt |rt = rj , D) = ϖs|j dϖj

S
s
p(Qt |ϖjQ , rt = rj , D) = p(Qt = Qs |rt = rj , D)[Qt =Q ] s=1
Γ(Aj + Nj )
s=1 × ∏S (13)
∏S
( )[Qt =Qs ] s=1 Γ(αs|j + νs|j )
= ϖs|j (7) Hence,
s=1
Γ(Aj + Nj )Γ(αd|j + νd|j + 1)
where ϖjQ = {ϖs|j }Ss=1 is a set of hyperparametrs charac- p(Qt = Qd |rt = rj , D) =
Γ(Aj + Nj + 1)
terizing the likelihood function in (5). ∏S
s̸=d Γ(αs|j + νs|j )
Since the hyperparameters are typically not known a × ∏S
priori, the likelihood function is evaluated by integrating s=1 Γ(αs|j + νs|j )
over the hyperparametrs’ space: αd|j + νd|j
∫ = (14)
Aj + Nj
p(Qt |rt = rj , D) = p(Qt |ϖjQ , rt = rj , D)
Finally, (6) and (14) can be combined to obtain an explicit
× p(ϖjQ |rt = rj , D)dϖjQ (8) expression for the posterior probability distribution of (5):

279
IFAC DYCOPS 2013
December 18-20, 2013. Mumbai, India

0.6
historical probability of the inferential model resulting in
Reliable Moderately Unreliable a prediction error greater than bj . The values of rj satisfy
0.5 Reliable
the following conditions:
r1 = 1 and rRe → 0 as bRe → ∞ (17)
Probability Density

0.4
where r1 and rRe corresponds to the highest and lowest
0.3 performance of the inferential sensor, respectively. To
illustrate, the following reliability statuses can be specified
0.2 with reference to the cumulative distribution function
shown in Fig. 2:
0.1 

 1 0 < e˜t ≤ b1
0
 1 − F (bj )
b1 b2 rj = b1 < e˜t ≤ bj (18)
Absolute Prediction Error 
 1 − F (b1 )

0 bRe−1 < e˜t
Fig. 1. PDF of absolute value of prediction errors
Finally, a reliability index (RI) can be assigned to each
1 real-time prediction such that,

RIt , E[rt ] = p(rt = rj |Qt = Qd , D)rj (19)
F(bj)
Cumulative Distribution

where RIt ∈ [0, 1].


F(b1)

3.1 Design Procedure

To summarize our discussion thus far, the procedure


followed to design a Bayesian performance assessment
framework is outlined below:
0
b1 bj 1. Include the prior knowledge of process operation to
Absolute Prediction Error
properly partition the operating range of each process
variable as well as the operating space of a set of
Fig. 2. CDF of absolute value of prediction errors
process variables. In the absence of relevant prior
αd|j + νd|j information, the operating range of the k th input
p(rt = rj |Qt = Qd , D) = γϖje (15) variable may be partitioned as follows:
Aj + Nj
1.1. Approximate the CDF of the k th input, Fk (.),
The above posterior probability distribution can be eval- based on the identification data.
uated to obtain the MAP estimates of the qualitative 1.2. Specify the operating range of each input variable
reliability status of the inferential sensor (see (4)). As as Fk−1 (b) − Fk−1 (a), where a, b ∈ [0, 1] and b > a.
illustrated in Fig. 1, rt = rj implies that bj−1 < |yt − Note that a and b are design parameters chosen
ŷt | ≤ bj , where bj−1 and bj are the lower and upper based on the quality of identification data. For
boundaries of |et |, respectively. Note that an expression instance, a = 0.05 and b = 0.95 can be selected
similar to (15) has also been derived by Qi et al. (2010) to reduce the effect of outlying observations.
assuming that each discrete random variable can only take 1.3. Decide on the number of operating statuses, Ouk ,
two values (e.g. faulty and normal). In this work, however, to be considered.
(15) applies to multiple values of the discrete random 1.4. Partition the operating range of uk into equal-
variables. width intervals, i.e. the width of each interval
would be equal to (Fk−1 (b) − Fk−1 (a))/(Ouk − 2).
In order to quantify the real-time performance of inferen- It should be noted that any other data-driven ap-
tial sensors, it is proposed to associate a numerical value proach can be used to partition the operating space
to each reliability status in the light of the historical (see (3)).
probability distribution of prediction errors. Suppose that 2. Specify a set of indicator variables to denote the
F (ẽt |ut , yt ) denotes the cumulative distribution function operating status of each input variable.
(CDF) of the absolute value of prediction error, e˜t = |yt − 3. Calculate the model prediction errors within the
ŷt |. As the final stage of the training process, a quantifiable identification data-set.
measure of reliability can be defined based on the CDF of 4. Specify possible reliability statuses of inferential sen-
the random variable e˜t as follows: sor predictions by analyzing the PDF of the absolute
p(ẽt > bj ) 1 − F (bj ) value of prediction error (see (2)).
rj , = for j = 1 · · · Re (16) 5. Assign a numeric value to each reliability status based
p(ẽt > b1 ) 1 − F (b1 )
on the CDF of the absolute value of prediction error
where p(ẽt > b1 ) is a normalizing constant. (see (16)).
Moreover, F (bj ) = p(ẽt < bj ) is the historical probability 6. Determine the prior distribution of reliability sta-
of the inferential model resulting in a prediction error tuses, {ϖje }Rj=1 , based on the expected prediction
e

smaller than bj . Alternatively, rj ∝ p(ẽt > bj ) is the performance of the inferential sensor as well as the

280
IFAC DYCOPS 2013
December 18-20, 2013. Mumbai, India

misclassification costs involved in inaccurately pre- Table 1. A summary of the simulated variables
dicting the reliability of predictions.
7. Determine the prior distribution of hyperparameters Description Distribution Unit
given the reliability status, p(ϖjQ |rt = rj ), based on Dilution rate N (0.165, 0.00045) hr−1
the explicit prior knowledge. Note that the prior infor- Substrate concentration N (25, 14.15) kg/m3
mation over hyperparameters can be well-represented
by Dirichlet distributions (see (11)). Table 2. Parameter settings for performance
8. Characterize the posterior probability distribution of assessment of the CFR inferential model
hyperparameters given the reliability status, p(ϖjQ |rt =
rj , D) (see (12)). Property Parameter Setting
9. Characterize the likelihood of indicator variables for No. of operating statuses of u1 10
each reliability status, p(Qt |rt = rj , D), by integrat- No. of operating statuses of u2 10
ing over the hyperparametrs’ space (see (13)). No. of reliability statuses 3
10. Characterize the posterior probability distribution of Reliability statuses Reliable iff
each reliability status, p(rt = rj |Qt , D) (see (15)). 0 < |et | ≤ 1.379
Moderately reliable iff
4. CONTINUOUS FERMENTATION REACTOR 1.379 < |et | ≤ 2.758
SIMULATION Unreliable iff
2.758 < |et |
The governing equations of a continuous fermentation Prior probability ϖe = {0.40, 0.48, 0.12}
reactor (CFR) are given by (Henson and Seborg, 1997): No. of prior samples A = 22
No. of training samples N = 2000
Ẋ = −DX + µX (20)
1
Ṡ = D(Sf − S) − µX (21) 0.5
Reliable Moderately Unreliable
YX/S Reliable

Ṗ = −DP + (αµ + β)X (22) 0.4


Probability Density

where specific growth rate (µ) is defined as 0.3


( )
µm 1 − PPm S
µ= S2
(23) 0.2
Km + S + K i
0.1
Biomass concentration (X), substrate concentration (S)
and product concentration (P ) are state variables of the 0
system. Dilution rate (D) and feed substrate concentration 0 1 2 3 4 5 6 7 8
Absolute Prediction Error
(Sf ) are considered as system inputs. Moreover, cell-
mass yield (YX/S ), yield parameters (α, β), maximum Fig. 3. Probability density function of the absolute predic-
specific growth rate (µm ), product saturation constant tion error obtained from the CFR inferential model
(Pm ), substrate saturation constant (Km ) and substrate
inhibition constant (Ki ) are model parameters. the reliability status of the identified inferential model,
the vector of quality variables Qt has S = 102 points
The identification data was simulated using the variable in its sample space. The proposed Bayesian approach is
settings presented in Table 1 as well as the non-linear used to assess the reliability status of the predictions. The
dynamic model given by (20)-(23). Data are collected at parameter settings required to design a Bayesian reliability
a relatively slow sampling rate so that data can be con- index are presented in Table 2. The boundaries of each
sidered at the steady-state. An empirical linear model has reliability status have been selected based on the PDF of
been identified to describe the steady-state relationship be- the absolute prediction error shown in Fig. 3. Moreover,
tween the input variables, dilution rate and feed substrate the data-driven approach recommended in Section 3.1 was
concentration, and the output quality variable, biomass applied to partition the operating range of each input
concentration. Linear models are often used for develop- variable. Table 3 shows the confusion matrix obtained
ment of inferential sensors in practical applications. In based on the reliability analysis results for N = 1000
this case study, however, the identified linear model may test samples. The diagonal and cross-diagonal elements
not sufficiently represent the non-linear behavior of the of the confusion matrix shown in Table 3 represent the
fermentation process over such a wide operating space. number of predictions with correctly and incorrectly iden-
Due to the inherent structural limitations of the identified tified reliability status, respectively. The low number of
model, the inferential sensor is thus expected to exhibit incorrectly identified instances indicates that the method
a degraded prediction performance in operating regions could effectively determine the reliability of inferential
with low densities of identification data. Therefore, it is model predictions.
desirable to estimate the real-time prediction performance
of the inferential sensor as well. The entries of the confusion matrix can be used to quan-
tify the performance of the proposed method in terms
To determine the real-time prediction performance of this of sensitivity, precision, and accuracy (Sokolova and La-
inferential sensor, a set of binary indicator variables is palme, 2009). A summary of the metrics quantifying the
introduced as {Qt }Nt=1 = {(qt , qt )}t=1 ∈ R
u1 u2 N 2×N
. Given performance of the Bayesian reliability analysis of the CFR

281
IFAC DYCOPS 2013
December 18-20, 2013. Mumbai, India

Table 3. Confusion matrix for the Bayesian re-


Reliable Moderately Unreliable
liability analysis of the CFR inferential model
Reliable
1
Predicted Status
0.8

Reliability Index
Reliable Mod. Reliable Unreliable
Reliable 432 68 0 0.6
Mod. Reliable 20 371 15
0.4
Unreliable 0 5 89
0.2
Table 4. Performance metrics for the Bayesian
reliability analysis of the CFR inferential 0
0 1 2 3 4 5 6 7 8
model Absolute Prediction Error

Reliability Class Sensitivity Precision Accuracy


(%) (%) (%) Fig. 5. Performance assessment of the CFR inferential
model
Reliable 86.4 95.6 91.2
Mod. Reliable 91.4 83.6 89.2 REFERENCES
Unreliable 94.7 85.6 98.0 DeGroot, M. (1970). Optimal Statistical Decisions.
Total 89.2 89.2 89.2 McGraw-Hill, New York.
Henson, M.A. and Seborg, D.E. (1997). Nonlinear Process
1 Control. Prentice-Hall Inc., USA.
Kaneko, H., Arakawa, M., and Funatsu, K. (2010). Ap-
0.8 plicability domains and accuracy of prediction of soft
Cumulative Distribution

sensor models. AIChE Journal, 57(6), 1506–1513.


0.6 Kaneko, H. and Funatsu, K. (2011). Improvement and
estimation of prediction accuracy of soft sensor models
0.4
based on time difference. In K.G. Mehrotra, C.K. Mo-
han, J.C. Oh, P.K. Varshney, and M. Ali (eds.), Mod-
0.2
ern Approaches in Applied Intelligence, volume 6703 of
Lecture Notes in Computer Science, 115–124. Springer-
0
0 1 2 3 4 5 6 7 8 Verlag, Berlin.
Absolute Prediction Error
McKay, M.D., Morrison, J.D., and Upton, S.C. (1999).
Evaluating prediction uncertainty in simulation models.
Fig. 4. Cumulative distribution function of the absolute Computer Physics Communications, 117(1-2), 44–51.
prediction error obtained from the CFR inferential Nomikos, P. and MacGregor, J.F. (1995). Multi-way par-
model tial least squares in monitoring batch processes. Chemo-
inferential model is reported in Table 4. The large values metrics and Intelligent Laboratory Systems, 30(1), 97–
of the sensitivity, precision and accuracy are indicative of 108.
the effectiveness of the proposed method. Pernestål, A. (2007). A Bayesian Approach To Fault
Isolation With Application To Diesel Engine Diagnose.
Regardless of the distribution of prediction error, a quan- Ph.D. thesis, KTH School of Electrical Engineering,
tifiable measure of reliability can be defined solely based on Stockholm, Sweden.
the CDF of the absolute prediction error. From the CDF Qi, F., Huang, B., and Tamayo, E.C. (2010). Data-
shown in Fig. 4, it is evident that the prediction error does driven Bayesian approach for control loop diagnosis with
not follow a Gaussian distribution in this example. Fig. missing data. AIChE Journal, 56(1), 179–195.
5 shows the reliability indices assigned to the inferential Sokolova, M. and Lapalme, G. (2009). A systematic
sensor predictions obtained for the test data. It can be analysis of performance measures for classification tasks.
observed that smaller reliability indices are assigned to Information Processing and Management, 45(4), 427–
larger prediction errors. 437.
Soto, A.J., Vazquez, G.E., Strickert, M., and Ponzoni, I.
5. CONCLUSION (2011). Target-driven subspace mapping methods and
their applicability domain estimation. Chemometrics
In this paper, a data-driven Bayesian framework for real- and Intelligent Laboratory Systems, 30(9), 779–789.
time performance assessment of inferential sensors was Vries, S.D. and Braak, C.J.F.T. (1995). Prediction error
proposed. The main focus was to characterize the effect in partial least squares regression: A critique on the
of the operating space on the prediction reliability in the deviation used in the unscrambler. Chemometrics and
absence of target measurements. The detail of the design Intelligent Laboratory Systems, 30(2), 239–245.
procedure was presented. It was shown that the applica- Yang, H.Y., Lee, S.H., and Na, M.G. (2009). Monitoring
tion of the proposed Bayesian solution does not depend and uncertainty analysis of feedwater flow rate using
on the identification techniques employed for inferential data-based modeling methods. IEEE Transactions on
model development. Moreover, its real-time implementa- Nuclear Science, 56(4), 2426–2433.
tion is computationally efficient and simple for practition-
ers to use. The effectiveness of the proposed method was
demonstrated through a simulation case study.

282

You might also like