You are on page 1of 16

Artificial Neural-Network-Assisted Stochastic

Process Optimization Strategies


Somnath Nandi, Soumitra Ghosh, Sanjeev S. Tambe, and Bhaskar D. Kulkarni
Chemical Engineering Div., National Chemical Laboratory, Pune-411 008, India

This article presents two hybrid robust process optimization approaches integrating
artificial neural networks (ANN) and stochastic optimization formalisms ᎏ genetic al-
gorithms (GAs) and simultaneous perturbation stochastic approximation (SPSA). An
ANN-based process model was de®eloped solely from process input ᎐ output data and
then its input space comprising design and operating ®ariables was optimized by employ-
ing either the GA or the SPSA methodology. These methods possess certain ad®antages
o®er widely used deterministic gradient-based techniques. The efficacy of ANN-GA and
ANN-SPSA formalisms in the presence of noise-free as well as noisy process data was
demonstrated for a representati®e system in®ol®ing a nonisothermal CSTR. The case
study considered a nontri®ial optimization objecti®e, which, in addition to the con®en-
tional parameter design, also addresses the issue of optimal tolerance design. Compari-
son of the results with those from a robust deterministic modelingroptimization strategy
suggests that the hybrid methodologies can be gainfully employed for process optimiza-
tion.

Introduction
Conventionally, chemical plant design consists of choosing ture of chemical processes, it leads to complex nonlinear
and sizing appropriate process equipment, as well as fixing models, which in most cases are not amenable to analytical
the nominal operating points. In this endeavor, deterministic solutions; thus, computationally intensive numerical methods
gradient-based optimization techniques that mostly use must be utilized for obtaining solutions. The difficulties asso-
steady-state process models are utilized. Here, the objective ciated with the construction and solution of the phenomeno-
function to be optimized is a suitably chosen cost function Žto logical models necessitate exploration of alternative modeling
be minimized. or a profit function Žto be maximized.. Tradi- formalisms. Process identification via empirical models is one
tionally, issues such as the choice and design of the process such alternative. They are mostly discrete-time dynamic mod-
control system are addressed once the nominal operating els comprising, for instance, Hammerstein and Wiener mod-
point is known consequent to the process design activity. els, Volterra models and polynomial autoregressive moving-
Availability of a process model assumes considerable im- average models with exogenous inputs ŽARMAX. ŽHenson,
portance in the process design activity. For a given process, a 1998.. These linear models and their nonlinear counterparts
‘‘first principles Žphenomenological.’’ model can be con- are constructed exclusively from the input᎐output process
structed from the knowledge of mass, momentum, and en- data. A fundamental deficiency of the empirical modeling ap-
ergy balances, as well as from other chemical engineering proach is that the model structure Žform. must be specified a
principles. Owing to the lack of a good understanding of the priori. Satisfying this requirement, especially for nonlinearly
underlying physicochemical phenomena, development of behaving processes is a cumbersome task, since it involves
phenomenological process models poses considerable diffi- selecting heuristically an appropriate model structure from
culties. Moreover, nonlinear behavior being a common fea- numerous alternatives.
In recent years, artificial neural networks ŽANNs. have
been found to be an attractive tool for steady-staterdynamic
process modeling, and model-based control in situations
Correspondence concerning this article should be addressed to S. S. Tambe. where the development of phenomenological or the empiri-
Present address of S. Ghosh: Chemical Engineering Department, Indian Institute
of Technology ŽIIT ., Kharagpur, West Bengal 721 302, India. cal models just given either becomes impractical or cumber-

126 January 2001 Vol. 47, No. 1 AIChE Journal


some Žsuch as Bhat and McAvoy, 1990; Hernandez and 1998a,b.. This characteristic of the GA and SPSA methods
Arkun, 1992; Nahas et al., 1992; Ramasamy et al., 1995; Ten- can be fruitfully exploited for optimizing the ANN-based
dulkar et al., 1998; and reviews by Narendra and Partha- models whose functional forms do not assuredly satisfy the
sarathy, 1990; Hunt et al., 1992; Agarwal, 1997.. ANNs are requirements of the gradient-based optimization methods. An
based on the concept that a highly interconnected system of additional benefit of the GA and SPSA methods is that they
simple processing elements Žcalled neurons or nodes. can ap- can be used in situations where input information into the
proximate complex nonlinear relationships existing between optimization method Žsuch as objective function evaluations.
independent ŽANN input. and dependent ŽANN output. vari- may be noisy. The objective of this article, therefore, is to
ables to an arbitrary degree of accuracy ŽHornik et al., 1989; present two hybrid ‘‘modeling-optimization’’ techniques,
Poggio and Girosi, 1990.. The advantages of a neural-net- namely, ANN-GA and ANN-SPSA, for the purpose of robust
work-based process model are Ž1. it can be developed solely chemical process design and optimization. In this approach,
from the process input᎐output data Žthat is, without invoking an ANN-based process model is first developed and its input
process phenomenology.; Ž2. even multi-input multioutput space is optimized next using either the GA or the SPSA
relationships can be approximated easily; and Ž3. it possesses formalism. The principal advantage of the hybrid methods is
generalization ability owing to which the model can accu- that process design and optimization can be conducted solely
rately predict the outputs corresponding to a new set of in- from the steady-state process input᎐output data.
puts that were not part of the data used for constructing the For validating the ANN-GA and ANN-SPSA methods, we
ANN model. For design purposes, it is not adequate that a have considered a nontrivial process optimization objective,
generalization-capable ANN model is available. What is im- which not only aims at obtaining the optimal values of pro-
portant is that the ANN model should be amenable to opti- cess variables, but also the optimal values of tolerances Žop-
mization. Specifically, it should be possible to optimize the erating windows. for the process variables. Fixing the values
input space of the ANN model, representing process vari- of the tolerances becomes important owing to the fact that
ables, such that the model output Žproduct concentration, re- chemical processes involve a number of variables andror pa-
actor temperature, etc.. is maximized or minimized. This ob- rameters that are always subjected to some degree of uncer-
jective differs from that involving ANN model development tainty Žstochastic variability.. For instance, irrespective of how
where given an input᎐output example data set, a suitably good a control system is, process variables such as concentra-
chosen optimization algorithm finds a set of network parame- tion, temperature, and pressure do vary randomly, albeit
ter Žweights. that minimizes a prespecified error function. within a narrowly bounded window. Depending upon their
In commonly used deterministic optimization techniques, origin, uncertainties can be classified into the following four
the solution to an optimization problem is represented in the categories ŽPistikopoulos, 1995; Pistikopoulos and Ierapetri-
form of a vector consisting of values of decision variables at tou, 1995; Diwekar and Kalagnanam, 1996; 1997a,b..
which the gradient of the objective function with respect to 䢇 Process-inherent uncertainty. Due to random variations in

the decision variables becomes zero. Thus, gradient computa- process parametersrvariables, such as flow rate, temperature,
tion is an integral feature of such optimization paradigms. and pressure.
Additionally, most gradient-based techniques require the ob- 䢇 Model-inherent uncertainty. Accounts for variations in the

jective function to be smooth, continuous, and differentiable. phenomenological model parameters representing, for in-
In the case of an ANN, it is possible to express the nonlin- stance, kinetic constants, heat-rmass-transfer coefficients,
ear mapping that it executes in terms of a generic closed-form and physical properties.
function. It can be noted that the nonlinear mapping ability 䢇 External uncertainty. Considers variations in parameters

of ANNs is due to the nonlinear activation function used for that are external to the process, but influencing the process
computing the node-specific outputs. For computing an out- cost Ž feed stream a va ilab ility, p rod u ct d em a n d ,
put, the nonlinear activation function makes use of the argu- pollutionreconomic indices, etc...
ments comprising a number of network parameters Žweights. 䢇 Discrete uncertainty. Accounts for the equipment avail-

and node-specific inputs. Consequently, the mapping exe- ability and other random discrete events.
cuted by an ANN attains a complex nonlinear character that The conventional deterministic process optimization ap-
cannot be guaranteed to simultaneously fulfill the smooth- proach ignores uncertainties, thereby resulting in suboptimal
ness, continuity, and differentiability criteria for the objective solutions. Uncertainties are capable of influencing, for in-
function. This feature of ANN models poses difficulties in stance, the product quality and control cost and, therefore,
using the conventional deterministic techniques for optimiz- they need to be considered during process design and opti-
ing their input space. Hence, formalisms that do not impose mization activity. Accounting for uncertainties leads to toler-
stringent conditions on the form of the objective function ance design, which aims at obtaining the optimal size of the
need to be explored. The stochastic optimization formalisms, window for each uncertainty-affected process variabler
namely, genetic algorithms ŽGAs. and simultaneous pertur- parameter. The best average process performance can be
bation stochastic approximation ŽSPSA., among others, are achieved consequent to optimal tolerance design so long as
not heavily constrained by the properties of the objective the process operates within the optimized operating zones.
function, and thus they are potential candidates for employ- In a recent article by Bernardo and Saraiva Ž1998., the au-
ment in the optimization of an ANN model. An important thors have introduced a novel robust optimization ŽRO.
characteristic of the GA and SPSA methodologies is that they framework that deals with the optimization objective alluded
need measurements of the objective function only, and not to earlier. An advantage of the optimal solution given by the
the measurements Žor direct calculation . of the gradient Žor RO framework is that it provides the best operating regions
higher-order derivatives. of the objective function ŽSpall, for designing a control system. One of the optimization prob-

AIChE Journal January 2001 Vol. 47, No. 1 127


lems considered by Bernardo and Saraiva was minimization operating variables. The optimal solutions so obtained should
of the continuous stirred-tank reactor’s ŽCSTR. annual plant ensure minimization of the annual plant cost while maintain-
cost that comprises four components, namely, equipment, op- ing the desired product quality.
erating, control, and quality costs. The RO formalism, which The ANN-assisted RO framework assumes that a steady-
accounts for the control costs right at the process design stage, state process model defined as
was shown to yield qualitatively improved results as com-
pared to those obtained using a fully deterministic optimiza- y s f Ž ⌿,⌽ , W . Ž1.
tion approach. Specifically, it was shown for the case of CSTR
that simultaneous optimization of process operating variables is available, where y represents the process output variable
and the respective tolerances could reduce the total annual that also determines product quality; ⌿ and ⌽, respectively,
plant cost by nearly one order of magnitude, that is, from refer to the M- and N-dimensional vectors of design and op-
$101,872ryr to $14,716ryr. For the sake of affording a direct erating process variables Ž ⌿ s w ␺ 1, ␺ 2 , . . . , ␺m , . . . , ␺ M xT; ⌽ s
comparison, we adopt the RO framework with necessary w ␾ 1, ␾ 2 , . . . , ␾n , . . . , ␾ N xT .; W denotes the weight matrix of the
modifications to account for the ANN-based process model. ANN model; and f represents the ANN-approximated non-
The principal differences between the RO methodology and linear function.
the hybrid formalisms presented here are as follows. The total annual plant cost Ž C yr . to be minimized is as-
䢇 While the RO framework assumes knowledge of a phe-
sumed to consist of four components, namely, equipment cost
nomenological process model, the ANN-GA and ANN-SPSA Ž C eq p ., operating cost Ž C o p ., control cost Ž Cc ., and quality
formalisms utilize a neural-network-based process model. cost Ž C q .:
䢇 In the RO approach, the phenomenological process
model is optimized using a deterministic successive quadratic
C yr sC eq p qC o p qCc qC q . Ž2.
programming algorithm ŽNPSOL package; Gill et al., 1986.,
whereas hybrid methodologies optimize the ANN-based
model using inherently stochastic optimization techniques, Among the four cost components, the operating, control, and
namely, GA and SPSA. It may be noted, however, that evalu- quality costs have uncertainties associated with them that
ation of the objective function accounting for the stochastic emanate from random fluctuations in the process operating
behavior of the uncertainty-affected process variables, re- variables. However, the equipment cost, which is usually a
mains the same in the RO- and ANN-based hybrid for- deterministic quantity, has no uncertainty attached to it. The
malisms. extent of uncertainty in an operating variable can be charac-
䢇 In the present study it is shown that the hybrid optimiza- terized in terms of a probability density function ŽPDF. ŽDi-
tion methodologies are capable of yielding comparable solu- wekar and Rubin, 1991, 1994. where mean value of the PDF
tions when noise-free and noisy process data are utilized for represents the nominal value of that operating variable. Ac-
constructing the ANN-based process model. cordingly, defining J Ž ⌽ . to be the set of PDFs associated
This article is structured as follows. First, the mathematical with the operating variables, ⌽, the corresponding set Ž ⌽ ˆ.
formulation of the ANN model-based robust optimization is describing the operating space regions can be represented as
presented. A detailed discussion of the development of the
ANN-based process model and the stepwise implementation ⌽̂ s  ⌽ :⌽ g J Ž ⌽ . 4 . Ž3.
of the ANN-GA and ANN-SPSA hybrid optimization
methodologies are provided next. Finally, results pertaining ˆ comprise a set of oper-
The physical operating regions, ⌽,
to the CSTR optimization case study are presented and dis- ˆ␾ n4, where ⌽␾ n, denoting the window for the
ating windows ⌽
cussed.
nth operating variable, is defined as

ANN-Assisted Robust Optimization Framework ⌽̂␾ ns ␾nl , ␾nu ; ns1, 2, . . . , N. Ž4.


While developing the framework, it is not necessary to take
the model-inherent uncertainty into account, since the ANN- Here, ␾nl and ␾nu, respectively, representing the lower and
GA and ANN-SPSA strategies do not utilize a phenomeno- upper bounds on the nth operating variable, are expressed as
logical model. Among the remaining three uncertainty cate-
gories, only process-inherent uncertainty has been considered
␾nl s ␮ n Ž 1᎐ ⑀ n . ; ␾nu s ␮ n Ž 1q ⑀ n . , Ž5.
although the optimization framework presented below is suf-
ficiently general toward inclusion of the remaining two Žex-
ternal and discrete . uncertainties. The origin of the com- where ␮ n refers to the mean value of the nth operating vari-
monly encountered process-inherent uncertainty lies in the able and ⑀ n is the associated tolerance. Commonly, varia-
small but significant random Žuncontrolled. fluctuations tions in ␾n obey Gaussian Žnormal. probability distribution,
present in the process operating variables. and therefore the respective tolerance Ž ⑀ n . can be approxi-
We define the optimization problem under consideration mated as
as: given the process input data comprising the steady-state
values of equipment Ždesign. and operating variables, and the ⑀ n s 3.09 Ž ␴nr␮ n . , Ž6.
corresponding values of process output variables, obtain, in a
unified manner, the optimal values of Ž1. design variables, Ž2. where ␴n refers to the standard deviation of the Gaussian
operating variables, and Ž3. tolerances defining bounds on the PDF.

128 January 2001 Vol. 47, No. 1 AIChE Journal


Owing to the random fluctuations in process operating known as Hammersley sequence sampling ŽHSS. ŽDiwekar and
variables, the steady-state value of the process output Žqual- Kalgnanam, 1996, 1997a,b. could be utilized. In this tech-
ity. variable, y, also deviates from its desired Žnominal. set- nique, a statistically adequate number Ž Nobs . of observations
point. Thus, it becomes essential to define a PDF, LŽ y ., per- is sampled from the Gaussian PDFs associated with the un-
taining to the quality variable y as well. Accordingly, an ex- certainty-involving operating variables. Next, each of the Nobs
pression similar to Eq. 3, but involving LŽ y ., can be written sampled sets, ⌽j Ž js1, 2, . . . , Nobs ., along with the design
for defining the corresponding space region Ž ˆ y .: variable vector, ⌿, is applied to the ANN model for comput-
ing the magnitude of the process output, y. The estimate of
ˆy s  y : y g L Ž y . 4 . Ž7. the quality cost can then be computed using the Taguchi loss
function ŽTaguchi, 1986., as given below:
Using Eqs. 2᎐7, it is possible to write the complete mathe-
E Ž C q . s k l Ž ␮ y y yU . q ␴y2 ,
2
matical formulation of the robust optimization problem un- Ž 12.
der consideration as
where k l refers to the quality loss coefficient; yU is the de-
ˆ , ˆy . sC eq p Ž ⌿ . qC o p Ž ⌽
min C yr Ž ⌿, ⌽ ˆ . qCc Ž ⌽
ˆ. sired value of y; and ␴y denotes the standard deviation of
ˆ
⌿,⌽ Nobs number of y values. The mean value of the quality vari-
qC q Ž ˆ
y. Ž8. able, ␮ y , is calculated as

N ob s
subject to,
Ý f Ž ⌿, ⌽j , W .
js 1
ŽI. f Ž ⌿, ⌽ , W . s y for all ˆ;
⌽g⌽ ␮ys . Ž 13.
Nobs
T
ˆ s  ⌽ :⌽ g J Ž ⌽ . 4
⌽ s w ␾ 1 , ␾ 2 , . . . , ␾N x ; ⌽ Ž9.
For computing the estimate of the operating cost, E Ž C o p .,
Ž II . ˆy s  y : y g L Ž y . 4 . Ž 10. the following expression is used:

This formulation makes use of an ANN model ŽEq. 1. pos- N ob s

sessing the following properties: Ž1. the input space of the Ý C o p Ž ⌽j .


ANN model comprises design variables, ⌿, and uncertainty- E Ž Co p . s
js 1
. Ž 14.
involving process operating variables, ⌽; Ž2. the output space Nobs
of the ANN model represents the quality variable y; Ž3. the
weight matrix ŽW . of the ANN model is available; and Ž4. the Since the design variables are not associated with any uncer-
model is valid over operating variable regions, ⌽, ˆ and the tainty, the ⌿-dependent equipment cost, C eq pŽ ⌿ ., can be
output region, ˆ y; consequently, the equality constraint de- calculated deterministically. The last of the four cost compo-
fined in Eq. 9 always holds. nents representing the control cost, Cc , can be determined
In the objective function defined by Eq. 8, the elements of using the mean Ž ␮ n . and standard deviation Ž ␴n . of the PDFs
vectors ⌿ and ⌽ ˆ signify the decision variables; ˆy, represent- describing operating variables:
ing the space region of the output variable, depends on ⌿
and ⌽, ˆ and therefore is not a decision variable. However, the N ␮n
optimization objective, which involves the simultaneous de-
termination of the operating variables Žnominal operating
Cc s Ý
ns 1
ž aq b
␴n / , Ž 15.
points. and associated tolerances, necessitates: Ž1. optimiza-
tion of the mean values Ž ␮ n; ns1, 2 , . . . , N . of the Gaussian where a, b are constants and n refers to the operating vari-
PDFs characterizing uncertainty-involving operating vari- able index.
ables ⌽, and Ž2. optimization of the tolerances, ⑀ n Ž ns1, 2
, . . . , N .. Simultaneous determination of the mean  ␮ n4 and Construction of ANN-Based Process Model
tolerance  ⑀ n4 values also fixes the corresponding standard
deviations,  ␴n4 Žsee Eq. 6., which can be used to characterize A prerequisite to the implementation of the ANN-GA and
the PDF set, J Ž ⌽ .. It is thus clear that optimization of the ANN-SPSA methodologies is the development of a suitable
mean and associated tolerance values in turn leads to the ANN-based process model. For this purpose, a class of ANNs
optimization of the PDF set, J Ž ⌽ .. known as multilayered feedforward networks ŽMFFNs. can be
Following the prescription of Bernardo and Saraiva Ž1998., used. An MFFN Žsee Figure 1. is a nonlinear mapping device
the objective function defined in Eq. 8 can be evaluated as between an input set Ž I . and an output set Ž O .. It represents
a function f that maps I into O, that is, f : I ™O, or y s f Ž x .,
where y g O and x g I. The widely used MFFN paradigm is
ˆ .,
C yr s E Ž C q . q E Ž C o p . qC eq p Ž ⌿ . qCc Ž ⌽ Ž 11. multilayered perceptron ŽMLP., mostly comprising three se-
quentially arranged layers of processing units. The three suc-
where E(C q ) and E(C o p ) refer to the expected values of the cessive layers, namely, input, hidden, and output layers, house
quality cost and the operating cost, respectively. For comput- NI , NH , and NO number of nodes, respectively. Usually, the
ing these expected costs, an efficient sampling technique input and hidden layers also contain a bias node possessing a

AIChE Journal January 2001 Vol. 47, No. 1 129


consists of two types of passes, namely, forward and re®erse,
through the network layers. In the forward pass, an input
pattern from the example data set is applied to the input
nodes and outputs of the active nodes are evaluated. For
computing the output, the weighted-sum of the inputs to an
active node is calculated first, which is then transformed us-
ing a nonlinear activation function, such as the sigmoid func-
tion. The outputs of the hidden nodes computed in this man-
ner form the inputs to the output-layer nodes whose outputs
are evaluated in a similar fashion. In the reverse pass, the
pattern-specific SSE defined in Eq. 17 is computed and used
for updating the network weights in accordance with the GDR
strategy. The weight-updation procedure when repeated for
all the patterns in the training set completes one training it-
eration.
The RMSE minimization procedure described earlier does
not ensure that the trained network possesses satisfactory
generalization ability. To possess good generalization ability,
it is essential that the network captures the underlying trends
existing in the example input᎐output data. The phenomenon,
which affects the network’s generalization ability, is known as
Figure 1. Three-layered feed-forward neural network.
‘‘overfitting.’’ When it occurs, the network attempts to fit even
the noise in the example data set at the cost of learning the
trends therein. As a result, the network makes poor predic-
constant output of q1. All the nodes in the input layer are tions in the case of new inputs. Overfitting occurs due to two
connected using weighted links to the hidden layer nodes; factors: Ž1. when the network is trained excessively Žover-
similar links exist between the hidden and output-layer nodes. trained ., that is, over a large number of training iterations,
Nodes in the input layer do not perform any numerical pro- and Ž2. when the network’s hidden layer contains more units
cessing, and thus act as ‘‘fan-out’’ units; all numerical pro- than necessary. To prevent the occurrence of overfitting, the
cessing is done by the hidden and output-layer nodes, and network’s generalization performance is monitored at the end
thus they are termed ‘‘active’’ nodes. of every training iteration on a set different than the one
The problem of neural network modeling is to obtain a set used for updating the weights. Specifically, the available ex-
of weights such that the prediction error Ždifference between ample input᎐output data are partitioned into two sets,
network-predicted outputs and their desired values. mea- namely, the training and test sets. While the former set is
sured in terms of a suitable error function, for instance, the used for adjusting the network’s weights, the latter set is uti-
root-mean-squared error ŽRMSE., is minimized. The RMSE is lized for monitoring the network’s generalization perfor-
defined as: mance. In essence, the RMSE magnitude with respect to the
training set Ž Et r n . indicates the data-fitting ability of the net-
N pa t work undergoing training, and the test set RMSE Ž Et st . mea-

RMSEs ) Ý 2 El
ls1
Npat = No
, Ž 16.
sures how well the network is generalizing. Upon training the
network over a large number of iterations, the weight matrix
resulting in the smallest Et st magnitude for the test set data,
is taken to be an optimal weight set. It may, however, be
noted that this weight set pertains to the specific number of
where l refers to the input pattern index Ž l s1, 2, . . . , Npat .; hidden units Ž NH . considered in the network architecture.
No denotes the number of output layer nodes, and El is a For a given ANN-based modeling problem, the number of
measure of the sum-of-squares error ŽSSE., defined as nodes in the network’s input layer Ž NI . and output layer Ž NO .
are dictated by the input᎐output dimensionality of the sys-
1 No
2
tem being modeled. However, the number of hidden units
El s
2
Ý Ž yli ᎐ oli . , Ž 17. Ž NH . is an adjustable structural parameter. If the network
is1 architecture contains more hidden units than necessary, they
lead to an oversized network and, consequently, an overpa-
where, y li denotes the desired output of the ith output node rameterized network model. Such a model, like an over-
when the lth input pattern is presented to the network, and trained one, gives poor representation of the trends in the
oli refers to the corresponding desired output. The task of example data. For excluding the possibility of an oversized
RMSE minimization is accomplished by ‘‘training’’ the net- network, it becomes essential to study the effect of the num-
work wherein a gradient descent technique, such as the gen- ber of hidden units on the network’s function approximation
eralized delta rule ŽGDR. ŽRumelhart et al., 1986., is utilized and generalization capabilities. Accordingly, multiple net-
for the updation of connection weights. work training simulations are conducted by systematically
Network training is an iterative procedure that begins with varying the number of hidden units. Theses simulations es-
initializing the weight matrix randomly. A training iteration sentially aim at obtaining an optimal network architecture

130 January 2001 Vol. 47, No. 1 AIChE Journal


Žthat is, housing only adequate number of hidden units., function; Ž2. capability to handle nonlinear, complex, and
leading to the smallest possible RMSE magnitude for the test noisy objective functions; Ž3. they perform global search, and
set data. thus are more likely to arrive at or near the global optimum;
The entire procedure for selecting an optimal MLP archi- and Ž4. their search procedure being stochastic, GAs do not
tecture and associated weight matrix using the GDR strategy impose preconditions, such as smoothness, differentiability,
is summarized in the following steps ŽBishop, 1994.: and continuity on the objective function form. Owing to these
1. Fix a small value Žsuch as one or two. for the number of attractive features, GAs are being used for solving diverse
hidden units, NH , and initialize the network weight matrix optimization problems in chemical engineering Žsuch as
randomly. Also select values of the GDR parameters, namely, Cartwright and Long, 1993; Hanagandi et al., 1996; Garcia
learning rate Ž␩l ; 0 -␩l F1.0. and momentum coefficient Ž ␣ m; and Scott, 1998; Garcia et al., 1998; Garrard and Fraga, 1998;
0 - ␣ m F1.0.: addition of the momentum term in the GDR- Polifke et al., 1998.. An application of the ANN-GA formal-
based weight-updation expression helps accelerate the weight ism for the purpose of parameter design of an industrial dryer
convergence and to avoid the local minima on the error sur- using noise-free data has been shown recently by Hugget et
face. al. Ž1999.. To illustrate the working principles of GAs, we
2. Minimize the test set RMSE Ž Et st . using the GDR-based recast the cost-minimization problem ŽEq. 8. as
error-back-propagation ŽEBP. algorithm. Repeat the training
procedure a number of times using different random number Minimize C yr Ž x . ; x kL F x k F x U
k ; k s1, 2 , . . . , K ;
sequences for initializing the network weights. This proce-
dure is performed for exploring the weight᎐space rigorously x s ⌿ D⌽ D E, Ž 18.
and, consequently, locating the deepest local minimum Žor
the global one. on the error surface. Store the network weight where K-dimensional vector, x s w x 1, x 2 , . . . , x k , . . . , x k xT,
matrix that has resulted in the smallest Et st . represents the set of decision variables, and x kL and x U k are
3. Repeat steps 1 and 2 by systematically increasing the the lower and upper bounds on x k .
number of hidden units until Et st attains its smallest possible In a typical GA procedure, search for the optimal solution
magnitude. is conducted from a randomly initialized population of candi-
Implementation of these steps optimizes the network ar- date solutions, wherein each solution is usually coded as a
chitecture and the associated weight matrix, thereby creating string Žchromosome. of binary digits. A coded string is com-
an optimal network model possessing the much desired posed of as many segments as the number Ž K . of decision
data-fitting and generalization capabilities. For details of the variables. The resultant population of candidate solutions is
GDR-based EBP training algorithm, the reader may refer to, then iteratively refined in a manner imitating selection and
for example, Hecht-Nielsen Ž1990., Freeman and Skapura adaptation in biological evolution, until convergence is
Ž1991., and Tambe et al. Ž1996.. achieved. Within an iteration, GA evaluates the goodness of
a candidate solution by employing a fitness function, whose
ANN-Model-Assisted Stochastic Process magnitude is indicative of the objective function value. For a
Optimization Methodologies function maximization Žminimization. problem, the fitness
function value should scale up Žscale down. with the increas-
The principal difference between the widely used deter-
ing value of the objective function. In each GA-iteration, a
ministic gradient-based optimization schemes and the
new population Žgeneration. of candidate solutions is formed
stochastic ones, such as GA and SPSA, is that the latter class
using the following GA operators:
of methodologies involves a random component at some stage
䢇 Selection: This operator chooses chromosome strings to
in their implementation. For instance, GAs manipulate a set
form a mating pool of parent strings that are subsequently
of candidate solutions at random with the objective of sam-
used for producing offspring. Selection of parent strings is
pling the search Žsolution. space as widely as possible, while
conducted in a manner such that fitter strings enter the mat-
at the same time trying to locate promising regions for fur-
ing pool on a priority basis. For selection purposes, the
ther exploration ŽVenkatasubramanian and Sundaram, 1998..
roulette wheel ŽRW. or less noisy stochastic remainder ŽSR.
In the present work, GA and SPSA methodologies have been
methodologies ŽGoldberg, 1989. may be used.
used, in conjunction with an ANN-based process model,
䢇 Crosso®er: The action of this critical GA operator pro-
for optimizing Ž1. process design variables, ⌿; Ž2. process
duces an offspring population wherein randomly selected
operating variables, ⌽ ; and Ž 3 . tolerances, E s
w ⑀ 1, ⑀ 2 , . . . , ⑀ n , . . . , ⑀ N xT, associated with the process operating parts of the parent strings are exchanged mutually to form
two offspring strings per parent pair. Whether a pair Žalso
variables. In what follows, the salient features and implemen-
selected randomly. undergoes crossover or not is governed by
tation details of the ANN-GA and ANN-SPSA methodolo-
the prespecified value of the crossover probability Ž Pcr oss ..
gies are provided.
Action of the crossover operator tends to improve the combi-
natorial diversity of the offspring population by utilizing
ANN-GA optimization methodology the building blocks of the parent population ŽVenkata-
Genetic algorithms ŽHolland, 1975; Goldberg, 1989. com- subramanian and Sundaram, 1998.. A high Pcr oss magnitude
bine the ‘‘survival of the fittest’’ principle of natural evolution Žsuch as 0.5F Pcr oss F1.0. ensures more crossover between
with the genetic propagation of characteristics, to arrive at a parent pairs, and thereby greater diversity in the offspring
robust search and optimization technique. Principal features population.
possessed by the GAs are: Ž1. they are zeroth-order optimiza- 䢇 Mutation: This operator introduces new characteristics

tion methods requiring only the scalar values of the objective in the offspring population by randomly flipping bits of the

AIChE Journal January 2001 Vol. 47, No. 1 131


offspring strings from zero to one and vice versa. The bit-flip- sets from the Gaussian PDFs associated with the operating
ping operation performed with a small Ž0.01᎐0.05. probability variables, ⌽.
of mutation Ž Pmut . helps in conducting a local search around Step 4. Apply the jth Ž js1, 2, . . . , Nobj . sampled set Ž ⌽j .
point solutions represented by the unmutated offspring along with the corresponding design variable vector Ž ⌿i . to
strings. the ANN model and obtain the model output, y j . Next, com-
The stepwise procedure for implementing the ANN- pute mean Ž ␮ y . and standard deviation Ž ␴y . of the ANN out-
model-assisted GA strategy is now in order Žalso see flow put set,  y j 4.
chart in Figure 2.. Step 5. Evaluate the objective function ŽEq. 11. and fit-
Step 1. Initialize generation counter, Ngen , to zero. ness score of the ith population string.
Step 2. Create the initial population of Npop number of Step 6. Repeat steps 3᎐5 for all population strings Žthat
candidate solution strings randomly using binary digits; each is, is1, 2, . . . , Npop . and rank the strings in the decreasing
string of length equal to l chr digits comprises K segments. order of their fitness scores.
Step 3. Using the HSS technique on the ith Ž is1, 2, Step 7. Create a mating pool of parent strings using the
. . . , Npop . string, sample Nobs number of operating variable SR selection scheme.

Figure 2. Implementation of ANN-GA hybrid methodology.

132 January 2001 Vol. 47, No. 1 AIChE Journal


Step 8. From the mating pool, choose pairs of parent Step 4. Using ˆ xq
t and ˆ x t᎐ as arguments, compute two
strings randomly and perform crossover operation on each measurements, that is, C yr ˆ Ž xq
t
. and C yr Ž ˆ x t᎐ ., of the objective
one of them to obtain the offspring population. function defined in Eq. 11. This step involves usage of Ž1. the
Step 9. Perform mutation on the offspring population. HSS technique for sampling Nobs sets of operating variables,
Step 10. Update generation index by one: Ngen s Ngen q1. and Ž2. the ANN model for computing the expected costs,
Step 11. Repeat steps 3᎐10 on the new generation strings E Ž C q . and E Ž C o p ., as well as the control cost, CcŽ ⌽ ˆ ..
until a convergence criterion, such as Ž1. Ngen exceeds the Step 5. Generate the simultaneous perturbation approxi-
max .
prespecified maximum generations limit Ž Ngen , or Ž2. the mation of the unknown gradient, ˆ g tŽ ˆ
x t ., using
fitness score of the best string in a population no longer in-
creases, is satisfied.
xq
C yr Ž ˆ x t᎐ .
t . ᎐ C y r Žˆ T
ˆg t Ž ˆx t . s = ⌬᎐t 11 ,⌬᎐t 21 , . . . , ⌬᎐t K1 , Ž 22 .
2 Zt
ANN-SPSA optimization methodology
The SPSA optimization methodology ŽSpall, 1987, 1998a,b.
differs from the commonly employed deterministic gradient- where ˆ x t . is K-dimensional and ⌬ t k refers to the kth ele-
g tŽ ˆ
based techniques in the following respects. Instead of directly ment Žq1 or y1. of the perturbation vector, ⌬ t .
evaluating the gradient with respect to each decision variable Step 6. Update estimate of the decision vector according
by perturbing it separately Žas done in the standard two-sided to
finite difference approximation., the SPSA methodology ap-
proximates the gradient by perturbing all the decision vari- ˆx tq1 s ˆx t ᎐ A t ˆg t Ž ˆx t . . Ž 23.
ables simultaneously. Thus, irrespective of the number Ž K . of
decision variables, only two objective function measurements
Step 7. Increment the iteration counter t to t q1 Ž1F t F
are necessary for gradient approximation; this is in contrast
t max . and repeat steps 2᎐6 until convergence; the criterion
to the finite-difference approximation, where 2 K function
for convergence could be that in successive iterations the de-
measurements are necessary for the gradient evaluation. The
cision variable values exhibit very little or no change.
implementation procedure of ANN-SPSA formalism is an it-
erative that begins with a randomly initialized Žguess. solu-
tion vector, ˆ x. The SPSA technique stipulates the cost func- Optimization of CSTR Using ANN-GA and
tion, C yr Ž x ., to be differentiable, since it searches for the ANN-SPSA Strategies
minimum point, xU , at which the gradient of the objective Consider a steady-state process involving a jacketed non-
function, g Ž xU ., attains zero magnitude. That is, isothermal CSTR wherein two first-order reactions in series,
A™ B™C, take place. For implementing the ANN-GA and
ANN-SPSA schemes, an MLP-based model approximating
⭸ C yr Ž x .
g Ž xU . s s 0. Ž 19. the functional relationship between the CSTR’s design and
⭸x x s xU operating variables, and the corresponding steady-state val-
ues of the output variable, was first developed. For conve-
nience, the steady-state CSTR data required for developing
In each SPSA iteration, the gradient is approximated by
the MLP model were generated using the CSTR’s phe-
utilizing the numerically efficient simultaneous perturbation
nomenological model. In actual practice, the MLP model can
technique alluded to earlier. With these preliminaries, the
be developed from a representative steady-state database that
stepwise procedure for ANN-SPSA implementation can be
is already at hand or generated by conducting specially de-
given as Žalso see flow chart in Figure 3.:
signed experiments. The phenomenological equations of
Step 1. Set the iteration index, t, to zero and choose ran-
CSTR used for simulating its steady-state behavior are given
domly a K-dimensional guess solution vector, ˆ x t N ts0 .
in Appendix A, where volume Ž V, m3 . represents the design
Step 2. Compute the t-dependent values, A t and Zt ,
variable and, flow rate Ž F, m3rmin., heat removal rate Ž Q,
termed ‘‘gain sequences’’ using,
kJrmin., inlet concentration of reactant A Ž CA0 , molrm3 ., in-
let concentration of B Ž CB0 , molrm3 ., and inlet temperature
A Z ŽT 0 , K., collectively denote the five operating variables. The
At s ␩ ; Zt s , Ž 20. CSTR output variable, namely, the rate of production
Ž r q t q1. Ž t q1. ␤ Žmolrmin. of B, has been chosen as the quality variable, and
its steady-state value Ž y . has been obtained as
where constants, A, Z, r, ␩ , and ␤ assume nonnegative val-
ues. The optimal values of ␩ and ␤ are either 0.602 and
y s Ž CˆB ᎐ CB0 . = F , Ž 24.
0.101 or 1.0 and 0.1667, respectively ŽSpall, 1998a..
Step 3. Generate a K-dimensional perturbation vector, ⌬ t ,
using Bernoulli "1 distribution, where probability of occur- where, CˆB represents the steady-state concentration of B. The
rence of either q1 or y1 is 0.5; next, perturb all the K ele- desired value of y defined as yU is 600 molrmin.
ments of the vector ˆx t simultaneously, as given by For operating a process, ranges of design and operating
variables are usually specified. Accordingly, the following
ranges were considered for the steady-state CSTR simula-
ˆxq x t q Zt ⌬ t ;
t sˆ ˆx t᎐ s ˆx t y Zt ⌬ t . Ž 21. tion: V s w0.3᎐0.4x m3, F s w0.5᎐1.1x m3rmin, Qs w100᎐1,100x

AIChE Journal January 2001 Vol. 47, No. 1 133


Figure 3. Implementation of ANN-SPSA hybrid methodology.

kJrmin, CA0 s w3,000᎐4,000x molrm3, CB0 s w30᎐600x molrm3, MLP training simulations, use of sigmoid transfer function
and T 0 s w300᎐320x K. Using these ranges, 50 random combi- was made for computing the outputs of the hidden and out-
nations of the CSTR’s design and operating variables were put-layer nodes. The optimal MLP architecture obtained
generated, and using each combination, the corresponding thereby has six input nodes, two hidden nodes, and one out-
steady-state value of the quality variable, y, was computed. put node Ž NI s6, NH s 2, NO s1.; the corresponding values
The data set comprising design and operating variables forms of the learning rate Ž␩l . and momentum coefficient Ž ␣ m .
the network’s input space, and the corresponding y values were 0.7 and 0.01, respectively. An MLP network with good
represent the network’s desired Žtarget. output space. After function approximation and generalization abilities results in
normalizing and partitioning these data into the training set small but comparable RMSE values for both the training set
Ž40 patterns . and the test set Ž10 patterns ., an optimal MLP Ž Et r n . and the test set Ž Et st .. In the case of the MLP-based
network model was developed in accordance with the three- CSTR model, the Et r n and Et st magnitudes were 0.0061 and
step network training procedure described earlier. In the 0.0063, respectively. Additionally, values of the coefficient of

134 January 2001 Vol. 47, No. 1 AIChE Journal


correlation ŽCC. between the MLP-predicted and target y indicate that within a solution string each decision variable is
values were calculated. The CC values for the training and represented with five-bit precision. This precision can be en-
test sets were 0.999 and 0.998, respectively. It can be inferred hanced further by choosing the l chr and K values such that
from the very small magnitudes of the training and test set the ratio, l chrrK, attains a higher Ž ) 5. integer magnitude.
RMSES and the corresponding high Ž f1. CC magnitudes, For sampling values of the five operating variables from their
that the MLP network has excellently approximated and gen- respective PDFs, a statistically adequate sample size of 400
eralized the nonlinear relationship existing between its six in- measurements ŽBernardo and Saraiva, 1998. was utilized Žthat
puts and the single output. is, Nobs s 400.. The function used for computing the fitness
The generalized framework of the ANN-based robust opti- score of a candidate solution Ž ␰ i . was
mization aims at not only obtaining the optimal values of the
design and operating variables, but also the optimal values of 15,000
tolerances Ž E . associated with the process operating vari- ␰i s i
; is1, 2, . . . , Npop , Ž 26.
ables. Thus, for the CSTR case study, the overall decision 15,000qC yr
space denoted by vector x Žsee Eq. 18. becomes eleven-
dimensional Žone design variableqfive operating variablesq where C yr i
refers to the overall annual plant cost correspond-
five tolerances .; for clarity, the correspondence between x- ing to the ith candidate solution string. The functions used
vector elements and the CSTR variables has been tabulated for computing various components of the annual plant cost
in Table 1. Upon appropriate substitution from the notation are described in Appendix B.
given in Table 1 and the definition of the Taguchi loss func- In a typical plant, the controller action constraints an oper-
tion ŽEq. 12., the CSTR-specific form of the objective func- ating variable from deviating beyond a specific limit. Accord-
tion representing the total annual cost ŽEq. 8. can be written ingly, the tolerances Ž ⑀ F , ⑀ Q , ⑀CA0 , ⑀CB0 , ⑀ T 0 . for five operating
as variables were made to satisfy the following constraints: Ž1.
0.0001F ⑀ F F 0.15, Ž2. 0.0001F ⑀ Q F 0.15, Ž3. 0.0001F ⑀CA0 F
C yr sC eq p Ž V . qC o p Ž ␮ , E . qCc Ž ␮ , E . qC q Ž ␮ y , yU , ␴y . ,
0.1, Ž4. 0.0001F ⑀CB0 F 0.1, and Ž5. 0.0001F ⑀ T 0 F 0.02; these
Ž 25. constraints can also be expressed as inequality constraints.
Any candidate solution violating the constraints was penal-
where, ␮ y and ␴y , respectively, represent the mean and stan- ized during the fitness evaluation by resetting its fitness score
dard deviation of the quality variable, y, and yU denotes the to zero magnitude ŽGoldberg, 1989.. A more rigorous, that is,
desired value of y Ž s600 molrmin.. The five-dimensional penalty function ŽPF., approach can also be used for con-
vectors ␮ and E, respectively, representing the mean and straint handling. In the PF approach, the objective function
tolerance values of the operating variables are defined as: f Ž x . to be minimized is replaced by the penalty function, P Ž x .
ŽGoldberg, 1989; Deb, 1995.:
T
␮ s w ␮ F , ␮Q , ␮CA0 , ␮CB0 , ␮ T 0 x and
T P Ž x . s f Ž x . q Ý ␥ j0 ␬ h h j0Ž x . q Ý ␥ k 0 ␬ g g k 0Ž x . , Ž 27 .
Es w ⑀ F , ⑀ Q , ⑀CA0 , ⑀CB0 , ⑀ T 0 x .
j0 ko

GA-based optimization of CSTR


where j0 and k 0 denote the index of inequality and equality
For performing the GA-based optimization of the CSTR constraints, respectively; ␥ j0 and ␥ k 0 refer to the penalty co-
model, the following values of the GA-specific parameters efficients Žthese are usually kept constant throughout the GA
were used: K s11, Npop s 30, l chr s 55, Pcross s 0.95, Pmut s simulation.; h j0Ž x . and g k 0Ž x ., respectively, represent the in-
max
0.01, and Ngen s 250. The values of parameters l chr and K equality and equality constraints, and ␬ h and ␬ g describe the
penalty terms associated with h j0Ž x . and g k 0Ž x .. The penalty
terms can assume different forms, which are discussed in
Table 1. Equivalence Between Decision Vector greater detail in Deb Ž1995..
Elements and CSTR Variables The GA-optimized values of eleven decision variables and
Decision Decision Decision Corresponding corresponding costs are listed in Table 2 Žcolumn 1.. Addi-
Variable Vector Variable CSTR tionally, in Figure 4 the Gaussian PDFs defining the optimal
Index Ž k . Variables TypeU Variable space regions of the five operating variables Ž F, Q, CA0 , CB0 ,
1 x1 DES Volume, V Žm3 . and T 0 . are depicted in panels 4a to 4e, respectively.
2 x 2 Ž s ␮F . OPR Flow rate, F Žm3rmin.
3 x 3 Ž s ␮Q . OPR Heat removal rate, Q ŽkJrmin.
4 x 4 Ž s ␮C A0 . OPR Inlet concentration of A, CA0 Žmolrm3 . SPSA-based optimization of CSTR
5 x 5 Ž s ␮C B0 . OPR Inlet concentration of B, CB0 Žmolrm3 .
6 x6 Ž s ␮T 0 . OPR Inlet temperature, T 0 ŽK.
Implementation of the ANN-SPSA formalism was per-
7 x 7 Ž s ⑀F . TOL Tolerance for F Žm3rmin. formed using the following SPSA-specific parameter values:
8 x 8 Ž s ⑀Q . TOL Tolerance for Q ŽkJrmin. As 0.1, r s 20.0, Zs 0.02, ␩ s 0.602, ␤ s 0.101, and t max s
9 x 9 Ž s ⑀C A0 . TOL Tolerance for CA0 Žmolrmin. 32,000. In the SPSA procedure, it was ensured before objec-
10 x 10 Ž s ⑀C B0 . TOL Tolerance for CB0 Žmolrmin. tive function evaluation that those elements of the perturbed
11 x 11Ž s ⑀ T 0 . TOL Tolerance for T 0 ŽK. vector ˆ x t describing tolerances do fall within the specific lim-
U
Abbreviations: DES: design variable; OPR: operating variable; TOL: tol- its; if a tolerance value failed this test, then it was reset to its
erance of an operating variable. nearest limiting value ŽSpall, 1998b.. Alternatively, the more

AIChE Journal January 2001 Vol. 47, No. 1 135


Table 2. Comparison of Solutions Obtained Using RO Framework and Hybrid Methodologies
ANN Model Based on ANN Model Based on
Noise-Free Process Data Noisy Process Data
GA-Based SPSA-Based GA-Based SPSA-Based RO Framework
Solution Solution Solution Solution SolutionU
1 2 3 4 5
V Žm3 . 0.3428 0.3458 0.3726 0.3617 0.3463
F Žm3rmin. 0.5437Ž1"0.0936. 0.5506Ž1"0.0929. 0.5270Ž1"0.0649. 0.5096Ž1"0.0573. 0.5023Ž1"0.099.
Q ŽkJrmin. 262.253Ž1"0.0585. 208.878Ž1"0.0492. 684.137Ž1"0.0617. 611.213Ž1"0.0482. 146.7Ž1"0.080.
CA0 Žmolrm3 . 3,312.74Ž1"0.0314. 3,260.577Ž1"0.0200. 3,801.78Ž1"0.0219. 3,775.73Ž1"0.0122. 3,140 Ž1"0.050.
CB0 Žmolrm3 . 422.062Ž1"0.0505. 399.18Ž1"0.0507. 542.44Ž1"0.0543. 592.957Ž1"0.0380. 510.7Ž1"0.050.
T 0 ŽK. 310.444Ž1"0.0051. 310.82Ž1"0.0040. 304.42Ž1"0.0051. 305.414Ž1"0.0030. 313.8Ž1"0.005.
␮ y Žmolrmin.UU 599.12 601.39 599.85 600.18 600
␴y Žmolrmin. 11.48 8.54 11.26 7.0 16.8
C e q p Ž$ryr. 1,057.03 1,062.79 1,113.31 1,092.99 1,064
C o p Ž$ryr. 9,729.4 9,750.16 9,623.98 9,617.18 9,712
Cc Ž$ryr. 866.02 2,597.86 2,288.2 3,288.43 2,105
C q Ž$ryr. 2,201.02 489.77 828.32 320.61 1,835
Cyr 13,853.47 13,900.58 13,853.81 14,319.21 14,716
CPU time† 80.3 47.0 85.5 54.0 ᎏ
U
Obtained by Bernardo and Saraiva Ž1998 .. Solutions with respect to the operating variables are listed using ␮ nŽ1" ⑀ n . format.
UU
Desired ␮ y value Ž s y U .: 600 molrmin.

Seconds taken by 366-MHz Pentium-II CPU to arrive at the optimal solution.

rigorous penalty function approach described earlier ŽEq. 27. recorded value of the noise-corrupted steady-state measure-
can be used for handling constraints ŽWang and Spall, 1999.. ments may show a positive or negative deviation from its true
It was observed during implementation of the SPSA method- mean Žnominal set point.. The deviation magnitude, which is
ology that the proper choice of the SPSA parameters, namely variablerparameter-specific, is likely to vary from one run to
A, r, and Z, is a prerequisite to successful convergence. For another. This situation is different from the process uncer-
a judicious selection of the stated parameters, the reader may tainties that are caused by the random physical variations in
refer to several guidelines provided in Spall Ž1998a.. The re- the process variablesrparameters.
sults of the SPSA-based CSTR optimization are presented in For the present case study, we consider a scenario wherein
column 2 of Table 2, where it is seen that the SPSA-mini- steady-state values of all the monitored process variables are
mized annual total cost Ž$13,900.58ryr. is nearly equal to that corrupted, with noise obeying the Gaussian PDF. Accord-
given by GA-based optimization Ž$13,853.47ryr.. However, the ingly, the steady-state values of the CSTR’s design, operating,
control and quality cost values corresponding to the GA- and and quality variables obtained earlier by solving the phe-
SPSA-based solutions differ significantly. A high value for the nomenological model, were corrupted using the Gaussian
quality cost results when Ž1. the mean of the quality variable noise. The extent of measurement noise in each variable was
Ž ␮ y . deviates significantly from its desired magnitude, andror assumed to lie within "5% tolerance limit. Letting ␮ l be the
Ž2. the corresponding standard deviation value Ž ␴y . is high true steady-state value Žnominal setpoint. of the lth process
Žsee Eq. 12.. It is noticed in the GA-based optimization re- inputroutput variable, the corresponding standard deviation
sults that the ␴y value Ž11.48. is higher than the correspond- ␴ l , required for generating the noisy measurements, was
ing SPSA-based value Ž8.54.. As a result, the product quality computed as
will exhibit greater variability, eventually leading to higher
quality cost. In the case of the SPSA-based solution, it 0.05= ␮ l
is observed that the control cost has a higher magnitude ␴l s . Ž 28.
3.09
Ž$2,597.86ryr. as compared to the GA-based solution
Ž$866.02ryr.. By definition, the control cost is inversely pro-
All seven elements of the 50 patterns representing the CSTR’s
portional to the tolerance values Žrefer to Appendix B, Eq.
noise-free steady-state input᎐output data set were randomly
B4., since smaller tolerances necessitate stricter process con-
corrupted using variable-specific Gaussian mean Ž ␮ l . and
trol, thereby increasing the cost of control. This can be veri-
standard deviation Ž ␴ l . values. Specifically, a time series se-
fied from the tolerance values corresponding to the SPSA-
quence comprising one thousand noisy measurements was
based solution. It is observed that the optimized tolerances,
generated for each pattern element. The sequence obtained
0.049, 0.02, and 0.004, in respect of the process variables Q,
thereby was denoised using a nonlinear noise-reduction algo-
CA0 , and T 0 , are smaller as compared to those optimized by
rithm ŽKantz and Schreiber, 1997., and the resulting se-
the GA Ž0.059, 0.031, and 0.005.. Consequently, the control
quence was averaged out. The database obtained thereby
cost has assumed a higher value Ž$2,597.86ryr compared to
consists of 50 patterns representing noise-filtered steady-state
$866.02ryr..
values of the CSTR’s seven input᎐output variables. It is worth
pointing out here that even after noise-filtration, the resul-
CSTR optimization in the presence of noisy process data tant steady-state values do contain a small amount of resid-
Sensors monitoring process variables and parameters often ual noise. For creating training and test sets the noise-filtered
generate noisy measurements. Consequently, the mean steady-state database was normalized and partitioned in a 4:1

136 January 2001 Vol. 47, No. 1 AIChE Journal


0.01. For this network model, the RMSE magnitude for the
training set was 0.0134, and for the test set, the magnitude
was 0.0123. The corresponding CC magnitudes were 0.998
Žtraining set. and 0.996 Žtest set.. The RMSE ŽCC. values are
sufficiently low Žhigh. to infer that the MLP-network has cap-
tured well the inherent relationship between the CSTR’s
noise-filtered input and output variables. A comparison of
the Et r n and Et st values Ž0.0134 and 0.0123. pertaining to the
noise-reduced steady-state data with the corresponding ones
Ž0.0061 and 0.0063. when noise-free data were used for con-
structing an ANN model, reveals that the former set of RMSE
values is marginally higher. This indicates that the network
model has fitted the noise-filtered data with marginally lower
accuracy. It can be noted that the MLP-training procedure
utilized in this study ensures that the network is not an over-
fitted one. Due to avoidance of overfitting, the network has
not fitted the small amount of residual noise contained in the
noise-filtered data, but instead has approximated the under-
lying trends Žphysicochemical phenomena. therein. This in
turn has resulted in marginally higher RMSE values in re-
spect of the ANN model trained on the noise-filtered
steady-state data. A similar inference can also be drawn from
the lower CC values in respect of the predictions made by
the network trained on the noise-filtered data.
The MLP-network model just described was optimized us-
ing GA and SPSA formalisms; values of the various GA and
SPSA parameters used in the respective optimization simula-
tions were
䢇 GA: N
pop s 30, l chr s 55, Pcross s 0.95, Pmut s 0.01, and
max
Ngen s 250.
䢇 SPSA: As 0.08, r s 20, Zs 0.05, ␩ s 0.602, ␤ s 0.101,

and t max s 32,000.


The optimal solutions searched by the GA and SPSA meth-
ods are presented in columns 3 and 4 of Table 2, respectively.
In the case of nonlinear objective functions, the decision
surface can comprise several local minima with varying shapes
and sizes. Thus, for a problem involving function minimiza-
tion, it becomes important to obtain a solution that corre-
sponds to the deepest local or global minimum on the objec-
tive function surface. Stochasticity in the implementation
procedures of the GA and SPSA methodologies to some ex-
tent helps in achieving the stated goal. Nevertheless, it was
ensured during GArSPSA-implementation that the search
space is thoroughly explored. This was done by using differ-
ent pseudorandom number sequences Žgenerated by chang-
ing the random number generator seed. for initializing the
candidate solution population Žin the GA-based optimiza-
tion., and the guess solution vector Žin the SPSA-based opti-
mization.. Usage of different random initializations in essence
helps in exploring different subsets of the decision space,
Figure 4. GA-optimized probability density functions thereby locating the deepest local minimum on the decision
(PDFs) corresponding to five operating vari- surface. By mapping the decision surface, it is possible to ver-
ables. ify whether the optimization algorithm has indeed captured a
solution corresponding to the deepest local minimum. In the
present case study, it is not possible to view the surface
formed by the objective function, since the decision space is
ratio. Using these sets, an MLP-based optimal steady-state eleven-dimensional. We therefore resort to mapping the ob-
model was developed following the three-step training proce- jective function in single dimension only. For such a map-
dure elaborated earlier. The optimal network model compris- ping, the GA-optimized solution listed in column 3 of Table
ing 6, 2, and 1 nodes in its input, hidden, and output layers, 2 has been considered. Accordingly, values of the objective
respectively, was trained using ␩l and ␣ m values of 0.7 and function defined in Eq. 25 were evaluated by systematically

AIChE Journal January 2001 Vol. 47, No. 1 137


Figure 5. Effect of variation in a process variable on (1) mean value of the quality variable ( ␮ y , molr
rmin), and (2)
annual plant cost ( C y r , $r
ryr).
Panels Ža. ᎐ Žf. depict results corresponding to variations in V , F, Q, C A0 , C B0 , and T 0, respectively.

varying the magnitude of a design or an operating variable very close to their desired magnitude Ž600 molrmin.. Follow-
while maintaining values of the remaining ten decision vari- ing comparison with the RO solution ŽTable 2, column 5. ob-
ables at their optimum. The six panels in Figure 5 depict the tained by Bernardo and Saraiva Ž1998., the annual plant costs
effect of variations in V, F, Q, CA0 , CB0 , and T 0 on the values in respect to the GA-based Ž$13,853.47ryr, $13,853.81ryr. and
of C yr and ␮ y . It is seen in all six C yr profiles that a single SPSA-based Ž$13,900.58ryr, $14,319.21ryr. solutions are a few
minimum exists and that the GA-searched solution always lies percent lower than the corresponding RO solution value
at the valley bottom. In view of the efforts made toward lo- Ž$14,716ryr.. Such a reduction of C yr was brought about ei-
cating a deepest local or global minimum, it can thus be in- ther by the reduction in the control cost, Cc Žsee column 1 of
ferred that the GA was successful in fulfilling the objective. Table 2., or by the reduction in the quality cost, C q Žsee
columns 2, 3, 4 of Table 2.. It is noticed from the standard
deviations Ž ␴y . of the quality variable that their magnitudes
Discussion 11.48, 8.54, 11.26, and 7.0, pertaining to the solutions given
Upon examining the solutions given by the hybrid method- by the GA and SPSA methodologies, are smaller than the
ologies Žlisted in Table 2., it is observed that the mean values corresponding RO solution value of 16.8, although the re-
of the quality variable Ž599.12, 601.39, 599.85, and 600.18. are spective ␮ y values deviate marginally from their desired mag-

138 January 2001 Vol. 47, No. 1 AIChE Journal


in each case was not very different. This is in contrast to the
SPSA implementation, where 15᎐20 simulationsᎏeach time
resulting in a different solutionᎏwere needed to arrive at
the overall optimal solution.

Conclusions
To summarize, this article presents two process optimiza-
tion strategies combining an ANN-based process model with
stochastic optimization formalisms, namely, GA and SPSA.
The principal advantage of using neural networks for process
modeling is that the model can be developed exclusively from
process input᎐output data without invoking process phe-
nomenology. Having built an ANN model, its input space
comprising process input variables is optimized using the GA
and SPSA techniques. These optimization paradigms possess
positive characteristics, such as: Ž1. only objective function
measurements Žand not the measurements of objective func-
tion derivatives. are needed in their optimization procedures,
and Ž2. the paradigms can tolerate noisy objective functions.
It is necessary to point out at this juncture that the magni-
Figure 6. Comparison of PDFs pertaining to the quality
tudes of various algorithmic parameters utilized in the devel-
variable, y.
opment of the ANN models and implementation of the
Ža . GA-optimized solution using noise-free data Ž ␮ y s
599.12, ␴ y s 11.48 .; Žb . SPSA-optimized solution using GArSPSA methodologies are problem-specific and, except
noise-free data Ž ␮ y s 601.39, ␴ y s 8.54 .; Žc . GA-optimized for a few Žfor instance, ␩ and ␤ values of the SPSA algo-
solution corresponding to the noisy process data Ž ␮ y s rithm., must be selected heuristically. Notwithstanding this
599.85, ␴ y s 11.26 .; Žd . SPSA-optimized solution corre-
sponding to the noisy process data Ž ␮ y s 600.18, ␴ y s 7.0 .; fact, development of ANN-based process models is still an
and Že . RO framework solution ŽBernardo and Saraiva, 1998 . easier and more cost-effective task compared to the develop-
Ž ␮ y s 600.0, ␴ y s 16.8 ..
ment of phenomenological models. The efficacy of ANN-GA
and ANN-SPSA formalisms has been demonstrated by con-
nitude of 600 molrmin. These results suggest that for the case sidering a nontrivial optimization objective, which in addition
of the CSTR, there exists a trade-off between the mean and to the parameter design, also addresses the issue of tolerance
standard deviation values of the quality variable. The nature design. Thus, the ANN-model-based mathematical frame-
of this trade-off can be understood from Figure 6, wherein work required for fulfilling the stated optimization objective
the PDFs pertaining to the quality variable y are plotted. In has been formulated. A case study involving CSTR has been
the figure, the PDFs formed by the dashed lines correspond conducted for validating the optimization performance of the
to the solutions given by the GA and SPSA methods, whereas ANN-GA and ANN-SPSA strategies; the optimization objec-
the PDF formed by the continuous line refers to the RO so- tive considered was minimization of the CSTR’s total annual
lution. It is noticed in the figure that implementation of cost. In the case study, two ANN models were developed us-
GArSPSA-based solutions will result in the ␮ y values, which ing noise-free and noisy steady-state process data. It was ob-
are marginally different from their desired value of 600 served that both the ANN models possess closely comparable
molrmin. On the other hand, implementation of the RO so- data-fitting and generalization abilities. Input space of the
lution will result in a ␮ y value exactly equal to 600 molrmin. ANN models consisting of the CSTR’s design and operating
This, however, will be achieved at the cost of more widely variables was then optimized using the GA and SPSA meth-
spread steady-state values of the quality variable. ods; the tolerances associated with the operating variables
A peculiar feature of the GA and SPSA techniques, which were simultaneously optimized. The solutions obtained
is shared by most stochastic methods, is that the obtained thereby have been found to compare excellently with that
solution is influenced by the random number sequence used given by a robust deterministic optimization formalism. The
during their implementation. As a result, multiple optimiza- ANN-GA and ANN-SPSA approaches presented here are
tion runs, each time taking a different random number se- sufficiently general, and therefore can be employed for all
quence Žby changing the random number generator seed., kinds of process design and optimization problems. These
were performed to obtain an overall optimal solution. It is strategies become considerably simple to implement when the
seen from the CPU times consumed by the GArSPSA optimization objective involves only parameter design. In that
methodologies Žlast row of Table 2. that the SPSA procedure case, tolerances defining operating windows need not be de-
consumes less time Ž47 and 54 s. as compared to the time termined, thereby avoiding the usage of a sampling technique
taken by the GAs Ž80.3 and 85.5 s.. These values also suggest and associated numerical computations.
that implementation of hybrid formalisms is not computa-
tionally burdensome, even if multiple runs need to be per- Acknowledgment
formed. In the case of GA-based optimization, it took 10᎐15 One of the authors ŽS.N.. thanks the Council of Scientific and In-
runs to arrive at the overall optimal solutions reported in dustrial Research ŽCSIR., the Government of India, New Delhi, for
Table 2, although it was noticed that the converged solution a Junior Research Fellowship.

AIChE Journal January 2001 Vol. 47, No. 1 139


Literature Cited Dynamical Systems Using Neural Networks,’’ IEEE Trans. Neural
Networks, 1, 4 Ž1990..
Agarwal, M., ‘‘A Systematic Classification of Neural-Network-Based
Pistikopoulos, E. N., ‘‘Uncertainty in Process Design and Opera-
Control,’’ IEEE Control Syst., 26Ž2., 75 Ž1997..
tions,’’ Comput. Chem. Eng., 19, S553 Ž1995..
Bernardo, F. P., and P. M. Saraiva, ‘‘Robust Optimization Frame-
Pistikopoulos, E. N., and M. G. Ierapetritou, ‘‘Novel Approach for
work for Process Parameter and Tolerance Design,’’ AIChE J., 44,
Optimal Process Design Under Uncertainty,’’ Comput. Chem. Eng.,
2007 Ž1998..
19, 1089 Ž1995..
Bhat, N., and T. McAvoy, ‘‘Use of Neural Nets for Modeling and
Poggio, T., and F. Girosi, ‘‘Regularization Algorithms for Learning
Control of Chemical Process Systems,’’ Comput. Chem. Eng., 14,
that are Equivalent to Multilayer Networks,’’ Science, 247, 978
573 Ž1990.. Ž1990..
Bishop, C. M., ‘‘Neural Networks and Their Applications,’’ Re®. Sci.
Polifke, W., W. Geng, and K. Dobbeling, ‘‘Optimization of Rate Co-
Instr., 65, 1803 Ž1994..
efficients for Simplified Reaction Mechanisms with Genetic Algo-
Cartwright, H. M., and R. A. Long, ‘‘Simultaneous Optimization of
rithms,’’ Combust. Flame, 113, 119 Ž1998..
Chemical Flowshop Sequencing and Topology Using Genetic Al-
Ramasamy, S., S. S. Tambe, B. D. Kulkarni, and P. B. Deshpande,
gorithms,’’ Ind. Eng. Chem. Res., 32, 2706 Ž1993..
‘‘Robust Nonlinear Control with Neural Networks,’’ Proc. R. Soc.
Deb, K., Optimization for Engineering Design: Algorithms and Exam-
Lond. A., 449, 655 Ž1995..
ples, Prentice Hall, New Delhi Ž1995..
Rumelhart, D., G. Hinton, and R. Williams, ‘‘Learning Representa-
Diwekar, U. M., and J. R. Kalagnanam, ‘‘Robust Design Using an
tions by Backpropagating Errors,’’ Nature, 323, 533 Ž1986..
Efficient Sampling Technique,’’ Comput. Chem. Eng., 20, S389
Ž1996.. Spall, J. C., ‘‘A Stochastic Approximation Technique for Generating
Maximum Likelihood Parameter Estimates,’’ Proc. of the Amer.
Diwekar, U. M., and J. R. Kalagnanam, ‘‘An Efficient Sampling
Cont. Conf., AACC, Evanston, IL, p. 1161 Ž1987..
Technique for Optimization Under Uncertainty,’’ AIChE J., 43, 440
Ž1997a.. Spall, J. C., ‘‘Implementation of the Simultaneous Perturbation Al-
gorithm for Stochastic Optimization,’’ IEEE Trans. Aerosp. Elec-
Diwekar, U. M., and J. R. Kalagnanam, ‘‘An Efficient Sampling
tron. Syst., AES-34, 817 Ž1998a..
Technique for Off-Line Quality Control,’’ Technometrics, 39, 308
Ž1997b.. Spall, J. C., ‘‘An Overview of the Simultaneous Perturbation Method
for Efficient Optimization,’’ Johns Hopkins APL Tech. Dig., 19, 482
Diwekar, U. M., and E. S. Rubin, ‘‘Stochastic Modeling of Chemical Ž1998b..
Processes,’’ Comput. Chem. Eng., 15, 105 Ž1991..
Taguchi, G., Introduction to Quality Engineering, Amer. Supplier Inst.,
Diwekar, U. M., and E. S. Rubin, ‘‘Parameter Design Methodology
Dearborn, MI Ž1986..
for Chemical Processes Using a Simulator,’’ Ind. Eng. Chem. Res.,
Tambe, S. S., B. D. Kulkarni, and P. B. Deshpande, Elements of
33, 292 Ž1994..
Artificial Neural Networks with Selected Applications in Chemical En-
Freeman, J. A., and D. M. Skapura, Neural Networks: Algorithms,
gineering, and Chemical & Biological Sciences, Simulation & Ad-
Applications, and Programming Techniques, Addison-Wesley, Read-
vanced Controls, Louisville, KY Ž1996..
ing, MA Ž1991..
Tendulkar, S. B., S. S. Tambe, I. Chandra, P. V. Rao, R. V. Naik,
Garcia, S., and E. P. Scott, ‘‘Use of Genetic Algorithms in Thermal
and B. D. Kulkarni, ‘‘Hydroxylation of Phenol to Dihydroxyben-
Property Estimation: I. Experimental Design Optimization,’’ Nu-
zenes: Development of Artificial Neural-Network-Based Process
mer. Heat Transfer ŽPart A., 33, 135 Ž1998..
Identification and Model Predictive Control Strategies for a Pilot
Garcia, S., J. Guynn, and E. P. Scott, ‘‘Use of Genetic Algorithms in
Plant Scale Reactor,’’ Ind. Eng. Chem. Res., 37, 2081 Ž1998..
Thermal Property Estimation: II. Simultaneous Estimation of
Venkatasubramanian, V., and A. Sundaram, ‘‘Genetic Algorithms:
Thermal Properties,’’ Numer. Heat Transfer ŽPart A., 33, 149 Ž1998..
Introduction and Applications,’’ Encyclopedia of Computational
Garrard, A., and E. S. Fraga, ‘‘Mass Exchange Network Synthesis
Chemistry, Wiley, Chichester, U.K. Ž1998..
Using Genetic Algorithms,’’ Comput. Chem. Eng., 22, 1837 Ž1998..
Wang, I. J., and J. C. Spall, ‘‘A Constrained Simultaneous Perturba-
Gill, P. E., W. Murray, M. A. Saunders, and M. H. Wright, ‘‘User’s
tion Stochastic Approximation Algorithm Based on Penalty Func-
Guide for NPSOL: A Fortran Package for Nonlinear Program-
tions,’’ Proc. of the Amer. Cont. Conf., AACC, Evanston, IL, p. 393
ming,’’ Tech. Rep. SOL 86-2, Systems Optimization Laboratory, Ž1999..
Stanford University, Stanford, CA Ž1986..
Goldberg, D. E., Genetic Algorithms in Search, Optimization, and Ma-
chine Learning, Addison-Wesley, Reading, MA Ž1989.. Appendix A: CSTR Model Equations and Parameter
Hanagandi, V., H. Ploehn, and M. Nikolaou, ‘‘Solution of the Self-
Consistent Field Model for Polymer Adsorption by Genetic Algo-
Values
rithms,’’ Chem. Eng. Sci., 51, 1071 Ž1996.. The steady-state model for the nonisothermal CSTR,
Hecht-Nielsen, R., Neurocomputing, Addison-Wesley, Reading, MA wherein two first-order reactions, A™ B™C, take place, is
Ž1990..
Henson, M. A., ‘‘Nonlinear Model Predictive Control: Current Sta- given below. While Eq. A1 refers to the energy balance, the
tus and Future Directions,’’ Comput. Chem. Eng., 23, 187 Ž1998.. remaining two equations account for the material balances of
Hernandez, E., and Y. Arkun, ‘‘Study of the Control-Relevant Prop- components A ŽEq. A2. and B ŽEq. A3., respectively:
erties of Back-Propagation Neural Network Models of Nonlinear
Dynamical Systems,’’ Comput. Chem. Eng., 4, 227 Ž1992..
Holland, J. H., Adaptation in Natural and Artificial Systems, Univ. of ␳ c p F Ž T 0 yT . q k A0 exp Ž ᎐ EArRT . CˆA Ž ᎐ H1 . V
Michigan Press, Ann Arbor Ž1975..
Hornik, K., M. Stinchcombe, and H. White, ‘‘Multilayer Feedfor- q k B0 exp Ž ᎐ EBrRT . CˆB Ž ᎐ H2 . V ᎐ Qs 0 Ž A1.
ward Networks are Universal Approximators,’’ Neural Networks, 2,
359 Ž1989.. F Ž CA0 ᎐ CˆA . ᎐ k A0 exp Ž ᎐ EArRT . CˆAV s 0 Ž A2.
Hugget, A., P. Sebastin, and J. P. Nadeau, ‘‘Global Optimization of a
Dryer by Using Neural Networks and Genetic Algorithms,’’ AIChE F Ž CB0 ᎐ CˆB . q k A0 exp Ž ᎐ EArRT . CˆAV
J., 6, 1227 Ž1999..
Hunt, K., D. Sbarbaro, R. Zbikowski, and P. Gawthrop, ‘‘Neural
Networks for Control SystemsᎏA Survey,’’ Automatica, 28, 1083 ᎐ k B0 exp Ž ᎐ EArRT . CˆBV s 0. Ž A3.
Ž1992..
Kantz, H., and T. Schreiber, Nonlinear Time Series Analysis, Cam- For generating representative steady-state data comprising
bridge Univ. Press, Cambridge, U.K. Ž1997.. 50 input᎐output patterns, the following parameter values
Nahas, E., M. Henson, and D. Seborg, ‘‘Nonlinear Internal Model
Strategy for Neural Network Models,’’ Comput. Chem. Eng., 16, were considered: EA s 3.64=10 4 Jrmol, EB s 3.46=10 4
1039 Ž1992.. Jrmol, k A0 s8.4=10 5 min ᎐ 1 , k B0 s 7.6=10 4 min ᎐ 1 , H1 s
Narendra, K., and K. Parthasarathy, ‘‘Identification and Control of ᎐2.12=10 4 Jrmol, H2 s᎐6.36=10 4 Jrmol, ␳ s1,180 kgrm3,

140 January 2001 Vol. 47, No. 1 AIChE Journal


c p s 3.2=10 3 JrŽkg ⴢ K., and Rs8.314 JrŽmol ⴢ K.. The values 3. The overall control cost Ž Cc . is obtained by summing
of other model parameters, namely, V, F, Q, CA0 , CB0 , and T 0 , the control cost contributions of five operating variables Žsee
describing design and operating variables were randomly Eq. 15.. Using as143.2 and bs1.736, Cc has been evalu-
chosen within the ranges specified in the main text. ated as

Appendix B: Functions Used for Evaluating ␮F ␮Q ␮CA0 ␮CB0 ␮T 0


Components of the Annual Plant Cost ( C y r .
Here, only the final expressions used for computing vari-
Cc Ž $ryr . s 5aq b
ž ␴Q
q
␴F
q
␴CA0
q
␴CB0
q
␴T 0 /
ous CSTR costs have been given. For more details, the reader
may refer to Bernardo and Saraiva Ž1998.. x2 x3 x4 x5 x6
1. The equipment cost C eq p is computed by using volume
Ž V, m3 ., as given by:
s 716q 1.736=3.09
ž x7
q
x8
q
x9
q
x 10
q
x 11 / ,

Ž B4 .
0.6227
V
C eq p Ž $ryr . s 4,199.55 ž /

. Ž B1 .
where ␮ n and ␴n refer to the mean and standard deviation
of the PDF pertaining to the nth operating variable.
2. The operating cost Ž C o p . includes the utility cost Ž Cut il . 4. The quality cost has been computed using Taguchi loss
and the pumping cost Ž Cpump .. The Cu t i l value is calculated function ŽTaguchi, 1986. given as
using heat recovery rate, Q ŽJrmin. and Q N Ž s 2.54=10 7
Jrmin., according to
C q Ž $ryr . s k l Ž ␮ y ᎐ yU . q ␴y2 ,
2
Ž B5 .
4 2
Cut il Ž $ryr . s1.145 7,896y6,327 Ž QrQ N . q4.764=10 Ž QrQ N .

y1.022=10 4 Ž QrQ N . ,
4
Ž B2 . where ␮ y and ␴y denote the mean and standard deviation,
respectively, of Nobs number of quality variable values,  y4,
obtained using the ANN-based CSTR model; yU refers to
and Cpump is evaluated using the flow rate Ž F, m3rmin., as
the desired value Ž600 molrmin. of the quality variable, y;
given by
and k l Ž s6.536. is the loss coefficient.
0.8050
Cpump Ž $ryr . s13.8831 Ž 264.2 F . . Ž B3 . Manuscript recei®ed Jan. 5, 2000, and re®ision recei®ed June 7, 2000.

AIChE Journal January 2001 Vol. 47, No. 1 141

You might also like