You are on page 1of 13

Minerals Engineering 16 (2003) 441453 This article is also available online at: www.elsevier.

com/locate/mineng

Local models for soft-sensors in a rougher otation bank


G.D. Gonzalez
a

a,*

, M. Orchard a, J.L. Cerda a, A. Casali b, G. Vallebuona


Received 8 November 2002; accepted 14 January 2003

Department of Electrical Engineering, University of Chile, Av. Tupper 2007, P.O. Box 412-3, Santiago, Chile b Department of Mining Engineering, University of Chile, Av. Tupper 2069, Santiago, Chile

Abstract Starting from a general approach for dynamic modelling, several classes of local dynamic models for soft-sensors are used to model the concentrate grade in a rougher otation bank. Among the non-linearbut linear in the parametermodels are nonlinear ARMAX, Takagi and Sugeno, fuzzy combinational, projection on latent states (PLS) and wavelet based models. The fully non-linear dynamic model studied is a multilayer perceptron. The models are identied using actual rougher plant data. This data which is very noisyis rst analysed in order to detect apparent sporadic short term failures of the sensor system for measuring the concentrate grade and then to repair the failed measurements. The models are determined using an identication (training) data set. The root mean square error and the correlation coecient are used to compare model performances using validation and cross validation data sets. Results show that the best dynamic models is PLS, followed by perceptron and wavelet based models. Nonlinear ARMAX, fuzzy combination and Takagi and Sugeno dynamic models give somewhat larger errors. 2003 Elsevier Science Ltd. All rights reserved.
Keywords: Dynamic modelling; Froth otation; Neural networks; Process instrumentation

1. Introduction Local dynamic models which are accurate around operating points are useful for the design of soft-sensors and for model-based control strategies. Soft sensors (Gonzalez, 1999) are an important part of process instrumentation schemes, because they provide a back-up for real sensors whenever they become unavailable due to failure, maintenance or repairs. The main advantage of local models is their relative simplicity, since they have few terms and parameters but, because of this simplicity, in general they are only valid for a region in a vicinity of the operating point for which they have been identied. As a consequence they require updating whenever the operating point experiences a deviation of a certain magnitude. But if these models satisfy certain requirementssuch as linearity in the parametersthey may be rapidly updated as operating points change. Furthermore, these linear in the parameter models may incorporate non-linearities in their inputs. In this way their region of validity may increase and become better

Corresponding author. E-mail address: gugonzal@cec.uchile.cl (G.D. Gonzalez).

models for representing non-linear systems than if only the plant measurements are used as inputs to the model (Casali et al., 1998). In order to build a model for a variable of a system or plant all the available information should be considered. The two main sources that provide relevant information are: (i) inputoutput data from experiments performed on the plant and (ii) physical or structural insight on plant characteristics. In addition, the class of model to be identied with plant data must be decided upon, mainly on application oriented grounds. Among the kinds of dynamic models are input/output models (e.g., ARMAX and NARMAX models), state/state-output models, neural network models, fuzzy models, etc. All these model classes may be grouped into broader classes, depending on the knowledge of the plant charberg et al., acteristics which is incorporated in them (Sjo 1995). On the one hand are phenomenological (white box) models, in which the knowledge about the system is such that the model may be entirely constructed using physical insight and prior knowledge. On the other hand are black box (empirical) models where no physical insight is available (or used) and where several model subclasses may be employed, characterised by their structure, ability to represent non-linearities, ability to permit

0892-6875/03/$ - see front matter 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0892-6875(03)00021-9

442

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453

relatively easy estimation of parameters, etc. In between are grey models combining in varying proportions both physical insight and black box modelling. An example of this class of model may be found in (Casali et al., 2002) where phenomenological knowledge is combined with black box sub models in order to develop a model for a rougher otation bank. Further examples are given by berg et al. (1995), Ljung (1987) and Gonzalez (1999). Sjo Some grey models include inputs, in general nonlinear, incorporating phenomenological or structural knowledge about the plant. These inputs are combinations of measurements performed on the plant (Casali berg et al. (1995) calls et al., 1998; Gonzalez, 1999). Sjo this class of grey models semiphysical models. The advantage of semiphysical models is that, in addition to the fact that they incorporate phenomenological knowledge, they may have a structure that is linear in the parameters and thus may be treated using linear regression methods for which even explicit solutions are possible in most cases. Then the determination of the model structure and the estimation of its parameters is greatly simplied as compared with the case of other classes of non-linear models, such as neural networks. In this latter case, for example, genetic algorithms could be used to nd the best structure, but the time required to obtain a solution tends to be extremely large, as compared with the case of models that are linear in the parameters, for which linear regression methods may be successfully employed. On the other hand, the non-linear approximation characteristics of neural networks may cause the model to be valid for larger regions around operating points and there may be no need of on line parameter (weight) updating. This is desirable, because this updating may be impractical because of the time required. Flotation plant models incorporating a large proportion of physical knowledge have been developed (Casali et al., 2002; J ams a-Jounela and L attil a, 1995) and, since they also incorporate black box sub models, they belong to the grey model class. Due to the complexity of the otation process, tting these models to actual plant data becomes very dicult. However, in Casali et al. (2002) the problem is reduced because the model was explicitly built with the requirement that this tting should be simplied. Nevertheless, the models are too complex to be used in the design of soft-sensors or automatic control systems. Although they may exhibit good qualitative characteristics in a large region of op-

eration, their quantitative performance tends to be insucient for such on-line applications. On the other hand, since local models valid around operating points are considerably simpler, their structure may be comparatively easily determined o-line and their parameters estimated either o-line or on-line. Development of such local dynamic modelshaving a non-linear ARMAX structurefor a rougher otation bank is found in Casali et al. (2002). These models were then used to help the tting of a (mostly) phenomenological dynamic model. In this paper soft sensor models for the concentrate grade in a rougher otation bank are identied using the modelling approaches mentioned above, for a given set of plant measurements, and the results are compared.

2. Experimental campaign In this paper, operational on-line data from the rougher circuit of the Andina copper otation plant, before expansion, has been used. Andina is one of the Divisions of CODELCOChile whose concentrator has been expanded from 32,000 to 64,000 t/d of a copper ore, mainly chalcopyrite, with an average feed grade of around 1% copper. Fig. 1 shows a diagram, including instrumentation, of one of the ve rougher otation banks. Each bank consists of 9 OK-38 cells, in arrays of 234 cells. Experimental data was collected for the circuit shown in Fig. 1, over a period of about 20 h, in order to cover uctuations in all the measured variables related to the rst array concentrate grade gcc . The measured variables are: rich concentrate copper grade gcc [%], feed copper grade gcf [%], feed iron grade gff [%], fresh ore feed rate Fsf [t/min], slurry level in the rst 2 cells array of the otation bank Lp [m] and feed solids concentration by weight Csf [0 /1]. In order to include combinations of measured variables having physical meaning, three combined variables were considered: the rst array residence time, s [min], the copper feed rate, Fcf [t/min], and the iron feed rate, Fff [t/min]; which are expressed in terms of the measured variables as follows: s k1 Lp Csf ; Fsf 1 0:63Csf 1

Fsf
Grinding Plant

Lp
Tailings Rougher flotation bank

g cf g ff Csf

gcc
Concentrate

Fig. 1. Rougher otation bank: owsheet and instrumentation.

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453

443

Fcf k2 Fsf gcf ; Fff k3 Fsf gff ;

2 3

where k1 is a constant depending on the cell dimensions and the number of cells, while k2 k3 0:01.

[1; 70] [ [891; 1420]. The model structure and parameters are rst determined with the ID data set and tested in the VAL data set. After that a cross validation is performed in which the model structure already determined is kept. Then the model parameters are determined in the VAL data set and the model is validated using the ID data set.

3. Data analysis 3.1. Detection of operating regions Fig. 2 shows the concentrate grade gcc (ne trace) measurements in experimental data. It is apparent that a change in the operating region of the plant takes place approximately during the time interval [600; 800] min. In order to enhance this fact a Haar wavelet decomposition of gcc (Burrus et al., 1998; Misiti et al., 1996) was made. It was found that the reconstructed signal ro , based on the Haar scaling function at resolution depth 7, shows the operation point change with greater clarity, as may be seen in Fig. 2 in coarse trace. Also shown is the average value 19.7% of gcc . It may also be observed that apparently another change has taken place at about t 130 min, but there is not enough data from 1 to 130 min to be sure of this fact. In any event, it may be seen that gcc > 19:7 could dene an operating region, and gcc < 19:7, another. In order to cover most of the range of the concentrate grade gcc for model building and considering the operation change described above, the data setrunning from t 1 to 1420 minwas partitioned into an identication data set for data in time interval ID [71; 890] and a validation data set for data in VAL 4. Detection of data outliers Outlier detection is used in order to avoid an erroneous determination of the models. It basically consists of a two stage statistical procedure that is able to identify and characterize the main disturbances that aect the analysed process. The rst stage uses principal component analysis (PCA) in order to identify main disturbances. Then each identied disturbance is analmez et al., ysed in a second stage where Sche es Test (Go 1996) is applied to every process variable. The purpose of this second stage is to detect data outliers by means of statistical criteria. 4.1. Principal component analysis PCA considers all predominant sources of variability in a data set. In addition, the inuence of measurement noise is reduced. For these reasons PCA can be used to obtain a statistical identication of the main disturbances that aect the behavior of the predicted variable in the data set (Hodouin et al., 1993), in this case the concentrate grade gcc . Analysis of the otation plant data show that ve principal components (PC) are

22

C opper C onc entrate Grade g

cc

[% ]

21

20

19

18

17

16 200 400 600 800 Time [min] 1000 1200 1400

Fig. 2. Concentrate grade gcc (ne trace), Haar wavelet reconstruction with level 7 scaling function (coarse trace) and concentrate grade mean value 19.7 (dashed line).

444

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453

enough to account for about 95% of the gcc variance (Fig. 3). As an example, a score map (projection of measurement data into the orthogonal PC basis) for the two main PCaccounting for about 60% of the gcc varianceshows that some data is projected outside the 99% condence ellipse (coarse line in Fig. 3). In such case the process shall be considered disturbed during those time instants by an unknownunmeasured disturbance. An index Ti measures the distance of the score from the pattern mean value (geometric center of ellipse). When Ti is greater than 1, the corresponding score is

outside the 99% condence ellipse. Fig. 4 summarizes the information by showing both index Ti (shown as Ti2 ) and the concentrate copper grade gcc versus time. As seen in Fig. 4, there are moments when process disturbances do not aect the behavior of gcc , i.e., when the Ti index is not exceeded due to gcc . Disturbances may aect the entire process or just some of its variables. However, if only one variable is being aected in a complex process then the detected disturbance is probably an outlier. Thus, it is important to dene which measurements are being aected during a disturbance. This task can be made through the application of the Sche es test to all the process variables.

Fig. 3. PCs analysis. (a) Percentage of explained variance according to the number of utilised PC. (b) 95% and 99% condence ellipses for data scores.

5 4 3 T2
i

2 1 0 200 400 600 800 1000 1200 1400

22 20 [% ] g 16 14 200 400 600 800 1000 1200 1400 Tim e [m in]


cc

18

Fig. 4. Detected process disturbances. (a) Hotellings test for otation data. (b) Concentrate copper grade data (time instants associated with detected disturbances marked with M).

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453

445

s test 4.2. Schee mez et al., 1996) is used to identify Sche es test (Go which variables are indicating a possible source of disturbance in a system of m measured signals. It is based on the analysis of each process measurement in a sliding window of N samples, where the deviation of the measured output process variables from a reference value are analyzed. The deviations are usually called residuals: ri k yi k yi k : 4

s test 4.3. Detection of data outliers based on Schee results Sche es test is used to detect disturbances in each of the m 7 measurements of the otation bank data set. The reference signals used to obtain the residuals are the variables mean values in a 30 min sliding window (1 min sample time, N 30; Nm 80). Therefore, a disturbance in a variable will be detected if its Sche e index is 95 greater than the limit F70;:23 2:44 (condence interval of 95%). The comparison of Sche es indexes of all variables can generate information about outliers in the measured concentrate grade gcc . This is based on the assumption that there is a very low probability for gcc signal to suer a disturbance when the rest of the process variables remain undisturbed. Therefore, if the Sche es test for gcc detects a disturbance while it indicates that the rest of the process measurements remain undisturbed, then the measured data for gcc can be considered as an outlier that must be repaired. From Fig. 5, where experimental data appear in ne trace and Sche e index in coarse trace, it can be noted that approximately at 800 min, there is an important disturbance in gcc while the rest of the measurements are undisturbed. Therefore at that instant the data set has an outlier in gcc and it must be removed. After that the repaired data may be used for model identication.

The reference signal could be a control loop set point, a predictive model output or the mean value of the signal in a another sliding window of Nm samples, with Nm P N . Each residual is processed separately to obtain m indexes foi :  2 ri k N N m ; 5 foi k ri k N 1m where ri : mean value of residual i, ri : standard deviation of residual i, N : number of samples of the sliding window, m: number of process measurements. The absence of disturbances in foi with a condence degree a, for m variables and N samples is given by:
a foi 6 Fm ;N m ;

a where Fm ;N m is a threshold obtained from a Fisher probability distribution function.

[% ] g
cc

5 0 -5 0 6 4 2 0 -2 0 8 6 4 2 0 -2 0 8 6 4 2 0 -2 0 8 6 4 2 0 -2 -4 0 8 6 4 2 0 -2 0 200 400 600 Tim e [m in] 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400

Fig. 5. Experimental data (ne trace), Sche e index (coarse trace) and outlier detection in copper grade gcc .

[min]

sf

[t/m in]

sf

[o/1]

ff

[% ]

cf

[% ]

446

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453

5. Flotation bank models The problem at hand is to develop a soft-sensor to replace the concentrate grade measurement gcc by its ^cc given by the output of a dynamic model prediction g having as inputs some or all of the measured variables and possibly combinations among them having physical or structural meaning. 5.1. Model variables Fig. 6 shows the dierent variables that are involved in the modelling process of a plant, in particular, of the rougher otation bank to be modelled. The purpose is to obtain a model for the output variable y t at time t in terms of an input measurements vector xt such that a certain criterion of good t between the plant output ^t is satised. The meay t and the model output y surements vector includes: measurements of plant input variables ut (i.e., controls or manipulated variables and measured disturbances), and their delays: U t u1 tu1 t 1 u2 tu2 t 1 ur tur t 1 ; other plant outputs Pt and their delays: gt g1 tg1 t 1 g2 tg2 t 1 gq tgq t T ; 8 delayed plant outputs containing previous measurements of the plant output y t: Yd t y t 1y t 2 y t d :
T T

In the particular case of the otation bank dynamic soft-sensor model, The variable to be modelled is y t gcc t, the concentrate grade, hence: Yd gcc t 1gcc t 2 gcc t dgc T : 11

The commandor manipulated variableis the pulp level Lp in the bank, so that Lp t and delayed versions are elements of U t. Other manipulated variables are collector ow and frother ow, but they are usually determined by ratio controllers where the ratio of these ows to the solids feed is kept constant. The measured disturbancesi.e., those variables that aect the concentrate grade and are measured, but over which no action may be taken, from the point of view of the otation plantare the copper and iron feed grades gcf ; t and gff t, the solids feed ow rate Fsf t, and the feed solids concentration Csf t, therefore U t Lp tLp t 2 Lp t dL ; gcf ; tgcf ; t 1 gcf ; t dgc ; gff tgff t 1 gff t dgf ; Fsf tFsf t 1 Fsf t dF ; Csf tCsf t 1 Csf t dC T : 12

Then, the vector xt of inputs to the dynamic model may be expressed as 2 3 x1 2 3 U t 6 x2 7 6 7 10 x t 4 P t 5 6 . 7: 4. . 5 Yd t xq

Several unmeasured disturbances act upon the concentrate grade, such as liberation, particle size distribution, mineralogy, concentration of contaminants in the water, etc., but they cannot be used as inputs to the model since they are not measured. Therefore, a list of 30 variables was set up as candidates to be model inputs. It contains the measured variables and delayed values of them, including gcc with delays from 1 to 4 min. In addition, it contains nonlinear combinations of the measured variables having physical meaning: residence time s, feed copper ow rate Fcf , and feed iron feed rate Fff , and their delays from 1 to 4 min.

PLANT

y(t)

(t)
U(t)
delays

U(t)

(t)

Yd(t)

MODEL as Soft-Sensor

(t) y

y (t)
MODEL

d (t) Y

delays

Fig. 6. Left: Identication of a dynamic model for y t using controls and measured disturbances ut, other plant outputs gt, and delayed plant outputs y t d . Right: Use of the dynamic model as soft-sensor in the absence of measurement y t due to unavailable sensor signal.

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453

447

The performance index to be used is the root mean square (RMS) error between the actual concentrate ^cc : grade gcc and the model prediction g v q u N u1 X ^cc 2 g  t ^cc i2 ; 13 J Efgcc g gcc i g N i1 where the expected value has been approximated by the time average from time i 1 to N . 5.2. A general class of models Several black box and grey box dynamic models that are derived here for a rougher otation bank may be viewed form the general model class proposed by berg et al. (1995). The models in this class for several Sjo model input variables and one output variable may be expressed as: n X ^ y ak Gk uk x; t; bk ; ak ; 14
k 1

developed here the particular form of the mappings u and Gk shall be dened. By appropriate selection of functions Gk and of the input transformation vector u, several dierent models may be represented, e.g., perceptrons and linear in the parameter models, such as ordinary least squares models, PC regression models, projection on latent states models and fuzzy models. 5.3. Linear-in-the-parameter dynamic models Models in which neither uk x; t nor Gk uk ; bk ; ck include parameters which are used in the minimisation of the cost function J i.e., because they are givenare of particular interest. In such a case Eq. (14) takes the form of a model which is linear in the parametersalthough it may be non-linear in the variables xi and has the following form: n X ^ y ak Gk uk : 16
k 1

x x1 x2 xq :

15

In the general model structure (Eq. 14), uk x; t is a transformation of the input variables in x, which may take dierent forms. In this transformation t is a vector of parameters predetermined, or to be determined. The transformation uk x; t may involve simply deleting some of the measurements, e.g., because they have small correlation with y t, or because they have a large correlation with one or more of the other inputs. It may also represent transformations of the input variables into their PCs. Also, it may contain combinations of the x0i s which have some physical signicance (e.g., residence time volume divided by ow). Furthermore uk x; t may include all of the above transformations and others in addition. Functions Gk relate the transformed inputs ^cc (Fig. 7). Transformations ^t g to the model output y Gk are functions of parameters bk and ck predetermined or to be determinedas well as of the transformed input variables uk (Fig. 7). For each model

Since the uk are then known values determined from measurementsand hence, in this case, the Gk are also knownusual linear regression methods may be employed in the model determination. In particular, the determination of parameters ak has considerable advantages in terms of time and use of computational resources. The ARMAX (Ljung, 1987), PCR (Mardia et al., 1994), Takagi and Sugeno (1985), Combinational, PLS (Hodouin et al., 1993), and wavelet otation bank models developed here are of this kind. On the other hand, the neural network model is non-linear in the parameters because in such case Gk (uk ; bk ck ) are sigmoid functionshence, non-linearand parameters (weights) bk and ck are used to minimise the cost function J . 5.3.1. Non-linear ARMAX model A selection of the 30 candidate variables was made using the stepwise regression method (Haber and

1 uk-1 uk-q yk-1 yk-q vk-1 vk-q m

G1

1 2

G2

= G() y

Gn

Fig. 7. Schematic representation of the general model form Eq. (14).

448

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453

Unbehauen, 1990). The resulting model identied with the identication data set is ^cc k 1 0:11g ^cc k 4 1:22gcf k ^cc k 0:85g g 0:078gcf k 5 0:46gff k 0:22sk 1 0:10sk 3 0:21Lp k 6:06; 17 where s is the non-linear component. Evaluation of the model in the validation data set gave a root means square (RMS) error of 0.385 and a cross validation RMS error of 0.821. 5.3.2. Simplied non-linear ARMAX model The best ARMAX model determined in Casali et al. (2002) was identied using the identication data set dened here. The resulting model is ^cc k 0:85g ^cc k 1 0:12g ^cc k 4 g 5:75gcf k 1:24gff k 1:09gff k 5 0:14sk 13:8: 18 When evaluated with the validation data this model gives an RMS error of 0.390 and a cross validation RMS error of 0.824. 5.3.3. Fuzzy combination models The graph of gcc in Fig. 2 leads to the assumption that there are two operating points: one corresponding to data approximately in time interval [1; 600] and the other one in [800; 1420] with a transition between them. In order to substantiate this qualitative observation, the fuzzy C-means algorithm of the MATLAB Fuzzy Logic Toolbox (Mathworks, 1998) was applied to grade gcc (Fig. 2) in order to divide the data into two fuzzy sets or clustersC1 and C2 for gcc dened by membership functions l1 (gcc ) and l2 (gcc ). Fig. 8 shows the two membership functions, which intersect at 0.5 for grade
Fig. 9. Relative membership functions w1 gcc ne dash traceand w2 gcc coarse trace.

gcc 19:5. Therefore, the membership of grades gcc P 19:5 is greater to cluster 2 than to cluster 1; and conversely for grades gcc < 19:5. Fig. 9 shows the relative membership functions w1 gcc corresponding to cluster C1 and w2 (gcc for cluster C2 , dened by l1 gcc and l1 gcc l2 gcc l2 gcc w2 gcc ; l1 gcc l2 gcc w1 gcc so that w1 gcc w2 gcc 1: 20 It may be observed in Fig. 9 that for time interval [600; 1420] gcc belongs mostly to cluster 2, while for [1; 600) there is no such clear cut belonging of gcc to C1 or C2 . This result is a consequence of the fact that gcc in [1, 600) has values greater and lesser than 19.5, while in [600; 1420] gcc is mostly greater than 19.5. This fact is also apparent in Fig. 2 where the separation into two sets according to grade gcc has been made through wavelet analysis. This analysis suggests that there should be two different models: one for cluster C1 with output yC1 and one for cluster C2 with output yC2 and that their outputs should be combined according to their membership functions to give the model output y , i.e., ^cc k w1 gcc yC1 k w2 gcc yC2 k : g 21

19

Fig. 8. Membership functions l1 gcc to cluster 1 (squares) and l2 gcc to cluster 2 (circles) obtained by applying fuzzy c-means algorithm to gcc .

Unfortunately the number of data points in C1 , for the available data base has proven to be too small to build a good model for such cluster. In order to solve this problem, models were identied in two intervals of the identication set [71; 890], instead of in two clusters: model M1 for interval I1 71; 700, and model M2 for interval I2 271; 890. The interval I2 was obtained by trial and error, and is justied by the fact that there are several data points having gcc > 19:5 for t < 600, as may be seen in Fig. 2. The models for the two intervals, obtained using stepwise regression are

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453

449

Model M1 : y1 k 11:9gcf k 6:11gff k 0:32sk 1 0:29sk 2 0:25sk 3 0:29Fcf k 1 8:02Csf k 5 48:3: Model M2 : y2 k 18:2gcf k 10:3gcf k 5 4:38gff k 2:76gff k 5 0:40sk 1 0:38sk 2 0:34Fsf k 2 4:75Csf k 5 34:3: 23 22

In both cases the relative error of parameter determination was below 0.5. When tested in the validation interval VAL 1; 70 [ 891; 1420, the RMS error for this fuzzy combination model is 0.380. The cross validation could not be performed due to the small size of cluster 1. 5.3.4. Takagi and Sugeno model A standard Takagi and Sugeno model (Takagi and Sugeno, 1985) was obtained by dening two clusters using 26 candidate variables (not including the four autoregressive variables). In this way membership functions were obtained for each variable with respect to each of the two clusters. Then a new set of 52 candidate variables was formed by multiplying each candidate variable by its relative membership to cluster C1 and to cluster C2. Selection of the best variables to be inputs to the T&S model was then performed as in previous cases, using a forward inclusion procedure. The resulting model is: y k 0:39ws;1 sk 1 0:12ws;1 sk 3 1:96wFcf 1 Fcf k 3 0:792wFcf 1 Fcf k 5 wFff 1 Fff k 2 5:63wCsf 1 Csf k 3 62:5wgcf 2 gcf k 22:6wgff 2 gff k 9:28wgff 2 gff k 5 1:21ws2 sk 2 1:20wFcf 2 Fcf k 3 8:90wCsf 2 Csf k 39:4wCsf 2 Csf k 5 25:9: 24

highly correlated input variables and a considerable amount of available data. Conceptually, the PLS algorithm nds a latent basis in the input variables space such that the resulting latent vectors are more suitable for predicting the output variable. Therefore the process model can be identied over a reduced dimension space (the latent structure) that condenses all the information available in the data set. In order to apply the algorithm, it is necessary to form an input variable matrix Xdata , which contains input and delayed output process variables, and an output variable matrix Ydata . It is advisable to statistically normalize matrices Xdata and Ydata (i.e., to scale each column by subtracting its mean value and dividing it by its standard deviation) in order to deal with numbers having homogeneous magnitudes. The scaled matrices shall be denoted by X and Y respectively. PLS generates the latent vectors (t and u) for the X and Y spaces through an iterative algorithm (see appendix), such that the covariance between the latent structures of X and Y is maximized. Therefore, the matrices X and Y are represented (for A latent vectors) as: X
A X i1

ti piT Ex A and Y

A X i1

t i cT i E y A :

25

The nal model relates the scaled matrices X and Y through the following expression: Y XB; B w1 wA p1 pA w1 wA
T 1

c1 cA : 26

The PLS procedure explained above was used to obtain a dynamic model for the scaled value of the copper grade gcc as a function of the scaled values of the rest of the process variables (for determined time delays). The selection of the appropriate input variables for the model is made according to the following procedure: Several PLS models were adjusted in order to explain variable gcc (single-column Y matrix) as a function of just one process variable (single-column X matrix). Dierent delays for each variable can be considered, but autoregressive variables must not be used. The PLS model with the minimum residual variance will dene the rst input variable of the dynamic model (with the corresponding delay). The procedure mentioned above is repeated subsequently, by adding a new column to the X matrix each time which represents a new possible model input variable. Thus, a forward inclusion criteria is used to select all input variables of the predictive model. The procedure stops when a new input variable

In Eq. (24) the relative membership functions have been denoted by wnj , where n refers to the variables involved while j 1, 2, to cluster 1 or cluster 2. Once again the relative error is less than 0.5 for all but model component, wCsf 2 Csf k , where it is 0.58. Evaluation of this model in the validation set gives an RMS error of 0.394. Cross validation could not be performed due to the small size of cluster 1. 5.3.5. Projection to latent structures Projection to latent structures (PLS) analysis provides a valuable tool for model identication in multivariable processes, specially when there is a large number of

450

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453

decreases the residual variance less than a 3% of the variance of the original Y matrix. Finally, the last input variable of the model is selected as the autoregressive variable that minimizes the nstep RMS error between the real and the predicted value of copper grade gcc . Table 1 shows the selected model input variables for the dynamic modelaccording to their entry order and the coecients associated to each variable of parameter vector B. The PLS model performance was among the ones giving the best results with a validation RMS error of 0.369 and a cross validation RMS error of 0.537. The comparison between the actual and predicted grade gcc in the identication set ([71; 890]) and in the validation set 1; 70 [ 891; 1420 is shown in Fig. 10. 5.3.6. Wavelet based model The discrete wavelet transform using the Haar scaling and wavelet functions was applied to all variables, which are candidates to become inputs to the model (see above), including delayed versions, as well as to the concentrate grade gcc t to be modelled. The discrete wavelet transform of a variable y t corresponds to an expansion similar to a Fourier series expansion. But
Table 1 PLS model input variables and coecients Input variable sk 2 gff k gcf k gff k 7 Lp k 1 gcc k 2 B coecient )0.217 )0.115 0.217 )0.108 )0.046 0.498

instead of sine functions scaling functions and wavelet functions are used (Burrus et al., 1998). For example, the basic Haar scaling function (of time) has the form of a pulse of unit height and unit width, while the basic Haar wavelet (mother wavelet) is a double pulse, having the value 1 in the interval [0, 1=2) and )1 in [1=2, 1). Dilation (widened versions) and time shifts of these pulses constitute a basis for the wavelet expansion of y t. The fact that these wavelets have a limited support in time allows a suitable representation of transient aspects y t, including localisation of the times at which these transients occur. The widths of these functions (akin to frequency of the sinusoidal waves in a Fourier series expansion) determine the resolution. The projection of y t at a given resolution may be likened in the case of a Fourier series expansion, to the contribution to y t of the series terms of a given frequency. Using the wavelet transform the projection of each variable in the scaling function space mn and the wavelet spaces Wj j 1; . . . ; n, for several resolutions depths j, were found. If y t is any of the variables involved in the models, let Yn t projection of y t in the scaling function space mn of resolution depth n (lowest resolution) yj t projection of y t in the wavelet space Wj of resolution depth j j 1; . . . ; n; j 1 being the depth of highest resolution. For example, Figs. 11 and 12 shows the concentrate grade gcc t, its projection Gcc;6 t on the scaling function space m6 and its projection gcc;4 on the wavelet space W4 . The averaging eects of the projection on the scaling function space is apparent, as is the derivative-like property of the projection on the wavelet space. A model was determined for the projections of the variables on the scaling function space mn (resolution depth n) as well as for the projections of the variables on each wavelet space Wj (at resolution depths j 1; . . . ; n). Then, as in the case of the non-linear ARMAX model, stepwise regression (Haber and Unbehauen, 1990) was used to select those variables giving the best model ^cc;n ; g ^cc;n ; in each of these spaces. Thus models G ^cc;n1 ; . . . ; g ^cc;j ; . . . ; g ^cc;1 , were obtained for the conceng trate grade gcc in the scaling function space mn and the wavelet spaces Wj . Next the models were added using dierent combinations, and the one giving the smallest validation RMS ^cc error was selected. It turned out that the best model g is given by

22

21

cc Conc entrate c opper grade g

[% ]
20 19 18 17 16 200 400 600 800 1000 1200 1400

^cc;6 g ^cc G ^cc;5 g ^cc;4 g ^cc;3 : g

27

Ti m e [m in]

Fig. 10. Comparison between the real (ne trace) and the predicted (coarse trace) signals. PLS dynamic model.

For this model the validation RMS error is 0.368 and the cross validation RMS error is 0.640.

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453


22 21

451

c c ,6 , G cc g

20 19 18 17 16 200 400 600 800 1000 1200 1400 Tim e [m in] 2

c c ,4 g

-1

-2 200 400 600 800 1000 1200 1400 T im e [m in]

Fig. 11. Concentrate grade gcc (ne trace) and its projection Gcc;6 on the scaling function space m6 (coarse trace) is seen in the upper graph. The lower graph shows the projection gcc;4 of gcc on the wavelet space W4 .

5.4. Non-linear-in-the-parameters model: neural network The multilayer perceptron is a particular case of the general model structure (Eq. (28)). It provides a framework for representing non-linear mappings between a set of input variables and a set of output variables. For this case G G1 G2 is a sigmoid function, e.g., a tansig function (Haykin, 1994). ^ y
n X k 1

ak Gk uk t; x; bk ; ak :

28

It has been proven that PLS algorithms are able to improve neural networks models (Qin, 1997). Particularly, these algorithms can help in the selection of the best latent structures that must be used in order to obtain a better validation RMS error. For that reason, the same variables selected for the PLS model will be used again for the neural network model in order to explain any non-linearity present in the otation bank data that cannot be explained through the conventional PLS model. Therefore, the input variables in the case of the otation bank are: sk 2; gff k ; gcf k ; gff k 7; Lp k 1 and gcc k 2: 5.4.1. Neural network structure During the whole identication procedure the neural network structure n 3 1 was kept, because there were no signicant dierences in the results when other similar structures were used. The n 3 1 structure

architecture consists of a network with n inputs, 3 neurons with a tansig sigmoid activation function and 1 output. A test data setin addition to the validation setwas not required (because validation data was not used to determine the network structure). Although it is possible to further optimise the network structure (architecture), this step was not carried since it needs large quantities of data and requires a high computational eort (e.g., as in the case of genetic algorithms). An RMS error of 0.367 and a correlation coecient of 0.66 were obtained in the validation set, while cross-validation gave 0.629 (Fig. 12). 5.5. Comparison of results Table 2 shows the results obtained in the validation for the dierent models developed.

6. Conclusions According to Table 2, considering validation, the best linear-in-the-parameter dynamic models are the wavelet based model and the PLS model, with an RMS validation error of about 0.37, followed by the new model type which has been designated here by fuzzy combination model (RMS 0:38), while the other linear in the parameter models give practically equal results of about 0.39. The perceptron non-linear-in-the-parameter model RMS validation error is close to 0.37. However, if cross

452
22

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453

21 [% ]
cc

Copper c onc entrate grade g

20

19

18

17

16 200 400 600 800 1000 1200 1400 Tim e [m in]

Fig. 12. Comparison between actual (ne trace) and the predicted signal (coarse trace) in the validation and identication (training) sets. Neural network model.

Table 2 Performance indexes Model Validation RMS error Validation correlation coecient 0.66 0.66 0.64 0.62 0.60 0.61 0.66 Cross-validation RMS error 0.537 0.640
a

Linear-in-the-parameter dynamic models PLS 0.368 Wavelet based 0.368 Fuzzy combination 0.380 ARMAX 0.385 ARMAX-2 0.390 Takagi and Sugeno 0.394 Non-linear-in-the-parameter dynamic model (neural network) Perceptron 0.367
a

0.821 0.823
a

0.629

Not applicable in this case.

validation is also taken into account, the PLS model is the best. This ranking might conceivably change if another set of data were used. In particular, it may be expected that the Takagi and Sugeno and the fuzzy combination models could give better results if all clusters involved were to contain a large enough number of data points. It should be especially noted that for all models, input variables having physical signicance have been selected by the dierent model structure determination methods that have been used. Thus all linear in the parameter models are non-linear with respect to plant measurements, in this way reecting the non-linear character of the otation process. This may be the reason why the perceptron model has not shown a better performance than the models which are linear in the parameters determined here (at least for similar of computational eorts and quantity of data used in their developments).

Acknowledgements Research leading to this paper has received funding from projects FONDECYT 1000977 and 1020741, and from the Electrical and the Mining Engineering Departments of the University of Chile. Research Assistant David Miranda has contributed in the development of the wavelet model.

Appendix A A.1. Projection to latent structures algorithm PLS is used for predicting a block of centred and scaled measures Y in terms of another centred and scaled block X that contains a group of explicativeand highly correlatedvariables. The model is obtained through an iterative algorithm that generates latent

G.D. Gonzalez et al. / Minerals Engineering 16 (2003) 441453

453

basis t and u for the X and Y spaces respectively, such that X and Y matrices can be represented (for A latent vectors) as: X
A X i1

ti piT Ex A and Y

A X i1

ti cT i Ey A

and the correlation between ti and ui Yci =cT i ci is maximized. The iterative PLS algorithm is (Hodouin et al., 1993): Step Step Step Step Step Step Step Step 1: 2: 3: 4: 5: 6: 7: 8: Set i 1 Set ui as the rst column of Y T T wT i ui X = ui ui T T wi wi =normwi ti Xwi =wT i wi T T cT t Y = t i i i ti T ui Yci =ci ci Goto Step 3

At convergence of vector ui , a loading vector for X space is dened as pi X T ti =tiT ti and the residuals for matrices X and Y are calculated as Ex X ti piT and Ey Y ti cT i . The algorithm is now repeated (from step number two) by replacing the X and Y matrices by Ex and Ey respectively and increasing the variable i by one. The procedure stops when the addition of a new latent vector reduces the variance of Ey in less than 3% of the variance of the original Y matrix. References
Burrus, C.S., Gopinath, R.A., Guo, H., 1998. Introduction to Wavelets and Wavelet Transforms. Prentice Hall, New Jersey. Casali, A., Gonz alez, G.D., Torres, F., Vallebuona, G., Castelli, L., Gimenez, P., 1998. Particle size distribution soft-sensor for a grinding circuit. Powder Technology 99, 1521.

Casali, A., Gonzalez, G.D., Agusto, H., Vallebuona, G., 2002. Dynamic simulator of a rougher otation circuit for a copper sulphide ore. Minerals Engineering 15 (4), 253262. Gonzalez, G.D., 1999. Soft-sensors for processing plants. In: Proceedings of Second Intelligent Processing and Manufacturing of Materials International Conference (IPMM), Honolulu, Hawaii, 1, pp. 5970.  mez, E., Unbehauen, H., Kortmann, P., Peters, S., 1996. Fault Go detection and diagnosis with the help of fuzzy-logic and with application to a laboratory turbogenerator. In: Proceedings of 13th IFAC World Congress, San Francisco, pp. 235240. Haber, R., Unbehauen, H., 1990. Structure identication of nonlinear dynamic systemsA survey on input/output approaches. Automatica 26, 651677. Haykin, S., 1994. Neural networks. A comprehensive foundation. Macmillan College Publishing Co., New York. Hodouin, D., MacGregor, J.F., Hou, M., Franklin, M., 1993. Multivariate statistical analysis of mineral processing plant data. CIM Bulletin November/December, pp. 2334. J ams a-Jounela, S.L., L attil a, T., 1995. A methodology for computeraided process study at mineral processing plants. In: Proceedings of Copper95, Santiago, Chile, pp. 8998. Ljung, L., 1987. System identication, theory for the user. Prentice Hall, Information and System Sciences Series, New Jersey, USA. Mardia, K.V., Kent, J.T., Bibby, J.M., 1994. Multivariate Analysis. Academic Press., pp. 213254. Mathworks, Fuzzy Logic Toolbox for Use with MATLAB, 1998, The Mathworks Inc., Natick, MA, USA. Misiti, M., Misiti, Y., Oppenheim, G., Poggi, J-M., 1996. Wavelet Toolbox for Use with MATLAB. The Mathworks Inc., Natick, MA, USA. Qin, S.J., 1997. Neural networks for intelligent sensors and control Practical issues and some solutions. In: Omidvar, O., Elliott, D.L. (Eds.), Neural Systems for Control. Chapter 8. Academic Press, pp. 213234. berg, J., Zhang, Q., Ljung, L., Beinveniste, A., Delyon, B., Sjo Glorennec, P.-Y., Hjalmarsson, H., Juditsky, A., 1995. Nonlinear black-box modeling in system identication: a unied overview. Automatica 31 (12), 16911724. Takagi, T., Sugeno, M., 1985. Fuzzy identication of systems and its application to modelling and control. IEEE Transactions on Systems Man and Cybernetics 15, 116132.

You might also like