You are on page 1of 21

UlIRlUilImllJUlllUlt~

mlnERUL
PROtEDInr.
E L S E V I ER Int. J. Miner. Process. 42 (1994) 53-73

Statistical discrimination of flotation models based


on batch flotation data
M. Mazumdar
Department of Industrial Engineering, Universityof Pittsburgh, Pittsburgh, PA 15261, USA
Received 5 January 1994; accepted 18 November 1993

Abstract

An essential requirement for the optimization of design and operation of a flotation facility is the
availability of a mathematical model that predicts how the amount of recovery of desirable particles
during a certain time period depends on the design parameters. The selection of a best model from a
collection of proposed models is a choice among competing theories and is based on empirical results
obtained from experimental data. This paper focuses on how a choice for the appropriate model may
be based on the statistical analysis of recovery data. Two general statistical criteria for model discrim-
ination are considered based on the model's predictive capability. They are referred to as model fit
and model stability. Statistical measures relating to these criteria are calculated for five different time-
recovery profiles to evaluate five different flotation models. These calculated measures are used to
infer about the suitability of these proposed models in representing experimental data.

1. Introduction

In a froth flotation process, hydrophobic particles are separated by floating them to the
top of a liquid suspension with the help of air bubbles, while particles that are naturally
hydrophilic, or are rendered so by means of suitable treatment, remain with the liquid and
are drained out. Depending on the situation to which the process is applied, the hydrophobic
particles may be the ones that are valuable or undesirable. Flotation is a major process in
the production of many nonferrous metals and in the purification of substances such as
quartz, mica, phosphates, etc. Earlier application of flotation to coals was made in connection
with the recovery of fines, but the emphasis of recent applications has been on coal bene-
ficiation with a view to removing mineral matter and pyrite from the feed coal. Environ-
mental control of SO2 pollution and the provisions of the Clean Air Act of 1990 have lent
much impetus to the optimization of design and operation of a coal flotation facility that
would make fine coal desulfurization economically feasible on a large scale. An essential

0301-7516/94/$07.00 © 1994 Elsevier Science B.V. All fights reserved


SSDIO301-7516(94)OOOI8-U
54 M. Mazumdar / Int. J. Miner. Process. 42 (1994) 53-73

requirement for such optimization is the availability of a mathematical model that predicts
how the amount of recovery of desirable particles during a certain time period depends on
the design parameters.
The operating characteristics of a flotation circuit depends on many interdependent factors
in a complex manner. Huber-Panu et al. (1976) state that a complete model should account
for the following factors on which the performance of a flotation process depends: (a) those
that are determined by the feed ore, flotation reagents and steps of ore preparation prior to
application of flotation, (b) those that are determined by the design of the flotation process,
i.e., the mixing and aeration characteristics of the machine, and (c) the pulp transport
characteristics. Lynch et al. ( 1981 ) have given a taxonomy of the different variables that
should enter into a flotation model. They classify the independent variables into two cate-
gories: manipulated variables (e.g., air addition, pulp levels, reagent additions, etc.) and
disturbance variables (e.g., degree of oxidation, head grade, etc.). They also classify the
dependent variable into two categories: performance variables of the final product (e.g.,
grade, recovery, flow rate, pulp density, etc.) and performance variables of the intermediate
product of a similar nature contained in the rougher and scavenger concentrates.
In a recent paper, Dowling et al. (1985) have reviewed thirteen different models of
flotation and attempted to discriminate among them based on several sets of data on time-
recovery profiles for the flotation of a porphyry copper ore. In particular, they have proposed
a statistical procedure for discriminating among these models based on such data. The
selection of a best model from a collection of constructed models is often a choice among
competing theories and is based on empirical results obtained from experimental data. The
work of Dowling et al. has been the first important attempt in considering how a choice for
the appropriate model may be made based on statistical analysis of recovery data. The
purpose of this paper is to reconsider the methods proposed by these authors for discrimi-
nating among different competing models based on experimental data and propose several
additional statistical criteria for selection of appropriate models. This paper also points out
a flaw in these authors' derivation of a statistical procedure used by them for model
discrimination. We consider five different models-- two containing two unknown constants
to be estimated from the data and the other three models containing three unknown para-
meters. We also consider five different data sets, three of them taken from published
literature. We apply the different criteria proposed in this paper to the different data sets to
ascertain how well these models compare amongst themselves. The purpose of this paper
is not necessarily to endorse a particular model but to show how statistical procedures may
be used to make a judicious selection of a "best" model based on empirical evidence.
We first discuss the different statistical criteria to be used for model discrimination. This
is followed by a brief description of the models and the data sets used in this paper for
illustrating the applications of these criteria. The final sections provide numerical results
and the conclusions.

2. Statistical criteria for model discrimination

Once a set of competing models has been identified as relevant for a particular process
on the basis of theoretical considerations and modeling objectives, we try to find the one
M. Mazumdar /lnt. J. Miner. Process. 42 (1994) 53-73 55

from the collection that has the best predictive capability on the basis of experimental
information. Two general considerations play an important role in this regard. The first
consideration is how well the estimated model function fits the observed responses. Sec-
ondly, since the parameters appearing in the model will be estimated based on the observed
data, the question is to what extent these parameter estimates depend on the particular values
observed in the experiment. If, because of random influences, parameter estimates vary
significantly when fitted to different repetitions of the same experiment, it is likely that a
fitted model of the form being considered will be a poor predictor of future performance.
Using statistical techniques, it is often possible to analyze, using data from one experiment
how much the estimated parameters will vary if the experiment is repeated or several
additional observations are taken under similar experimental conditions. If this analysis
shows that the estimated parameters are likely to fluctuate considerably under these situa-
tions, then the model is obviously not very suitable.
We will refer to the first consideration as modelfit and the second consideration as model
stability. A good model should fit the observed data adequately and should be stable in the
sense that its predictions based on estimated parameters should not fluctuate too much if
additional data are taken. A stable model with a moderate fit to the data at hand may be
preferable to an unstable model with an excellent fit.

2.1. Measures of model fit

Several measures of the goodness of fit are available from the statistical literature. They
are listed below.

1. Residuals
Consider the situation where the observed data set consists of n pairs of observations
(tl, Yl), (t2, Y2) ..... (tn, yn), where ti refers to the time at which the ith observation is taken
and Yi refers to the corresponding cumulative recovery at this time point. Suppose that the
proposed model is of the form:

y=f(t; a,b)
where a and b are the unknown parameters to be estimated from the data. Based on the
experimental data, the model's unknown parameters are estimated. A commonly used
method for estimation of the parameters is the method of least squares. It estimates the
model parameters by determining the values of d and/~ that minimize the sum of residuals
squared across all time points. Denote the calculated recovery values at the time points tl,
tz..... tn by 371,372..... )~n, where
y, =f(t,; d,/~)
The quantities y~-37i ( i = 1, 2 ..... n) are known as residuals. The smaller the values of
these residuals are, the better is the fit. Apart from the absolute magnitude of the residuals,
the visual pattern of residuals will often provide a clue to the goodness of model fit. For
example, in a good fit, the sign and magnitude of the residuals should display a random
pattern when plotted against the time points.
56 M. Mazumdar / Int. J. Miner. Process. 42 (1994) 53-73

2. Mean Square Error


The Mean Square Error is defined by the following expression:
n
~_, (Yi-Yi) 2
MSE =i=l
n-m

where n is the number of observations in the data set and m is the number of parameters in
the model that are estimated from the data. This statistic, also known as the residual mean
square, plays an important role in hypothesis testing in regression analysis, confidence
intervals, etc. This measure accounts for both the magnitudes of the residual, that is, the
amount of unexplained variation after the model has been fitted, as well as the degrees of
freedom (i.e., n - m) available to estimate the error variance. As the number of unknown
parameters in the model increases, this will in and of itself tend to increase the MSE quantity,
since the denominator will decrease. A reasonable, and certainly simple rule, is to choose
the candidate model that has the smallest value of MSE.

2.2. Measures of model stability

These measures should reflect to what extent the predicted values will vary if the predic-
tions were based on different random samples of the time-recovery profile obtained by
repeating the experiment and observing recovery at the same points in time. The following
measures of model stability have been proposed in the literature:

1. Confidence interval of the model parameters


Based on the observed experimental data, if it is possible to provide an estimated interval
which has the property that it contains the unknown model parameter with a certain specified
degree of confidence, then the width of this interval should provide a measure of model
stability. A wide interval implies that a large range of parameter values is consistent with
the experimental data. Thus the model under question is not very stable in the sense that if
additional observations are taken in a repeated sample, the predicted parameter values may
be quite different from those obtained from the first sample.
Dowling et al. have previously stressed the need for computing confidence intervals for
the model parameters in addition to computing the mean square error denoting the model
fit. They have expressed the need for the model parameters to have narrow confidence
intervals by means of the following sentence, "...each model parameter must have a range
of statistical significance discrete or narrow enough so that changes in the flotation system
can be confidently assessed." The authors state that because of the nonlinear nature of the
flotation models, the well-known formulas for obtaining parameter confidence intervals for
regression coefficents in a linear model cannot be used. They have, therefore, prescribed
their own procedure for obtaining these confidence intervals based on earlier work by
Klimpel and Austin (1984). Based on the arguments given in the Appendix, we do not
believe that their formulas are correct. Recent textbooks, e.g., Seber and Wild (1989)
specify the formulas for the approximate confidence intervals for parameters in non-linear
regression models that should be used in preference to the formulas given by Klimpel and
58 M. Mazuradar / Int. J. Miner. Process. 42 (1994) 53-73

state flotation. PRESS analysis is the most appropriate way to identify and quantify the
existence of a non-steady state flotation.
With a set of n pairs of data points, each candidate model will have n PRESS residuals
associated with it corresponding to each data value. Frequently, these residuals are combined
to yield one summary measure for the model stability or its potential for predictive error.
This statistic, PRESS (Prediction Error Sum of Squares) is defined as
n

PRESS= ~ ei2_i= (yi--~i,_l) 2


i=1 i=1

The model having the smallest PRESS may be considered to be the one that has the
smallest predictive error. A few other measures of model stability have been discussed by
Borowiak (1989).

2.3. Description of flotation models

We consider five different kinetic models. The first two models involve two unknown
parameters to be estimated from the data, and are both frequently used in the flotation
literature. The last three models each contain three parameters. The purpose of this paper is
to illustrate the application of different statistical criteria enumerated above for discrimi-
nating among this restricted set of five models.
We do not make any effort to carry out an exhaustive evaluation of the score or so flotation
models that have been proposed in the literature. Nevertheless, it is our belief that the list
of models considered here contains those that are highly appropriate in many applications.
In the list of models given below, three have been evaluated previously by Dowling et al.,
in whose work Models 2 and 3 were shown to have a very high degree of performance.
Model 1 is the standard classical model. Models 4 and 5 have been observed by us to yield
very good results in many different cases.

Model 1. Classical first-order kinetic model


This two-parameter model based on a first-order rate equation has the following mathe-
matical form:

y=R={ 1 - - e x p ( - - K l t ) } (1)
where y = recovery of component at time t, R~ = ultimate recovery of component, and
K1 = first-order rate constant for component (rain- ~).

Model 2. First-order flotation model with rectangular distribution of floatabilities


This model was derived by Meyer and Klimpel (1984) and Huber-Panu et al. (1976)
by considering the first-order rate equation for a component with a single particle size having
a rectangular distribution of floatabilities. The mathematical form for this model is

y=R~ 1--7--;.{ l - e x p ( - K 2 t ) } (2)


M. Mazumdar/lnt. J. Miner.Process.42 (1994) 53-73 59

Table 1
Time-cumulativerecovery profilesfor data sets 1-5

1 2 3 4 5

Time Cum. Time Cum. Time Cure. Time Cum. Time Cum.
(sec) Rec. (%) (sec) Rec. (%) (sec) Rec. (%) (sec) Rec. (%) (sec) Rec. (%)

30 40.4 30 76.0 20 53.99 30 50.00 30 63.30


60 61.5 60 85.9 40 70.55 60 55.55 60 78.39
90 70.2 90 92.1 60 77.69 180 58.33 120 87.51
120 74.4 120 93.6 120 84.36 300 58.47 240 92.92
180 78.0 180 95.5 180 86.14 420 58.60 480 95.57
240 79.7 240 96.8 240 87.77 600 58.73 960 96.74
300 81.0 300 97.2 480 89.53 900 58.82 1920 97.21
360 81.9 360 98.0 1200 58.82 3840 97.65
420 82.2 420 98.1 5760 97.75
480 82.3 480 98.1
540 82.9 540 98.1
600 83.1 600 98.1
660 83.8 660 98.1
720 84.0 720 98.1

where y = recovery at time t, R= = ultimate recovery of component, and K2 = rate constant


related to the right hand limit on the rectangular distribution of particle floatabilities
(min-l).
For a detailed derivation of this equation, the reader is referred to Huber-Panu et al.

Model 3. Three-parameter kinetic model with particle floatabilities proportional to particle


sizes
Huber-Panu et al. derived this model based on the following assumptions: (a) particle
sizes are distributed uniformly within the range (Xo, Xm), and (b) the floatability of a
particle is proportional to its size. The mathematical form of this model is given by

y=R~l exp(-K3st)-exp(-K3ut);

where y = cumulative recovery at time t, R~ = ultimate recovery of component, g3s = rate


constant corresponding to the smallest particle size (rain- 1), and g3o = rate constant cor-
responding to the largest particle size ( m i n - 1)

Model 4. Three-parameter kinetic model based on the proportionality law


This model has been derived by Lai (1990) based on the hypothesis of a proportionality
law which states that the rate of recovery of flotable material is proportional to the amount
of flotable material remaining to be recovered and is inversely proportional to the elapsed
time t. Thus denoting y = recovery at time t, R~ = asymptote of cumulative recovery, and
k4 = dimensionless rate constant, Lai's hypothesis results in the differential equation
dy k4(R~-y)
dt t
60 M. Mazumdar / Int. J. Miner. Process. 42 (1994) 53-73

C~

m 60

8
40 i i i i i t
30 60 90 120 180 240 300 360 420 4~ 540 600 660 720
Time in Seconds
I00
A
- - ---- _- .: ".

95

90

,+ 85

¢=

~6o

75

b
?o i L t t t I I J I L i I I I
30 60 90 120 180 240 300 360 429 480 540 600 660 720
Time in Seconds

8O

L"

6O

C
i
511
20 40 60 90 120 180 240 48O
Time in Seconds

-m- Actual ~ Model 1 ~ Model 2 ~ Model 3 ~ Model 4 _~_ Model 5


M. Mazumdar / Int. J. Miner. Process. 42 (1994) 53-73 61

fZ]
V

d
i i I i t i i I
30 60 180 300 420 600
T i m e in Seconds
I00
_m A

70

O
l I I
30 60 120 240 480 960 1920 3840 5760
Time in Seconds
Actual + Model 1 ~ Model2 o Model3 - - o - Model4 ~ Model5

Fig. 1. Cumulative recovery and corresponding calculated values. (a) Data set 1; (b) data set 2; (c) data set 3;
(d) data set 4; (e) data set 5.

which upon integrating yields

y = R = - b 4 t -~" (4)

where b 4 --- [ Roo- R ( 1 ) ] - l, and R( 1 ) = recovery of the component after one elapsed time
unit. It is apparent from the above equation that the parameter b 4 has the dimension of t k'.
62 M. Mazumdar / Int. J, Miner. Process. 42 (1994) 53-73

Table 2
Comparison of root mean square errors

Model

Data set 1 2 3 4 5

1 1.4584 1.1533 0.8955 0.3973 0.3042


2 2.3548 0.5464 0.5585 0.5874 0.5477
3 2.5770 0.2539 0.2750 0.3985 0.2925
4 0.7003 0.4205 0.3668 0.0781 0.0741
5 3.4627 0.4377 0.4728 0.3891 0.1957

Table 3
Model parameters

Dab s~

Model Parameter 1 2 3 4

a. Point estimates
1 R~ 82.0305 96.7934 86.0036 58.4369 95.2658
KI (/min) 1.3215 2.7877 2.7065 3,7631 1.9574
2 R® 86.4463 99.5449 91.2369 59.3332 97.7487
K,(/min) 3.1829 8.1642 6.5452 13.2690 5.1565
3 R= 85.7118 98.7174 90.4766 58.6874 97.7487
K3u(/min) 0.3081 0.9441 0.0626 0.6278 -0.0000017
K3~(/min ) 2.7492 8.0349 6.5227 10.4501 5.1565
4 R= 85.9904 99.7331 92.0705 58.8474 98.3913
K4 0.9556 0.9584 0.8720 1,4456 0.8577
b -23.6081 - 12.3854 - 14.67 3.2515 19.4810
5 R~ 84.7936 99.4787 91.0368 58.8405 98.0861
K5 1.2262 1.0582 1.0653 1.5141 0.9845
b 48.0322 10.0003 12.7339 28.0816 12.5053

b. Approximate 95% confidence intervals


I R~ 81.016-83.045 95.296-98.291 83.135-88.872 57.763-59.110 92.106-98.426
Ki 1.225-1.417 2.385-3.191 2.242-3.171 3.362-4.164 1.562-2.353
2 R~ 85,449-87.444 99.129-99,961 90.881-91.593 58.876-59.790 97.312-98.185
Kz 2.938-3,427 7.735-8.594 6.394-6.691 11.631-14.907 4.978-5.335
3 R~ 82.143-84.080 97.368-100.067 87.874-93.079 59.237-60.980 "
K3u 0.127-0.489 -0.111--0.300 -0.180--3.060 0.424-0.830 "
K3~ 2.374-3.125 7.390-8.679 6.286--6.759 9.499-11.401 a
4 R,~ 85.317-86.664 98.742-100.725 90.689-93.452 58.739-58.956 97.782-99.001
K4 0.904-1.007 0.812-1.104 0.78643.958 1.343-1.548 0.796--0.919
b -24.672--22,544 - 13.955-- 10.816 - 16.433-- 12.898 3.499-3.004 20.504--18.458
5 R~ 84.389-85.198 98.639-100,318 90,190-91.884 58.740-58,941 97.803-98,369
K, 1.182-1.270 0.918-1.199 0,999-1.132 1.415-1.613 0.951-1.018
b 40.430-55.634 5.173-14.828 10.195-15.273 18.528-37,635 11.018-13,993

"Jacobian singular.
M. Mazumdar / Int. J. Miner. Process. 42 (1994) 53-73 63

Ordinary Residuals

8
-2

-3
0.5 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0
Time in Minutes
1Modeil [~7~Model2 ~ M o d e i 3 [~Model4 [~-]Model5

PRESS Residuals

-4

0.5 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0
Time in Minutes
lModelk [~Model2 ~Model3 ~Model4 ~ModeI5

Fig. 2 (for caption see p. 67)


64 M. Mazumdar / Int. J. Miner. Process. 42 (1994) 53-73

Ordinary Residuals
3

-2

-4

-5

0.5 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 1 0 . 0 11.0 12.0
Time in Minutes
l Model 1 ~ : ~ Model 2 ~ Model 3 ~ Model 4 I I Mo~l 5

PRESS Residuals
15

10

-5
5

0 1
b
-10
0.5 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0
Time in Minutes
1 Mo~l 1 ~ Mo~l 2 ~ Mo~, 3 ~ Modol4 ~ Mo~el S
M. Mazumdar / lnt. J. Miner. Process. 42 (1994) 53-73 65

Ordinary Residuals
4

•~ 0

-2

-3
0.333 0.667 1.000 1.500 2.000 3.000 4,000 8.000
Timein Minutes
Modeil ~ M o d e l 2 ~Model3 ~Model4 [ -IModel5

PRESS Residuals

4
3 -

~o
>

E o

-2

C
-4
0.333 0.667 1.000 1.500 2.000 3.000 4.000 8.000
Timein Minntes
remodel1 ~'~Model2 ~ M o d e l 3 ~Model4 ~Model5

Fig. 2 (for captionsee p. 67)


66 M. Mazumdar / Int. J. Miner. Process. 42 (1994) 53-73

Ordinary Residuals

0.5

0 I ~-" ~ ~ ~ ~
-0.5
.~.
~ -1

-1.5

-2
0.5 1.0 3.0 5.0 7.0 10.0 15.0 20.0
Time in Minutes
1 Model 1 7 ~ Model 2 [ ~ Model 3 [ ~ Model 4 I [Model 5

PRESS Residuals
6

5 -

v 3

~2

8
-1

-2
d
-3
0.5 1.0 3.0 5.0 7.0 I0.0 15.0 20.0
Time in Minutes
Model I ~ Model 2 ~ Model 3 ~ Model 4 ~ ] Model 5
M. Mazumdar / lnt. J. Miner. Process. 42 (1994) 53-73 67

Ordinary Residuals
5
4
3
2

~ -1
"~ -2

~ -3
-4
-5
-6
-7
0.5 1.0 2.0 4.0 8.0 1.6 32.0 64.0 96.0
Time in Minutes
l Model 1 ~ Model 2 [~ Model 3 [ ~ Model 4 [ I ~odel 5

PRESS Residuals
15

I0

!5
,C

"~ 0

6
-5

-10
0.5 1.0 2.0 4.0 8.0 1.6 32.0 64.0 96.0
Time in Minutes
I Mode! I ~ Model2 7:~ Model 3 ~ Model4 V-l ~odel 5

Fig. 2. Ordinary and PRESS residuals for the different models. (a) Data set 1; (b) data set 2; (c) data set 3; (d)
data set4; (e) data set 5.
68 M. Mazumdar / Int. J. Miner. Process. 42 (1994) 53-73

Model 5. Three-parameter kinetic model based on the logarithmic proportionality law


This model is a second form of the proportionality law which accounts for the recovery
of flotable material in a logarithm scale (Lai, 1990). In this model the rate of recovery is
proportional to the amount of flotable material (in logarithm scale) remaining to be recov-
ered and is inversely proportional to the elapsed time t. That is,
dlogy ks(logR~- logy)
dt t
where
d l o g y 1 dy
dt y dt
gives the proportional rate of percentage change ofy with respect to time. Upon integration,
we obtain
y = R~exp( - b s t -kS) (5)

2.4. Description of data sets

We consider five different data sets to which we apply the different criteria mentioned
earlier to evaluate the five flotation models. The first two data sets are taken from Huber-
Panu et al. Data set 1 refers to flotation of pyrite ore and corresponds to Test A of Table 1
given there. Data set 2 refers to experimental results obtained with flotation of galena ore
and corresponds to Test I of the same Table 1. Data set 3 corresponds to the results reported
in Test No. 1 of Dowling et al. for flotation of porphyry copper ore. Data set 4 refers to the
weight recovery of flotable pyrite in an experiment with a reagent level of 5 mM sodium
ethyl xanthate at pH 3 carried out at the Department of Energy's Pittsburgh Energy Tech-
nology Center. Data set 5 refers to flotation yield of minus 200-mesh Upper Freeport, PA.
coal (Fuerstenau et al., 1992). Table 1 displays the raw data on the flotation yields for the
five sets.

3. Results

The comparative analysis on the performance of the five different models with respect to
the five different sets is given in this section. Fig. 1 shows the plot of the cumulative recovery
values against the calculated values for each of the five data sets. Table 2 gives the root
mean square error when each model is fitted to each of the five data sets using the least
squares criterion. Table 3 gives the estimated values of the model coefficients together with
their (asymptotic) 95% confidence intervals obtained from the SAS output. Fig. 2 compares
the ordinary residuals resulting from the least square fit with the PRESS residuals for each
model and each data set. This comparison thus points out the influential observations for
each set and gives a sense of the stability of each model. Table 4 compares the PRESS
statistics for each of the data sets for five different models. The PRESS residuals were
M. Mazumdar / Int. J. Miner Process. 42 (1994) 53-73 69

Table 4
The PRESS statistics

D~aset
Model 1 2 3 4 5

1 34.4327 262.5209 0.9172 0.2685 216.9427


2 38.0388 7.2141 0.0069 0.0621 4.5859
3 32.3781 22.5490 0.0134 0.0778 4.5859
4 32.5841 122.2081 0.1716 0.0192 13.4476
5 1.5850 76.2311 2.2644 4.8663 1.8964

computed by deleting each observation one at a time from the data set and repeating the
N L I N procedure. This procedure required that initial values for the model parameters be
provided, and except for Model 3, the final values of the estimated model parameters were
found to be impervious to the starting values. The root mean square values, the residuals,
and the interval estimates of the model parameters were directly given by the SAS output.
The ready availability of sophisticated software packages (e.g., S A S ) has rendered it
unnecessary to improvise on finding computational procedures for calculating the various
measures of model performance that have been used in this paper. Table 5 gives a comparison
of the confidence intervals given by Dowling et al. with those obtained using SAS for several
selected models and data sets.
Table 2 shows that Model 5 had the lowest root mean square error for three data sets
(sets 1, 4 and 5 ) whereas Model 2 had the lowest root mean square error for the two other
cases. The Models 2, 4 and 5 fit the data sets well. Model 1 ranks lowest in terms of this
measure of goodness of fit for all five data sets. When one looks at the interval estimates of
the parameters, one observes that the confidence interval for parameter ILo has comparable
widths for both Models 2 and 4 in case of the first three data sets. The width o f Model 5
parameter R~ is slightly smaller.

Table 5
Comparison of 95% confidence intervals according to Dowling et al. and SAS
Parameter

Kl(min) R=

Data set a DKA SAS DKA SAS

Model 1
Test 1 2.00-3.76 2.24-3.17 0.824-0.898 0.831-0.889
Test 5 2.68-6.86 2.56-5.22 0.847-0.900 0.837-0.917
Test 7 3.35-7.98 3.40-6.15 0.848-0.894 0.845-0.896
Test 9 2.55-7.12 2.58-5.35 0.858-0.920 0.847-0.928

Model 2
Test 1 6.17-6.93 6.39-6.70 0.907--0.918 0.909-0.916
Test 2 6.32-6.80 6.48-6.61 0.908-0.918 0.911-0.914
Test 3 6.34--6.78 6.50-6.61 0.908-0.917 0.911-0.914

"The data sets refer to those given in Dowling et al. (1985).


70 M. Mazumdar / Int. J. Miner. Process. 42 (1994) 53-73

Table 4 provides yet another measure of model stability. Table 4 shows that in cases of
Data Sets 2 and 3, Model 2 has the lowest value for the PRESS statistic, whereas Model 4
has the lowest value for this measure for Data Set 4. Model 5 is the best for the remaining
two sets. The PRESS statistic, it may be recalled, measures the predictive error associated
with these models for future observations. Looking closely at the PRESS residuals given in
Fig. 2, one observes that they assume large values for early flotation times for all models.
This observation is especially true for Model 4. Thus it may be concluded that the instability
of the model arises from the very early flotation times (the first observation in most cases).
If this observation is deleted from the data set, perhaps a more stable model would result in
all cases (especially for Model 4). The earlier observations are much less influential for
Model 5.
Table 5 shows that taking SAS output to be the benchmark values, the confidence intervals
given by Dowling et al. are often much wider.

4. Summary and conclusions

In this paper we have applied several statistical procedures to discriminate among five
contending flotation models. The five different models that were examined were each fitted
to five different data sets. The selected models include three that were previously examined
by Dowling et al. The two other models considered have been proposed by Lai based on
the hypothesis of a proportionality law in flotation kinetics. The five data sets were selected
from those available in the literature and from recently conducted experiments. The predic-
tive value of a model is not only given in terms of its fit to different data sets examined but
also in terms of its stability, for which several measures were given. The measures of model
stability that were examined are: (a) confidence intervals of the model coefficients, and
(b) the PRESS statistics and the PRESS residuals. No single model is found to be uniformly
the best for all the data sets. The numerical results given here suggest that of the five models
considered, two particular models have the best predictive power. These two models are:
(a) the first-order flotation model with a rectangular distribution of flotabilities as proposed
by Huber-Panu et al. and Klimpel and Austin, and (b) the model based on the logarithmic
proportionality law as proposed by Lai. The statistical analysis also showed that the obser-
vations for the earliest flotation times are the influential ones in that they affect the estimated
parameter values the most.
We have also pointed out an error in the derivation of the formulas by Dowling et al. for
the confidence intervals of the model parameters. Numerical values have been given to
indicate the extent of the error resulting from the use of these formulas. Although the
numerical examples given here show that the error is on the conservative side, there exist
no theoretical reasons to suggest that this will indeed be the case in all situations.

Acknowledgements

The research of M. Mazumdar was supported in part by an appointment to the U.S.


Department of Energy Fossil Fuel Part-time Faculty Participation program administered by
M. Mazumdar / lnt. J. Miner. Process. 42 (1994) 53-73 71

Oak Ridge Asociated Universities at the Pittsburgh Energy Technology Center. The author
is grateful to R.E. Hucko, R.W. Lai and J. Yingling for their helpful comments. Ms. Kim
Petri helped the author in preparing the tables and graphs. The author is also indebted to the
reviewers for their constructive comments.

Appendix 1

1.1. Appendix
The procedure proposed by Dowling et al. (1985) for finding the 95% confidence
intervals is roughly as follows: "For the non-linear model y =f(t; a,b,c,...), find the least
square estimates d,/~, E,.... etc. and the corresponding mean square error $2(0). Then
increase the value of d by an arbitrary amount (say 10%) and assign this value to parameter
a. With this assigned value of a, recalculate the least square estimates of b, c ..... etc. and
find the new value of mean square error. Call this value $2(1). Then, consider the ratio
$2( 1) / ~ ( 0 ) , and determine whether it is significant by comparing to the tabulated values
of the 5% percentile of the F distribution with degrees of freedom corresponding to ~ ( 1 )
and ~ ( 0 ) . If not, increase the value o f t by another amount, and calculate the least squares
estimates and the revised MSE, $2(2). Test for the ratio $2(2)/$2(0) as above, and from
these successive iterations, estimate the values of d ( m ) for which S2(m)/S~(0) is just
equal to the 95% percentile point of the F distribution with appropriate degrees of freedom.
(Here, m refers to the iteration at which the equality occurs.) These values then define the
95% confidence interval for a." (See Figs. 3 and 4 in Dowling et al., 1985.)
Although intuitively plausible, the above procedure fails to give correct results because
the successive Mean Square Errors ~ ( 1 ) , ~ ( 2 ) ..... etc. are not independent o f ~ ( 0 ) . One
crucial requirement for the F statistic to remain valid is that the numerator and denominator
mean squares should be independent. A specific counter-example using linear regression
models where the formulas are well known is given below.
Consider the situation where we have n pairs of observations (x~, y~) and the relationship
between x; and y~is expressed by the following model:

Yi = t~ + ~ ( x i - x ) + ~i (A-l)

where.~ = XT=1xi/n, and 6i are independent identically distributed random variables having
the normal distribution with mean zero and variance 0-2. It is well known (Meyers, 1990)
that the 95% confidence interval for the parameter a is given by

S
y + t,,/2 - - (A-2)

where
12 M. Mazumdar/lnt. J. Miner. Process. 42 (1994) 53-73

s2,i=l
(A-3)
n-2

(A-4)

andt f2 is the lOO(1 - a/2) % percentile of a r distribution with n - 2 degrees of freedom.


The quantities B and b are the least squares estimators of the parameters (Yand j3.
Following the scheme proposed by Dowling et al., we assign the value y + k to the estimate
8, and minimize the residual sum of squares

S;(l) = e(yi-Y-k-P(Xi-X))2
i=l

It c~ be seen easily that the value of p for which $( 1) reaches a minimum is also the same
as p shown in (A-4). Now we consider the equation

i~(Yi-F-k-B(xi-x))21(n-l)
=Fo,&n- 1, n-2) (A-5)
~(yi-y-p(xj-x))2/(n-2)
i=l

where F0.e5 (n - 1, n - 2) is the 95th percentile of the F distribution with (n - 1, n - 2)


degrees of freedom. Solving for k in Eq. (A-5)) we obtain

n-2
_+--= nk2
F,dn-- 1, n-2)
n-l n-ls2

from which

[( x )I
l/2
n-l n-2
k=f - Fo.&n-- 1, n-2) -n_l s
n

Thus, according to the scheme proposed by Dowling et al., the 95% confidence interval
for (Yis

Y*
[(
n-l
-
n x Fm,(n- 1, n-2) s
which differs from the correct formula given in Eq. (A-2). Since this scheme is being seen
to be incorrect for linear models, we should not expect it to provide correct results for non-
linear regression models.
M. Mazumdar / Int. J. Miner. Process. 42 (1994) 53-73 73

References

Borowiak, D.S., 1989. Model Discrimination for Nonlinear Regression Models, Marcel Dekker, New York, NY.
Dowling, E.C., Klimpel, R.R. and Aplan, F.F., 1985. Model discrimination in the flotation of a porphyry copper
ore. Miner. Metallurg. Process., 2: 87-101.
Fuerstenan, D.W., et al., 1992. Coal Surface Control for Advanced Coal Flotation. Final Report to DOE Pittsburgh
Energy Technology Center, Project No. DE-AC22-88PC88878, University of California at Berkeley.
Huber-Panu, I., Ene-Danalache, E. and Cojocariu, D.G., 1976. Mathematical models of batch and continuous
flotation. In: M.C. Fuerstenau (Editor), Flotation - - A.M. Gaudin Memorial Volume. AIME, New York, NY,
Vol. 2, Ch. 25, pp. 675-724.
Klimpel, R.R. and Austin, L.G., 1984. The back-calculation of specific rates of breakage from continuous mill
data. Powder Technol., 38: 77-91.
Lai, R.W., 1990. The Overlooked Law of Nature - - A New Concept in Kinetic Analysis. Toshi Co., Pittsburgh,
PA, p. 9.
Lynch, A.J., Johnson, N.W., Maniapig, E.V. and Thome, C.G., 1981. Mineral and coal flotation circuits. In: D.W.
Fuerstenau (Editor), Developments in Mineral Processing, Vol. 3. Elsevier, Amsterdam.
Meyer, W.C. and Klimpel, R.R., 1984. Rate limitations in froth flotation. Trans. AIME, 274: 1852-1858.
Meyers, R.H., 1990. Classical and Modem Regression with Applications, PWS-Kent, Boston, Ma.
SAS, 1985. SAS User's Guide: Statistics, Version 5 Edition. SAS Institute, Inc., Cary, NC.
Seber, G.A.F. and Wild, C.J., 1989. Nonlinear Regression, Wiley, New York, NY.

You might also like