Professional Documents
Culture Documents
com
DOI: 10.1002/minf.201400030
2014 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2014, 33, 311 314 311
Letter to the Editors www.molinf.com
by CV, but are unable to predict well really unseen new data in a preliminary step before the model development,
chemicals (see as examples the models 46 in Table 1, that putting aside these supposed unknown compounds for
are over-optimistically verified as predictive according to use later in the following evaluation of the models, models
CV, while, by various statistical parameters used for external that are developed only on the remaining training set used
validation,[15,16] they demonstrate their inability to predict in the learning process. In terms of validation procedure
new independent chemicals, when applied to two different there is no difference between an external dataset as tem-
prediction sets). There is evidence that CV is necessary, but porally delayed and the splitting of an available data set,
this is not sufficient to guarantee predictivity for new exter- obtaining in this way an external set. The chemicals put
nal chemicals.[5,10, 20] Therefore, it is not a psychological ar- aside in this preliminary splitting step constitute the set
gument (as stated by Gtlein et al.), but probably the dif- that, preferably, should be called external prediction
ferent philosophical approach of a QSAR modeller who set(s),[10, 1518] or the external evaluation set,[14] to be clearly
wishes to propose only cross-validated single models that distinguished from the iterative test sets of CV.
are additionally verified for their possible predictivity on Therefore, it should be clear that, in this approach, these
truly never-seen chemicals, to guarantee a larger generaliz- two validations of single models have completely different
ability. Certainly we will never wish to propose models that aims, and cannot be used as parallel or alternative process-
could be present in the GA population of CV-validated OLS es but only as sequential ones. The aim of CV is for a pre-
models, as for instance some models in Tab. 1 of the Princi- liminary model validation of each single model in the GA
ples of QSAR model validation paper[10] and the externally population, and to help in the selection of the most robust
unpredictive models no. 46 in Table 1. and internally predictive models; instead the use of exter-
In my cited paper, Principles of QSAR models validation: nal prediction sets has the subsequent goal of evaluating
internal and external,[10] I clarified the different aspects of each single model (these models being based only on the
what are, in my opinion and in the OECD Guidance Docu- structural information found in the training set compounds
ment for QSAR model validation,[20] internal and external and having passed previous cross-validation) with regard to
validations, but additional clarification is needed and is pro- its predictivity on actual unseen compounds whose
vided here. The question that requires an answer, which is chemical structures, as already pointed out, have NEVER in-
obtainable only by an additional external validation of one fluenced the descriptor selection. The single model in the
specific QSAR model, is: Is the developed model, whose GA population of CV-validated robust models which, simu-
robustness has been validated by CV, able to also predict lating a real application of the model, shows also prediction
completely new chemicals? These external chemicals must performances (measured by Q2ext or CCC)[15,16] , similar to the
never be included in training sets during the complete pro- internal ones (measured by Q2LOO and Q2LMO), also on the
cess of the model development, not even in one single iter- prediction set compounds, is preferred as a verified exter-
ation of the k-fold CV procedure; therefore, their structural nally predictive model and is our proposal (for instance
information must NEVER be taken into account. Psychologi- models no.13 in Table 1). To avoid misunderstandings on
cally, the best external set of new chemicals for this evalua- this point it is probably useful, and better, to define this ad-
tion (the so called blind set) should be one that might ditional check on really external compounds as external
become , available to a QSAR modeller after his model de- evaluation or external verification of a specific QSAR
velopment; this set could also be called a temporal set. model, before its proposal.
However, it is very rare to have a blind data set, due to In my recent works (see Gramatica,[17] as an example),
limited data availability and time reasons (we should after having, at the end, checked that the molecular de-
always have new experimental data for QSAR model evalu- scriptors, selected in a robust specific model taking infor-
ation and wait for a temporal set). Therefore, if the QSAR mation only from the structures of two training chemicals,
modeller wishes to verify a real model predictivity, before are successfully able to also predict completely new chemi-
proposing his best single model, his only option is to ex- cals (prediction sets), the same descriptors are used to re-
ploit the actual data availability, sacrificing part of these develop a full model on the complete data set to exploit all
Table 1. Comparison of internal and external validation parameters for some algae toxicity models.
Splitting by structure (Kohonen maps) Splitting by ordered response
2 2 2 2
Variables R QLOO Q LMO Qext-Fn CCCext R2 Q2LOO Q2LMO Q2ext-Fn CCCext
1 T(N..S) AEigZ Seigv* 0.83 0.76 0.72 0.720.84 0.87 0.85 0.79 0.77 0.730.79 0.86
2 AEigm F08[O-O] Seigv 0.84 0.80 0.76 0.720.84 0.84 0.87 0.83 0.81 0.690.76 0.82
3 nDB X2sol JGI4 0.80 0.70 0.66 0.800.88 0.87 0.83 0.76 0.74 0.700.77 0.86
4 Xindex F07[C Cl] F08[O O] 0.84 0.76 0.74 ( 0.02)0.40 0.62 0.83 0.75 0.73 0.100.32 0.62
5 nDB Xt F08[N O] 0.84 0.77 0.75 ( 0.13)0.34 0.60 0.80 0.72 0.71 0.020.25 0.69
6 Xt nCONN nCXr 0.83 0.79 0.77 ( 0.43)0.16 0.58 0.84 0.81 0.78 ( 0.36)( 0.04) 0.56
*Model published by Gramatica et al.[10] For CCC (Concordance Correlation Coefficient) see the literature.[15, 16]
2014 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2014, 33, 311 314 312
Letter to the Editors www.molinf.com
2014 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2014, 33, 311 314 313
Letter to the Editors www.molinf.com
tions. For this reason, it is important to apply some rules models, after their internal validation by CV, but this should
for splitting the original data into training set (for the learn- be done already at the proposal step.
ing step and subsequent random splits in training sub-sets
and tests for CV) and prediction set (for external evaluation
after model development and CV).[17,20] It is also useful to
apply different kinds of splitting methods, as implemented Acknowledgements
in QSARINS.[18] To avoid the limitation of using only a single
external set, we, in our recent papers,[17] always verify our I thank Knut Baumann, and also my collaborators Nicola
models on two/three different prediction sets: one ob- Chirico and Stefano Cassani, for the interesting and helpful
tained on the sorted responses (for verifying the model on discussions during the preparation and revision of this
chemicals in the response domain) and the other on struc- letter.
tural similarity by Kohonen Maps (to check the model in
the structural domain) and/or even randomly (that is the
splitting, in a sense, more similar to the real life situation of
References
unknown new chemicals and that, being unbiased for re-
sponse and structure, cannot be accused of purposeful ma- [1] M. Gtlein, C. Helma, A. Karwath, S. Kramer, Mol. Inf. 2013, 32,
nipulation). 516 528.
In conclusion, the two modelling approaches compared [2] H. Kubinyi, A. H. Fred, T. Mietzner, J. Med. Chem. 1998, 41,
here are philosophically different, and neither should be 2553 2564.
[3] H. Kubinyi, Quant. Struct-Act. Relat. 2002, 21, 348 356
considered as right or wrong. The approach based on [4] A. Golbraikh, A. Tropsha, J. Mol. Graph. Model. 2002, 20, 269
double CV is focused only in obtaining the best statistical 276.
performance (the lowest prediction error). The information [5] A. Tropsha, P. Gramatica, V. K. Gombar, QSAR Comb. Sci. 2003,
from all the available data is exploited, therefore the best 22, 69 77.
results are expected. This approach produces an ensemble [6] K. Baumann, TrAC 2003, 22, 395 406.
of different models, each verified on test chemicals that [7] D. M. Hawkins, S. C. Basak, D. Mills, J. Chem. Inf. Comput. Sci.
2003, 43, 579 586.
could be considered as external only for each correspond-
[8] D. M. Hawkins, J. Chem. Inf. Comput. Sci. 2004, 44, 1 12.
ing model, but at the end the complete algorithm is not [9] K. Baumann, N. Stiefl, J. Comput-Aid. Mol. Design 2004, 18,
verified on really new chemicals. Therefore, I agree with 549 562.
this statement in the Gtlein et al. paper:[1] If external vali- [10] P. Gramatica, QSAR Comb. Sci. 2007, 26, 694 701.
dation implies (i) that no instance from any test set is ever [11] A. Tropsha, A. Golbraikh, Curr. Pharm. Des. 2007, 13, 3494
used for building the final model (see e.g. the literature[3,6,20]), 3504.
[12] A. Tropsha, Mol. Inf. 2010, 29, 476 488.
then no form of cross-validation (in which the complete data
[13] K. H. Esbensen, P. Geladi, J. Chemom. 2010; 24,168 187.
set is repeatedly divided into disjoint training and test sets) [14] T. M. Martin, P. Harten, D. M. Young, E. N. Muratov, A. Gol-
can be regarded as external validation braikh, H. Zhu, A. Tropsha, J. Chem. Inf. Model. 2012, 52,
The approach based on additional verification of statisti- 2570 2578.
cally robust models on real external chemicals has the aim [15] N. Chirico, P. Gramatica, J. Chem. Inf. Model. 2011, 51, 2320
to propose good QSAR models (even if, probably, not the 2335.
best possible models) additionally evaluating them for [16] N. Chirico, P. Gramatica, J. Chem. Inf. Model. 2012, 52, 2044
2058.
the external predictivity before their presentation, in order [17] P. Gramatica, S. Cassani, P. P. Roy, S. Kovarich, C. W. Yap, E.
to guarantee a larger generalizability. We hope to avoid the Papa, Mol. Inf. 2012, 31, 817 835.
proposal of models which seem only in appearance predic- [18] P. Gramatica, N. Chirico, E. Papa, S. Cassani, S. Kovarich, J.
tive, as models no. 46 in Table 1. In my opinion, it is im- Comput. Chem. 2013, 34, 2121 2132.
portant to remember that the ultimate goal of a validation [19] OECD Principles 2004; http://www.oecd.org/dataoecd/33/37/
strategy should be to simulate, with sufficient accuracy, the 37849783.pdf (accessed 02/02/2014
[20] Guidance Document on the Validation of (Quantitative) Struc-
difficulties that one would encounter when applying
ture-Activity Relationships Models ENV/JM/MONO(2007)2;
a methodology in future circumstances (new experimental http://search.oecd.org/officialdocuments/displaydocu-
data), trying to represent the future working situation of mentpdf/?doclanguage = en&cote = env/jm/
the particular model: only an additional external evalua- mono%282007 %292 (accessed 02/02/2014).
tion on totally new chemicals can do this for QSAR
2014 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2014, 33, 311 314 314