Professional Documents
Culture Documents
Please cite this article as: G.M. Randazzo, D. Tonoli, S. Hambye, D. Guillarme, F. Jeanneret, A.
Nurisso, L. Goracci, J. Boccard, S. Rudaz, Prediction of retention time in reversed-phase liquid
chromatography as a tool for steroid identification, Analytica Chimica Acta (2016), doi: 10.1016/
j.aca.2016.02.014.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
PT
RI
U SC
AN
M
D
TE
EP
C
AC
ACCEPTED MANUSCRIPT
PT
AUTHORS: Giuseppe Marco RANDAZZO(1), David TONOLI(1,2,3), Stephanie HAMBYE(1),
RI
GORACCI(4), Julien BOCCARD(1), Serge RUDAZ(1,2)
SC
(1) School of Pharmaceutical Sciences, University of Geneva and University of Lausanne,
Geneva, Switzerland
U
(2) Swiss Centre for Applied Human Toxicology (SCAHT), Universities of Basel and Geneva,
AN
Basel, Switzerland
(4) Department of Chemistry, Biology and Biotechnology, University of Perugia, Perugia, Italy
D
TE
C EP
AC
CORRESPONDENCE:
E-mail: serge.rudaz@unige.ch
1
ACCEPTED MANUSCRIPT
ABSTRACT
The untargeted profiling of steroids constitutes a growing research field because of their
such as ultra high-pressure liquid chromatography coupled with mass spectrometry (MS),
offer the possibility of a fast and sensitive analysis. Nevertheless, difficulties regarding
PT
steroid identification are encountered when considering isotopomeric steroids. Thus, the use
of retention times is of great help for the unambiguous identification of steroids. In this
RI
context, starting from the linear solvent strength (LSS) theory, quantitative structure retention
SC
VolSurf+ descriptors combined with a new dedicated molecular fingerprint, were developed
to predict retention times of steroid structures in any gradient mode conditions. Satisfactory
U
performance was obtained during nested cross-validation with a predictive ability (Q2) of
AN
0.92. The generalisation ability of the model was further confirmed by an average error of
4.4% in external prediction. This allowed the list of candidates associated with identical
M
2
ACCEPTED MANUSCRIPT
1. INTRODUCTION
Steroids are a family of molecules which regulate several vital functions, such as body
growth, response to stress, sexual development and behaviour, and rate of metabolism [1].
The endocrine system is responsible for numerous regulatory functions, thus, a steroid
PT
diseases. Several genetic disorders, such as adrenoleukodystrophy, Wolman disease and
RI
have been identified as a cause of endocrine perturbation [2]. Other diseases (e.g.,
hypertension, cancer, etc.) are associated with an altered regulation of hormone production
SC
[3]. Currently, environmental toxicants also constitute a potent source of endocrine
disruption. In this context, a large number of drugs and synthetic chemicals have been
U
identified as endocrine disruptors (EDCs) [4] by the Food and Drug Administration (FDA). A
AN
large-scale monitoring of steroidogenesis is therefore essential for the screening of EDCs,
recommended within the Organization for Economic Co-operation and Development (OECD)
M
guidelines.
D
Cholesterol could be considered as the initiating molecule of steroidogenesis [5]. From this
TE
compound, the different endocrine glands are able to synthesise a vast range of relatively
a wide structural diversity on the gonane skeleton (Figure 1), dictating the need for selective
chromatography (LC-MS) today represents the standard detection method for steroids [6].
The main issue of the GC-MS analytical workflow is related to the derivatisation step required
to enhance the volatility of the analyte, which limits throughput [7]. Today, untargeted
(HRMS) provide an attractive analytical alternative for steroid analysis [8, 9]. The resolution
offered by HRMS and the peak capacity offered by modern LC allows the simultaneously
analysis of thousands of peaks in complex biological matrices, such as urine, blood, plasma,
3
ACCEPTED MANUSCRIPT
and cellular cultures [8, 10-12]. The first step in the identification of features is usually
accomplished through mass-based database searches [13]. However, this procedure has
severe limitations in the presence of isotopomers [14] having the same exact mass and
therefore the same molecular formula [15, 16]. A recent work showed that diastereoisomers
PT
chromatographic retention time constitutes an additional molecular property that can be
RI
Modelling retention in LC is an active field of research both for theoretical and practical
SC
needs. Several theoretical models have already been described and implemented using
U
reported by Hberger [19]. For reversed-phase (RP) retention, the seminal works of Dolan
AN
and Snyder [20] have demonstrated that changes in retention behaviour with mobile phase
composition can be modelled thanks to a linearisation of the logarithm of the retention factor
M
(k) towards the percentage of the organic modifier in the mobile phase. Consequently, the
linear solvent-strength model (LSS) was proposed both to characterise the principles of
D
gradient separation and to establish the mathematical relationship that can be used in both
TE
isocratic and gradient conditions. However, the need for a formal peak tracking of each
analyte constitutes the most important constraint to speed up and automate the
EP
Quantitative structure retention relationship (QSRR) models were developed for the
AC
in the review by Hberger and the works of Kaliszan [21-23]. For that purpose, in silico
descriptors allow an extensive characterisation of the molecular structures. They include (i)
descriptors. The main limitation of most existing QSRR models reported in the literature is
4
ACCEPTED MANUSCRIPT
Nord et al. published a study based on three-dimensional descriptors, underlining the role of
lipophilicity) [24]. However, their dataset included only few diastereoisomers with identical
calculated log P values, whereas their experimental retention times were different.
The first part of this work demonstrates the importance of retention time prediction in various
PT
gradient conditions for the identification of steroid candidates. The LSS model, allowing the
RI
identification of steroids, could be considered as an efficient strategy. In a second step,
QSRR models based on VolSurf+ 3D molecular descriptors combined with a new gonane
SC
topological weighted fingerprint (GTWF) were developed. A supervised model based on
partial least squares (PLS) was elaborated to predict the LSS parameters (Log kw and S) and
U
to calculate the retention time of the steroid. Thanks to the description of the chiral centres of
AN
the steroid homologues, the models performance was increased, providing satisfactory
Acetonitrile (ACN) and water were purchased from Romil Ltd. (Waterbeach, UK). MeOH was
EP
provided by Biosolve (Valkenswaard, Netherlands). Formic acid (FA) was obtained from
Sigma-Aldrich (Buchs, Switzerland). The reference steroids (n=91) were purchased from
C
different suppliers (Steraloids, Sigma, LGC Standards, Sterling). Stock solutions of 1 mg/mL
AC
of each steroid standard were made in methanol and stored at -80 C before use. Working
solutions (10 g/mL) were prepared by dilutions of the stock solution in H2O / ACN (95:5, v/v)
+ 0.1% FA.
UHPLC separations were performed with an UPLC Acquity H-Class (Waters, Milford, MA,
USA) including a quaternary solvent manager (QSM), a sample manager (SM-FTN), and a
5
ACCEPTED MANUSCRIPT
column manager (CM-A). The separation was performed on a Kinetex C18 column (2.1 x
150 mm, 1.7 m) (Phenomenex, Torrance, CA, USA) with a SecurityGuard ULTRA C18 (2.1
x 2 mm) (Phenomenex, Torrance, CA, USA). Samples were kept at 6 C in the autosampler.
The post-column flow was directed into a 6-port valve that in turn was directed into a waste
vessel between 0.06 and 0.43 min. During this period, a calibration solution used for
PT
automatic post-acquisition recalibration was infused into the ESI source at 100 L/min using
a Sun Flow 100 HPLC pump (SunChrom). The calibration solution consisted of a 1 M sodium
RI
hydroxide solution diluted 100-fold with H2O + 0.1% FA/ isopropanol + 0.1% FA (50:50, v/v).
Automatic calibration was performed with nine formate adducts spanning an m/z range of
SC
158 to 702 using the high-precision calibration (HPC) algorithm and calibration, version 1.0
U
The maXis 3G QTOF MS (Bruker) used an electrospray ionisation (ESI) source operating in
AN
the positive mode. The end plate offset value was set at 500 V, the capillary voltage at 4.7
kV and the nebuliser pressure at 1.8 bar. The dry gas flow rate and temperature were 5.5
M
L/min and 225 C, respectively. The transfer time a nd pre-pulse storage were set at 40 and
D
7.0 s, respectively. The accumulation time was set at 0.5 s, and the monitored mass range
TE
was from m/z 50 to 1,000. The acquisition was performed in the profile mode, and the data
were acquired using the Compass v1.5 SR3 software suite from Bruker and HyStar v3.2
EP
SR2. The UHPLC was controlled using the plug-in for a Waters Acquity UPLC v.1.5.
Mobile phase A was H2O + 0.1% FA, and mobile phase B was ACN + 0.1% FA. The flow-
C
rate was set at 300 L/min. Linear gradients from 5% to 95% B were performed in 14, 45,
AC
and 60 minutes, and 10 L of each working solution was injected. The column was kept at 30
C during the analyses. Data processing was perform ed using Data Analysis 4.2 (64-bit,
The LSS model parameters were calculated with an in-house software developed in Python
language using equations from the LSS theory described by Snyder and Dolan [25] and
6
ACCEPTED MANUSCRIPT
were executed with Python (version 2.7.9). The software takes as input the two retention
times measured from two different linear gradients and uses the Nelder-Mead optimisation
method [26] to minimise the retention time recalculation errors as an objective function. The
isocratic separation optimisation was performed with the same package taking into account
PT
2.4 QSRR models: dataset, molecular descriptors and regression analysis
2.4.1 Dataset
RI
The dataset totalled 91 steroid structures: 48 androgens, 27 mineraloglucocorticoids, 14
progestogens and 2 oestrogens. To validate the proposed approach and evaluate retention
SC
time estimation through LSS parameter prediction, the dataset was rationally divided into a
training set (80% of the overall set) and an external test set (20% of the overall set). The
U
division was achieved with the Most Descriptive Compounds (MDC) algorithm described by
AN
Hudson et al. [27] applied to PCA scores accounting for 95% of the variance associated with
Molecular descriptors were calculated using canonical simplified molecular-input line entry
TE
system strings (SMILES) with VolSurf+ version 1.7.0.l [28]. This software uses GRID force
hydrogen bonding donor/acceptor surface volumes around the molecules, using a water
probe (O) and a hydrogen-bonding donor amide nitrogen probe (N1). Energy maps, called
AC
Molecular Interaction Fields (MIFs), were then converted into molecular descriptors. VolSurf+
generates 128 molecular descriptors related to molecular shape, volume, polarisability, polar
surface area, hydrophobic surface area, lipophilicity, molecular diffusion, and solubility. It
also includes specific descriptors, which refer to the volume and relative position of MIFs at
All the analysed steroids were in their neutral forms due to their highly acidic pKa values, thus
the steroids were beyond the estimated mobile phases pH (i.e., 2.5) [30] and all pH-
7
ACCEPTED MANUSCRIPT
dependent descriptors were removed. The percentage of un-ionised species and all ADME
model predictions, except intrinsic solubility, were also removed. The impact of the latter in
RP chromatography was underlined by Nord et al. [24]. The final models for Log Kw and S
were built accounting for a total of 97 VolSurf+ molecular descriptors (see Table S4 for the
PT
2.4.3 Gonane topological weighted fingerprint
RI
A GTWF was calculated in two steps. First, the molecule was analysed in the SMILES format
to match the gonane skeleton and to detect the fragment bonded to each position. This
SC
procedure was computed by in-house software in C++ language, which implements the
subgraph isomorphism algorithm described by Ullmann [31]. In the second step, a fingerprint
U
composed of 68 columns (17 positions on the gonane skeleton 4 cases) was generated for
AN
each molecular structure. For a given position, the first column was used to describe
substituent and stereocentre weights were then initiated with random values and further
TE
optimised using a simplex method. The simplex procedure was stopped when the error of an
optimised PLS model predicting LSS parameters was minimised. The optimised fingerprint
EP
weights were validated using a nested cross-validation. All the PLS prediction models were
performed using VolSurf+, which implements the NIPALS algorithm. The descriptors were
C
standardised to unit variance. The model prediction accuracy was assessed through a 5-fold
AC
cross-validation repeated 20 times to reduce the variance during the validation process and
During the first phase of the study, an LSS empirical model was obtained for each steroid
standard. The steroid dataset was composed of 91 endogenous and exogenous steroids
8
ACCEPTED MANUSCRIPT
covering the main steroid structures and classes (i.e., progestogens, oestrogens, androgens
separations and predict retention times under any gradient conditions through two variables:
Log kw and S. While Log kW represents the extrapolated value of the retention factor k in pure
water, S represents a constant molecular parameter for a given compound and fixed
PT
experimental conditions (i.e., the stationary phase chemistry). S corresponds to the slope of
the linear fit between the logarithm of the retention factor (k) vs. the percentage of organic
RI
solvent. Values for Log kw and S of the 91 steroids (analysed by groups of 6) were obtained
from two different gradients presenting a runtime ratio 3 to provide accurate values,
SC
according to the work of Snyder & Dolan [20]. Hence, two gradients of 14 and 60 min were
selected. After data acquisition, models were assessed using an iterative procedure, which
U
take the two retention times and extrapolates from them Log kW and S (for more details see
AN
the package at https://github.com/gmrandazzo/PyLSS). To validate the parameter
estimation, the retention times of the 91 steroids were calculated over a linear gradient of 45
M
accuracies (R2 of 0.9997) were obtained with a maximum bias of approximately 0.5% for the
TE
calculated values. With this performance, even minor changes in chromatographic selectivity
can be anticipated.
EP
To clearly illustrate the problem of steroid identification (ID) and the importance of retention
time evaluation in this context, a case study encountered during the analysis of cellular
C
refer to Tonoli et al. [33]. According to the data analysis, a steroid of interest to be identified
was measured at tR = 7.85 min. The steroid of interest presented an accurate mass of m/z
305.2111 (Figure 3), which corresponds to the molecular formula C19H28O3 (tolerance set at
+/- 5 ppm). Unfortunately, a query of Lipid Maps revealed that up to 20 steroid isotopomer
structures could be associated with this single exact mass (Figure 4).
9
ACCEPTED MANUSCRIPT
Taking into account the analytical conditions used with a tolerance of +/- 0.3 min (see
Reference 32), and thanks to the retention time prediction of this panel of candidates using
LSS theory, the number of putative compounds was decreased from 20 to only 4 candidates,
PT
16-Hydroxytestosterone. While the first advantage to integrate chromatographic information
for ID is to drastically decrease the number of suggestions, the second advantage concerns
RI
the possibility of rapidly obtaining optimal chromatographic conditions in terms of peak
spacing and resolution for confirmatory purposes. Hence, isocratic conditions (an organic
SC
solvent concentration of 25%) were advocated to achieve the best separation for these 4
steroids. As predicted, the injection on the selected stationary phase of a standard mixture of
U
the 4 candidates demonstrated the difficulty of obtaining a complete chromatographic
AN
separation for two co-eluting steroids, namely 16-Hydroxy-DHEA and 11-
The information gathered during the simultaneous MS/MS experiments was therefore used
TE
to differentiate these steroids. In fact, the in-source fragmentation profile of these compounds
was different; 16-Hydroxy-DHEA led to the observation of the two most prominent in-source
EP
mainly the molecular ion [M+H]+. After re-injection of the biological sample under isocratic
C
compound of interest in the cellular culture. This example illustrates the systematic approach
needed to obtain a definitive steroid identification, even in the presence of steroid standards.
When standards are not available, retention time prediction using a QSRR model could
constitute a potent way to reduce the number of candidates. Given that the QSRR models
that directly predict retention time are dependent on the instrument and chromatographic
10
ACCEPTED MANUSCRIPT
conditions, the LSS parameters, namely Log kW and S, were chosen as key parameters to be
The molecular descriptors are of the utmost importance for generating robust QSRR models.
PT
The role of QSRR models is to extract relevant information about molecular shape,
RI
descriptors (such as intrinsic solubility, Log Pn-oct, and molecular diffusion), thus VolSurf+ 3D
molecular descriptors are naturally chosen in RP-LC to describe their lipophilic interactions
SC
[34]. PLS was selected due to its ability to efficiently handle large numbers of correlated
U
descriptors. Additionally, PLS provides several outputs, which help to obtain better insights
into the intrinsic phenomena governing the retention process. The optimal number of latent
AN
variables associated with each PLS model was selected using a 5-fold cross-validation
M
iterated 100 times. The prediction ability was evaluated using the 20% out-of-bag randomly
selected data subset. Nested cross-validation was carried out to avoid over-fitting. The
D
results obtained for Log kw and S are reported in Table 1 in terms of prediction accuracy (Q2)
TE
As presented in Table 1, the conventional VolSurf+ set of descriptors was quite efficient in
obtaining effective predictions of steroid retention times through the use of LSS parameters.
C
This was particularly the case for the S parameter, considered as the most important input to
AC
obtain accurate retention times. It is worth noting that VolSurf+ descriptors are versatile and
consequence, the resulting models are often self-explanatory and easily interpretable. When
molecular features for reliable prediction. However, only small differences are observed in
MIFs when comparing diastereoisomers. For example, a global difference below 15% is
11
ACCEPTED MANUSCRIPT
retention shift of 0.8 minutes is measured on a gradient of 15 minutes. This limitation may be
because the GRID Force-Field assigns the same atom charge to both compounds. To
improve the prediction ability of the model, additional topological information was tentatively
PT
3.4 The gonane topological weighted fingerprint
RI
structure used to evaluate similarity between molecules [35]. The most common fingerprints
are based on the presence/absence of substructure patterns in a molecule. The choice of the
SC
latter plays a critical role in the definition of the fingerprint and on the setting of the
U
similarity/dissimilarity criteria between molecules. When compounds are similar, as could be
considered in the case of steroids, the only way to discriminate these structures is to define a
AN
fingerprint capable of detecting differences between chiral centres and substituents. Thus, a
M
new type of fingerprint dedicated to steroid structures, named the Gonane Topological
was chosen for its advantage in its interpretability [36]. The main role of GTWF is to detect
weights: a geometrical weight w, which depicts the chirality, and a substituent weight c. The
=
(eq.1)
AC
The parameter w represents the stereo-information weight and depends on the R, S, and
However, as presented in Table 1, the use of GTWF descriptors alone was not sufficient to
provide satisfactory retention time prediction. Therefore, a collection of 165 descriptors (97
from VolSurf+ combined with 68 from GTWF) was retained to build QSRR models for Log kw
12
ACCEPTED MANUSCRIPT
and S. The results obtained for Log kw and S when including the GTWF fingerprint are
reported in Table 1. The GTWF fragments with their corresponding w and c parameters are
summarised in Tables S1 and S2, respectively. Other algorithms, such as Random Forest,
Neural Networks and Support Vector Regression, and Forward Stepwise Regression were
simultaneously evaluated but did not exhibit any improvement in retention time prediction
PT
(See table S3, supporting information).
RI
3.5 Log kW model discussion
The PLS model related to Log kW was characterised by a prediction ability Q2 of 0.72 with 3
SC
latent variables. In this case, only a slight improvement was obtained by the addition of the
GTWF. The investigation of beta coefficients (see Figure 6a) allowed the most important
U
descriptors to be highlighted. The latter included the VolSurf+ molecular volume V, HSA, Log
AN
Pn-oct and the 3D triplet pharmacophoric area descriptors ACACDO/ACDODO, which
describe the hydrogen bonding triplets. All these parameters were positively correlated with
M
Log kW. Interestingly, these values are consistent with the physical means of the retention
D
factor k, which is related to the distribution coefficient between the stationary and the mobile
TE
phase. The retention factor for a compound in pure water kw could be considered as a
descriptor of the maximum interaction with the stationary phase. In RP-LC. lipophilicity is
EP
actually known to be highly correlated with Log kw, as reported in the work of Henchoz et al.
The variable S is a molecular parameter related to both the characteristics and column
selectivity for a solute, both of which are driven by intermolecular interactions, such as
hydrophobicity and hydrogen bonding. The PLS regression model associated with the S
parameter was characterised by a high predictive ability with a Q2 value of 0.91 with 3 latent
variables. Interestingly, a marked improvement of prediction accuracy was obtained with the
addition of GTWF. Figure 6b reports the PLS beta coefficients for three latent variables.
13
ACCEPTED MANUSCRIPT
Apart from the negative contribution of Log Pc-hex, the most important descriptors were the
hydrophilic volume WO5 and WO6 and the 3D pharmacophoric descriptors based on the
three points of the hydrogen bonding acceptors, hydrogen bonding donors and hydrogen
bonding donors (ACDODO) from the VolSurf+ set. Topological descriptors confirmed the
PT
chromatographic selectivity. Prominent hydrophobic interactions were associated with
RI
configurations R and S, whereas marked hydrogen bonding acceptor/donor interactions were
associated with positions 15, 16 and 17 substituted with a variety of hydrophilic groups. It is
SC
also interesting to note that few molecules revealed other important positions (e.g., position
4, 5 or 7). This may be due to the relative abundance of specific structures in the dataset.
U
AN
3.7 Model performance in external prediction
To validate the model, a test set was rationally extracted from the entire dataset according to
M
the procedure described in the materials and methods. A subset of 72 molecules (the training
D
set) was kept to train the GTWF descriptors and to build the PLS models. The modelled LSS
TE
parameters were used to predict the retention time of the remaining 19 steroids (the test set)
with three values of gradient steepness (i.e., 14, 45 and 60 minutes) to demonstrate the
EP
ability Q2internal of 0.90 and R2external of 0.93 (using 19 molecules of the test set) were reported
C
according to the predicted versus experimental retention times (Figure 7). Moreover a
AC
comparison between the retention time estimation based on the experimental LSS
parameters and the QSRR derived parameters is reported in Figure S1 in the supplementary
information.
The structures and prediction of retention times at various gradients of the external test set
are reported in Table S5. From these results, the global relative error in prediction was lower
than 5% (i.e., 4.4%). Interestingly, nine molecules were predicted with a relative retention
error under 3%. Poorly predicted values (i.e., error comprised between 5% and 10%) were
14
ACCEPTED MANUSCRIPT
related to diastereoisomers, such as compounds 11 and 12 (see Table S4). Additionally,
other compounds showed a poor prediction, such as compound 19, which corresponds to the
only molecule possessing a 2-Hydroxyethyl fragment in position 17. Even if the global
prediction error of the QSRR models is approximately 4%, the relative error in the prediction
of retention time remains too high to unambiguously assign a specific structure to each peak.
PT
Nevertheless, these results appear very promising for reducing the number of steroid
candidates.
RI
U SC
AN
M
D
TE
C EP
AC
15
ACCEPTED MANUSCRIPT
4. CONCLUSION
Steroidomics, i.e., the untargeted profiling of steroids, constitutes an important research field
due to the numerous applications related to steroidogenesis and its alterations. Untargeted
detect a wide range of putative steroids, selected based on their exact mass matched to
PT
databases. However, as often in omics-like strategies, the definitive assignment of identity
remains the main bottleneck. As presented here, empirical models allowing the prediction of
RI
retention times in association with experimental MS information, i.e., exact masses and
MS/MS spectra, are essential for the identification of steroids. Chromatographic information
SC
should act as a filter to reduce the initial data dimensionality (post-targeted approach) and is
mandatory for confirmatory biomarker identification. In the absence of empirical models, the
U
prediction of retention time from the molecular structure has a great potential to facilitate and
AN
accelerate the identification process. While improvements are still necessary, QSRR
modelling was demonstrated as a potent tool to predict the retention time of steroids under
M
LSS parameters (S and Log kw), with the great advantage to be possibly applied in any
gradient condition. The additional advantage when predicting LSS parameters such as S and
EP
Log kw is to provide valuable information for optimising the isocratic separation of co-eluting
analytes, including isotopomers. Even if some issues are still observed for specific
C
dataset will efficiently improve the accuracy of prediction. The next step will consist of
increasing the number of steroids in the database for refining QSRR models. Thanks to the
information obtained from the GTWF and PLS beta coefficients, the selection of new steroids
16
ACCEPTED MANUSCRIPT
ACKNOWLEDGMENTS
The authors wish to thank Dr. Szabolcs Fekete for valuable scientific exchanges regarding
the LSS model. The authors also acknowledge Sterling (Perugia) for providing several
exogenous substrates. Dr. Alessandra Nurisso also gratefully acknowledges the Excellence
fellowship programme of the University of Geneva, Switzerland for its financial support.
PT
FIGURES CAPTIONS
RI
Figure 1. Classification of steroids based on their precursors. From each scaffold, different
SC
Figure 2. Plot of experimental versus predicted retention times through the empirical LSS
Figure 4. Isotopomers structures extracted in LipidMaps with the molecular formula C19H28O3.
EP
Figure 5. Chromatogram obtained under optimal isocratic conditions at 25% ACN applied to
17
ACCEPTED MANUSCRIPT
TABLE CAPTIONS
Table 1. QSRR model results for Log kw and S. LV represents the optimised number of latent
variables. R2 represents the fitting of the model. Q2 is the prediction ability estimated by
by cross-validation.
PT
REFERENCES
RI
[1] K. Monostory, Z. Dvorak, Steroid regulation of drug-metabolizing cytochromes P450, Curr
Drug Metab, 12 (2011) 154-172.
[2] D. Lin, T. Sugawara, J.F. Strauss, 3rd, B.J. Clark, D.M. Stocco, P. Saenger, A. Rogol,
SC
W.L. Miller, Role of steroidogenic acute regulatory protein in adrenal and gonadal
steroidogenesis, Science (New York, N.Y.), 267 (1995) 1828-1831.
[3] M.G. Mohaupt, The role of adrenal steroidogenesis in arterial hypertension, Endocrine
development, 13 (2008) 133-144.
U
[4] FDA, FDA, Endocrine disruption potential of drugs: nonclinical evaluation, Draft Guidance,
, Federal Register 78 183: 57859., (September 2013).
AN
[5] J.I. Mason, W.E. Rainey, Steroidogenesis in the human fetal adrenal: a role for
cholesterol synthesized de novo, The Journal of clinical endocrinology and metabolism, 64
(1987) 140-147.
[6] H.J. Makin, J. Honour, C.L. Shackleton, W. Griffiths, General Methods for the Extraction,
M
[11] S. Fekete, J. Schappler, J.L. Veuthey, D. Guillarme, Current and future trends in
UHPLC, Trac-Trends in Analytical Chemistry, 63 (2014) 2-13.
AC
18
ACCEPTED MANUSCRIPT
[16] B. L. D., D. J Sanaullah Borts., HPLC/MS/MS measurement of steroid conjugates: An
analytical problem of olympic proportions, Book of Abstracts, 213th ACS National Meeting,
(1997) ANYL-148.
[17] S. Romand, S. Rudaz, D. Guillarme, Separation of substrates and closely related
glucuronide metabolites using various chromatographic modes, J Chromatogr A, (2016).
[18] Z.J. Zhu, A.W. Schultz, J. Wang, C.H. Johnson, S.M. Yannone, G.J. Patti, G. Siuzdak,
Liquid chromatography quadrupole time-of-flight mass spectrometry characterization of
metabolites guided by the METLIN database, Nature protocols, 8 (2013) 451-460.
[19] K. Hberger, Quantitative structure(chromatographic) retention relationships, Journal of
Chromatography A, 1158 (2007) 273-305.
PT
[20] J.J.K. Lloyd R. Snyder, John W. Dolan, Introduction to Modern Liquid Chromatography,
3rd Edition, 3rd ed.2010.
[21] K. Heberger, Quantitative structure-(chromatographic) retention relationships, J
Chromatogr A, 1158 (2007) 273-305.
RI
[22] R. Kaliszan, Structure and Retention in Chromatography: A Chemometric Approach: 1
(Chromatography: Principles & Practice).
[23] R. Kaliszan, Quantitative structure-retention relationships, Analytical Chemistry, 64
SC
(1992) 619A-631A.
[24] L.I. Nord, D. Fransson, S.P. Jacobsson, Prediction of liquid chromatographic retention
times of steroids by three-dimensional structure descriptors and partial least squares
modeling, Chemometrics and Intelligent Laboratory Systems, 44 (1998) 257-269.
U
[25] L.R.S.J.W. Dolan, High-Performance Gradient Elution: The Practical Application of the
Linear-Solvent-Strength Model, (2007).
AN
[26] J.A. Nelder, R. Mead, A Simplex-Method for Function Minimization, Computer Journal, 7
(1965) 308-313.
[27] B.D. Hudson, R.M. Hyde, E. Rahr, J. Wood, Parameter based methods for compound
selection from chemical databases, Quantitative Structure-Activity Relationships, 15 (1996)
M
285-289.
[28] VolSurf+ http://www.moldiscovery.com.
[29] P.J. Goodford, A computational procedure for determining energetically favorable
D
out and bootstrap, Computational Statistics & Data Analysis, 53 (2009) 3735-3745.
[33] D. Tonoli, C. Furstenberger, J. Boccard, D. Hochstrasser, F. Jeanneret, A. Odermatt, S.
Rudaz, Steroidomic Footprinting Based on Ultra-High Performance Liquid Chromatography
Coupled with Qualitative and Quantitative High-Resolution Mass Spectrometry for the
C
[34] T.M. Almeida, A. Leitao, M.L. Montanari, C.A. Montanari, The molecular retention
mechanism in reversed-phase liquid chromatography of meso-ionic compounds by
quantitative structure-retention relationships (QSRR), Chemistry & biodiversity, 2 (2005)
1691-1700.
[35] D. Rogers, M. Hahn, Extended-connectivity fingerprints, Journal of chemical information
and modeling, 50 (2010) 742-754.
[36] H. Chen, L. Carlsson, M. Eriksson, P. Varkonyi, U. Norinder, I. Nilsson, Beyond the
scope of Free-Wilson analysis: building interpretable QSAR models with machine learning
algorithms, Journal of chemical information and modeling, 53 (2013) 1324-1336.
[37] Y. Henchoz, D. Guillarme, S. Martel, S. Rudaz, J.L. Veuthey, P.A. Carrupt, Fast log P
determination by ultra-high-pressure liquid chromatography coupled with UV and mass
spectrometry detections, Anal Bioanal Chem, 394 (2009) 1919-1930.
19
ACCEPTED MANUSCRIPT
FIGURES
Figure 1. Classification of steroids based on their precursors. From each scaffold, different
PT
RI
U SC
AN
M
D
TE
C EP
AC
21
ACCEPTED MANUSCRIPT
Figure 2. Plot of experimental versus predicted retention times through the empirical LSS
PT
RI
U SC
AN
M
D
TE
C EP
AC
22
ACCEPTED MANUSCRIPT
PT
RI
U SC
AN
M
D
TE
C EP
AC
23
ACCEPTED MANUSCRIPT
Figure 4. Isotopomers structures extracted in LipidMaps with the molecular formula C19H28O3.
PT
RI
U SC
AN
M
D
TE
C EP
AC
24
ACCEPTED MANUSCRIPT
Figure 5. Chromatogram obtained under optimal isocratic conditions at 25% ACN applied to
11-Hydroxytestosterone
16-Hydroxy-DHEA
16-Hydroxytestosterone
PT
RI
5-Androstan-3-ol-7,17-dione
U SC
AN
M
D
TE
C EP
AC
25
ACCEPTED MANUSCRIPT
Figure 6. Beta coefficient plot at 3 LVs for a) Log kW and b) S.
HSA
Log Pn-oct
V
Shape Descriptors
PT
ACDODO
ACACDO
RI
U SC
AN
M
D
TE
C EP
AC
26
ACCEPTED MANUSCRIPT
Figure 7. Plots of experimental vs. predicted retention times: a) 14 minute gradient; b) 45
PT
RI
U SC
AN
M
D
TE
C EP
AC
27
ACCEPTED MANUSCRIPT
TABLE
Table 1. QSRR model results for Log kw and S. LV represents the optimised number of latent
variables. R2 represents the fitting of the model. Q2 is the prediction ability estimated by
by cross-validation.
PT
LV R2 Q2 RMSEP
Log kW
RI
3 0.80 0.70 0.10
VolSurf+
S
3 0.86 0.80 0.50
VolSurf+
SC
Log kW
1 0.35 0.16 0.17
GTWF
S
2 0.58 0.24 0.95
GTWF
U
Log kW
3 0.88 0.72 0.08
VolSurf+ and GTWF
AN
S
3 0.95 0.91 0.33
VolSurf+ and GTWF
M
D
TE
C EP
AC
28
ACCEPTED MANUSCRIPT
Difficulties regarding steroid identification are encountered when considering
isotopomeric steroids.
PT
The list of candidates associated with identical monoisotopic masses was
strongly reduced, facilitating definitive steroid identification.
RI
U SC
AN
M
D
TE
C EP
AC
Table S1 Table of w stereocentre weights. R and S correspond to the reverse and sinister
gonane scaffold.
PT
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
R 4.87 4.85 4.86 -3.97 -4.77 -0.61 4.90 1.53 -5.02 -4.87 -2.60 -4.59 4.31 -5.84 -4.36 -0.63 0.67
RI
S 2.24 -2.88 -4.47 2.17 -3.50 3.67 -1.97 -5.72 0.88 1.76 2.28 -0.81 -3.36 -2.88 -0.84 2.17 -3.50
P 3.20 1.63 0.88 5.93 5.40 -2.20 -0.87 -0.10 -5.58 2.32 -4.04 2.47 -3.15 -2.39 1.20 1.66 -4.02
SC
Table S2 Table of c substituent/fragments with the corresponding weight.
U
AN
9.96 2.28
-0.37
M
-6.10
18.61 -0.55
C EP
0.83 11.68
-0.16
AC
Q2internal
Options R2fitting RMSEP
validation
LogKw S LogKw S LogKw S
LM algorithm
ANN
VolSurf+ and GTWF
10 Hidden 0.99 0.99 0.63 0.81 0.11 0.47
Layer
SVR
VolSurf+ and GTWF
RFB Kernel 0.99 0.98 0.74 0.87 0.09 0.39
PT
RF
VolSurf+ and GTWF
100 Tree 0.95 0.97 0.65 0.80 0.11 0.49
Stepwise Regression Forward
VolSuf+ and GTWF
0.86 0.92 0.81 0.90 0.08 0.34
F-Ratio
RI
LM algorithm
ANN
VolSurf+
10 Hidden 0.99 0.99 0.45 0.73 0.13 0.56
Layer
SC
SVR
VolSurf+
RFB Kernel 0.95 0.88 0.76 0.81 0.09 0.47
RF
VolSurf+
100 Tree 0.95 0.97 0.64 0.78 0.11 0.50
U
AN
Table S4 : List of VolSurf+ descriptors used to build models.
D1 IW4 A DODODO
D2 CW1 CP SOLY
AC
PT
O
OH
O
RI
(S)
(S)
H
(S) (S)
(R) (S)
1 O
H H
344.1988 8.53 8.64 18.31 18.56 22.11 22.60 1.3%
SC
11-Dehydrocorticosterone
C21H28O4
U
AN
2 304.2038 9.55 9.80 20.86 20.86 25.42 26.35 2.6%
M
11-Ketoetiocholanolone
C19H28O3
D
O
O
(S)
H
(S) (S)
TE
(S) (S)
H H
(R) (S)
3 HO
H
304.2038 9.59 9.99 20.82 22.06 25.49 26.98 4.2%
11-Oxoandrosterone
EP
C19H28O3
OH
HO (S)
C
(S) (S)
H
(S) (S)
(R) (S)
AC
H H
4 O
304.2038 8.22 8.06 17.34 16.81 21.05 20.30 1.9%
11-Hydroxytestosterone
C19H28O3
ACCEPTED MANUSCRIPT
(S) (R)
H OH
(S) (S)
(R) (R)
H H
5 HO
(S)
304.2038 8.32 7.88 17.67 16.41 21.36 19.82 5.3%
16-HydroxyDHEA
PT
C19H28O3
RI
6 306.2195 9.73 9.50 21.53 20.91 26.53 25.60 2.4%
SC
16-Hydroxyandrosterone
C19H30O3
U
AN
7 306.2195 9.63 9.09 21.4 19.84 26.41 24.24 5.6%
M
16-Hydroxyetiocholanolone
C19H30O3
D
O
(S) (S)
H OH
TE
(S) (S)
(R) (R)
H H
8 HO
(S)
304.2038 8.01 7.71 16.76 15.97 20.26 19.28 3.7%
16-HydroxyDHEA
EP
C19H28O3
OH
C
(R)
(S) (S)
H OH
(S) (S)
(R) (R)
AC
H H
9 O
304.2038 8.33 7.67 17.56 15.86 21.28 19.13 7.9%
16-Hydroxytestosterone
C19H28O3
ACCEPTED MANUSCRIPT
OH
(S)
(S)
H
(S) (S)
O
(R)
H H
10 HO
302.1882 10.48 9.70 23.49 21.48 28.83 26.32 7.4%
2-Methoxyestradiol
PT
C19H26O3
OH
(S)
RI
(S)
H
(S) (S)
HO
(R) (R) (R)
H H
11 O
304.2038 8.52 8.17 18.12 17.22 22.07 20.986 4.1%
SC
2-Hydroxytestosterone
C19H28O3
U
OH
(S)
(S)
AN
H
(S) (S)
HO
(S) (R) (R)
H H
12 O
304.2038 8.54 8.10 18.16 17.00 22.11 20.57 5.2%
M
2-Hydroxytestosterone
C19H28O3
D
O
(S)
H
TE
(S) (S)
(R)
H
(R)
H 288.2089
13 HO
(R)
11.18 11.53 24.98 26.15 30.82 32.15 3.1%
3-Hydroxyandrost-5-en-17-one
EP
C19H28O2
OH
O
C
HO (S)
(S) (S)
H
(S) (S)
AC
(S) (S)
H H
14 HO
(R)
H
(S)
C21H34O4
ACCEPTED MANUSCRIPT
OH
(S)
(S)
H
(S) (S)
(S) (R)
H H
HO
H
3,5-Tetrahydrodeoxycorticosterone
PT
C21H34O3
O
OH
HO
RI
(R) OH
(S) (S)
H
(S) (S)
(S)
H
(S)
H 366.2406
16 8.07 8.37 17.46 18.15 21.13 22.21 3.7%
(R) (R)
HO
H
SC
3a.5-Tetrahydrocortisol
C21H32O2
U
O
(S)
AN
(S)
H
(S) (S)
17 (S)
(R)
H
(R)
H 316.2402 13.96 15.00 32.81 35.48 40.92 44.00 7.5%
O
H
M
5-Dihydroprogesterone
O
HO OH
HO
D
(R)
(S) (S)
H
(S) (S)
(R) (S)
18 H H
362.2093 7.62 7.57 15.98 15.75 19.17 19.04 0.6%
TE
O
Cortisol (hydrocortisone)
C21H30O5
EP
HO
(S)
(S) H
C
(S)
H
(S) (S)
(S) (R)
H H
HO
H
Pregnanediol
C21H36O2
ACCEPTED MANUSCRIPT
Figure S1 : Plot tR experimental v.s. tR predicted in comparisson between the LSS model
and the QSRR models.
PT
RI
U SC
AN
M
D
TE
C EP
AC