You are on page 1of 37

Accepted Manuscript

Prediction of retention time in reversed-phase liquid chromatography as a tool for


steroid identification

Giuseppe Marco Randazzo, David Tonoli, Stephanie Hambye, Davy Guillarme,


Fabienne Jeanneret, Alessandra Nurisso, Laura Goracci, Julien Boccard, Prof. Serge
Rudaz
PII: S0003-2670(16)30215-X
DOI: 10.1016/j.aca.2016.02.014
Reference: ACA 234417

To appear in: Analytica Chimica Acta

Received Date: 15 December 2015


Revised Date: 14 February 2016
Accepted Date: 16 February 2016

Please cite this article as: G.M. Randazzo, D. Tonoli, S. Hambye, D. Guillarme, F. Jeanneret, A.
Nurisso, L. Goracci, J. Boccard, S. Rudaz, Prediction of retention time in reversed-phase liquid
chromatography as a tool for steroid identification, Analytica Chimica Acta (2016), doi: 10.1016/
j.aca.2016.02.014.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

PT
RI
U SC
AN
M
D
TE
EP
C
AC
ACCEPTED MANUSCRIPT

Prediction of retention time in reversed-phase liquid

chromatography as a tool for steroid identification.

PT
AUTHORS: Giuseppe Marco RANDAZZO(1), David TONOLI(1,2,3), Stephanie HAMBYE(1),

Davy GUILLARME(1), Fabienne JEANNERET(1,2,3), Alessandra NURISSO(1), Laura

RI
GORACCI(4), Julien BOCCARD(1), Serge RUDAZ(1,2)

SC
(1) School of Pharmaceutical Sciences, University of Geneva and University of Lausanne,

Geneva, Switzerland

U
(2) Swiss Centre for Applied Human Toxicology (SCAHT), Universities of Basel and Geneva,
AN
Basel, Switzerland

(3) Human Protein Sciences Department, University of Geneva, Geneva, Switzerland


M

(4) Department of Chemistry, Biology and Biotechnology, University of Perugia, Perugia, Italy
D
TE
C EP
AC

CORRESPONDENCE:

Prof. Serge RUDAZ, School of Pharmaceutical Sciences, University of Geneva, University of

Lausanne, Boulevard dYvoy 20, 1211 Geneva 4, Switzerland

Phone: +41 22 379 34 72

Fax: +41 22 379 68 08

E-mail: serge.rudaz@unige.ch

1
ACCEPTED MANUSCRIPT
ABSTRACT

The untargeted profiling of steroids constitutes a growing research field because of their

importance as biomarkers of endocrine disruption. New technologies in analytical chemistry,

such as ultra high-pressure liquid chromatography coupled with mass spectrometry (MS),

offer the possibility of a fast and sensitive analysis. Nevertheless, difficulties regarding

PT
steroid identification are encountered when considering isotopomeric steroids. Thus, the use

of retention times is of great help for the unambiguous identification of steroids. In this

RI
context, starting from the linear solvent strength (LSS) theory, quantitative structure retention

relationship (QSRR) models, based on a dataset composed of 91 endogenous steroids and

SC
VolSurf+ descriptors combined with a new dedicated molecular fingerprint, were developed

to predict retention times of steroid structures in any gradient mode conditions. Satisfactory

U
performance was obtained during nested cross-validation with a predictive ability (Q2) of
AN
0.92. The generalisation ability of the model was further confirmed by an average error of

4.4% in external prediction. This allowed the list of candidates associated with identical
M

monoisotopic masses to be strongly reduced, facilitating definitive steroid identification.


D
TE

KEYWORDS: Retention time prediction, steroids, isotopomers identification, LSS theory,

reversed-phase liquid chromatography, quantitative structureretention relationships


C EP
AC

2
ACCEPTED MANUSCRIPT
1. INTRODUCTION

Steroids are a family of molecules which regulate several vital functions, such as body

growth, response to stress, sexual development and behaviour, and rate of metabolism [1].

The endocrine system is responsible for numerous regulatory functions, thus, a steroid

imbalance involving an altered hormone production could often be related to various

PT
diseases. Several genetic disorders, such as adrenoleukodystrophy, Wolman disease and

Smith-Lemli-Opitz syndrome, which are related to cholesterol synthesis and metabolism,

RI
have been identified as a cause of endocrine perturbation [2]. Other diseases (e.g.,

hypertension, cancer, etc.) are associated with an altered regulation of hormone production

SC
[3]. Currently, environmental toxicants also constitute a potent source of endocrine

disruption. In this context, a large number of drugs and synthetic chemicals have been

U
identified as endocrine disruptors (EDCs) [4] by the Food and Drug Administration (FDA). A
AN
large-scale monitoring of steroidogenesis is therefore essential for the screening of EDCs,

recommended within the Organization for Economic Co-operation and Development (OECD)
M

guidelines.
D

Cholesterol could be considered as the initiating molecule of steroidogenesis [5]. From this
TE

compound, the different endocrine glands are able to synthesise a vast range of relatively

homogeneous chemical structures. Hence, a complex pathway of steroidogenesis generates


EP

a wide structural diversity on the gonane skeleton (Figure 1), dictating the need for selective

analytical techniques to separate and identify these structures.


C

Mass spectrometry coupled with either gas chromatography (GC-MS) or liquid


AC

chromatography (LC-MS) today represents the standard detection method for steroids [6].

The main issue of the GC-MS analytical workflow is related to the derivatisation step required

to enhance the volatility of the analyte, which limits throughput [7]. Today, untargeted

profiling strategies based on liquid chromatography (LC) coupled to high-resolution MS

(HRMS) provide an attractive analytical alternative for steroid analysis [8, 9]. The resolution

offered by HRMS and the peak capacity offered by modern LC allows the simultaneously

analysis of thousands of peaks in complex biological matrices, such as urine, blood, plasma,

3
ACCEPTED MANUSCRIPT
and cellular cultures [8, 10-12]. The first step in the identification of features is usually

accomplished through mass-based database searches [13]. However, this procedure has

severe limitations in the presence of isotopomers [14] having the same exact mass and

therefore the same molecular formula [15, 16]. A recent work showed that diastereoisomers

may be separated in RPLC using achiral stationary phases[17]. Therefore, the

PT
chromatographic retention time constitutes an additional molecular property that can be

crucial to differentiate between isotopomers [18].

RI
Modelling retention in LC is an active field of research both for theoretical and practical

SC
needs. Several theoretical models have already been described and implemented using

various optimisation software to help the selection of optimal chromatographic conditions as

U
reported by Hberger [19]. For reversed-phase (RP) retention, the seminal works of Dolan
AN
and Snyder [20] have demonstrated that changes in retention behaviour with mobile phase

composition can be modelled thanks to a linearisation of the logarithm of the retention factor
M

(k) towards the percentage of the organic modifier in the mobile phase. Consequently, the

linear solvent-strength model (LSS) was proposed both to characterise the principles of
D

gradient separation and to establish the mathematical relationship that can be used in both
TE

isocratic and gradient conditions. However, the need for a formal peak tracking of each

analyte constitutes the most important constraint to speed up and automate the
EP

method-development process. When standards are not available, in silico prediction of

retention time constitutes a potent alternative solution.


C

Quantitative structure retention relationship (QSRR) models were developed for the
AC

classification of drugs with various chromatographic stationary phase chemistries as reported

in the review by Hberger and the works of Kaliszan [21-23]. For that purpose, in silico

descriptors allow an extensive characterisation of the molecular structures. They include (i)

topological descriptors, (ii) three-dimensional descriptors and (iii) quantum chemical

descriptors. The main limitation of most existing QSRR models reported in the literature is

the restricted number of isotopomeric structures included in the dataset. As an example,

4
ACCEPTED MANUSCRIPT
Nord et al. published a study based on three-dimensional descriptors, underlining the role of

fundamental physicochemical properties in RP chromatography (e.g., solubility and

lipophilicity) [24]. However, their dataset included only few diastereoisomers with identical

calculated log P values, whereas their experimental retention times were different.

The first part of this work demonstrates the importance of retention time prediction in various

PT
gradient conditions for the identification of steroid candidates. The LSS model, allowing the

optimisation of isotopomer separation in combination with MS/MS information for a formal

RI
identification of steroids, could be considered as an efficient strategy. In a second step,

QSRR models based on VolSurf+ 3D molecular descriptors combined with a new gonane

SC
topological weighted fingerprint (GTWF) were developed. A supervised model based on

partial least squares (PLS) was elaborated to predict the LSS parameters (Log kw and S) and

U
to calculate the retention time of the steroid. Thanks to the description of the chiral centres of
AN
the steroid homologues, the models performance was increased, providing satisfactory

prediction under any chromatographic condition.


M
D

2. MATERIALS AND METHODS


TE

2.1 Chemicals and reagents

Acetonitrile (ACN) and water were purchased from Romil Ltd. (Waterbeach, UK). MeOH was
EP

provided by Biosolve (Valkenswaard, Netherlands). Formic acid (FA) was obtained from

Sigma-Aldrich (Buchs, Switzerland). The reference steroids (n=91) were purchased from
C

different suppliers (Steraloids, Sigma, LGC Standards, Sterling). Stock solutions of 1 mg/mL
AC

of each steroid standard were made in methanol and stored at -80 C before use. Working

solutions (10 g/mL) were prepared by dilutions of the stock solution in H2O / ACN (95:5, v/v)

+ 0.1% FA.

2.2 UHPLC-MS conditions

UHPLC separations were performed with an UPLC Acquity H-Class (Waters, Milford, MA,

USA) including a quaternary solvent manager (QSM), a sample manager (SM-FTN), and a

5
ACCEPTED MANUSCRIPT
column manager (CM-A). The separation was performed on a Kinetex C18 column (2.1 x

150 mm, 1.7 m) (Phenomenex, Torrance, CA, USA) with a SecurityGuard ULTRA C18 (2.1

x 2 mm) (Phenomenex, Torrance, CA, USA). Samples were kept at 6 C in the autosampler.

The post-column flow was directed into a 6-port valve that in turn was directed into a waste

vessel between 0.06 and 0.43 min. During this period, a calibration solution used for

PT
automatic post-acquisition recalibration was infused into the ESI source at 100 L/min using

a Sun Flow 100 HPLC pump (SunChrom). The calibration solution consisted of a 1 M sodium

RI
hydroxide solution diluted 100-fold with H2O + 0.1% FA/ isopropanol + 0.1% FA (50:50, v/v).

Automatic calibration was performed with nine formate adducts spanning an m/z range of

SC
158 to 702 using the high-precision calibration (HPC) algorithm and calibration, version 1.0

(Compass v1.5 SR3, Bruker, Bremen, Germany).

U
The maXis 3G QTOF MS (Bruker) used an electrospray ionisation (ESI) source operating in
AN
the positive mode. The end plate offset value was set at 500 V, the capillary voltage at 4.7

kV and the nebuliser pressure at 1.8 bar. The dry gas flow rate and temperature were 5.5
M

L/min and 225 C, respectively. The transfer time a nd pre-pulse storage were set at 40 and
D

7.0 s, respectively. The accumulation time was set at 0.5 s, and the monitored mass range
TE

was from m/z 50 to 1,000. The acquisition was performed in the profile mode, and the data

were acquired using the Compass v1.5 SR3 software suite from Bruker and HyStar v3.2
EP

SR2. The UHPLC was controlled using the plug-in for a Waters Acquity UPLC v.1.5.

Mobile phase A was H2O + 0.1% FA, and mobile phase B was ACN + 0.1% FA. The flow-
C

rate was set at 300 L/min. Linear gradients from 5% to 95% B were performed in 14, 45,
AC

and 60 minutes, and 10 L of each working solution was injected. The column was kept at 30

C during the analyses. Data processing was perform ed using Data Analysis 4.2 (64-bit,

Bruker, Bremen, Germany).

2.3 LSS model generator and isocratic optimisation

The LSS model parameters were calculated with an in-house software developed in Python

language using equations from the LSS theory described by Snyder and Dolan [25] and

6
ACCEPTED MANUSCRIPT
were executed with Python (version 2.7.9). The software takes as input the two retention

times measured from two different linear gradients and uses the Nelder-Mead optimisation

method [26] to minimise the retention time recalculation errors as an objective function. The

isocratic separation optimisation was performed with the same package taking into account

the peak width change during elution.

PT
2.4 QSRR models: dataset, molecular descriptors and regression analysis

2.4.1 Dataset

RI
The dataset totalled 91 steroid structures: 48 androgens, 27 mineraloglucocorticoids, 14

progestogens and 2 oestrogens. To validate the proposed approach and evaluate retention

SC
time estimation through LSS parameter prediction, the dataset was rationally divided into a

training set (80% of the overall set) and an external test set (20% of the overall set). The

U
division was achieved with the Most Descriptive Compounds (MDC) algorithm described by
AN
Hudson et al. [27] applied to PCA scores accounting for 95% of the variance associated with

VolSurf+ descriptors (see below).


M

2.4.2 VolSurf+ descriptors


D

Molecular descriptors were calculated using canonical simplified molecular-input line entry
TE

system strings (SMILES) with VolSurf+ version 1.7.0.l [28]. This software uses GRID force

fields [29] to characterise molecular interactions, such as hydrophilic/hydrophobic and


EP

hydrogen bonding donor/acceptor surface volumes around the molecules, using a water

probe (OH2), a hydrophobic probe (DRY), a hydrogen-bonding acceptor carbonyl oxygen


C

probe (O) and a hydrogen-bonding donor amide nitrogen probe (N1). Energy maps, called
AC

Molecular Interaction Fields (MIFs), were then converted into molecular descriptors. VolSurf+

generates 128 molecular descriptors related to molecular shape, volume, polarisability, polar

surface area, hydrophobic surface area, lipophilicity, molecular diffusion, and solubility. It

also includes specific descriptors, which refer to the volume and relative position of MIFs at

given energy values.

All the analysed steroids were in their neutral forms due to their highly acidic pKa values, thus

the steroids were beyond the estimated mobile phases pH (i.e., 2.5) [30] and all pH-

7
ACCEPTED MANUSCRIPT
dependent descriptors were removed. The percentage of un-ionised species and all ADME

model predictions, except intrinsic solubility, were also removed. The impact of the latter in

RP chromatography was underlined by Nord et al. [24]. The final models for Log Kw and S

were built accounting for a total of 97 VolSurf+ molecular descriptors (see Table S4 for the

complete descriptor list).

PT
2.4.3 Gonane topological weighted fingerprint

RI
A GTWF was calculated in two steps. First, the molecule was analysed in the SMILES format

to match the gonane skeleton and to detect the fragment bonded to each position. This

SC
procedure was computed by in-house software in C++ language, which implements the

subgraph isomorphism algorithm described by Ullmann [31]. In the second step, a fingerprint

U
composed of 68 columns (17 positions on the gonane skeleton 4 cases) was generated for
AN
each molecular structure. For a given position, the first column was used to describe

stereochemical information (i.e., R, S, planar or aromatic), while the three remaining


M

contained information about the substituents weight in R, S or planar/aromatic position. All


D

substituent and stereocentre weights were then initiated with random values and further
TE

optimised using a simplex method. The simplex procedure was stopped when the error of an

optimised PLS model predicting LSS parameters was minimised. The optimised fingerprint
EP

weights were validated using a nested cross-validation. All the PLS prediction models were

performed using VolSurf+, which implements the NIPALS algorithm. The descriptors were
C

standardised to unit variance. The model prediction accuracy was assessed through a 5-fold
AC

cross-validation repeated 20 times to reduce the variance during the validation process and

to prevent over-fitting [32].

3. RESULTS AND DISCUSSION

3.1 Development of the LSS empirical model

During the first phase of the study, an LSS empirical model was obtained for each steroid

standard. The steroid dataset was composed of 91 endogenous and exogenous steroids

8
ACCEPTED MANUSCRIPT
covering the main steroid structures and classes (i.e., progestogens, oestrogens, androgens

and mineraloglucocorticoids). These models were used as a starting point to optimise

separations and predict retention times under any gradient conditions through two variables:

Log kw and S. While Log kW represents the extrapolated value of the retention factor k in pure

water, S represents a constant molecular parameter for a given compound and fixed

PT
experimental conditions (i.e., the stationary phase chemistry). S corresponds to the slope of

the linear fit between the logarithm of the retention factor (k) vs. the percentage of organic

RI
solvent. Values for Log kw and S of the 91 steroids (analysed by groups of 6) were obtained

from two different gradients presenting a runtime ratio 3 to provide accurate values,

SC
according to the work of Snyder & Dolan [20]. Hence, two gradients of 14 and 60 min were

selected. After data acquisition, models were assessed using an iterative procedure, which

U
take the two retention times and extrapolates from them Log kW and S (for more details see
AN
the package at https://github.com/gmrandazzo/PyLSS). To validate the parameter

estimation, the retention times of the 91 steroids were calculated over a linear gradient of 45
M

minutes and experimentally confirmed. As indicated in Figure 2, excellent prediction


D

accuracies (R2 of 0.9997) were obtained with a maximum bias of approximately 0.5% for the
TE

calculated values. With this performance, even minor changes in chromatographic selectivity

can be anticipated.
EP

To clearly illustrate the problem of steroid identification (ID) and the importance of retention

time evaluation in this context, a case study encountered during the analysis of cellular
C

cultures in toxicological experiments is presented. For supplementary information, please


AC

refer to Tonoli et al. [33]. According to the data analysis, a steroid of interest to be identified

was measured at tR = 7.85 min. The steroid of interest presented an accurate mass of m/z

305.2111 (Figure 3), which corresponds to the molecular formula C19H28O3 (tolerance set at

+/- 5 ppm). Unfortunately, a query of Lipid Maps revealed that up to 20 steroid isotopomer

structures could be associated with this single exact mass (Figure 4).

9
ACCEPTED MANUSCRIPT

Taking into account the analytical conditions used with a tolerance of +/- 0.3 min (see

Reference 32), and thanks to the retention time prediction of this panel of candidates using

LSS theory, the number of putative compounds was decreased from 20 to only 4 candidates,

namely 5-Androstan-3-ol-7,17-dione, 11-Hydroxytestosterone, 16-Hydroxy-DHEA, and

PT
16-Hydroxytestosterone. While the first advantage to integrate chromatographic information

for ID is to drastically decrease the number of suggestions, the second advantage concerns

RI
the possibility of rapidly obtaining optimal chromatographic conditions in terms of peak

spacing and resolution for confirmatory purposes. Hence, isocratic conditions (an organic

SC
solvent concentration of 25%) were advocated to achieve the best separation for these 4

steroids. As predicted, the injection on the selected stationary phase of a standard mixture of

U
the 4 candidates demonstrated the difficulty of obtaining a complete chromatographic
AN
separation for two co-eluting steroids, namely 16-Hydroxy-DHEA and 11-

Hydroxytestosterone (see Figure 5).


M
D

The information gathered during the simultaneous MS/MS experiments was therefore used
TE

to differentiate these steroids. In fact, the in-source fragmentation profile of these compounds

was different; 16-Hydroxy-DHEA led to the observation of the two most prominent in-source
EP

fragments at [M-H2O+H]+ and [M-2H2O+H]+, while 11-Hydroxytestosterone presented

mainly the molecular ion [M+H]+. After re-injection of the biological sample under isocratic
C

conditions at 25% ACN, 11-Hydroxytestosterone was successfully identified as the


AC

compound of interest in the cellular culture. This example illustrates the systematic approach

needed to obtain a definitive steroid identification, even in the presence of steroid standards.

3.2 Development of reliable QSRR models

When standards are not available, retention time prediction using a QSRR model could

constitute a potent way to reduce the number of candidates. Given that the QSRR models

that directly predict retention time are dependent on the instrument and chromatographic

10
ACCEPTED MANUSCRIPT
conditions, the LSS parameters, namely Log kW and S, were chosen as key parameters to be

predicted for recalculating the retention times.

3.3 Molecular descriptors

The molecular descriptors are of the utmost importance for generating robust QSRR models.

PT
The role of QSRR models is to extract relevant information about molecular shape,

hydrophobic/hydrophilic volume of interactions, dipole moments and physicochemical

RI
descriptors (such as intrinsic solubility, Log Pn-oct, and molecular diffusion), thus VolSurf+ 3D

molecular descriptors are naturally chosen in RP-LC to describe their lipophilic interactions

SC
[34]. PLS was selected due to its ability to efficiently handle large numbers of correlated

U
descriptors. Additionally, PLS provides several outputs, which help to obtain better insights

into the intrinsic phenomena governing the retention process. The optimal number of latent
AN
variables associated with each PLS model was selected using a 5-fold cross-validation
M

iterated 100 times. The prediction ability was evaluated using the 20% out-of-bag randomly

selected data subset. Nested cross-validation was carried out to avoid over-fitting. The
D

results obtained for Log kw and S are reported in Table 1 in terms of prediction accuracy (Q2)
TE

and root mean square error of prediction (RMSEP) evaluated by cross-validation.


EP

As presented in Table 1, the conventional VolSurf+ set of descriptors was quite efficient in

obtaining effective predictions of steroid retention times through the use of LSS parameters.
C

This was particularly the case for the S parameter, considered as the most important input to
AC

obtain accurate retention times. It is worth noting that VolSurf+ descriptors are versatile and

dataset-independent parameters based on generic physicochemical properties. As a

consequence, the resulting models are often self-explanatory and easily interpretable. When

modelling chromatographic retention, these characteristics are necessary to grasp important

molecular features for reliable prediction. However, only small differences are observed in

MIFs when comparing diastereoisomers. For example, a global difference below 15% is

reported between 16-Hydroxytestosterone and 16-Hydroxytestosterone while a marked

11
ACCEPTED MANUSCRIPT
retention shift of 0.8 minutes is measured on a gradient of 15 minutes. This limitation may be

because the GRID Force-Field assigns the same atom charge to both compounds. To

improve the prediction ability of the model, additional topological information was tentatively

included using an original molecular fingerprint.

PT
3.4 The gonane topological weighted fingerprint

Molecular topological fingerprints consist of an abstract representation of the molecular

RI
structure used to evaluate similarity between molecules [35]. The most common fingerprints

are based on the presence/absence of substructure patterns in a molecule. The choice of the

SC
latter plays a critical role in the definition of the fingerprint and on the setting of the

U
similarity/dissimilarity criteria between molecules. When compounds are similar, as could be

considered in the case of steroids, the only way to discriminate these structures is to define a
AN
fingerprint capable of detecting differences between chiral centres and substituents. Thus, a
M

new type of fingerprint dedicated to steroid structures, named the Gonane Topological

Weighted Fingerprint (GTWF), was implemented. The Free-Wilson approach describing


D

molecular activity/property as the sum of the contributions of substructures and fragments


TE

was chosen for its advantage in its interpretability [36]. The main role of GTWF is to detect

and describe dissimilarities in gonane skeletons by characterising each position by two


EP

weights: a geometrical weight w, which depicts the chirality, and a substituent weight c. The

definition is summarised by a function defined in equation 1 as follows:


C

 =
(eq.1)
AC

The parameter w represents the stereo-information weight and depends on the R, S, and

planar/aromatic configuration. The parameter c describes the substituent weight, which is

independent of the atom configuration.

However, as presented in Table 1, the use of GTWF descriptors alone was not sufficient to

provide satisfactory retention time prediction. Therefore, a collection of 165 descriptors (97

from VolSurf+ combined with 68 from GTWF) was retained to build QSRR models for Log kw

12
ACCEPTED MANUSCRIPT
and S. The results obtained for Log kw and S when including the GTWF fingerprint are

reported in Table 1. The GTWF fragments with their corresponding w and c parameters are

summarised in Tables S1 and S2, respectively. Other algorithms, such as Random Forest,

Neural Networks and Support Vector Regression, and Forward Stepwise Regression were

simultaneously evaluated but did not exhibit any improvement in retention time prediction

PT
(See table S3, supporting information).

RI
3.5 Log kW model discussion

The PLS model related to Log kW was characterised by a prediction ability Q2 of 0.72 with 3

SC
latent variables. In this case, only a slight improvement was obtained by the addition of the

GTWF. The investigation of beta coefficients (see Figure 6a) allowed the most important

U
descriptors to be highlighted. The latter included the VolSurf+ molecular volume V, HSA, Log
AN
Pn-oct and the 3D triplet pharmacophoric area descriptors ACACDO/ACDODO, which

describe the hydrogen bonding triplets. All these parameters were positively correlated with
M

Log kW. Interestingly, these values are consistent with the physical means of the retention
D

factor k, which is related to the distribution coefficient between the stationary and the mobile
TE

phase. The retention factor for a compound in pure water kw could be considered as a

descriptor of the maximum interaction with the stationary phase. In RP-LC. lipophilicity is
EP

actually known to be highly correlated with Log kw, as reported in the work of Henchoz et al.

[37]. Other descriptors indicated the presence of intra/intermolecular hydrogen bonding.


C
AC

3.6 S model discussion

The variable S is a molecular parameter related to both the characteristics and column

selectivity for a solute, both of which are driven by intermolecular interactions, such as

hydrophobicity and hydrogen bonding. The PLS regression model associated with the S

parameter was characterised by a high predictive ability with a Q2 value of 0.91 with 3 latent

variables. Interestingly, a marked improvement of prediction accuracy was obtained with the

addition of GTWF. Figure 6b reports the PLS beta coefficients for three latent variables.

13
ACCEPTED MANUSCRIPT
Apart from the negative contribution of Log Pc-hex, the most important descriptors were the

hydrophilic volume WO5 and WO6 and the 3D pharmacophoric descriptors based on the

three points of the hydrogen bonding acceptors, hydrogen bonding donors and hydrogen

bonding donors (ACDODO) from the VolSurf+ set. Topological descriptors confirmed the

most important positions for substitution on the gonane skeleton in relation to

PT
chromatographic selectivity. Prominent hydrophobic interactions were associated with

positions 10 and 13, substituted with a methyl fragment alternating in stereocentre

RI
configurations R and S, whereas marked hydrogen bonding acceptor/donor interactions were

associated with positions 15, 16 and 17 substituted with a variety of hydrophilic groups. It is

SC
also interesting to note that few molecules revealed other important positions (e.g., position

4, 5 or 7). This may be due to the relative abundance of specific structures in the dataset.

U
AN
3.7 Model performance in external prediction

To validate the model, a test set was rationally extracted from the entire dataset according to
M

the procedure described in the materials and methods. A subset of 72 molecules (the training
D

set) was kept to train the GTWF descriptors and to build the PLS models. The modelled LSS
TE

parameters were used to predict the retention time of the remaining 19 steroids (the test set)

with three values of gradient steepness (i.e., 14, 45 and 60 minutes) to demonstrate the
EP

versatility of the proposed modelling approach in different gradient conditions. A predictive

ability Q2internal of 0.90 and R2external of 0.93 (using 19 molecules of the test set) were reported
C

according to the predicted versus experimental retention times (Figure 7). Moreover a
AC

comparison between the retention time estimation based on the experimental LSS

parameters and the QSRR derived parameters is reported in Figure S1 in the supplementary

information.

The structures and prediction of retention times at various gradients of the external test set

are reported in Table S5. From these results, the global relative error in prediction was lower

than 5% (i.e., 4.4%). Interestingly, nine molecules were predicted with a relative retention

error under 3%. Poorly predicted values (i.e., error comprised between 5% and 10%) were

14
ACCEPTED MANUSCRIPT
related to diastereoisomers, such as compounds 11 and 12 (see Table S4). Additionally,

other compounds showed a poor prediction, such as compound 19, which corresponds to the

only molecule possessing a 2-Hydroxyethyl fragment in position 17. Even if the global

prediction error of the QSRR models is approximately 4%, the relative error in the prediction

of retention time remains too high to unambiguously assign a specific structure to each peak.

PT
Nevertheless, these results appear very promising for reducing the number of steroid

candidates.

RI
U SC
AN
M
D
TE
C EP
AC

15
ACCEPTED MANUSCRIPT
4. CONCLUSION

Steroidomics, i.e., the untargeted profiling of steroids, constitutes an important research field

due to the numerous applications related to steroidogenesis and its alterations. Untargeted

high-resolution acquisition using UHPLC-Q-TOFMS approaches offers the possibility to

detect a wide range of putative steroids, selected based on their exact mass matched to

PT
databases. However, as often in omics-like strategies, the definitive assignment of identity

remains the main bottleneck. As presented here, empirical models allowing the prediction of

RI
retention times in association with experimental MS information, i.e., exact masses and

MS/MS spectra, are essential for the identification of steroids. Chromatographic information

SC
should act as a filter to reduce the initial data dimensionality (post-targeted approach) and is

mandatory for confirmatory biomarker identification. In the absence of empirical models, the

U
prediction of retention time from the molecular structure has a great potential to facilitate and
AN
accelerate the identification process. While improvements are still necessary, QSRR

modelling was demonstrated as a potent tool to predict the retention time of steroids under
M

any gradient conditions. VolSurf+ 3D molecular descriptors combined with a novel


D

stereochemical description of the gonane skeleton showed a satisfactory predictability of


TE

LSS parameters (S and Log kw), with the great advantage to be possibly applied in any

gradient condition. The additional advantage when predicting LSS parameters such as S and
EP

Log kw is to provide valuable information for optimising the isocratic separation of co-eluting

analytes, including isotopomers. Even if some issues are still observed for specific
C

structures, including new compounds involving underrepresented molecular features in the


AC

dataset will efficiently improve the accuracy of prediction. The next step will consist of

increasing the number of steroids in the database for refining QSRR models. Thanks to the

information obtained from the GTWF and PLS beta coefficients, the selection of new steroids

will focus on under-represented structures. This extension is mandatory to obtain accurate

retention time prediction through LSS parameters.

16
ACCEPTED MANUSCRIPT
ACKNOWLEDGMENTS

The authors wish to thank Dr. Szabolcs Fekete for valuable scientific exchanges regarding

the LSS model. The authors also acknowledge Sterling (Perugia) for providing several

exogenous substrates. Dr. Alessandra Nurisso also gratefully acknowledges the Excellence

fellowship programme of the University of Geneva, Switzerland for its financial support.

PT
FIGURES CAPTIONS

RI
Figure 1. Classification of steroids based on their precursors. From each scaffold, different

steroids are produced by specific reactions.

SC
Figure 2. Plot of experimental versus predicted retention times through the empirical LSS

model in a 45 minute gradient.


U
AN
Figure 3. Extracted ion chromatogram of a representative cellular extract at m/z 305.2111
M

(14 min gradient)


D
TE

Figure 4. Isotopomers structures extracted in LipidMaps with the molecular formula C19H28O3.
EP

Figure 5. Chromatogram obtained under optimal isocratic conditions at 25% ACN applied to

separate the four putative candidates at m/z 305.2111.


C
AC

Figure 6. Beta coefficient plot at 3 LVs for a) Log kW and b) S.

Figure 7. Plots of experimental vs. predicted retention times: a) 14 minute gradient; b) 45

minute gradient; c) 60 minute gradient.

17
ACCEPTED MANUSCRIPT
TABLE CAPTIONS

Table 1. QSRR model results for Log kw and S. LV represents the optimised number of latent

variables. R2 represents the fitting of the model. Q2 is the prediction ability estimated by

cross-validation. RMSEP represents the root-mean-squared error of the prediction estimated

by cross-validation.

PT
REFERENCES

RI
[1] K. Monostory, Z. Dvorak, Steroid regulation of drug-metabolizing cytochromes P450, Curr
Drug Metab, 12 (2011) 154-172.
[2] D. Lin, T. Sugawara, J.F. Strauss, 3rd, B.J. Clark, D.M. Stocco, P. Saenger, A. Rogol,

SC
W.L. Miller, Role of steroidogenic acute regulatory protein in adrenal and gonadal
steroidogenesis, Science (New York, N.Y.), 267 (1995) 1828-1831.
[3] M.G. Mohaupt, The role of adrenal steroidogenesis in arterial hypertension, Endocrine
development, 13 (2008) 133-144.

U
[4] FDA, FDA, Endocrine disruption potential of drugs: nonclinical evaluation, Draft Guidance,
, Federal Register 78 183: 57859., (September 2013).
AN
[5] J.I. Mason, W.E. Rainey, Steroidogenesis in the human fetal adrenal: a role for
cholesterol synthesized de novo, The Journal of clinical endocrinology and metabolism, 64
(1987) 140-147.
[6] H.J. Makin, J. Honour, C.L. Shackleton, W. Griffiths, General Methods for the Extraction,
M

Purification, and Measurement of Steroids by Chromatography and Mass Spectrometry, in:


H.L.J. Makin, D.B. Gower (Eds.) Steroid Analysis, Springer Netherlands2010, pp. 163-282.
[7] S.J. Soldin, O.P. Soldin, Steroid hormone analysis by tandem mass spectrometry, Clinical
D

chemistry, 55 (2009) 1061-1066.


[8] F. Badoud, J. Boccard, C. Schweizer, F. Pralong, M. Saugy, N. Baume, Profiling of
TE

steroid metabolites after transdermal and oral administration of testosterone by ultra-high


pressure liquid chromatography coupled to quadrupole time-of-flight mass spectrometry, J
Steroid Biochem Mol Biol, 138 (2013) 222-235.
[9] F. Jeanneret, D. Tonoli, M.F. Rossier, M. Saugy, J. Boccard, S. Rudaz, Evaluation of
EP

steroidomics by liquid chromatography hyphenated to mass spectrometry as a powerful


analytical strategy for measuring human steroid perturbations, J Chromatogr A, (2015).
[10] S.M. Chesnut, J.J. Salisbury, The role of UHPLC in pharmaceutical development, J Sep
Sci, 30 (2007) 1183-1190.
C

[11] S. Fekete, J. Schappler, J.L. Veuthey, D. Guillarme, Current and future trends in
UHPLC, Trac-Trends in Analytical Chemistry, 63 (2014) 2-13.
AC

[12] F. Badoud, E. Grata, J. Boccard, D. Guillarme, J.L. Veuthey, S. Rudaz, M. Saugy,


Quantification of glucuronidated and sulfated steroids in human urine by ultra-high pressure
liquid chromatography quadrupole time-of-flight mass spectrometry, Anal Bioanal Chem, 400
(2011) 503-516.
[13] D.S. Wishart, Computational strategies for metabolite identification in metabolomics,
Bioanalysis, 1 (2009) 1579-1596.
[14] K. Murray Kermit, K. Boyd Robert, N. Eberlin Marcos, G.J. Langley, L. Li, Y. Naito,
Definitions of terms relating to mass spectrometry (IUPAC Recommendations 2013), Pure
and Applied Chemistry, 2013, pp. 1515.
[15] C. Gourmel, A. Grand-Guillaume Perrenoud, L. Waller, E. Reginato, J. Verne, B. Dulery,
J.L. Veuthey, S. Rudaz, J. Schappler, D. Guillarme, Evaluation and comparison of various
separation techniques for the analysis of closely-related compounds of pharmaceutical
interest, J Chromatogr A, 1282 (2013) 172-177.

18
ACCEPTED MANUSCRIPT
[16] B. L. D., D. J Sanaullah Borts., HPLC/MS/MS measurement of steroid conjugates: An
analytical problem of olympic proportions, Book of Abstracts, 213th ACS National Meeting,
(1997) ANYL-148.
[17] S. Romand, S. Rudaz, D. Guillarme, Separation of substrates and closely related
glucuronide metabolites using various chromatographic modes, J Chromatogr A, (2016).
[18] Z.J. Zhu, A.W. Schultz, J. Wang, C.H. Johnson, S.M. Yannone, G.J. Patti, G. Siuzdak,
Liquid chromatography quadrupole time-of-flight mass spectrometry characterization of
metabolites guided by the METLIN database, Nature protocols, 8 (2013) 451-460.
[19] K. Hberger, Quantitative structure(chromatographic) retention relationships, Journal of
Chromatography A, 1158 (2007) 273-305.

PT
[20] J.J.K. Lloyd R. Snyder, John W. Dolan, Introduction to Modern Liquid Chromatography,
3rd Edition, 3rd ed.2010.
[21] K. Heberger, Quantitative structure-(chromatographic) retention relationships, J
Chromatogr A, 1158 (2007) 273-305.

RI
[22] R. Kaliszan, Structure and Retention in Chromatography: A Chemometric Approach: 1
(Chromatography: Principles & Practice).
[23] R. Kaliszan, Quantitative structure-retention relationships, Analytical Chemistry, 64

SC
(1992) 619A-631A.
[24] L.I. Nord, D. Fransson, S.P. Jacobsson, Prediction of liquid chromatographic retention
times of steroids by three-dimensional structure descriptors and partial least squares
modeling, Chemometrics and Intelligent Laboratory Systems, 44 (1998) 257-269.

U
[25] L.R.S.J.W. Dolan, High-Performance Gradient Elution: The Practical Application of the
Linear-Solvent-Strength Model, (2007).
AN
[26] J.A. Nelder, R. Mead, A Simplex-Method for Function Minimization, Computer Journal, 7
(1965) 308-313.
[27] B.D. Hudson, R.M. Hyde, E. Rahr, J. Wood, Parameter based methods for compound
selection from chemical databases, Quantitative Structure-Activity Relationships, 15 (1996)
M

285-289.
[28] VolSurf+ http://www.moldiscovery.com.
[29] P.J. Goodford, A computational procedure for determining energetically favorable
D

binding sites on biologically important macromolecules, J Med Chem, 28 (1985) 849-857.


[30] X. Subirats, E. Bosch, M. Roses, Buffer Considerations for LC and LC-MS, Lc Gc North
TE

America, 27 (2009) 1000-1004.


[31] J.R. Ullmann, An Algorithm for Subgraph Isomorphism, Journal of the ACM, 23 (1976)
31-42.
[32] J.H. Kim, Estimating classification error rate: Repeated cross-validation, repeated hold-
EP

out and bootstrap, Computational Statistics & Data Analysis, 53 (2009) 3735-3745.
[33] D. Tonoli, C. Furstenberger, J. Boccard, D. Hochstrasser, F. Jeanneret, A. Odermatt, S.
Rudaz, Steroidomic Footprinting Based on Ultra-High Performance Liquid Chromatography
Coupled with Qualitative and Quantitative High-Resolution Mass Spectrometry for the
C

Evaluation of Endocrine Disrupting Chemicals in H295R Cells, Chemical research in


toxicology, 28 (2015) 955-966.
AC

[34] T.M. Almeida, A. Leitao, M.L. Montanari, C.A. Montanari, The molecular retention
mechanism in reversed-phase liquid chromatography of meso-ionic compounds by
quantitative structure-retention relationships (QSRR), Chemistry & biodiversity, 2 (2005)
1691-1700.
[35] D. Rogers, M. Hahn, Extended-connectivity fingerprints, Journal of chemical information
and modeling, 50 (2010) 742-754.
[36] H. Chen, L. Carlsson, M. Eriksson, P. Varkonyi, U. Norinder, I. Nilsson, Beyond the
scope of Free-Wilson analysis: building interpretable QSAR models with machine learning
algorithms, Journal of chemical information and modeling, 53 (2013) 1324-1336.
[37] Y. Henchoz, D. Guillarme, S. Martel, S. Rudaz, J.L. Veuthey, P.A. Carrupt, Fast log P
determination by ultra-high-pressure liquid chromatography coupled with UV and mass
spectrometry detections, Anal Bioanal Chem, 394 (2009) 1919-1930.

19
ACCEPTED MANUSCRIPT
FIGURES

Figure 1. Classification of steroids based on their precursors. From each scaffold, different

steroids are produced by specific reactions.

PT
RI
U SC
AN
M
D
TE
C EP
AC

21
ACCEPTED MANUSCRIPT
Figure 2. Plot of experimental versus predicted retention times through the empirical LSS

model in a 45 minute gradient.

PT
RI
U SC
AN
M
D
TE
C EP
AC

22
ACCEPTED MANUSCRIPT

Figure 3. Extracted ion chromatogram of a representative cellular extract at m/z 305.2111

(14 min gradient)

PT
RI
U SC
AN
M
D
TE
C EP
AC

23
ACCEPTED MANUSCRIPT
Figure 4. Isotopomers structures extracted in LipidMaps with the molecular formula C19H28O3.

PT
RI
U SC
AN
M
D
TE
C EP
AC

24
ACCEPTED MANUSCRIPT
Figure 5. Chromatogram obtained under optimal isocratic conditions at 25% ACN applied to

separate the four putative candidates at m/z 305.2111.

11-Hydroxytestosterone

16-Hydroxy-DHEA
16-Hydroxytestosterone

PT
RI
5-Androstan-3-ol-7,17-dione

U SC
AN
M
D
TE
C EP
AC

25
ACCEPTED MANUSCRIPT
Figure 6. Beta coefficient plot at 3 LVs for a) Log kW and b) S.

HSA
Log Pn-oct
V
Shape Descriptors

PT
ACDODO
ACACDO

RI
U SC
AN
M
D
TE
C EP
AC

26
ACCEPTED MANUSCRIPT
Figure 7. Plots of experimental vs. predicted retention times: a) 14 minute gradient; b) 45

minute gradient; c) 60 minute gradient.

PT
RI
U SC
AN
M
D
TE
C EP
AC

27
ACCEPTED MANUSCRIPT
TABLE

Table 1. QSRR model results for Log kw and S. LV represents the optimised number of latent

variables. R2 represents the fitting of the model. Q2 is the prediction ability estimated by

cross-validation. RMSEP represents the root-mean-squared error of the prediction estimated

by cross-validation.

PT
LV R2 Q2 RMSEP

Log kW

RI
3 0.80 0.70 0.10
VolSurf+
S
3 0.86 0.80 0.50
VolSurf+

SC
Log kW
1 0.35 0.16 0.17
GTWF
S
2 0.58 0.24 0.95
GTWF

U
Log kW
3 0.88 0.72 0.08
VolSurf+ and GTWF
AN
S
3 0.95 0.91 0.33
VolSurf+ and GTWF
M
D
TE
C EP
AC

28
ACCEPTED MANUSCRIPT
Difficulties regarding steroid identification are encountered when considering
isotopomeric steroids.

Quantitative structure retention relationship (QSRR) models were developed


from the linear solvent strength theory.

A dataset composed of 91 endogenous steroids and VolSurf+ descriptors


combined with a new dedicated molecular fingerprint, were used.

PT
The list of candidates associated with identical monoisotopic masses was
strongly reduced, facilitating definitive steroid identification.

RI
U SC
AN
M
D
TE
C EP
AC
Table S1 Table of w stereocentre weights. R and S correspond to the reverse and sinister

stereocenter configuration and P is ACCEPTED MANUSCRIPT


relative to the planar/aromatic configuration of the atom in

gonane scaffold.

PT
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

R 4.87 4.85 4.86 -3.97 -4.77 -0.61 4.90 1.53 -5.02 -4.87 -2.60 -4.59 4.31 -5.84 -4.36 -0.63 0.67

RI
S 2.24 -2.88 -4.47 2.17 -3.50 3.67 -1.97 -5.72 0.88 1.76 2.28 -0.81 -3.36 -2.88 -0.84 2.17 -3.50

P 3.20 1.63 0.88 5.93 5.40 -2.20 -0.87 -0.10 -5.58 2.32 -4.04 2.47 -3.15 -2.39 1.20 1.66 -4.02

SC
Table S2 Table of c substituent/fragments with the corresponding weight.

U
AN
9.96 2.28
-0.37
M

-12.03 5.95 1.08


D
TE

-6.10
18.61 -0.55
C EP

0.83 11.68
-0.16
AC

4.07 -4.79 -5.19

-7.02 1.50 -0.23


ACCEPTED MANUSCRIPT
Table S3 : Test of other algorithms

Q2internal
Options R2fitting RMSEP
validation
LogKw S LogKw S LogKw S
LM algorithm
ANN
VolSurf+ and GTWF
10 Hidden 0.99 0.99 0.63 0.81 0.11 0.47
Layer
SVR
VolSurf+ and GTWF
RFB Kernel 0.99 0.98 0.74 0.87 0.09 0.39

PT
RF
VolSurf+ and GTWF
100 Tree 0.95 0.97 0.65 0.80 0.11 0.49
Stepwise Regression Forward
VolSuf+ and GTWF
0.86 0.92 0.81 0.90 0.08 0.34
F-Ratio

RI
LM algorithm
ANN
VolSurf+
10 Hidden 0.99 0.99 0.45 0.73 0.13 0.56
Layer

SC
SVR
VolSurf+
RFB Kernel 0.95 0.88 0.76 0.81 0.09 0.47
RF
VolSurf+
100 Tree 0.95 0.97 0.64 0.78 0.11 0.50

U
AN
Table S4 : List of VolSurf+ descriptors used to build models.

V WO4 ID3 HSA


M

S WO5 ID4 PSAR


R WO6 CD1 PHSAR
G WN1 CD2 DRDRDR
D

W1 WN2 CD3 DRDRAC


W2 WN3 CD4 DRDRDO
TE

W3 WN4 CD5 DRACAC


W4 WN5 CD6 DRACDO
W5 WN6 CD7 DRDODO
EP

W6 IW1 CD8 ACACAC


W7 IW2 HL1 ACACDO
W8 IW3 HL2 ACDODO
C

D1 IW4 A DODODO
D2 CW1 CP SOLY
AC

D3 CW2 POL DD1


D4 CW3 MW DD2
D5 CW4 FLEX DD3
D6 CW5 FLEX_RB DD4
D7 CW6 NCC DD5
D8 CW7 DIFF DD6
WO1 CW8 LOGP n-Oct DD7
WO2 ID1 LOGP c-Hex DD8
WO3 ID2 PSA
ACCEPTED MANUSCRIPT

Table S5: Prediction of the external test set.

Monoisotopic tR Exp tR Pred tR Exp tR Pred tR Exp tR Pred %Relative Error


ID Structure
Mass 14 min 14 min 45 min 45 min 60 min 60 min 45 min gradients

PT
O

OH
O

RI
(S)

(S)
H
(S) (S)

(R) (S)

1 O
H H
344.1988 8.53 8.64 18.31 18.56 22.11 22.60 1.3%

SC
11-Dehydrocorticosterone

C21H28O4

U
AN
2 304.2038 9.55 9.80 20.86 20.86 25.42 26.35 2.6%

M
11-Ketoetiocholanolone

C19H28O3

D
O

O
(S)
H
(S) (S)

TE
(S) (S)
H H
(R) (S)

3 HO
H
304.2038 9.59 9.99 20.82 22.06 25.49 26.98 4.2%
11-Oxoandrosterone
EP
C19H28O3
OH

HO (S)
C

(S) (S)
H
(S) (S)

(R) (S)
AC

H H
4 O
304.2038 8.22 8.06 17.34 16.81 21.05 20.30 1.9%
11-Hydroxytestosterone

C19H28O3
ACCEPTED MANUSCRIPT

(S) (R)
H OH
(S) (S)

(R) (R)
H H
5 HO
(S)
304.2038 8.32 7.88 17.67 16.41 21.36 19.82 5.3%
16-HydroxyDHEA

PT
C19H28O3

RI
6 306.2195 9.73 9.50 21.53 20.91 26.53 25.60 2.4%

SC
16-Hydroxyandrosterone

C19H30O3

U
AN
7 306.2195 9.63 9.09 21.4 19.84 26.41 24.24 5.6%

M
16-Hydroxyetiocholanolone

C19H30O3

D
O

(S) (S)
H OH

TE
(S) (S)

(R) (R)
H H
8 HO
(S)
304.2038 8.01 7.71 16.76 15.97 20.26 19.28 3.7%
16-HydroxyDHEA
EP
C19H28O3
OH
C

(R)
(S) (S)
H OH
(S) (S)

(R) (R)
AC

H H
9 O
304.2038 8.33 7.67 17.56 15.86 21.28 19.13 7.9%
16-Hydroxytestosterone

C19H28O3
ACCEPTED MANUSCRIPT

OH

(S)

(S)
H
(S) (S)
O
(R)
H H
10 HO
302.1882 10.48 9.70 23.49 21.48 28.83 26.32 7.4%
2-Methoxyestradiol

PT
C19H26O3
OH

(S)

RI
(S)
H
(S) (S)
HO
(R) (R) (R)
H H
11 O
304.2038 8.52 8.17 18.12 17.22 22.07 20.986 4.1%

SC
2-Hydroxytestosterone

C19H28O3

U
OH

(S)
(S)

AN
H
(S) (S)
HO
(S) (R) (R)
H H
12 O
304.2038 8.54 8.10 18.16 17.00 22.11 20.57 5.2%

M
2-Hydroxytestosterone

C19H28O3

D
O

(S)
H

TE
(S) (S)

(R)
H
(R)
H 288.2089
13 HO
(R)
11.18 11.53 24.98 26.15 30.82 32.15 3.1%
3-Hydroxyandrost-5-en-17-one
EP
C19H28O2
OH
O
C

HO (S)

(S) (S)
H
(S) (S)
AC

(S) (S)
H H

14 HO
(R)

H
(S)

350.2457 9.11 9.52 20.01 21.15 24.31 25.96 4.5%


3,5-Tetrahydrocorticosterone

C21H34O4
ACCEPTED MANUSCRIPT

OH
(S)

(S)
H
(S) (S)

(S) (R)
H H

15 334.2508 11.30 11.55 25.86 26.61 31.99 32.90 2.2%


(R) (S)

HO
H

3,5-Tetrahydrodeoxycorticosterone

PT
C21H34O3
O

OH
HO

RI
(R) OH
(S) (S)
H
(S) (S)

(S)
H
(S)
H 366.2406
16 8.07 8.37 17.46 18.15 21.13 22.21 3.7%
(R) (R)

HO
H

SC
3a.5-Tetrahydrocortisol

C21H32O2

U
O

(S)

AN
(S)
H
(S) (S)

17 (S)

(R)
H
(R)
H 316.2402 13.96 15.00 32.81 35.48 40.92 44.00 7.5%
O
H

M
5-Dihydroprogesterone
O

HO OH

HO

D
(R)
(S) (S)
H
(S) (S)

(R) (S)

18 H H
362.2093 7.62 7.57 15.98 15.75 19.17 19.04 0.6%

TE
O

Cortisol (hydrocortisone)

C21H30O5
EP
HO

(S)

(S) H
C

(S)
H
(S) (S)

(S) (R)
H H

19 320.2715 12.77 14.02 29.75 33.39 36.98 41.55 9.8%


(R) (R)
AC

HO
H

Pregnanediol

C21H36O2
ACCEPTED MANUSCRIPT
Figure S1 : Plot tR experimental v.s. tR predicted in comparisson between the LSS model
and the QSRR models.

PT
RI
U SC
AN
M
D
TE
C EP
AC

You might also like