Professional Documents
Culture Documents
3 - Dimensional
Quantitative Structure Activity Relationship
Unit-IV
Contents
1. Conceptual Introduction
3. More Considerations
(CoMFA)
4. Beyond CoMFA
(CoMFA/CoMFA)
Multiple Simultaneous Modeling MFA
More Explicit Receptor RSA
Lock & Key 5D QSAR (Induced Fit)
More Graphical Understanding, Variable Selection GOLPE
Pharmacophore Modeling HypoGen
Active Conformer/Superposition Topomer CoMFA
5. 3D-QSAR
3D-QSAR Introduction
Field
CoMFA
PLS
Cross-validation
Experiments
What to do?
(3D QSAR)
Find 3D Information
Molecular Representation
benzene 1D
2D
3D
ensemble of conformations, orientations,
protonation states 4D
3D-QSAR Introduction
What is CoMFA?
Steric Properties
Electrostatic Properties
(3D) QSAR
SA
Volume KI
Steric Field IC50
Electrostatic Field,.... MIC
q
V E r
r
CoMFA Comparative Molecular Field Analysis
CoMFA Assumptions
Classical Methods
One parameter : Linear Regression
Many Parameters : Multilinear Regression
Multilinear Regression
MLR was developed to deal with situations in which the number of objects (N) is five
times at least larger than the number of variables (X).
PLS analysis:
Belongs to the family of PCR (principal component
regressions) techniques.
Use of principal component analysis in regression:
First reduction of X and/or Y matrices in principal
components also called latent variables (LVs).
Secondly, regression between these latent variables.
Different types:
PCR : reduction of X matrix only and regression with Y
variable.
PLS : reduction of X matrix using the variability of Y
matrix.
Partial Least Square Analysis (PLS)
Y vector X matrix
BA1 S11 S12 S13 S14 . . . . . . . S1m E11 E12 . . . . . E1m
BA2 S21 S22 S23 S24 . . . . . . . S2m E21 E22 . . . . . E2m
BA3 S31 S32 S33 S34 . . . . . . . S3m E31 E32 . . . . . E3m PLS vectors
: : : : : : : : :
: : : : : ....... : : : ..... : PCA vectors
: : : : : : : : :
: : : : : : : : :
BAn Sn1 Sn2 Sn3 Sn4 . . . . . . . Snm En1 En2 . . . . . Enm
PLS SAMPLS
analysis analysis
Original Groups of
Table crossvalidation
PRESS
Derivation
of a model Differences
Compounds Predicted
excluded Activity
Measured
Prediction of excluded Activity
compounds
SSD ( yactual y ) 2
Significant statistical results
PRESS
q 1.0
2
0.00 = No Model!
Y=aX+b
3
Weight/Height
2
(B) Curved Line: Perfect Fittin
1
Y=aX3+bX2+cX+d
0 phosphoric acid charge/pH
0 1 2 3 4 5
Cross-validation
0.7 1.2
0.6 1
0.5
r^2 fnal
0.8
0.4
Q^2
0.6
0.3
0.2 0.4
0.1 0.2
0 0
0 5 10 15 0 5 10 15
Number of components Number of components
N CN
N CN N
N
O CN
R Y X
Y X
R N N
CN O Set 3 : 18 compounds
R O
N N
O O
O Set 2 : 7 compounds
R
Set 1 : 16 compounds R
X R' O O O
Y R Set 6 : 22 compounds
N
OR N N
R'
N X O
O
R Y
O R
Set 5 :12 compounds O
Set 4 : 3 compounds Set 7 : 21 compounds
Training Set Selection
Sets 4 and 7:
10 not enough active (7)
9 or inactive (5)
compounds.
pIC50 MAO A
8
Sets 1, 2, 3 and 5:
7 poor distribution of
6 biological activities.
5
Set 6:
4 Broad range and
3 relatively well
distributed biological
0 1 2 3 4 5 6 7 8
activities
Sets (congeneric)
Training Set Selection
N: number of components
Training Set Selection
S
Training set selection
CONCLUSION
Steric
-
Electrostatic
Steric N +
+
Lipophilic
Polar
Electrostatic
-
Model Validation
Within the Training Set
Model Fitting
R2 : the square of correlation coefficient
RMSE : root mean square error
ANOVA F value, p value
Internal Predictability
Cross-validated R2, Bootstrap R2 , y-Scrambling
Three aspects
Leverage = How far from Reds
Discrepancy = Out of line with others
Influence = combined leverage and discrepancy
Hi Leverage
Lo Leverage Hi Discrepancy
Hi Discrepancy Hi Influence
Mod Influence
Hi Leverage
Lo Discrepancy
Mod Influence
Residual Plot
Residual = observed Y predicted Y
Pattern
(a) ideal
Detection of Outliers
Typical CoMFA Parameters
R2 >0.8
R2CV or Q2 >0.5
(Cross-validation R2)
Descriptors > 5 compounds per descriptor or
components
Data points At least 10, optimally above 20
Podlogar, B.; Meugge, I. And Brice, J. Curr. Opin. Drug Discovery Dev., 2001, 4(1), 102-109.
Summary: Factors Affecting CoMFA
Statistics
Others
Bioactive Conformation? HO
MCS detection
Field Fit
(dissimilar compounds)
(non atom center)
Superposition Methods
Molecular Skin
Regional Overlap
weighing contributions of
one molecular region with
this overlap
Polyhedra
Coalescing
Conclusion of CoMFA
Many Choices
Training/Test set
Conformer Generation
Superposition
Field Calculation
Variable Selection
Statistical Validation
Good 3D Information
Can test Dissimilar Compounds
LSE
LOF 2
c dp
1
M
V (r ) r R
V(r) 0
4 r 6 17 r 4 22 r 2 -1
V (r ) 6
4
2
1 0 1R 2R
9R 9 R 9 R
r
Receptor Surface Model ( Features & Usage )
Surface Properties
Partial charges, electrostatic potential, hydrogen bonding
propensity, hydrophobicity
Model Usage
Open / Closed model
Structures can be energy minimized within the receptor
surface model (Alignment, Docking)
3D-QSAR,
Virtual Screening (Catalyst/CatShape)
de novo design
Same Molecules, Different models, Different Predictions
OH
0.1kcal/molA2
4D QSAR Examples
steric nature
Estimation of Free Energy
Primordial Envelope of Ligand Binding
Ligand repositioning
Evolution of the Points
GA
Averaged Receptor Envelope Initial Family of Points
4D QSAR plus
HypoGen
Topomer CoMFA
HypoGen Conformer Generation
Poling Algorithm
(using potential for conformational diversity)
If a new conformer is similar to an existing conformer,
this one is penalized by a potential.
Ligands are represented by
H
Reduction of
Number of Points D A
H H
D A D A
HypoGen Theory
Feasible Models
Subtractive Phase
Top
Scoring
Optimization Phase
Models
HypoGen Constructive Phase
1. Training Set
1. Training Set
Adjust Torsions:
(1,3),(1,2),(5,8),(10,14)
Combi-Chem library
10
(4) 9
1 S 11
2 3 8
5
12
7 6
13
Calculate steric field
like CoMFA
Topomer CoMFA Features
Single Conformation
15
0.852
A Fixed Core
(14)O
10
(4) 9
1 S 11
2 3 8
5
12
13
7 6
1 0.85
Topomer CoMFA pros & cons
Fast
Virtual Screening Conscious
Lead Optimization
Lead Hopping (Lead Generation)
3D QSAR methods:
Very robust
Lead optimization
3D QSAR 3D SAR