You are on page 1of 17

Predicting Protein Ligand Binding Affinities

Statistics Assessment
We have been given a set of data containing experimental and
calculated binding affinities of different drugs with five proteins.
Each protein interacts with a drug that has a common backbone;
then different ligands are attached to the backbone to see how the
binding affinity for that protein is affected. The data given contains
an experimental measurement of the binding energy for each
protein-drug interaction. This is used to evaluate the various
computational methodologies for calculating the binding energy,
which should be correlated to the experimental measurements.

Statistical Methods
There are two published sets of data that have been recalculated
using newer versions of the software. These are compared to see if
the new software correlates to the original. There is then a group of
results that take the configurations from programs CDOCKER and
Glide then rescore the binding energies using various solvation
models. These are compared to the experimental values to see if
the rescoring is worthwhile and leads to improved results.
To compare the data, correlation tests are used. These give a value
between -1 and 1, 1 being perfectly correlated and -1 perfectly anticorrelated. Values close to 0 suggest there is no correlation. Pearson
r and Kendall tau are used to compare the data. Pearson r simply
looks at whether there is a linear relationship between the data.
Kendall tau compares the rank of the data, and gives a high
correlation if the ordering of one set matches the ordering of
another. Kendall tau was chosen over Spearman rank as a rankordering test as it is insensitive to error and/or outliers, and gives
better p-values with small sample sizes, which we have.
We can also use the Kruskal-Wallis test to determine whether an
apparent difference in results is due to random fluctuations within
one underlying distribution, or if the results suggest there is a
statistically significant (p-value below a certain threshold, usually
0.05) difference between the data. Kruskal-Wallis is a nonparametric version of ANOVA, which means it does not assume a
normal distribution of the underlying data, which ANOVA does.

1a) Both tests are used to show how the software versions
compare, with the results summarised in the following table.

-Secratase
Factor Xa
HIV1 Protease
Src Tyrosine
Kinase
Thrombin

Pearson r
Kendall tau
Docking E-Novo
Docking E-Novo
0.39810 0.772904 0.71428 0.52380
15
3
57
95
0.63418 0.233362 0.42857 0.17748
03
2
14
92
0.98325 0.067936 0.89743 0.15384
36
63
59
62
0.89928
0.61111 0.66666
93 0.750692
11
67
0.90549 0.654266 0.72727 0.57575
13
8
27
76

The two protocols used are docking and e-novo. Docking gives an
estimate to the binding energy using a simplified scoring function,
while e-novo uses the MM-GBSA method to calculate the binding
energy. Looking at the scatter plots can help to assess which
method of comparison is preferable.
-Secratase

Factor Xa

HIV1 Protease

Src Tyrosine Kinase

Thrombin

This clearly highlights that the docking versions compare very well
for HIV1 Protease. The Pearson value is higher as there is a clear
linear relationship between the data, however the Kendall value is
slightly lower as some of the data is out of order, meaning the ranks
wont match as well. Similarly the docking values for Src Tyrosine
Kinase and Thrombin have fairly good Pearson values, while the
Kendall values are lower. When looking at -Secratase plots you can
see that the docking values match well for the main part, whereas
the e-novo values do not. Because of the outlier in the docking, the
Pearson values show the opposite of this, whereas the Kendall
values reinforce the plot. For this purpose, where we are trying to
predict which drug molecule will perform best, the order rather than
the absolute value is most important. Kendall tau gives a better
representation of the order, and is less affected by outliers so is the
better statistic to use. From the table it is clear that only the
docking values for HIV1 Protease perform well, with -Secratase
and Thrombin docking doing reasonable as well. All other correlation
between the software versions is poor, and the good value for HIV1
Protease is probably due to the fact it is a common protein used to
parameterise software, so may have been used in the docking
parameterisation and therefore likely to give a good result.

1b) All the calculations are then compared to the experimental data
to test their reliability at predicting the binding energies. The
boxplots of all the data for -Secratase are given below.
CDOCKER

Glide

This clearly highlights that due to the differences in absolute values


of the data, a direct comparison between the values would be futile.
Therefore a correlation test is run comparing each method to the
experimental results.

Pearson R for CDOCKER

BS

FX

HIV

Published.Docki
ng.
Published.E.No
vo

0.150193
086
0.458988
242

0.702377 0.726222
53
2
0.542172 0.297188
09
46

Calculated.Doc
king.
Calculated.E.No
vo

0.533193
975
0.308323
829

0.342519
07
0.116512
34
0.160053
58
0.287518
46
0.073103
25
0.172195
49
0.110272
75
0.155325
93
0.055896
34

GBOBC1.bondi.
GBOBC1.mbon
di2.

0.559628
897
0.269487
855
0.255397
166
0.226624
285

GBOBC2.bondi.
GBOBC2.mbon
di2.

0.243324
324
0.239919
032

Gbn.

0.008088
969

Prime.
GBHCT.

Pearson R for Glide


BS

FX

0.675751
62
0.059595
77
0.366010
4
0.617587
25
0.340486
86
0.533463
74

SRC
0.324962
6
0.872547
6
0.120297
4
0.747189
0.741116
3

T
0.196286
6
0.821611
5
0.042641
7
0.664411
7
0.444049
8
0.459003
1
0.392025
8
0.395236
6

0.537831
98
0.538168
35

0.490098
5
0.387818
4

0.391382
82

0.485523
4

HIV

SRC

Docking.

0.119282
3

0.419266
4

0.85004
14

0.770229
59

Prime.

0.600303
2

0.253477
4

0.61752
42

GBHCT.

0.420875
9

0.702107
9

0.64370
29

0.744451
47
0.033898
74

T
0.84727
36
0.30081
23
0.53378
17

GBOBC1.bondi.
GBOBC1.mbon
di2.
GBOBC2.bondi.
GBOBC2.mbon
di2.
Gbn.

0.366162
4

0.623586
3

0.66463
45

0.284300
4
0.237496
8
0.297454
8
0.128451
8

0.689911
3
0.636080
6

0.74715
62
0.72090
65
0.65339
25
0.57394
56

Kendall tau for CDOCKER


BS

0.582852
0.628740
7

FX

HIV

0.33333
33
0.39393
94
0.21212
12
0.24242
42

0.46153
85
0.58974
36

0.30303
03
0.15151
52

0.41025
64

0.30303
03

0.64102
56
0.43589
74

Calculated.Doc
king.
Calculated.E.No
vo

0.333333
33
0.142857
14

0.3260900
4
0.1782625
5
0.1695668
2
0.2304369
6
0.0304350
7
0.1521753
5
0.0739137
4
0.1000009
5
0.0478265
4

0.69230
77
0.10256
41

GBOBC1.bondi.
GBOBC1.mbon
di2.
GBOBC2.bondi.
GBOBC2.mbon
di2.
Gbn.

0.047619
05
0.047619
05
0.047619
05

0.43589 0.514495
74
8
0.48717
95
0.33333
33
0.41025
64

0.5782663
4
0.4217431
2

GBHCT.

0.55361
83
0.66308
54
0.46873
53
0.65367
65

0.21212
12
0.69696
97

0.047619
05
0.238095
24

Prime.

SRC

0.65775
41

0.171498
6
0.743160
5
0.114332
4
0.457329
6

Published.Docki
ng.
Published.E.No
vo

0.428571
43
0.047619
05
0.047619
05
0.047619
05

0.107626
53
0.158076
26
0.058763
12
0.101623
57
0.196666
97

0
0.45454
55

Kendall tau for Glide


BS
FX
HIV
SRC
T
0.047619 0.160871 0.76923 0.628828 0.6060606
Docking.
05
1
08
1
1

Prime.
GBHCT.
GBOBC1.bondi.
GBOBC1.mbon
di2.
GBOBC2.bondi.
GBOBC2.mbon
di2.
Gbn.

0.428571
43
0.238095
24

0.134783
9
0.613049
3

0.238095
24
0.238095
24
0.238095
24

0.447830
3
0.552179
1
0.500004
7

0.238095
24
0.142857
14

0.508700
5
0.552179
1

0.56410 0.628828
26
1
0.38461 0.057166
54
2
0.30769 0.114332
23
4
0.41025 0.057166
64
2
0.33333 0.057166
33
2
0.38461 0.114332
54
4
0.20512
82 0.285831

0.0909090
9
0.5151515
2
0.5151515
2
0.4545454
5
0.6666666
7
0.5454545
5
0.5151515
2

None of these suggest any consistent correlation across any of the


methods, with the only values above 0.8 coming from the Pearson
tests. As discussed earlier these are less likely to represent the
correct ordering of the binding energies, which is the most
important outcome of the calculations. Looking at the boxplots for
all these tables provides an clearer representation.
Pearson R for CDOCKER

Pearson R for Glide

Kendall tau for CDOCKER

Kendall tau for Glide

Both the Pearson and Kendall results show correlation averages from
0.2-0.6 for CDOCKER, while Kendall gives 0.2-0.6 for and Pearson
0.4-0.8 for Glide. This shows that none of the methods are
particularly well correlated to the experimental values, suggesting
all the methods are not very reliable for predicting binding energies.

1c) Looking at the boxplots it appears that the published e-novo


results are best for CDOCKER while the docking results are best for
glide, as these have the largest average correlation to the
experimental energies. However this does not give any indication as
to whether these results are statistically significant. In order to
further the analysis, a new approach is taken using the ideas of the
Friedman test. This is another non-parametric ANOVA alternative,
where it actually uses the values for the ranks and compares those,
rather than just comparing the rank-ordering like Kruskal-Wallis.
This has been implemented by taking the values of the binding
energies for each protocol and converting them to their rank, for
example:
Ligand
Binding
No.
Energy
Rank
1
-9.94
3
2
-12.00
2
3
-9.43
4
4
-12.30
1
5
-8.92
5
6
-8.73
6
7
-8.37
7
The values for the experimental rank are then taken away from each
of the calculated ranks, and the absolute value of the difference
taken. This means that any value equal to zero has the correct rank,
and any other value represents how far away the data is from the
correct rank. A method with all its values further from zero will be
worse than a method with all its values close or equal to zero. We
can then run a Kruskal-Wallis test to examine whether the values for
absolute differences in the ranks differ significantly between
protocols, resulting in the following p-values.
CDOCKE
R
Glide
-Secratase
0.9696
0.9998
0.00293
Factor Xa
0.01811
1
HIV 1
Protease
0.339
0.7422
Src Tyrosine
0.3436
0.1267

Kinase
Thrombin
All separate
All combined

0.1999
0.5764
0.3219
0.7534
0.7675

This shows that the only protein within which the methods differ
significantly is Factor Xa (using 0.05 as the threshold p-value). If you
consider all proteins combined, whether you combine the CDOCKER
and Glide data or leave them separate, then there is no statistically
significant difference between the methods. Looking at the boxplots
of Factor Xa compared with the next lowest scores, Src Tyrosine
Kinase, shows how the means Factor Xa vary substantially
respectively.
Factor Xa CDOCKER

Factor Xa Glide

Src Tyrosine Kinase CDOCKER

Src Tyrosine Kinase Glide

The mean values of Factor Xa have a wider range suggesting the


methods give significantly different results to each other. The Src
Tyrosine Kinase mean range is only three showing a much closer
distribution, suggesting all the methods are equivalent. The
protocols with a mean closer to zero are the better methods, as zero
represents the experimental ordering. For Factor Xa these are
published docking in CDOCKER and GBHCT in Glide.
This is reinforced by an ANOVA test, where the following table
summarises the difference of each methods mean with the mean of
all the data for Factor Xa, . The more negative the value, the
closer the mean for that method is to zero and the better the
method. The most negative values are the published docking and
GBHCT.

Factor Xa CDOCKER = 6.475207


Calculated.Doc Calculated.E.N
king.
ovo
GBHCT.
Gbn.
-1.157
-0.157
-0.8388
1.0248
GBOBC1.mbon GBOBC2.bon GBOBC2.mbo
GBOBC1.bondi. di2.
di.
ndi2.
1.2521
-0.0207
1.7975
0.343
Published.Doc Published.E.N
Prime.
king.
ovo
1.9793
-2.9298
-1.2934
Factor Xa Glide

= 4.272727

GBOBC1.bond
Docking.
GBHCT.
Gbn.
i.
2.4091
-1.5
-0.9545
-0.2727
GBOBC1.mbon GBOBC2.bond GBOBC2.mbo
di2.
i.
ndi2.
Prime.
-0.9091
-0.4545
-0.6364
2.3182

1d) We can now run a similar test, except instead of grouping the
data by method we can group it by the ligands within each protein.
This will help us to understand if there are specific proteins that do
badly across all the different methods. We cannot combine the data
between proteins however, as ligand 1 on Factor Xa does not
correspond to ligand 1 on Thrombin for example. However we can
combine all the Glide and CDOCKER data within each protein. The
following table gives the p-values, summarising which proteins have
significantly differing ligands.
CDOCK
Combine
ER
Glide
d
1.18E2.84E2.20E-Secratase
01
07
16
2.42E5.10E3.60EFactor Xa
07
09
13
1.27E4.52E1.26EHIV 1 Protease
07
09
14
Src Tyrosine
0.0310
0.00067
Kinase
9 0.01639
62
1.55E1.04EThrombin
05 0.01828
05
This shows that all the ligands differ by a statistically significant
amount (again using 0.05 as the threshold p-value). We can now

examine the boxplots to see which ligands correlate well to the


experimental ordering (mean close to zero), and which do not (mean
far from zero).

-Secratase

Factor Xa

HIV 1 Protease

Src Tyrosine Kinase

Thrombin

The worst ligands for each protein can clearly be seen on each
graph; ligand six for -Secratase; ligand three, ten or sixteen for
Factor Xa; ligand ten for HIV 1 Protease; ligand seven for Src
Tyrosine Kinase; ligand seven for Thrombin. The ANOVA test can be
run again, giving the following deviations from the mean, with the
most negative the best performing ligands and the most positive the
worst performing ligands.

-Secratase = 1.834586
ligand 1 ligand 2 ligand 3 ligand 4 ligand 5 ligand 6 ligand 7
1.218
0.5865
-1.203
-1.0977
-0.5188
2.5338
-1.5188

Factor Xa = 5.547847
ligand 1 ligand 2 ligand 3 ligand 4
0.084
-1.1
3.978
2.057
ligand
ligand
ligand 8 ligand 9 10
11
-0.653
-3.706
4.768
-1.127
ligand
ligand
ligand
ligand
15
16
17
18
-0.969
4.136
-2.916
1.768
ligand
22
0.794

ligand 5
0.136
ligand
12
0.347
ligand
19
-0.048

ligand 6
-3.074
ligand
13
-0.285
ligand
20
0.557

ligand 7
-2.548
ligand
14
-1.443
ligand
21
-0.758

HIV 1 Protease = 2.550607


ligand 1 ligand 2 ligand 3 ligand 4 ligand 5 ligand 6 ligand 7
-1.551
0.028
1.291
-0.919
-1.709
-1.498
-0.551
ligand
ligand
ligand
ligand
ligand 8 ligand 9 10
11
12
13
-0.287
1.081
3.713
-0.498
-0.287
1.186
Src Tyrosine Kinase = 2.299145
ligand 1 ligand 2 ligand 3 ligand 4 ligand 5 ligand 6 ligand 7
-1.0299
-1.2991
-0.5684
-1.0299
-0.1453
0.8547
3.0855
ligand 8 ligand 9
0.5085
-0.3761

Thrombin = 2.763158
ligand 1 ligand 2 ligand 3 ligand 4 ligand 5 ligand 6 ligand 7
-0.1316
-0.8684
-0.0263
0.2368
-1.0263
1.0263
1.7105
ligand
ligand
ligand
ligand 8 ligand 9 10
11
12
-1.6053
1.4474
-0.9737
0.7105
-0.5
This again just gives a numerical indication of the results we can see
from examining the boxplot. The ANOVA test is not really applicable
to these results, as they are not normally distributed, so absolute
values cannot be used, but help to strengthen the argument and
make clearer the boxplots.

Conclusions
1e) These results show that there is no statistically significant
difference between any of the rescored methods with the original
calculations, both on the old and new software, when compared to
the experimental values. This all assumes the experimental values
are correct and do not have any error, which will of course not be
true. However as we are principally considering whether the order is
correct rather than the absolute values, as this is what matters
when choosing which drug will work best, then the error will have
less of an effect.
It is also clear however that depending on the ligand attached to the
drug molecule then the calculations do perform differently. This is an
obvious shortcoming in the protocols that is present across all the
tested proteins. Future work could involve investigation into how
these ligands affect the results, as this could provide valuable
insight into improving the performance of the calculations.

You might also like