Professional Documents
Culture Documents
Statistics Assessment
We have been given a set of data containing experimental and
calculated binding affinities of different drugs with five proteins.
Each protein interacts with a drug that has a common backbone;
then different ligands are attached to the backbone to see how the
binding affinity for that protein is affected. The data given contains
an experimental measurement of the binding energy for each
protein-drug interaction. This is used to evaluate the various
computational methodologies for calculating the binding energy,
which should be correlated to the experimental measurements.
Statistical Methods
There are two published sets of data that have been recalculated
using newer versions of the software. These are compared to see if
the new software correlates to the original. There is then a group of
results that take the configurations from programs CDOCKER and
Glide then rescore the binding energies using various solvation
models. These are compared to the experimental values to see if
the rescoring is worthwhile and leads to improved results.
To compare the data, correlation tests are used. These give a value
between -1 and 1, 1 being perfectly correlated and -1 perfectly anticorrelated. Values close to 0 suggest there is no correlation. Pearson
r and Kendall tau are used to compare the data. Pearson r simply
looks at whether there is a linear relationship between the data.
Kendall tau compares the rank of the data, and gives a high
correlation if the ordering of one set matches the ordering of
another. Kendall tau was chosen over Spearman rank as a rankordering test as it is insensitive to error and/or outliers, and gives
better p-values with small sample sizes, which we have.
We can also use the Kruskal-Wallis test to determine whether an
apparent difference in results is due to random fluctuations within
one underlying distribution, or if the results suggest there is a
statistically significant (p-value below a certain threshold, usually
0.05) difference between the data. Kruskal-Wallis is a nonparametric version of ANOVA, which means it does not assume a
normal distribution of the underlying data, which ANOVA does.
1a) Both tests are used to show how the software versions
compare, with the results summarised in the following table.
-Secratase
Factor Xa
HIV1 Protease
Src Tyrosine
Kinase
Thrombin
Pearson r
Kendall tau
Docking E-Novo
Docking E-Novo
0.39810 0.772904 0.71428 0.52380
15
3
57
95
0.63418 0.233362 0.42857 0.17748
03
2
14
92
0.98325 0.067936 0.89743 0.15384
36
63
59
62
0.89928
0.61111 0.66666
93 0.750692
11
67
0.90549 0.654266 0.72727 0.57575
13
8
27
76
The two protocols used are docking and e-novo. Docking gives an
estimate to the binding energy using a simplified scoring function,
while e-novo uses the MM-GBSA method to calculate the binding
energy. Looking at the scatter plots can help to assess which
method of comparison is preferable.
-Secratase
Factor Xa
HIV1 Protease
Thrombin
This clearly highlights that the docking versions compare very well
for HIV1 Protease. The Pearson value is higher as there is a clear
linear relationship between the data, however the Kendall value is
slightly lower as some of the data is out of order, meaning the ranks
wont match as well. Similarly the docking values for Src Tyrosine
Kinase and Thrombin have fairly good Pearson values, while the
Kendall values are lower. When looking at -Secratase plots you can
see that the docking values match well for the main part, whereas
the e-novo values do not. Because of the outlier in the docking, the
Pearson values show the opposite of this, whereas the Kendall
values reinforce the plot. For this purpose, where we are trying to
predict which drug molecule will perform best, the order rather than
the absolute value is most important. Kendall tau gives a better
representation of the order, and is less affected by outliers so is the
better statistic to use. From the table it is clear that only the
docking values for HIV1 Protease perform well, with -Secratase
and Thrombin docking doing reasonable as well. All other correlation
between the software versions is poor, and the good value for HIV1
Protease is probably due to the fact it is a common protein used to
parameterise software, so may have been used in the docking
parameterisation and therefore likely to give a good result.
1b) All the calculations are then compared to the experimental data
to test their reliability at predicting the binding energies. The
boxplots of all the data for -Secratase are given below.
CDOCKER
Glide
BS
FX
HIV
Published.Docki
ng.
Published.E.No
vo
0.150193
086
0.458988
242
0.702377 0.726222
53
2
0.542172 0.297188
09
46
Calculated.Doc
king.
Calculated.E.No
vo
0.533193
975
0.308323
829
0.342519
07
0.116512
34
0.160053
58
0.287518
46
0.073103
25
0.172195
49
0.110272
75
0.155325
93
0.055896
34
GBOBC1.bondi.
GBOBC1.mbon
di2.
0.559628
897
0.269487
855
0.255397
166
0.226624
285
GBOBC2.bondi.
GBOBC2.mbon
di2.
0.243324
324
0.239919
032
Gbn.
0.008088
969
Prime.
GBHCT.
FX
0.675751
62
0.059595
77
0.366010
4
0.617587
25
0.340486
86
0.533463
74
SRC
0.324962
6
0.872547
6
0.120297
4
0.747189
0.741116
3
T
0.196286
6
0.821611
5
0.042641
7
0.664411
7
0.444049
8
0.459003
1
0.392025
8
0.395236
6
0.537831
98
0.538168
35
0.490098
5
0.387818
4
0.391382
82
0.485523
4
HIV
SRC
Docking.
0.119282
3
0.419266
4
0.85004
14
0.770229
59
Prime.
0.600303
2
0.253477
4
0.61752
42
GBHCT.
0.420875
9
0.702107
9
0.64370
29
0.744451
47
0.033898
74
T
0.84727
36
0.30081
23
0.53378
17
GBOBC1.bondi.
GBOBC1.mbon
di2.
GBOBC2.bondi.
GBOBC2.mbon
di2.
Gbn.
0.366162
4
0.623586
3
0.66463
45
0.284300
4
0.237496
8
0.297454
8
0.128451
8
0.689911
3
0.636080
6
0.74715
62
0.72090
65
0.65339
25
0.57394
56
0.582852
0.628740
7
FX
HIV
0.33333
33
0.39393
94
0.21212
12
0.24242
42
0.46153
85
0.58974
36
0.30303
03
0.15151
52
0.41025
64
0.30303
03
0.64102
56
0.43589
74
Calculated.Doc
king.
Calculated.E.No
vo
0.333333
33
0.142857
14
0.3260900
4
0.1782625
5
0.1695668
2
0.2304369
6
0.0304350
7
0.1521753
5
0.0739137
4
0.1000009
5
0.0478265
4
0.69230
77
0.10256
41
GBOBC1.bondi.
GBOBC1.mbon
di2.
GBOBC2.bondi.
GBOBC2.mbon
di2.
Gbn.
0.047619
05
0.047619
05
0.047619
05
0.43589 0.514495
74
8
0.48717
95
0.33333
33
0.41025
64
0.5782663
4
0.4217431
2
GBHCT.
0.55361
83
0.66308
54
0.46873
53
0.65367
65
0.21212
12
0.69696
97
0.047619
05
0.238095
24
Prime.
SRC
0.65775
41
0.171498
6
0.743160
5
0.114332
4
0.457329
6
Published.Docki
ng.
Published.E.No
vo
0.428571
43
0.047619
05
0.047619
05
0.047619
05
0.107626
53
0.158076
26
0.058763
12
0.101623
57
0.196666
97
0
0.45454
55
Prime.
GBHCT.
GBOBC1.bondi.
GBOBC1.mbon
di2.
GBOBC2.bondi.
GBOBC2.mbon
di2.
Gbn.
0.428571
43
0.238095
24
0.134783
9
0.613049
3
0.238095
24
0.238095
24
0.238095
24
0.447830
3
0.552179
1
0.500004
7
0.238095
24
0.142857
14
0.508700
5
0.552179
1
0.56410 0.628828
26
1
0.38461 0.057166
54
2
0.30769 0.114332
23
4
0.41025 0.057166
64
2
0.33333 0.057166
33
2
0.38461 0.114332
54
4
0.20512
82 0.285831
0.0909090
9
0.5151515
2
0.5151515
2
0.4545454
5
0.6666666
7
0.5454545
5
0.5151515
2
Both the Pearson and Kendall results show correlation averages from
0.2-0.6 for CDOCKER, while Kendall gives 0.2-0.6 for and Pearson
0.4-0.8 for Glide. This shows that none of the methods are
particularly well correlated to the experimental values, suggesting
all the methods are not very reliable for predicting binding energies.
Kinase
Thrombin
All separate
All combined
0.1999
0.5764
0.3219
0.7534
0.7675
This shows that the only protein within which the methods differ
significantly is Factor Xa (using 0.05 as the threshold p-value). If you
consider all proteins combined, whether you combine the CDOCKER
and Glide data or leave them separate, then there is no statistically
significant difference between the methods. Looking at the boxplots
of Factor Xa compared with the next lowest scores, Src Tyrosine
Kinase, shows how the means Factor Xa vary substantially
respectively.
Factor Xa CDOCKER
Factor Xa Glide
= 4.272727
GBOBC1.bond
Docking.
GBHCT.
Gbn.
i.
2.4091
-1.5
-0.9545
-0.2727
GBOBC1.mbon GBOBC2.bond GBOBC2.mbo
di2.
i.
ndi2.
Prime.
-0.9091
-0.4545
-0.6364
2.3182
1d) We can now run a similar test, except instead of grouping the
data by method we can group it by the ligands within each protein.
This will help us to understand if there are specific proteins that do
badly across all the different methods. We cannot combine the data
between proteins however, as ligand 1 on Factor Xa does not
correspond to ligand 1 on Thrombin for example. However we can
combine all the Glide and CDOCKER data within each protein. The
following table gives the p-values, summarising which proteins have
significantly differing ligands.
CDOCK
Combine
ER
Glide
d
1.18E2.84E2.20E-Secratase
01
07
16
2.42E5.10E3.60EFactor Xa
07
09
13
1.27E4.52E1.26EHIV 1 Protease
07
09
14
Src Tyrosine
0.0310
0.00067
Kinase
9 0.01639
62
1.55E1.04EThrombin
05 0.01828
05
This shows that all the ligands differ by a statistically significant
amount (again using 0.05 as the threshold p-value). We can now
-Secratase
Factor Xa
HIV 1 Protease
Thrombin
The worst ligands for each protein can clearly be seen on each
graph; ligand six for -Secratase; ligand three, ten or sixteen for
Factor Xa; ligand ten for HIV 1 Protease; ligand seven for Src
Tyrosine Kinase; ligand seven for Thrombin. The ANOVA test can be
run again, giving the following deviations from the mean, with the
most negative the best performing ligands and the most positive the
worst performing ligands.
-Secratase = 1.834586
ligand 1 ligand 2 ligand 3 ligand 4 ligand 5 ligand 6 ligand 7
1.218
0.5865
-1.203
-1.0977
-0.5188
2.5338
-1.5188
Factor Xa = 5.547847
ligand 1 ligand 2 ligand 3 ligand 4
0.084
-1.1
3.978
2.057
ligand
ligand
ligand 8 ligand 9 10
11
-0.653
-3.706
4.768
-1.127
ligand
ligand
ligand
ligand
15
16
17
18
-0.969
4.136
-2.916
1.768
ligand
22
0.794
ligand 5
0.136
ligand
12
0.347
ligand
19
-0.048
ligand 6
-3.074
ligand
13
-0.285
ligand
20
0.557
ligand 7
-2.548
ligand
14
-1.443
ligand
21
-0.758
Thrombin = 2.763158
ligand 1 ligand 2 ligand 3 ligand 4 ligand 5 ligand 6 ligand 7
-0.1316
-0.8684
-0.0263
0.2368
-1.0263
1.0263
1.7105
ligand
ligand
ligand
ligand 8 ligand 9 10
11
12
-1.6053
1.4474
-0.9737
0.7105
-0.5
This again just gives a numerical indication of the results we can see
from examining the boxplot. The ANOVA test is not really applicable
to these results, as they are not normally distributed, so absolute
values cannot be used, but help to strengthen the argument and
make clearer the boxplots.
Conclusions
1e) These results show that there is no statistically significant
difference between any of the rescored methods with the original
calculations, both on the old and new software, when compared to
the experimental values. This all assumes the experimental values
are correct and do not have any error, which will of course not be
true. However as we are principally considering whether the order is
correct rather than the absolute values, as this is what matters
when choosing which drug will work best, then the error will have
less of an effect.
It is also clear however that depending on the ligand attached to the
drug molecule then the calculations do perform differently. This is an
obvious shortcoming in the protocols that is present across all the
tested proteins. Future work could involve investigation into how
these ligands affect the results, as this could provide valuable
insight into improving the performance of the calculations.