You are on page 1of 10

Duquesne University

URP 2014
The Project

Pharmacophore Discovery Aimed at


Inhibiting the Spi-1C/EBP Interaction

Author:
Emily Cribas
esc5161@psu.edu
Penn State University

Supervisors:
Dr. Jeffry D. Madura
Dr. Philip E. Auron

July 24, 2014

I.

Introduction

A.

Significance

The interleukin gene encodes for the IL-1 protein, which is important for cell signaling. In this case,
the protein helps module immune responses, including fever and inflammation. The gene turns on with
the help of an enhancer and a promoter.[1] Notably, the complex that forms between one of the enhancers
and one of the proteins causes overstimulation of the gene, which in turn, can lead to inflammatory diseases
such as gout, rheumatoid arthritis, and inflammatory bowl disease.[2]

B.

Project Summary

Spi-1 is a member of the ETS family of transcription factors, which are distinguished by their DNA
binding domain (DBD) that bind to a piece of DNA containing a specific recognition sequence. In the case
of Spi-1, the DBD is found in helix 3 of its winged helix-turn-helix (wHTH) structure.[3]. For clarification,
only the DBD of Spi-1 (specifically, Arg232 and Arg235 ) binds to the recognition sequence (in this case, a
string of 4 purine bases) in the interleukin gene. To reiterate, this project focuses solely on the DBD portion
of the Spi-1 transcription factor.
The interleukin gene is transduced through direct interactions with Spi-1 and C/EBP at the promoter
region. Through a glutamic acid and an arginine, these two factors interact via hydrogen bonding.[4] Once
this interaction occurs, transcription is activated and the RNA polymerase proceeds to transcribe DNA into
RNA.
C/EBP, on the other hand, is a class of enhancer transcription factors that contain a bZIP, or basic
leucine zipper, which includes a basic domain that binds to the major groove in DNA. Only five pairs of
leucines are required to stabilize its structure. The remaining leucines and helices contribute to its binding
to Spi-1, based on electrostatic interactions.[5]
The pertinent literature section describes in further detail how Arg232 on Spi-1 shows the clearest and
strongest form of interaction between the complex and the enhancer transcription factor.[4] Therefore, the
area surrounding the residue seems like the most reasonable place to explore, in terms of inhibitory binding
capabilities.

C.

Pertinent Literature

Kodandapanis paper[3] generated the x-ray crystal structure of the Spi-1/DNA complex, and therefore
figured out the winged helix-turn-helix motif of Spi-1, which characterizes its mode of binding to DNA. To
our interest, they discovered the DNA recognition sequence to which Spi-1 binds as well as the binding of
R232 to the DNA.
Listmans paper[4] discovered how C/EBP binds to the Spi-1 protein to, cooperatively, support transcription of IL1. Its experiments proved that a single substitution at the arginine 232 residue results in an
80% reduction in C/EBP bZIP binding, again proving the importance of the R232 residue in their binding. It also said that residues 330345 at the COOH-terminal end (leucine bZIP domain) of the C/EBP
mediates binding to the Spi-1 ETS domain to support Spi-1 function on the promoter.
As for the computational pertinent literature, Cheathams paper[6] on molecular modeling for nucleic
acids provides clear guidelines as to which force fields are best for different types of DNA (in this case, B)
as well as how to setup and analyze nucleic acids in MD. Also, it touches on the important factors that can
characterize DNA movement, such as the effects a salty or polar environment could have on conformational
changes.

II.
A.

Hypothesis and Specific Aims


Hypothesis

The region near Arg232 (R232) of the Spi-1/DNA complex can accommodate a small molecule through
hydrogen bonding. The purpose of this project is to try to inhibit binding between Spi-1 and C/EBP
because, together, they act as a cooperative unit to begin transcription of the Interleukin gene, which can
encode for a protein that can lead to inflammatory disease. Importantly, the binding between these two
transcription factors is most prominent around the R232 pocket. Therefore finding a molecule that can best
bind to this pocket will inhibit binding of the enhancer transcription factor, C/EBP.[4]

B.

Specific Aims

i. Characterize the region near R232 of the Spi-1/DNA complex. This will be done through molecular
modeling techniques and running MD calculations using NAMD for three separate simulations: the
nucleic acid, the protein, and the complex.
ii. Develop a pharmacophore model. After finding the optimal structure of the complex, we can use
MOE to develop a model with desirable molecular electronic and steric features that will be critical in
characterizing the inhibitory molecule.
iii. Virtually screen a library of small molecules. Through testing each molecule, we can pick the one that
exhibits the strongest form of binding, so that future continuations of this project can include comparing
binding strengths to those of C/EBP to see if these predicted molecules can serve as effective pharmacophores by inhibiting Spi-1/C/EBP interactions.

III.
A.

Methods
Setup

To prepare the complex, the PDB (ID:1PUE) was downloaded from the Protein Data Bank and the structure was prepared and optimized using MMTSB tools[7] which specify commands to add missing atoms(in
this case, hydrogens), solvate the complex, add ions to neutralize the system, and finally generate a PSF
which will be utilized for equilibration.

B.

Equilibration

The equilibration phase is made up of: minimization and molecular dynamics (MD) simulations. NAMD
2.9, a parallel molecular dynamics code designed for high-performance simulation of large biomolecular
systems[8] is required for both steps as we are dealing with a 40,000+ atom system with numerous forces
that must be minimized efficiently. Ideally, no restraints would be desired to best mimic the physiological
environment of the complex, but because of the size of the system, restraints are required in the beginning
to control the forces and minimize the large energies.

C.

Analysis and Pharmacophore Development

R[9] and VMD1.9.1[10] were used to monitor the progress of equilibration and stability of the complex. By
plotting values such as potential energy and volume in R, proximity to near-convergence can be calculated.
Through use of the moviemaker option in VMD, the overall complex movement and conformational changes
can be seen on a wider scale to avoid highly detrimental instabilities and possible denaturation.
In this case, harmonic constraints with force constants of 50 and 100 kJ/mol, respectively, were placed
on the system and were run separately to expedite convergence.
MOE[11] was used to create a validated pharmacophore query with five features based on optimal binding
locations within the R232 binding pocket. A library was chosen arbitrarily to cross-reference any molecules
2

that may have any of the desired features of the inhibitory molecule and were ranked based upon rmsd,
compared to the location of the features annotation points, and rscore, a sum of all pharmacophore feature
points.

IV.

Limitations and Accuracy

Since this is an explicit solvent system with over 42,000 atoms, the complexity and computational cost
is much greater than a typical MD simulation.
Additionally, these MD simulations can get trapped in metastable conformational states that may not
be representative of reality[6] and this is due to the fact that it is not currently possible to fully sample all
thermal conformations of a complex because potential energy is never going to be completely constant[6]
Even more limiting, ion and protein parameters tend to underestimate interactions with DNA. Ion parameters, or AMBER adapted Aqvist parameters, for example, may underestimate the free energy of salvation.[12]
Finally, since the Interleukin gene represents only a short strand of the entire DNA sequence, characteristics of the DNA may be misrepresented. For example, the flexibility and structure of DNA depends on a
multitude of factors including base pairing. Adding base pairs to a sequence of DNA, even more if they are
GC base pairs (they have triple hydrogen bonds and better stacking interactions) can increase stability of
the structure, and make it less prone to other conformations.

V.
A.

Data and Interpretations


Simulation Analysis

For each of the three simulations, RMSD and RMSF values were compared to ensure there were no
discrepancies in the behavior of the two macromolecules.
To clarify, an RMSD value is a change in displacement between the position of a particular set of atoms
(for the protein, they were: C ,C1,N, and O due to the stability and regularity of a peptide bond, and for
the nucleic acid, they were: N1/N9, C4, and C1 due to the regularity of the glycosydic bond) at a certain
time frame with respect to a reference position of those same atoms at a reference frame (usually at time=0)
averaged for each of the chosen particles, giving average displacements of all of those atoms for each time
frame. These values are important in determining whether a protein or nucleic acid has underwent a conformational change or has reached equilibration.

(a) Bound Spi-1, Mean RMSD: 0.774


A

(b) Unbound Spi-1, Mean RMSD: 1.377


A

Figure 1: RMSD of Backbone Atoms in Spi-1


(a) Illustrates the near-convergence of complexed Spi-1 20 ns into the simulation in terms of movement,
seeing as the RMSD value is relatively stable. Besides a peak at about 28 ns that could signify a conformational change, there are no unaccounted for discrepancies in the graph. However, (b) free Spi-1 does not
display a converged RMSD value, at least not as quickly as bound Spi-1 did. The protein in our complex has
reached a stable and lower RMSD value, and so, its final structure can be examined for binding potentials
near R232.
3

(a) Bound DNA, Mean RMSD:0.948


A

(b) Unbound DNA, Mean RMSD:1.446


A

Figure 2: RMSD of Glycosidic Atoms in DNA


(a) Displays a large peak at around 82 ns into the simulation, possibly meaning the DNA in complex has
undergone a conformational change, so running the simulation for a longer period of time until the RMSD
value has further stabilized is desirable. Interestingly, (b) the free DNA has seemed to reach a stable, but
higher RMSD value during the same time frame. Further analysis using programs such as Curves+ [13] and
Canal[14] should be done to further analyze the behavior of the nucleic acid in its bound and unbound state
throughout the simulation.

An RMSF value, however, is a change between the position of a particle at a certain time frame with
respect to the position of that particle at a reference frame averaged over time, giving average displacements
for each atom per residue or base. These values are important for understanding how floppy a certain
residue may be compared to the rest of the protein or how unstable a base may be compared to the entire
strand of DNA.

Figure 3: RMSF per Residue of Spi-1


RMSF values for the unbound and bound states of Spi-1 are very similar for the majority of residues in
the DBD, but differ dramatically at the beginning and ending residues. This difference can be accounted for
the fact that since unbound Spi-1 does not have a DNA complexed to it, it has nothing to limit its ends from
moving, because it has nothing on its ends to bind to. Notably, the region near residue 232 is similar in both
states, meaning that its movement has remained undisturbed, and there are no irregularities throughout the
entire simulation.
4

Figure 4: RMSF per Base of DNA


Again, both RMSF values are comparable to each other with no major discrepancies, and the lower
RMSF values for the end bases can be accounted for by the harmonic restraints placed on the ends in both
simulations. It would be interesting to measure RMSF values for these states without restraints, but they
were originally placed to avoid unraveling or disruption of the gene.

B.

Pharmacophore Modeling

(a) Binding Pocket with Features

(b) Pharmacophore Features

Figure 5: Pharmacophore Features in the R232 Binding Pocket


(a) Highlights the R232 binding pocket in pink and green(hydrophilic and hydrophobic areas) with the
pharmacophore features from (b) depicted as spheres. Many of the features in our model contain hydrogen
bonding to mostly water molecules found in the pocket, highlighting the importance of waters in this region.
The features were chosen based on areas with high binding accessibility to the DNA, waters, or the protein,
excluding any type of binding to Arg232 to avoid any perterbance in the bonding found in the complex.

C.

Library Screening

(a) Molecule 1

(b) Molecule 2

(c) Molecule 3

Figure 6: Stereo View of Screened Molecules

(a) Molecule 1

(b) Molecule 2
(c) Molecule 3

Figure 7: Ligand Interaction Maps of Screened Molecules


Figures 7 and 8 display the nature of the top 3 hits obtained from the arbitrarily chosen database after cross-referencing with our pharmacophore features. Each molecule contains solvent contacts and some
contain important hydrogen bonds to residues in Spi-1 and/or nucleic acid bases. Table 1 further describes
binding features of each molecule.

molecule

rmsd

rscore

0.328

3.690

0.448

3.693

1.091

11.936

Table 1: Library of Small Molecules


This table displays only 3 of the thousands of hits obtained from screening results. The rmsd is the
calculated distance from the center of the feature (annotation point) to the atom of the molecule containing that particular feature.The rscore denotes the sum of individual feature rscores, the acceptor or
donor strength of matching atoms per pharmacophore feature. Importantly, a low RMSD and high rscore
are desirable characteristics of our molecule. In this case, the correct balance between the two must be found.

VI.

Conclusions

The analysis of the three simulations validates our final Spi-1:DNA structure in the fact that they prove
that, for the most part, our complex has converged, and is behaving as it should, in terms of movement.
Further analysis on the nucleic acid should be conducted to investigate the unconverged nature of the bound
DNA.
After defining our R232 pocket, we have determined that it has provided us with a viable pharmacophore
model, because, from it, we have been able to screen a library of small molecules, and had a positive outcome
from it.
Each of our five pharmacophore features found in the binding pocket are either hydrogen acceptor or
donor features, and so, our screened molecules contain at least one of these features, supporting our hypothesis that our inhibitory molecule will act through hydrogen bonding.

VII.

Future Work

In the future, we hope to refine our screening results by using different libraries and possibly refining our
pharmacophore query to include a more accurate and limited library of small molecules.
After we have narrowed down our results, we can use molecular docking techniques to attach each
molecule to its corresponding binding region (separately) and conduct MD simulations for each molecule in
the complex.
Finally, we can compare the binding strengths of each of these molecules to those of C/EBP to determine whether the molecule could, in fact, inhibit the binding of this protein.
Additionally, obtaining a crystal structure of the two proteins on DNA would be extremely helpful in
carrying out more accurate computational simulations, modeling, and screening procedures.

VIII.

Acknowledgements

National Science Foundation, Major Research Instrumentation (MRI) Grant Number: CHE-1126465
National Institutes of Health R25, National Institute on Drug Abuse (NIDA) Grant Number: 1 R25
DA032519-01
Duquesne University Undergraduate Research Program (URP)
Madura Research Group
Auron Research Group
Scott Boesch
Emilio Esposito

References
1.

Adamik, J.; Wang, K. Z. Q.; Unlu, S.; Su, A.-J. a.; Tannahill, G. M.; Galson, D. L.; ONeill, L. a.;
Auron, P. E. PloS one Jan. 2013, 8, e70622.

2.

Hazuda, J.; Simon, L. 1990.

3.

Kodandapani, R.; Pio, F.; Ni, C.-Z.; Piccialli, G.; Al, E. English Nature Apr. 1996, 380, 456
Kodandapani, R., Pio, F., Ni, C.Z., Piccialli.

4.

Listman, J. a.; Wara-aswapati, N.; Race, J. E.; Blystone, L. W.; Walker-Kopp, N.; Yang, Z.; Auron,
P. E. The Journal of biological chemistry Dec. 2005, 280, 4142141428.

5.

Tahirov, T. H.; Sato, K.; Ichikawa-Iwata, E.; Sasaki, M.; Inoue-Bungo, T.; Shiina, M.; Kimura, K.;
Takata, S.; Fujikawa, A.; Morii, H.; Kumasaka, T.; Yamamoto, M.; Ishii, S.; Ogata, K. Cell Jan. 2002,
108, 5770.

6.

Cheatham, T. E.; Young, M. a. Biopolymers 2000, 56, 23256.

7.

Feig, M.; Karanicolas, J.; Brooks, C. L. Journal of molecular graphics and modeling May 2004, 22,
37795.

8.

Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.;
Kale, L.; Schulten, K. Journal of computational chemistry Dec. 2005, 26, 1781802.

9.

R Core Team R: A Language and Environment for Statistical Computing.; R Foundation for Statistical
Computing, Vienna, Austria, 2014.

10.

Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual Molecular Dynamics., 1996.

11.

Inc., C. C. G. Molecular Operating Environment (MOE), 2011.10.

12.

Cheatham, T. E.; Young, M. A. Biopolymers Jan. 2000, 56, 232256.

13.

Blanchet, C.; Pasi, M.; Zakrzewska, K.; Lavery, R. Nucleic acids research July 2011, 39, W6873.

14.

Lavery, R.; Moakher, M.; Maddocks, J. H.; Petkeviciute, D.; Zakrzewska, K. Nucleic acids research
Sept. 2009, 37, 591729.

You might also like