Discovery Studio 2.1 Tutorials Guide

Discovery Studio 2.
1

Tutorials

:

Note. Some tutorials require data files that can be downloaded here:

http://doc.accelrys.com/doc/life/dstudio/21/tutorials/tutorialdata.zip

Save the zip file to your computer and then extract the contents to a folder. We
recommend that you add this folder to the Files Explorer as a top-level or 'root'
folder. To do this, navigate to the folder, right-click the folder, and choose the Add
Root Node... command from the context menu. Next, on the dialog, enter a display
name for the root folder (e.g., Tutorial Data) and click the OK button. The tutorial
data files will then be readily accessible at the top-level of the Files Explorer.19
Contents
These tutorials help you to increase your knowledge of Discovery Studio and
Discovery Studio Developer client. Whether you are just getting started with
Discovery Studio, or are focused on specific functionality within the software, this is
a good place to begin.
Introduction
UI and Mouse Learn how to view and manipulate 3D structures using tools and controls
available in Discovery Studio. Explore the interaction between windows and create useful
plots for data analysis.
Opening and viewing data - Learn how to open files, to alter display images and views,
and to save structures in specific file formats.
Library Design
Enumerate Library Enumerate a reaction-based combinatorial library and explore the
diverse and similar subset of the library.
Pareto optimization of a combinatorial subset library Optimize a selection of a
combinatorial subset library using the Pareto method.
Pharmacophore
Creating pharmacophores (structure-based) Generate a pharmacophore model
based on the HIV protease molecule.
Ligand Profiler Explore how a set of ligands can be profiled for potential activity using
pharmacophore models.
Common feature pharmacophore generation Use the Common Feature
Pharmacophore Generation protocol to create pharmacophore models.
Creating and using fragment-based pharmacophores Explore the use of
pharmacophores in fragment-based compound design.
Creating custom features (pharmacophore) Create customized pharmacophore
feature definitions using the Customize Pharmacophore Features tool.
Protein Modeling
Homology modeling of an extracellular amylase protein Learn about the tools that
are available for structural biologists in Discovery Studio and how these tools can be used
in a typical homology modeling project.
Looper with antibodies Create high-quality loop conformations using Looper: a
CHARMm based algorithm for loop refinement.
ZDOCK - Use protein docking with ZDOCK and refinement with RDOCK.
QSAR
Building a QSAR equation Construct a structure-activity relationship equation using
QSAR.
Receptor-Ligand Interactions
Docking small molecules with LibDock Explore small molecule docking and analysis
using LibDock.
Simulation
Simulation of a small peptide with restraints (protein simulation) Build a small
peptide and run the Standard Dynamics Cascade protocol with distance and dihedral
restraints.
Using the Calculate Energy (QM-MM) protocol to determine energies, rank order
poses, and compare energies before and after minimization - Calculate the QM-MM
interaction energy and pose ranking of human cyclin dependent kinase protein-ligand
complexes.
Developer Client
Appending a Final Minimization Stage to the Standard Dynamics Cascade protocol
- Learn how to customize a protocol by modifying the Standard Dynamics Cascade
protocol.
Calculating molecular properties on the lowest energy conformation extracted
from a Catalyst database - Learn how to use the Discovery Studio Developer client to
create a new protocol.
-2-
UI and mouse tutorial
Purpose: Learn how to view and manipulate 3D structures using tools and controls
available in Discovery Studio. Explore the interaction between windows and create useful
plots for data analysis.

Required functionality and modules: Discovery Studio Visualizer client.

Required data files: 1TPO.pdb and 1BVN.pdb.

Time: 20 minutes.
Background
Visualization and general structural analysis tools are critical in the understanding of
biological and biochemical systems. Discovery Studio provides an interactive environment
for molecular visualization, and supplies you with a wide range of tools for structure
characterization and analysis.
Introduction
This tutorial covers:
Viewing a protein molecule
Exploring the toolbar buttons and the mouse
Working with the Sequence Window and the 3D Window
2D and 3D Plots
Viewing a protein molecule
From the Files Explorer, expand the Samples folder and double-click the 1TPO file.
This opens the 1TPO. pdb file in the Graphics View of a 3D Window.
Note. Each 3D Window consists of a Graphics View, a Hierarchy View, and a Data Table
View. The display of these views can be toggled with CTRL+G, CTRL+H, and CTRL+T
respectively.
You can drag-and-drop files into windows in Discovery Studio. Dragging a file into a specific
window will insert the file into that window, adding, for example, an additional molecule to
a 3D Window. This can be useful when you want to work with several molecules in one
window.

-3-
Note. You can also open files from the Samples folder. From the menu bar, choose File |
Open..., click the Samples icon on the Open dialog, choose a file from the list box, and click
the Open button.

Alternatively, if you have internet access, choose File | Open URL... from the menu bar to
display the Open URL dialog. Enter 1TPO (the protein PDB identifier) in the PDB ID text
box, and click the Open button. By default, the 1TPO molecule is retrieved directly from the
RCSB web site.
Click the 1TPO - 3D Window tab to make it active, indicated by a bright blue outline.
Tip. To maximize the viewing area for an active window (3D Window, Sequence Window,
Ramachandran Plot, etc.), press the shortcut key F11. Press F11 again to restore your
original view.
Exploring the toolbar buttons and the mouse
Hover the cursor over each button on the View toolbar to display the tooltip that describes
that button's function. Then click each of the following buttons on the View toolbar in turn
and drag the cursor in the Graphics View of the 3D Window.
Select: Allows you to select items in the 3D Window. Select a region of the protein in
the Graphics View by clicking and dragging the lasso around a portion of the structure. You
can add to the current selection by pressing SHIFT and encircling another region with the
lasso.
Rotate: Allows you to rotate the view of the structure in the Graphics View. Press
SHIFT when rotating to rotate about the Z-axis. Rotation can also be performed while in
translation or zoom mode by clicking and holding the right-mouse button and dragging the
cursor.
Translate: Allows you to translate the structure in the Graphics View of the 3D
Window. Clicking and dragging translates the molecule in the XY plane, while pressing
SHIFT and dragging the cursor translates the protein along the Z-axis.
Zoom: Allows you to enlarge or reduce the view of the structure. Zoom in by dragging
the cursor upward. Dragging the cursor downward zooms out.
Note. Pressing CTRL while in rotation, translation, or selection mode allows you to adjust
the position of just the selected atoms.

You can use Undo (CTRL+Z) to revert any actions, such as rotation or translation, in the
3D Window.
Tip. Other mouse tools are available when you display other toolbars (e.g., the Torsion tool
on the Sketching toolbar).
-4-
Click the Select tool button on the View toolbar. Click an atom in the Graphics View to
select it. Double-click the selected atom while it is still highlighted (displayed in yellow) to
select its parent residue. Double-click the selected residue again while it is still selected
to select its parent chain.
This allows you to progressively select different levels in the molecular hierarchy. The
current selection is reflected in the Hierarchy View.
If the Hierarchy View is not visible, press CTRL+H to display it or, alternatively, choose
View | Hierarchy from the menu bar.
Tip. You can cancel the selection by clicking another object to make a new selection, or
click in a blank area in the Graphics View to select nothing.
To change the display style
Click the 1TPO - 3D Window tab to make it active. Click the Display Style button on the
View toolbar.
Tip. Alternatively, to display the Display Style dialog when the Graphics View is active,
press CTRL+D or right-click in the 3D Window and select Display Style... from the context
menu.
On the Display Style dialog, click the Atom tab, and from the Display style control group,
select the Ball and stick option. Click the Protein tab, and from the Display style control
group, select Solid ribbon. Click the OK button to apply your changes.
The atoms in the residues are shown as solid spheres and the bonds as cylinders. The
protein backbone is represented as a solid 3D ribbon.
Tip. You can use the Graphics View Display Style dialog to change the coloring of atoms
and residues in the protein based on a wide range of different properties.
Note. Available options on the Display Style dialog differ depending upon which window or
view is current. Changes made to the display rendering only apply to whatever is selected
in the Graphics View. If nothing is selected, the rendering will be changed for the whole
structure.
To copy, paste, and delete objects
Elements of the system can be copied, pasted, or deleted by highlighting and selecting
commands from the Edit menu. Standard Windows shortcuts, such as CTRL+C and
CTRL+V, can be used to quickly access this functionality.
Select the Water chain(s) from the Hierarchy View and press DELETE to remove from the
structure. As an alternative to using the DELETE key, you can use the Delete command
from the Edit menu.
Water molecules in the protein crystal structure are grouped into distinct chains. By
deleting these chains, we can focus on the protein structure itself in this tutorial.
-5-
To explore views
You can click the + and - symbols in the Hierarchy View to expand and contract sections of
the hierarchy.
Tip. You can use the Hierarchy View to drill down to the atomic level and highlight specific
atoms or groups of interest (e.g., hydrophobic residues).
If the Data Table is not displayed, select View | Data Table from the menu bar or press
CTRL+T. Explore the tabs in the Data Table View and notice the association between the
Hierarchy View's and the Data Table View's organization (e.g., the cell and molecule
levels). Click the Molecule tab in the Data Table View. Click the header of the Molecular
Weight column (located in the middle of the tab), and then drag and drop the header onto
the Number of Atoms column (located at the beginning of the tab).
The order of the columns on the Molecule tab changes, with the Molecular Weight column
now positioned before the Number of Atoms column.
Click the AminoAcidChain tab. Click in the cell in the Color column and click the
button.
This displays the Select Color dialog.
Select a new color and click the OK button.
This changes the color of the amino acid chain.
Note. The white cells in the Data Table View contain properties that can be modified,
whereas the values in the gray cells cannot be edited.
Working with the Sequence Window and the 3D Window
In this section, you will learn how the Sequence and 3D Windows can be used together.
To open the Sequence Window
If you closed the 3D Window, re-open the 1TPO. pdb file from the Samples folder.

From the menu bar, choose Sequence | Show Sequence to view the sequence for the
molecule.
This opens a new Sequence Window that allows you to visualize and manipulate the amino
acid sequence and the corresponding 3D structure simultaneously.
Drag the Sequence Window tab to the upper edge of the 3D Window to modify the
interface layout, so that you have both Sequence and 3D Windows available to work with.
Tip. You can adjust window layouts to present information for specific applications;
windows may be tabbed, set side-by-side, or hidden from view. You configure different
window layouts by dragging window tabs to desired positions. You can also adjust the
-6-
interface using the actions of the View menu and its associated shortcut keys. For example,
you can toggle the display of the Files, Tools, and Protocols Explorers using CTRL+2.
Click the 1TPO - Sequence Window tab to make it active. Press CTRL+D to display the
Display Style dialog for the Sequence Window. Click the Residue tab, click the Color by
sequence option, and then choose Secondary structure from the dropdown list. Click
the OK button.
This changes the residue colors in the Sequence Window based on secondary structure
type.
Right-click in the Sequence Window and choose Secondary Structure Cartoon from the
context menu.
This displays the Kabsch-Sander secondary structure cartoon. The coloring of the residues
correspond to the secondary structure cartoon display when this is employed in the
Graphics View. The blue arrows represent beta strands, and the red, solid cylinders
represent alpha helices.
Note. Additional secondary structure cartoons are available from the General tab of the
Sequence Window Display Style dialog.
Tip. You can set your preferences to open all sequences in one Sequence Window. From
the menu bar, choose Edit | Preferences... to display the Preferences dialog, and then, on
the Sequence Window page, check the Add to existing Window checkbox. Other display
preferences, such as automatically showing secondary structure, can be set on the
Preferences dialog.
To select the Catalytic Triad
Hover the cursor over any residue in the Sequence Window.
The residue ID is reported in a tooltip.
Find and select the three residues of the Catalytic Triad - HIS57, ASP102, and SER195
(clicking and dragging over the residue adds it to the selection).
HIS57, ASP102 and SER195 are at position numbers 40, 84, and 177 in the Sequence
Window. If you make a mistake in selecting a residue, you can undo that selection action
using the Edit Undo action (or use the CTRL+Z shortcut key).
Tip. You can also use the Hierarchy View or Data Table View to select the residues.
Note. The numbers on the ruler in the Sequence Window run sequentially starting from 1.
They do not reflect the residue numbering in the protein, however, hovering over a residue
in the Sequence Window will display a tooltip that reflects the numbering inherited from
the experimental structure available from the Protein Data Bank (PDB). For 1TPO, the first
residue is ILE16.
-7-
To create a group
Click the 1TPO - 3D Window tab to make it active. With the Catalytic Triad residues
selected, from the menu bar, choose Edit | Group... to open the Edit Group dialog. Enter
Catalytic Triad as the Group name and click the Define button. Select View |
Transform | Fit To Screen or click the Fit To Screen button on the View toolbar to
center and zoom over the Triad.
Note. The Catalytic Triad group is added to the Hierarchy View (at the bottom) and Data
Table View (Group tab).
To simplify the display style of the protein and the Catalytic Triad
Click any empty area of the screen to cancel any selection. Select View | Display Style...
from the menu bar and on the Atom tab of the Graphics View Display Style dialog, set the
Display style to None. On the Protein tab ensure the Display Style is set to Solid
ribbon and click the OK button.
This will remove all the atom-based rendering from the screen, and only display the solid
ribbon rendering.
Select the Catalytic Triad group from the Hierarchy View, and on the Display Style dialog
select Stick on the Atom tab. Click the OK button.
In the Graphics View you will see that only residues HIS57, ASP102 and SER195 are
displayed as stick models, and the remainder of the protein is rendered as a ribbon. By
only allowing a subset of residues to be visible on the Graphics View, one can have a more
focused view of the residues of interest, in this case, a simplified view of the Catalytic
Triad.
Tip. You can save your active view and rendering by saving the system as an MSV file.
Saving files in an MSV format will retain the active view, the rendering, the labels, or any
other annotations that you have made. MSV files provide an efficient medium for sharing
information with colleagues.
From the menu bar, choose Window | Close All to prepare for the next section of the
tutorial. For the next section you do not need to save any files so you can select No in the
dialog that appears.
To explore the 3D Window and Sequence Window interactivity
The previous section illustrates interactivity between windows, through residue selection in
the Sequence Window and display style changes in the 3D Window. In this section, we
continue to explore examples of window interactivity and provide additional examples in
which window-to-window interactivity is of value.
In the Files Explorer, expand the Samples folder and double-click the 1BVN.pdb file.
Tip. If the Hierarchy View and the Data Table are not visible, from the menu bar, choose
View and check the appropriate selection to display them.
-8-
From the menu bar, choose Sequence | Show Sequence to open the Sequence Window.
You will notice that there are three chains called P, T, and P in the Hierarchy View. Click the
first P chain to select it.
A selection will be made in the Graphics View and also in the Sequence Window. If you
highlight residues in the Sequence Window, these will be mapped to the Graphics View, the
Hierarchy View, and the Data Table.
Deselect by clicking an empty region of the Graphics View. Then slowly drag the mouse
from left to right on the Sequence Window starting from residue 1 in the ruler.
In the Sequence Window, the selection will have a black background and will be highlighted
in small yellow squares in both the Hierarchy View and the Graphics View. A selection will
also be made in the Data Table.
Note. Most windows in the application interact with one another. Selections made in one
window can be visualized and used in another.
Using the Select tool, select any region of the structure from the Graphics View.
You will notice that residues in the Sequence Window get highlighted as well.
To explore the Ramachandran Plot and 3D Window interactivity
Click the 1BVN - 3D Window tab to make it active. From the menu bar, choose Chart |
Ramachandran Plot.
A new Ramachandran Plot opens. On the View toolbar, make sure the Select tool is
enabled.
In the Ramanchandran plot, select residues in the region of the plot at approximately (-60,
-60) degrees for Phi and Psi respectively.
Highlighting points on the Ramachandran Plot allows you to identify residues in the
structure which adopt defined Phi and Psi angles. Residues outside typical Phi, Psi regions
can be quickly located, and, through selection on the Ramachandran Plot, examined in the
Graphics View.
2D and 3D Plots
Discovery Studio provides a rich set of interactive plotting tools for streamlined data
analysis, including 2D and 3D interactive plots.
To explore 2D Plots
From the menu bar, choose Window | Close All, and re-open the 1TPO.pdb file from the
Samples folder.

Display the Data Table View by pressing CTRL+T.
-9-
In the Data Table, select the AminoAcid tab. Scroll to the right and click the Avg.
Isotropic Displacement column header to select it.
From the Chart menu, select the Line Plot.
A plot will be displayed on the screen. The plot is also fully interactive. Selections in the
plot will be mapped to the 3D Window and selections in the Graphics View will be
highlighted on the Plot.
Note. A plot of the average isotropic displacement on a per residue basis is a useful way of
examining the thermal mobility of residues belonging to molecular structures that have
been solved using X-ray crystallography. Higher values represent higher atomic mobility
and also higher coordinate positional error.
To explore 3D Plots
Click the 3D Window tab to make it active. In the Data Table View, click any cell to
cancel the selection of the previously selected column.

From the Chart menu, select the 3D Point Plot.
A dialog is displayed with the plot type selections listed. The dialog contains a list of
possible selections for each axis.
For the X axis option, leave the selection as Name, for the Y-axis, select Hydrophobicity,
for the Z-axis select Avg. Isotropic Displacement, and for the Color axis, select
Secondary. Click the OK button.
A 3D Plot is displayed which can be used to analyze data straightforwardly (for example,
here, enabling you to identify hydrophilic residues with high isotropic displacement values).
This is the end of the tutorial.

Tutorial version: 1.14.2.6, updated: 2008/06/02 19:30:00.
-10-
Opening and viewing data
Purpose: Learn how to open files, to alter display images and views, and to save
structures in specific file formats.

Modules: DS Visualizer.

Required data files: 2RH1.pdb.

Time: 15 minutes.
Introduction
In this tutorial, you will learn how to analyze protein structures visually. As an example,
you will use the reported structure of a human B2-adrenergic G protein-coupled receptor.
This structure is a member of the family of proteins that form the membrane channels that
mediate cell membrane signal transmission, controlling critical characteristics of cell
behavior, and therefore form the targets of many drug molecules. This tutorial illustrates
the use of different display styles in viewing and analyzing the information provided by the
crystal structure reported by Cherezov and coworkers (the structure has the Protein Data
Bank identifier, 2RH1).
Opening the data file
Adjusting display styles
Saving a file
Opening the data file
In the Files Explorer, navigate to and double-click the 2RH1.pdb data file. Alternatively,
from the menu bar, choose File | Open URL and enter the PDB ID: 2rh1.
This opens the protein data file in a 3D Window.
Adjusting display styles
You can adjust the graphical display style and view of the structure using convenient tools
or by choosing settings on dialogs. In this tutorial, you will change the display using the
Display Style dialog and alter the view using the View toolbar.
1. To change the display of the structure
In the Graphics View, right-click and choose Display Style... from the context menu.
This opens the Display Style dialog.
On the dialog, click the Protein tab. In the Display style control group, click the Solid
ribbon radio button.

-11-
Click the Atom tab. From the Display style control group, click the None radio button.

Click the OK button.
This displays the protein as a solid ribbon. The ribbon is colored by secondary
structure. Blue represents beta sheet, red represents helical secondary structure, gray
represents loop structure, and green represents turn structures.
The solid ribbon display style allows you to see the overall architecture of the protein,
for example the helical regions in red which span the cell membrane, while removing
atomic detail for the bulk of the protein. The crystal structure also provides detailed
information on the binding of an inverse agonist, carazolol, to the receptor. Next, you
will display non-protein atoms in the structure and highlight the position of the
carazolol molecule.
If the Hierarchy view is not already displayed, choose View | Hierarchy from the
menu bar.
This opens the Hierarchy View in the 3D Window.
In the Hierarchy View, click the second chain A to select it.
Chain A turns yellow in the Hierarchy View, which indicates it is selected.
Right-click in the Graphics View and choose Display Style....
On the dialog, in the Display style control group, click the CPK radio button.

This closes the Display Style dialog and displays the ligands in CPK.
You can now view all the non-protein atoms in the crystal structure. This includes lipid
and counter ion molecules. In order to view the carazolol molecule, you will
temporarily hide all non-protein atoms, and then display the carazolol molecule.
In the Hierarchy View, expand the second chain A.

Uncheck the second chain A.
This hides chain A in the Graphics View.
In the Hierarchy View, in the second chain A, check the CAU408 residue.
A check appears next to the CAU408 residue to indicate that this residue is displayed in
the Graphics View.
You will see the carazolol molecule bound in the cross membrane receptor. The current
display style (CPK) for the ligand molecule shows the spatial extent of the molecule in
-12-
the context of the receptor. To view the chemistry of the ligand, you can employ a ball
and stick model.
In the Hierarchy View, click the CAU408 residue of the second chain A to select it.
The carazolol molecule turns yellow in the Hierarchy View and 3D View, which indicates
that it is selected.
In the Graphics View, right-click and choose Display Style... from the context menu.
On the dialog, in the Display style control group, click the Ball and stick radio button.

The individual atoms and bonds of the carazolol molecule are now displayed in the
context of the solid ribbon of the receptor channel. The protein remains displayed in
solid ribbon style. This allows you to view the ligand-receptor interaction without
viewing all of the atoms of the protein.
2. To change the view of the structure
Structures can be manipulated in a variety of different ways by clicking buttons on the
View toolbar and then left-clicking and dragging in the Graphics View to transform the
view.
Tip. To display toolbars, choose View | Toolbars from the menu bar and choose the
toolbar you want to display. A checkmark appears next to the name of the toolbar on
the View | Toolbars menu to indicate that it is available in the toolbar.
With the CAU408 residue selected, click the Fit to Screen button on the toolbar.
This focuses the view on the carazolol molecule.
On the toolbar, click the Rotate button. In the Graphics View, left-click and drag
in the direction you want the structure to rotate.

Click the Translate button. Left-click and drag to translate the structure in the
XY plane.

Click the Zoom button. Left-click and drag up or down to zoom in or out
respectively.
Tip. If you have a wheel on your mouse, you can zoom in and out of the structure
when any button is engaged on the View toolbar. Roll the wheel up or down to zoom in
or out respectively.
-13-
Saving a file
As you gain familiarity of the mouse and zoom controls, you will find that you can readily
select views which highlight critical interactions in protein systems. For example, in the
case of the B2-adrenergic G protein-coupled receptor, you can view the interaction
between phe290 and trp286 residues and the carazolol molecule, which may be involved in
toggling the activity of the receptor. You can try this by locating these residues in the first
A chain and setting their display styles appropriately.
You can save your work at any time. The native file format for molecular systems is MSV,
which records all display and view orientation information. This format is helpful for saving
the current state of your work on a particular system and for sharing information with
colleagues. If you need to save files for other programs, you can use a variety of
alternative formats. Next, you will learn how to save a Protein Data Bank (PDB) format file.
From the menu bar, choose File | Save As....
This opens the Save As dialog.
On the Look in textbox, click the dropdown arrow to choose a location to save the file to,
or click the Up one level button next to the textbox to navigate to a location.

In the File name textbox, enter a name for the file.

For the Files of type textbox, click the dropdown arrow and choose Protein Data Bank
Files from the list.

Click the Save button.
The file is saved in the chosen location. By default, you can double-click the file to
automatically open it in the application, whether or not the application is currently open.
Tip. If you name your file and add the extension, you do not have to choose from the Files
of type dropdown list to save your file in the specified format. The extension entered in the
File name textbox overrides any selection from the Files of type list.
-14-
Enumerate library tutorial
Purpose: Enumerate a reaction-based combinatorial library and explore the diverse and
similar subset of the library.

Required functionality and modules: Discovery Studio Visualizer client, Enumerate
Library by Reaction, Find Diverse Molecules, and Find Similar Molecules by Fingerprints.

Required data files: amines_35.sd, acids_45.sd, amine12-acid39.sd, and
AmideFormation.rxn.

Time: 45 minutes.
Introduction
In this tutorial, you will generate a combinatorial library, enumerating all of the input
reactants according to a specified amide formation reaction.
Enumerating the combinatorial library
Visualizing the combinatorial library
Selecting a diverse subset of ligands
Selecting a subset of similar ligands
Enumerating the combinatorial library
1. To import the amine and acid reactant files in the 3D Window
In the Files Explorer, navigate to the amines_35.sd data file. Right-click and choose
Open With | 3D Window.
This opens the amine in a new 3D Window.
Next, navigate to the acids_45.sd data file and drag and drop the file into the same
3D Window.
2. To open the Enumerate Library by Reaction protocol and modify the
parameters
In the Protocols Explorer, expand the Library Design folder and double-click the
Enumerate Library by Reaction protocol.
This opens the parameters in the Parameters Explorer.
In the Parameters Explorer, click the Input Reactants parameter, then click the
button.
This opens the Specify Ligands dialog.
-15-
On the dialog, click the All ligands from a 3D Window radio button and choose
amines_35 from the dropdown list.


Click the Input Reaction File parameter, then click the button.
This opens the Input Reaction File dialog.
On the dialog, navigate to and select the generic reaction file, AmideFormation.rxn.

Click the Open button.
3. To run the calculation and view the results
In the Protocols toolbar, click the Run button and wait for the job to complete.
The job takes about one minute on a Pentium 4, 2Gb RAM, 2.8GHz machine.
The Job Completed dialog displays when the job is complete.
In the Jobs Explorer, double-click the completed job.
This opens the Report.htm file in an Html Window.
In the Output Files section, click the amines_35-AmideFormation.sd link.
This opens a Table Browser with 1575 ligands that have been enumerated.
The 1575 ligands are the result of the combination of the 35 amine reactants and the
45 acid reactants. If any of the amine reactants also contained carboxylic acid groups,
or the carboxylic acids contained amine groups, then these could have also combined
in the generic reaction to generate more ligands.
Close the Enumerate Library by Reaction - Html Window.
Visualizing the combinatorial library
1. To open the Calculate Principal Components protocol and modify the
parameters
In the Protocols Explorer, expand the Library Analysis folder and double-click the
Calculate Principal Components protocol.
In the Parameters Explorer, click the Input Ligands parameter, then click the
button.
On the dialog, click the All ligands from a Table Browser radio button and choose
amines_35-AmideFormation from the dropdown list.
-16-

In the Output Files section, click the ModelDescription.html link.
This opens a description of the principal components analysis in an Html Window.
The first three principal components explain 82% of the variance in the property space
of the library, as defined by the eight independent variables.
Re-open the Report.htm file. On Windows, right-click in the Model PCATempModel -
Html Window and choose Back. On Linux, close the Model PCATempModel - Html
Window, locate the job's Output folder (highlighted in the Files Explorer), and
double-click Report.htm.

In the Output Files section, click the ViewResults.pl link.
This opens another Table Browser [amines_35-AmideFormation - Table Browser(1)]
containing the library with the three additional columns of the principal component
analysis and a 3D Point Plot (amines_35-AmideFormation - 3D Point Plot) of the library
in PCA space. Compounds that are close to each other in PCA space will have similar
properties and may have similar activities.
Drag the 3D Point Plot tab down so that the plot is side-by-side with the Table
Browser of the library.

Lasso points close to each other in the 3D Point Plot.
This also selects them in the Table Browser.
Close Calculate Principal Components - Html Window, amines_35-
AmideFormation - Table Browser(1), and amines_35-AmideFormation - 3D
Point Plot.
Selecting a diverse subset of ligands
1. To open the Find Diverse Molecules protocol and modify the parameters
In the Protocols Explorer, in the Library Analysis folder, double-click the Find
Diverse Molecules protocol.
-17-
button.

In the Output Files section, click the diverse-ligands.sd link.
This opens a subset library of 50 diverse molecules in a Table Browser.
Close diverse-ligands - Table Browser and Find Diverse Molecules - Html
Window.
Selecting a subset of similar ligands
1. To open a reference molecule
In the Files Explorer, navigate to and double-click the amine12-acid39.sd data file.
This opens one of the enumerated ligands in the library in a Table Browser.
2. To open the Find Similar Molecules by Fingerprints protocol and modify the
parameters
In the Protocols Explorer, in the Library Analysis folder, double-click the Find
Similar Molecules by Fingerprints protocol.
button.
-18-


Click the Input Reference Ligands parameter, then click the button.
amine12-acid39 from the dropdown list.

In the Output Files section, click the amines_35-AmideFormation-amine12-
acid39.sd link.
This opens a subset library of five molecules that are most similar to the reference
ligand in a Table Browser.
As the reference ligand (amine12-acid39) is also in the library, you can see that it has
been selected in the subset library and has a similarity value of 1.

-19-
Pareto optimization of a combinatorial subset library
(library design and analysis) tutorial
Purpose: Optimize a selection of a combinatorial subset library using the Pareto method.

Required functionality and modules: Discovery Studio Visualizer client and Optimize
Combinatorial Library with Pareto Method.

Required data files: amines_35.sd, acids_45.sd, and AmideFormation.rxn.

Time: 30 minutes.
Background
A common task when designing libraries of ligands is the selection of small subsets of
ligands from a large library. These subset libraries must adequately represent the chemical
diversity of the original library, and satisfy additional property criteria. The challenge of this
task is to find subset library choices that satisfy the competing goals of small size, high
diversity and defined chemical properties.
Pareto optimization is useful when you have more than one property you wish to optimize.
Unlike other optimization methods that generate a single solution, a Pareto optimization
generates multiple solutions to give the best possible tradeoffs among the properties being
optimized. The optimal solutions are those for which it is not possible to improve on one
property without degrading at least one other property.
Introduction
In this tutorial, you will select a diverse subset of ligands from a combinatorial library, and
optimize that selection for diversity and some Lipinski properties. The original
combinatorial library is defined by a generic reaction - an amide formation involving 35
different amines and 45 different carboxylic acids. The subset library will be a
combinatorial selection of the amides from seven different amines and eight different
carboxylic acids, with maximal diversity and an optimal range of some Lipinski properties.

Defining the combinatorial library
Defining the Pareto optimization
-20-
Defining the combinatorial library
1. To open the amine and acid reactant files in the same Graphics View
In the Files Explorer, navigate to the amines_35.sd data file. Right-click and choose
This opens the amine in a new 3D Window.
Next, navigate to the acids_45.sd data file, and drag and drop it into the same
Graphics View.
2. To open the Optimize Combinatorial Library with Pareto Method protocol and
define the combinatorial library
In the Protocols Explorer, expand the Library Design folder and double-click the
Optimize Combinatorial Library with Pareto Method protocol.
In the Parameters Explorer, click the Input Library Type parameter and choose
Reaction from the dropdown list.
The previously unavailable parameters (gray) Input Reactants and Input Reaction File
are now required (red). The Input RG File parameter is no longer available.
Click the Input Reactants parameter, then click the button.
On the dialog, click the All ligands from a 3D Window radio button and choose
amines_35 from the dropdown list.

Click the Input Reaction File parameter, then click the button.
This opens the Input Reaction File dialog.
On the dialog, navigate to AmideFormation.rxn, then click the Open button.
This opens the generic reaction file.
3. To define the subset library
Click the Library Dimensions parameter and enter 7,8 for the value.
The default value is changed in this step because it assumes 3 reactants or R-groups
and there are only two reactants in this example. This parameter setting defines the
number of amines and acids, respectively, which will be selected for the subset library.
-21-
Defining the Pareto optimization
1. To define the diversity parameters to Pareto optimize
In the Parameters Explorer, click the Optimization Property 1 parameter, then click
the button.
This opens the Calculable Property dialog.
On the dialog, in the search textbox, enter FCFP for the value, then choose FCFP_12
from the list.

This sets the diversity of a functional class extended-connectivity fingerprint to a
maximum diameter of 12.
Expand the Optimization Property 1 parameter. Click the What to Optimize
parameter and choose Unique Value Count from the dropdown list.
These parameters maximize the number of distinctly different fingerprints in the
subset.
2. To define the molecular weight Lipinski parameter to Pareto optimize
Click the Optimization Property 2 parameter, then click the button.
On the dialog, in the search textbox, enter Molecular_Weight for the value, then
choose Molecular_Weight from the list.

This sets the molecular weight of the ligands as an optimization property.
Expand the Optimization Property 2 parameter.

Click the What to Optimize parameter, and choose Range Penalty from the
dropdown list.

Click the Goal parameter, and choose Minimize from the dropdown list.

Expand the What to Optimize parameter.

Click the Range Minimum parameter and enter 100 for the value.

Click the Range Maximum parameter and enter 500 for the value.
These parameters minimize the selection of ligands that have a molecular weight less
than 100 or greater than 500.
-22-
3. To define the number of hydrogen bond donors Lipinski parameter to Pareto
optimize
Click the Optimization Property 3 parameter, then click the button.
On the dialog, in the search textbox, enter num for the value, then choose
Num_H_Donors_Lipinski from the list.

This sets the number of hydrogen bond donors as an optimization property.
Expand the Optimization Property 3 parameter. Click the What to Optimize
parameter and choose Range Penalty from the dropdown list.

Click the Goal parameter, and choose Minimize from the dropdown list.



Click the Range Minimum parameter and enter 0 for the value.

Click the Range Maximum parameter and enter 5 for the value.
These parameters minimize the selection of ligands that have more than five hydrogen
bond donors.
This takes about a minute on a single processor Pentium 4, 2Gb RAM, 2.8GHz machine.
In the Output Files section, 20 possible subsets of the original library are listed. Each
subset contains 56 ligands, composed of 7 different amine and 8 different carboxylic
acid combinations.
In the Output Files section, click the Pareto.html link.
This lists the results of the optimization.
The CrowdingDistance value indicates the relative distance (in terms of the
optimization properties) to other subsets on the same Pareto front. Subsets at the
extremes are assigned an infinite crowding distance (1.0e99).
-23-
The Objective value indicates the cost of a function that effectively summarizes the
results of the weighted optimization of the different properties.
The Pareto Plots show the optimization properties of the 20 subset libraries relative to
the first property (i.e., the function class extended-connectivity fingerprints). Each
subset solution is equal to any other subset; however, some are better optimized for
one property (or goal) than another. Compare the subsets:
o A subset in the top-right corner of the plot has a higher overall fingerprint
diversity, but includes more ligands that have been penalized because they were
outside of the molecular weight range (100 - 500).
o A subset in the bottom-left corner has a lower overall fingerprint diversity but
most compounds are within the specified molecular weight range.

-24-
Creating pharmacophores (structure-based) tutorial
Purpose: Generate a pharmacophore model based on the HIV protease molecule.

Required functionality and modules: Discovery Studio Visualizer client, Receptor-
Ligand Interactions, and Pharmacophore.

Required data files: 4phv_Protein.pdb and 4phv_Ligand.sd.

Time: 15 minutes.
Background
The Structure-Based Pharmacophore (SBP) method uses the known or suspected active
site of a protein to select compounds that are likely to bind within that site. The defined
active site is first analyzed to create an interaction map of features that the ligand is
expected to satisfy for a reasonable interaction with the protein. The feature interaction
map is then used as the basis to create an input pharmacophore model.
Introduction
In this tutorial, you will generate a pharmacophore model based on the HIV protease
molecule (4PHV).
Identifying interactions in the binding site of the protein
Creating a pharmacophore model based on the interactions
Identifying interactions in the binding site of the protein
1. To open the protein and ligand files
In the Files Explorer, navigate to and double-click the 4phv_Protein.pdb data file.
This opens the protein in a new 3D Window.
Next, navigate to the 4phv_Ligand.sd data file and drag and drop it into the same 3D
Window.
2. To delete water chains from the Hierarchy View
In the Hierarchy View, click Water to select it, then press DELETE.
3. To open the Receptor-Ligand Interactions tool panels
In the Tools Explorer, choose Receptor-Ligand Interactions from the dropdown list.
This opens the Receptor-Ligand Interactions tool panels.
-25-
4. To define the protein as the receptor
In the Hierarchy View, click 4phv_Protein to select it.

In the Binding Site tool panel, click Define Selected Molecule as Receptor.
This displays an SBD_Receptor in the Hierarchy View. Ignore any messages about
valence violations.
5. To create a sphere centered on the ligand
In the Hierarchy View, click 4phv_Ligand to select it.
In the Tools Explorer, in the Binding Site tool panel, click Define Sphere from
Selection.
This displays the SBD_Site_Sphere as a red sphere in the Graphics View.
6. To expand the sphere to encompass the ligand
In the Hierarchy View, click SBD_Site_Sphere to select it, then right-click and choose
Attributes of SBD_Site_Sphere....
This opens the Sphere Object Attributes dialog.
On the dialog, enter 9 in the Radius textbox.

The radius is increased to ensure that most of the binding site points are contained
within the radius of the sphere. You can confirm this visually in the Graphics View.
7. To open the Interaction Generation protocol and modify the parameters
In the Protocols Explorer, expand the Pharmacophore folder and double-click the
Interaction Generation protocol.
This displays the parameters in the Parameters Explorer.
In the Parameters Explorer, click the Input Receptor parameter and choose
4phv_Protein:4phv_Protein from the dropdown list.

Click the Input Site Sphere parameter and choose the coordinates of the sphere
from the dropdown list.

Click the Density of Lipophilic Sites parameter and enter 10 for the value.

Click the Density of Polar Sites parameter and enter 10 for the value.
-26-
Note. New values are used in place of the defaults to accelerate the computation time
by reducing the number of total features calculated.
This takes about one minute on a single processor Pentium 4, 2Gb RAM, 2.8GHz
machine.
This opens the Pharmacophore features that are based on the interactions in a 3D
Window.
Creating a Pharmacophore model based on the interactions
1. To open the Pharmacophore tool panels
In the Tools Explorer, choose Pharmacophore from the dropdown list.
This displays the Pharmacophore tool panels.
2. To view all of the potential Pharmacophore points
In the Edit and Cluster Pharmacophores tool panel, in the Show/Hide tools group, click
All Non-Features and then click Location Spheres to hide everything except the
pharmacophore features.

In the View toolbar, click the Fit to Screen button.
3. To cluster the Acceptor feature of the pharmacophore
Click the 4phv_Protein - 3D Window(1) tab to activate it.
In the Edit and Cluster Pharmacophores tool panel, in the Select Feature tools group,
click Current Feature and choose Acceptor from the dropdown list.

In the Cluster tools group, click Cluster Current Feature.
This opens the Dendrogram view.
Tip. It is often easier to view the Dendrogram and the pharmacophore features side by
side. To do this, click the 4phv_Protein:Acceptor - Dendrogram Window tab and
left-click and drag down and to the right until the frame indicating the borders of the
window displays as a vertical rectangle. Release the mouse button.
-27-
In the Dendrogram View, move the gray vertical slider so that it intersects two
clusters.
This action is reflected in the status bar.
In the Edit and Cluster Pharmacophores tool panel, click Keep Only Cluster Centers.

Close the Dendrogram View.
4. To cluster the Donor and Hydrophobe features of the pharmacophore
Repeat step 3 for the Donor and Hydrophobe centers. For the Dendrogram
intersection, use two cluster centers for each. To reduce confusion, remember to
close the Dendrogram View when finished with each feature.
5. To create an excluded volume model based on carbon atoms within 10 of
the sphere center
Click the 4phv_Protein - 3D Window(1) tab to make it active.

In the Hierarchy View, click SBD_Site_Sphere to select it.

From the menu bar, choose Edit | Select....
This opens the Select dialog.
On the Select Dialog, check the Element checkbox. In the Radius textbox, enter 10
for the value.

Click the Select button, and then click the Close button.
This selects all of the carbon atoms within 10 of the center of the sphere.
In the Query toolbar, click the Exclusion Constraint button.
The exclusion constraints become part of the pharmacophore. They represent regions
of space that atoms of a ligand cannot occupy (excluded volumes) without resulting in
steric clashes of the protein atoms.
In order to reduce database search times when using this pharmacophore, it may be
desirable to reduce the number of excluded volumes. One way to accomplish this is to
cluster the excluded volumes by applying the same clustering method used earlier.
In the Edit and Cluster Pharmacophores tool panel, in the Show/Hide tools group, click
All Non-Features.

In the Select Feature tools group, click Current Feature and choose Exclusion
Sphere from the dropdown list.

In the Cluster tools group, click Cluster Current Feature.
This opens the Dendrogram View.
-28-
In the Dendrogram View, move the slider to intersect 25 clusters.

In the Edit and Cluster Pharmacophores tool panel, in the Cluster tools group, click
Keep Only Cluster Centers.
You have created a pharmacophore derived from an active site. The pharmacophore
contains pharmacophore features derived from potential protein interactions in the binding
site and excluded volumes corresponding to atoms on the protein.
The next step could be to screen a set of actives with subsets of these features to find the
best pharmacophore. The Screen Library protocol in the Pharmacophore folder can be used
to accomplish this task.

-29-
Ligand profiler tutorial
Purpose: Explore how a set of ligands can be profiled for potential activity using

Required functionality and modules: Discovery Studio Visualizer client, Ligand Profiler
Protocol, and DS Catalyst Score.

Required data files: TK_xray_ligs.sd, 1E2K.chm, 1E2M.chm, 1E2N.chm, 1KI2.chm,
1KI3.chm, 1KI6.chm, 1KI7.chm, 1KIM.chm, 2KI5.chm.

Time: 30 minutes.
Background
The Ligand Pharmacophore Profiler protocol allows the parallel screening of multiple
compounds against multiple pharmacophore models. The derived profile for each
compound can be used to assess the potential biological action of the candidate compounds
or potential undesired interactions and side-effects.
Introduction
In this tutorial, you will screen nine thymidine kinase (TK) inhibitors against a set of
pharmacophores derived from the structures of corresponding PDB entries.
Screening multiple compound structures against multiple pharmacophore targets
Analyzing the results of the Pharmacophore Profiler protocol
Building and maintaining the Ligand Profiler database
Screening multiple compound structures against multiple
pharmacophore targets
1. To open the ligand molecules
In the Files Explorer, navigate to and double-click the TK_xray_ligs.sd data file.
This opens the ligand molecules for profiling in a new Table Browser.
2. To open the Ligand Profiler protocol and modify the parameters
Ligand Profiler protocol.
button.
-30-
This opens the Specify ligands dialog.
On the dialog, click the All ligands from a Table Browser radio button and check
that TK_xray_ligs is displayed in the textbox.

Click the Input File Pharmacophores parameter, then click the button.
This opens the Specify Queries dialog.
On the dialog, in the Queries from files control group, click the Add File Queries
button.
This opens the Choose Catalyst query file(s) dialog.
On the dialog, navigate to the Samples folder and select the pharmacophore files:
1E2K.chm, 1E2M.chm, 1E2N.chm, 1KI2.chm, 1KI3.chm, 1KI6.chm, 1KI7.chm,
1KIM.chm, 2KI5.chm.

On the Specify Queries dialog, click the OK button.
This populates the list of queries. These steps can be used to sequentially add more
queries to the list of file queries selected for screening.
Tip. Alternatively, you can add a list of queries by using the Add List of Queries button
under List of queries from a file on the Specify Queries dialog. You must provide a text
file containing the absolute paths to the query file(s), one path per line.
Click the Conformation Generation parameter and choose FAST from the dropdown
list.
Note. Although FAST is not the default setting for the Conformation Generation
parameter, good conformational coverage for the ligands is necessary to achieve
accurate mappings.
Expand the Advanced tab, choose the Save Aligned Ligands parameter and choose
True from the dropdown list.
Note. Although True is not the default setting for the Save Aligned Ligands parameter,
it provides a way to view the alignment of the ligands to the pharmacophores. It
greatly increases the size and count of generated files.
The job takes about four minutes on a single processor Pentium 4, 2Gb RAM, 2.8GHz
machine.
-31-
This opens Report.htm in the Ligand Profiler - Html Window.
In the Html Window, in the Output Files section, click the 1E2K_view.pl link.
This opens a 3D Window with 1E2K pharmacophore and a table of compounds which
map to it, select each compound in the table to view it's mapping.
In the Ligand Profiler - Html Window, in the Output Files section, click the
ViewResults.pl link.
This opens a Heat Map of the activity profile for each compound.
Analyzing the results of the Ligand Profiler protocol
To display individual compounds interactively using the Heat Map
Click and drag the TK_xray_ligs_profiled - Heat tab down and to the right until the
frame indicating the borders of the window displays as a vertical rectangle. Release the
mouse button.
This arranges the Heat Map and the Table Browser tabs side-by-side.
In the TK_xray_ligs_profiled - Table Browser Window, click the first row of the table to
highlight it. Right-click the row and choose Show Structure in 3D Window.
This displays the compound structure for the selected row in a Graphics View above the
table.
On the Heat Map, click any field.
This displays the corresponding compound structure in the Graphics View of the
TK_xray_ligs_profiled - Table Browser Window. You can also select a row in the table to
automatically select the corresponding row in the Heat Map.
Building and maintaining the Ligand Profiler database
If you have a collection of pharmacophores that will be repeatedly used for Ligand Profiling,
it is convenient to store them in a database. This database provides the capability to
organize the pharmacophores by Target Class, Target Subclass, Target Family, and Target
Acronym. The database also allows for descriptive information to be stored with the
pharmacophores.
1. To add pharmacophore models to the Ligand Profiler database
From the menu bar, choose Window | Close All. This closes all windows in the 3D
Window.
In the Files Explorer, navigate to and double-click the 1KIM.chm data file.
This opens the data file in a 3D Window.
-32-
In the Protocols Explorer, in the Pharmacophore folder, double-click the Add to
Ligand Profiler Database protocol.
In the Parameters Explorer, click the Input Pharmacophore parameter, then click
the button.
This opens the Specify Query dialog.
Click the Query from a 3D Window radio button and check that 1kim:QueryRoot is
displayed in the textbox.

Add classification information to the pharmacophore.
Click the Target Class parameter and choose Enzymes from the dropdown list.

Click the Target Subclass parameter and choose EC2.-(transferases) from the
dropdown list.

Click the Target Family parameter and choose kinases (thymidine) from the
dropdown list.

Click the Target Acronym parameter choose TK (HSV1) from the dropdown list.
Note. Once a pharmacophore model is created and included in the Ligand Profiler
database, it cannot be overwritten by default. This prevents accidental updates to the
database. However, if you need to overwrite an existing entry, for example, by
rerunning this job, do the following: expand the Input Pharmacophore group, click the
Overwrite parameter, and choose True from the dropdown list.
Repeat these steps, selecting other CHM files and providing classification information
for them, to add more pharmacophores to the database.
2. To check the contents of the Ligand Profiler database
In the Parameters Explorer, open the Ligand Profiler parameters.

Click the Input LigandProfilerDB Pharmacophores parameter, then click the
button.
This opens the LigandProfilerDB Pharmacophores dialog.
Expand the tree completely.
-33-
Notice that the pharmacophore models you added to the database are available for
screening.
3. To remove pharmacophore models from the Ligand Profiler database
In the Protocols Explorer, in the Pharmacophore folder, double-click the Remove
from Ligand Profiler Database protocol.
In the Parameters Explorer, click the Pharmacophore to Remove parameter and
choose the pharmacophore model from the dropdown list.


-34-
Common feature pharmacophore generation tutorial
Purpose: Use the Common Feature Pharmacophore Generation protocol to create

Required functionality and modules: Discovery Studio Visualizer client and DS Catalyst
Hypothesis.

Required data file: HIVprot_ligands_xray.sd.

Time: 20 minutes.
Background
The Common Feature Pharmacophore Generation protocol finds the common chemical
features shared by a set of active compounds, and provides an alignment of these
compounds with a pharmacophore that expresses these features.
Pharmacophores created in this way are also useful in suggesting putative structure-
activity models when you have only a few molecules that have similar activities, but
dissimilar and/or flexible structures. Additionally, common feature pharmacophores can be
used to search 3D databases to uncover potentially novel classes of active molecules.
Common feature pharmacophores do not consider relative biological activity data and are
therefore not intended for use in predicting relative activity among sets of potentially active
molecules.
Introduction
Preparing and viewing input data
Modifying protocol parameters and selecting features
Viewing the results
Preparing and viewing input data
The molecules for use in this tutorial are single X-ray conformations extracted from the
RCSB Protein Data Bank. These compounds are HIV protease inhibitors from 9hvp, 7hvp,
1sbg, 1aaq, 1hef, and 1hiv.
1. To prepare the input data
In the Files Explorer, navigate to the HIVprot_ligands_xray.sd data file. Right-click
and choose Open With | 3D Window.

In the Data Table, click the Molecule tab and view the Principal and MaxOmitFeat
columns.
-35-
The Principal column values designate a given molecule for special attention.
o A value of 0 indicates that the molecule is inactive and will not be used in the
creation of the pharmacophore. Use this value when you want to keep additional
compounds in the input set, but not contribute to pharmacophore generation.
o A value of 1 indicates that configurations of this molecule will be considered when
generating pharmacophores. Each molecule with a Principal value of 1 or 2 will
have a conformation that maps to the generated pharmacophore (with exceptions
determined by the Misses and CompleteMisses protocol parameters).
o A value of 2 indicates that the compound is a reference molecule. All of the
chemical features in that molecule are considered in building the pharmacophore.
Each input set of lead compounds must have at least one reference molecule.
Because it is the smallest inhibitor, the SB203386-1sbg molecule is the principal
molecule in this tutorial and has a value of 2 in the Principal column. All other inhibitors
have a value of 1.
The MaxOmitFeat column values designate a given molecule for special attention.
These settings can be used to increase or decrease the amount of bias the model has
towards specific compounds.
o A value of 0 indicates that all features must map to the molecule.
o A value of 1 indicates that one feature can miss. A pharmacophore is valid if all
molecules with MaxOmitFeat=1 maps all features or (all features minus one).
o A value of 2 indicates that no features are required to map to the generated
pharmacophore. A molecule with MaxOmitFeat=2 can map all the features, all the
features minus one, or can completely miss the entire pharmacophore.
In this tutorial, each molecule has a MaxOmitFeat parameter value of 2 because there
is no need to favor a specific inhibitor.
Modifying protocol parameters and selecting features
There are many parameters that you can modify to control the nature of common feature
pharmacophores. The following section demonstrates some of them.
1. To specify input ligands
Common Feature Pharmacophore Generation protocol.
button.
On the dialog, click the All ligands from a 3D Window radio button and check that
HIVprot_ligands_xray is displayed in the textbox.
-36-

2. To select the features
In the Parameters Explorer, click the Features parameter, then click the button.
This opens the Select Features dialog.
On the dialog, check that HB_ACCEPTOR, HB_DONOR, and HYDROPHOBIC are
selected.

Note. You can also try substituting HYDROPHOBIC_aromatic and
HYDROPHOBIC_aliphatic in place of the more general HYDROPHOBIC.
3. To specify the number of leads that may miss
Click the Number of Leads That May Miss parameter and enter 3 for the value.
The pharmacophores that fail to map completely to more than three compounds will be
discarded. This allows three compounds to have a partial map to the pharmacophore.
4. To set advanced parameters
Expand the Advanced parameter group.

Click the Feature Misses parameter and enter 2 for the value.
The Feature Misses parameter sets the number of times a specific pharmacophore
feature can be missed during mapping to the training molecules. This number should
be equal to or less than the value for the Number of Leads That May Miss parameter.
Click the Complete Misses parameter and enter 2 for the value.
The Complete Misses parameter permits pharmacophores to be generated that fail to
map any features of a training ligand.
The job takes about three minutes on a Pentium 4, 2Gb RAM, 2.8GHz machine. The
Job Completed dialog displays when it is complete.
Viewing the results
In the Html Window, in the Output Files section, click one of the ViewResults_xx.pl links
(where xx is the sequential number of the computed pharmacophore).
-37-
This opens the pharmacophore file in the Graphics View of the Table Browser Window.
Click individual rows in the Table Browser.
This displays each ligand aligned to the pharmacophore in the Graphics View.

-38-
Creating and using fragment-based pharmacophores
tutorial
Purpose: Explore the use of pharmacophores in fragment-based compound design.

Required functionality and modules: Discovery Studio Visualizer client, DS Catalyst
Shape, DS Catalyst DB Search, DS Catalyst De Novo Ligand Builder, and DS Catalyst
Score.

Required data files: 1fvv_ligand.sd and 1fvv_protein.pdb.

Time: 1 hour.
Background
Fragment-based approaches to drug discovery have gained popularity in recent years. This
tutorial illustrates an application of pharmacophores in a fragment-based compound
design. Using this approach, several diverse ligand candidates are obtained from a
chemical structure data source that would return only a single complex hit in traditional
screening.
Introduction
In this tutorial, you will be guided through the workflow of applying pharmacophores for
fragment-based de novo compound design.
Creating a reference pharmacophore
Creating the fragment-based pharmacophores
Searching a 3D compound database with each query
Enumerating all combinations
Verifying the de novo compounds
Creating a reference pharmacophore
You are going to begin by creating a complex reference pharmacophore query from the
crystal structure of a CDK2 inhibitor and deriving two fragment-based pharmacophore
queries from the same ligand.
1. To open the ligand file in a 3D Window
In the Files Explorer, navigate to the 1fvv_ligand.sd data file. Right-click and choose
2. To change the display style and show atom labels
Right-click in the Graphics View and choose Display Style....
-39-
On the dialog, in the Display style control group, click the Stick radio button.

From the menu bar, choose Structure | Labels | Add....
This opens the Label dialog.
On the dialog, check that Atom and Name are displayed in the Object and Attribute
textboxes.

3. To add the pharmacophore features
In the Hierarchy View, expand the 1fvv_ligand group to view the list of atoms in the
molecule. Click O1 to select it.

From the menu bar, choose Structure | Query Features....
This opens the Create Feature dialog.
On the dialog, choose Acceptor from the Feature dropdown list.


In the Hierarchy View, click N3 to select it.

On the dialog, choose Donor from the Feature dropdown list.

In the Hierarchy View, click C4, and press and hold CTRL and then click C5, C6, C7,
C11, and C12 to select these atoms.

On the dialog, choose Point from the Feature dropdown list. Check that
Hydrophobe_Aromatic is displayed in the Type textbox.


In the Hierarchy View, click C16, press and hold CTRL and then click the sulfanilamide
aromatic carbon atoms to select them. These are atoms: C17, C18, C19, C20, and
C21.
-40-

On the dialog, choose RingAromatic from the Feature dropdown list.


In the Hierarchy View, click sulfonyl O23 to select it.



While the acceptor feature is still selected, right-click in the Graphics View and choose
Attributes of Acceptor....
This opens the AcceptorFeature Attributes dialog.
On the dialog, set Orientation to First Lone Pair .

The orientation of the selected Acceptor feature changes.
In the Hierarchy View, click sulfonyl O24 to select it.



While the acceptor feature is still selected, right-click in the Graphics View, then choose
Attributes of Acceptor....
This opens the AcceptorFeature Attributes dialog.
On the dialog, set Orientation to Second Lone Pair .


-41-
Tip. To select the projection point (head) of an acceptor or donor feature and move it
freely in any direction, choose Free from the Orientation dropdown list of Acceptor
and Donor Feature Attributes. For example, you can use this to set a specific lone
pair orientation towards a given receptor donor atom.

In the Hierarchy View, click N29 to select it.



In the Hierarchy View, click the QueryRoot object to select it.

On the dialog, choose Location from the Feature dropdown list.

This adds location constraints to the pharmacophore feature. These constraints specify
that the selected features are located at their current X, Y, and Z coordinates, and that
this location may vary within a certain tolerance represented as a mesh sphere
surrounding the pharmacophore points.
Tip. You can quickly perform the above actions using the Query toolbar to add the
pharmacophore features onto the selected atoms. Refer to the Discovery Studio Help
for more information.
4. To save the pharmacophore
On the dialog, choose Catalyst Query Files (*.chm) from the Files of type
dropdown list and enter 1fvv_reference.chm in the File name textbox. Click the
Save button to save the query.
Note. The ligand structure does not constitute a part of the pharmacophore and will
not be saved in the query file. If you closed the 3D Window, open the saved file
1f vv_r ef er ence. chmand insert 1f vv_l i gand into the same Graphics View to
continue with this tutorial.
-42-
Creating the fragment-based pharmacophores
In this section, you will create two subset queries based on the reference pharmacophore
that will be used later to search for matching low molecular weight fragments.
Fragment-based pharmacophores are less complex than the parent pharmacophore and
may be less specific than desired. You will add molecular shape constraints to queries in
order to increase selectivity.
The method used to enumerate de novo compounds from fragment-pharmacophore
screening hits relies on the exact 3D location and chemical specifications of a selected link
atom from the query template molecule. You will begin by creating and editing the link
atom specifications.
1. To add the query atom feature to the pharmacophore
Check that the pharmacophore and 1f vv_l i gand are still displayed in the 3D
Window.
In the Hierarchy View, copy and paste the amide nitrogen N15 into the
1fvv_reference - 3D Window.
This inserts an additional nitrogen atom in the 3D Window, in an additional Molecule, at
the exact 3D location of N15. In the Hierarchy View, this new <Molecule> appears and
is highlighted to indicate that it is selected. This atom will be used to define a link point
between the two fragments.
This opens the Pharmacophore tool panels.
In the Customize Pharmacophore Features tool panel, in the Feature Attributes tools
group, click Query Atom Specification....
This opens the Edit Query Atom Attributes dialog.
On the dialog, check the Aliphatic and Exocyclic checkboxes.

In the Elements list, enter N O (separated by a space), or click the button to open
the Select Elements dialog. Click O while keeping N selected.

In the Hydrogen counts control group, click 1 and 2 to select them.


-43-
Note. The query atom specifications ensure that only molecules with atoms suitable
for enumeration are returned by the 3D fragment pharmacophore screen. The above
specifications mean that the candidate fragments must possess an aliphatic exocyclic
nitrogen with at least one free valence or a hydroxyl group at the link atom location.
The Edit Query Atom Attributes dialog is a powerful tool that controls the diversity and
chemical accessibility of the resulting fragment-based de novo libraries.
On the dialog, choose Location from the Feature dropdown list. Enter 0.5 for the
Radius.

At this stage, two main objects are present in the Hierarchy View: 1fvv_ligand and
QueryRoot.
2. To create the fragment pharmacophores, add shape constraints, and save the
queries
Click the Select button in the View toolbar.
In the Hierarchy View or the Graphics View, select all atoms and the corresponding
pharmacophore features in the heteroaromatic ring of 1fvv_ligand, including the
link atom N15 and the corresponding query feature. Any objects that are hidden
will not be selected when using the lasso select.
Verify that all hydrogens and location constraints in the fragment are selected by
looking in the Hierarchy View. If not, press and hold SHIFT to select them individually. T
-44-

Copy the selection.

From the menu bar choose File | New | 3D Window, and then paste the copied
selection there.
Two main objects are present in the Hierarchy View: QueryRoot and <Molecule>.
In the Hierarchy View, click the <Molecule> object to select it.

On the dialog, choose Shape from the Feature dropdown list.

This adds a shape constraint to the query.
dropdown list and enter 1fvv_frag1.chm in the File name textbox.

Click the Save button to save the query.

Close the active 3D Window.
-45-
Note. The ligand fragment does not constitute a part of the pharmacophore and will
not be saved in the query file.
Click the Select button in the View toolbar.

In the Hierarchy View or the Graphics View, select all atoms and the corresponding
pharmacophore features in the sulfanilamide moiety of 1fvv_ligand, including the
link atom N15 and the corresponding query feature. Verify that all hydrogens and
location constraints are selected. If they are not, hold SHIFT and select them
individually in the Hierarchy View.

Copy the selection.

From the menu bar choose File | New | 3D Window, and then paste the copied
selection there.
Two main objects are present in the Hierarchy View: QueryRoot and <Molecule>.
In the Hierarchy View, click the <Molecule> object to select it.

On the dialog, choose Shape from the Feature dropdown list.

-46-
This adds a shape constraint to the query.
dropdown list and enter 1fvv_frag2.chm in the File name textbox.

Click the Save button to save the query.

Close all 3D Windows.
Note. The ligand fragment does not constitute a part of the pharmacophore and will
not be saved in the query file.
Searching a 3D compound database with each query
In this section, you will use the fragment queries 1f vv_f r ag1. chmand 1f vv_f r ag2. chm
to perform a 3D pharmacophore search of the MiniMaybridge compound structure database
in order to identify matching fragment hits. You will use these hits to enumerate a de novo
combinatorial library later.
1. To open the Search 3D Database protocol, modify the parameters, and then
run the calculations
In the Files Explorer, navigate to and double-click 1fvv_frag1.chm.

Next, navigate to and double-click 1fvv_frag2.chm.
The pharmacophore queries open in separate 3D Windows.
In the Protocols Explorer, expand the Pharmacophore folder and then double-click
the Search 3D Database protocol.
In the Parameters Explorer, click the Input Database parameter. Click the dropdown
arrow and choose MiniMaybridge from the list.

Click the Input Pharmacophore parameter, then click the button.
On the dialog, click the Query from a 3D Window radio button and choose
1fvv_frag1:QueryRoot from the dropdown list.


In the Protocols toolbar, click the Run button.
-47-
In the Parameters Explorer, click the Input Pharmacophore parameter, then click
the button.
On the dialog, click the Query from a 3D Window radio button and choose
1fvv_frag2:QueryRoot from the dropdown list.

Note. As is the case for all Discovery Studio jobs, these protocols run simultaneously.
For each protocol, status is reported in the Jobs Explorer and the Job Completed dialog
displays on completion.
The job takes about two minutes on a Pentium 4, 2Gb RAM, 2.8GHz machine.
2. To view the results
In the Jobs Explorer, double-click the first Search 3D Database job that you ran.
In the Output Files section, click the View Results link. Browse the fragment hits
together with the pharmacophore query 1fvv_frag1.

In the Jobs Explorer, double-click the second Search 3D Database job that you ran.
In the Output Files section, click the ViewResults.pl link. Browse the fragment hits
together with the pharmacophore query 1fvv_frag2.
Note. The ViewResults.pl links use scripts to open the compound structures retrieved
by the pharmacophore searches and then inserts the corresponding pharmacophores.

The fragment hits are aligned to the queries and to the type and 3D location of link
atoms in each fragment of the two hit lists.
-48-
Enumerating all combinations
In this section, you will use the fragment hits to enumerate a de novo combinatorial library
with the Enumerate from Fragments protocol.
1. To open the Enumerate from Fragments protocol, modify the parameters, and
run the job
Check that the two Table Browser Windows MiniMaybridge_1fvv_frag1-QueryRoot and
MiniMaybridge_1fvv_frag2-QueryRoot containing the pharmacophore search results
from the previous section are still open. If necessary, open the job results for
1f vv_f r ag1. chmand 1f vv_f r ag2. chm, then click ViewResults.pl in the reports to
re-open the results for each search.
In the Protocols Explorer, expand the Pharmacophore folder and then double-click the
Enumerate from Fragments protocol.
In the Parameters Explorer, click the Fragment A parameter. Click the button.
On the dialog, click the All Ligands from a Table Browser radio button and check
that MiniMaybridge_1fvv_frag1-QueryRoot is listed in the textbox. Click the OK
button.

In the Parameters Explorer, click the Fragment B parameter. Click the button.
that MiniMaybridge_1fvv_frag2-QueryRoot is listed in the textbox.

Note. The Enumerate from Fragments protocol is designed to enumerate de novo
combinatorial libraries from up to three fragment sets. Each member of a set A, B, and
C must have one atom located very closely in the 3D space to an atom from another of
the three fragment sets (link atom). If two link atoms are of the same type, the atoms
are merged. Otherwise, multiple variants are enumerated.
In the Parameters Explorer, click the Library Name parameter and enter 1fvv_AB for
the value.


The Job Completed dialog displays when it is complete.
-49-

The protocol run creates a library of compounds that satisfy the criteria defined by the
protocol.
In the Output Files section, click the 1fvv_AB.sd link.
This opens the enumerated de novo combinatorial library.
Compare the de novo compound structures to 1fvv_ligand.
Verifying the de novo compounds
Finally, you will verify how the de novo library enumerated in the previous section fits the
reference pharmacophore.
1. To open the Ligand Pharmacophore Mapping protocol, modify the parameters,
and run the job
Verify that 1f vv_AB. sd is still open in the Table Browser Window. If necessary, open
the results from the Enumerate from Fragments job, then click 1f vv_AB. sd to open
the file.
From the menu bar, choose File | Open..., navigate to the previously saved file
1fvv_reference.chm, and then click the Open button.

In the Protocols Explorer, expand the Pharmacophore folder and then double-click the
Ligand Pharmacophore Mapping protocol.
button.
that 1fvv_AB is listed in the textbox.


Click the Input Pharmacophore parameter, then click the button.
-50-
On the dialog, check the Query from a 3D Window checkbox and check that
1fvv_reference:QueryRoot is listed in the textbox.

In the Parameters Explorer, click the Fitting Method parameter. Click the dropdown
arrow and choose Rigid from the list.


Alternatively, you can modify the workflow the following way:
1. Use the Enumerate from Fragments protocol to build a 3D searchable database
automatically from the resulting de novo library.
2. Use the Search 3D Database protocol to search the database with the reference
pharmacophore.
This approach may give an answer that is different from the one described above.
Compare this result to searching the MiniMaybridge database with the reference query.
Note. Building a 3D Database from the results resets the 3D coordinates of the de
novo library and generates a conformational model for each compound. Searching the
database with the pharmacophore will return the results to the original coordinate
frame.
This opens the de novo compounds mapped onto the pharmacophore query
1fvv_reference.
Right-click the FitValue column in the Table Browser Window and choose Sort....

Sort the column by FitValue in Descending order. Higher fit values indicate better
pharmacophore matches.

Tip. Insert the 1f vv_l i gand. sd and the 1f vv_pr ot ei n. pdb results files from the
Ligand Pharmacophore Mapping protocol run into the same Graphics View to see how
the de novo molecules relate to the ligand structure and the target protein.
-51-
Note. Advanced users can expand this workflow by using the Ligand Minimization
protocol (Receptor-Ligand Interactions protocol folder) to minimize the de novo
structures in the protein binding site, and then perform scoring using the Score Ligand
Poses protocol.

-52-
Creating custom features (pharmacophore) tutorial
Purpose: Create customized pharmacophore feature definitions using the Customize
Pharmacophore Features tool.


Required data files: none.

Time: 15 minutes.
Background
Discovery Studio provides 11 default feature definitions that are used during
pharmacophore modeling. Sometimes it is necessary to modify a default feature definition
to address a specific need. For example, you may want to allow a particular functional
group in a molecule to be recognized as a pharmacophore feature. Discovery Studio
provides an interactive environment for modifying and creating pharmacophore features.
Introduction
In this tutorial, you will learn how to customize pharmacophore features using the
Customize Pharmacophore Features tool panel by modifying one of the 11 default
pharmacophore features.
Modifying the default positive ionizable feature
Mapping the feature to the fragment
Adding custom features
Modifying the default positive ionizable feature
There are 11 default feature definitions used in Discovery Studio:
hydrophobic
hydrophobic aliphatic
hydrophobic aromatic
hydrogen bond donor
hydrogen bond acceptor
hydrogen bond acceptor lipid
negative charge
positive charge
negative ionizable
positive ionizable
ring aromatic
A detailed description of these feature definitions is available in the Discovery Studio Help.
-53-
The Customize Pharmacophore Features tool panel allows you to modify existing feature
definitions as well as to create new feature types.
In this tutorial, you will customize the positive ionizable feature to map pyridyl and
imidazolyl groups (the default positive ionizable feature does not map pyridyl or imidazolyl
groups).
1. To add a positive ionizable feature definition
From the menu bar, choose File | New | 3D Window.
This opens a new 3D Window.
In the Customize Pharmacophore Features tool panel, in the Feature Dictionary tools
group, click Add Feature From Dictionary....
This opens the Add Feature From Dictionary dialog.
On the dialog, click POS_IONIZABLE to select it.

2. To add the new fragments to be included in the definition
In the Tools Explorer, select Visualization from the dropdown list.
This opens the Visualization tool panels.
In the Fragment Builder tool panel, in the listbox, expand Aromatics and click 2-
Imidazolyl to select it.

Click Add Fragment.
This adds the fragment to the 3D Window.
Click 2-Pyridyl to select it.

Click Add Fragment.
This adds the fragment to the same 3D Window.
Press CTRL+A
This selects all the objects in the 3D Window.
-54-
In the Customize Pharmacophore Features tool panel, click Map ANY Selected.
This adds the two new substructures to the definition.
Mapping the feature to the fragment
1. To create centroids for the fragments
Select the 2-Imidazolyl fragment by double-clicking any atom in the molecule.
On the dialog, choose Centroid from the dropdown list.

This creates the required association for the point at which the feature mapping will
take place.
Repeat the last two steps for the 2-Pyridyl fragment.
The two centroids are displayed in the Graphics View.
2. To create the mapping
In the Graphics View, hold SHIFT and select both centroids and the red positively
ionizable feature object.
In the Customize Pharmacophore Features tool panel, click Create Mapping.
The modified positive ionizable feature is complete and you can save it as a new . chm
file or add it to the dictionary using Add Feature To Dictionary... in the Customize
Pharmacophore Features tool panel. Customized pharmacophore features that are
added to the dictionary can be used in many of the pharmacophore protocols.
Adding custom features
1. To add the feature to the dictionary
In the Customize Pharmacophore Features tool panel, click the Add Feature To
Dictionary command.
This opens the Add Feature To Dictionary dialog.
On the dialog, in the Function name textbox, enter POS_IONIZABLE_MOD.

This adds the feature to the dictionary and makes it available for use with many of the
pharmacophore protocols.
-55-
2. To use the customized feature in other protocols
Common Feature Pharmacophore Generation protocol.
This displays the parameters in the Parameters Explorer.
In the Parameters Explorer, click the Features parameter, then click the button.
This opens the Select Features dialog. The dialog contains the added feature
POS_IONIZABLE_MOD, which is now available as one of the possible features for
building pharmacophore models.

-56-
Homology modeling of an extracellular amylase
protein tutorial
Purpose: Learn about the tools that are available for structural biologists in Discovery
Studio and how these tools can be used in a typical homology modeling project.

Required functionality and modules: Discovery Studio Visualizer client, BLAST Search
(DS Server) protocol (from DS Sequence Analysis), Build Homology Models protocol (from
DS MODELER), Align Multiple Sequences protocol (from DS Protein Families), and Verify
Protein (Profiles-3D) protocol (from DS Protein Health).

Required data files: P41131.fasta.

Time: 1 hour 30 minutes.
Introduction
In this tutorial, you will follow a typical workflow for building a protein structure using the
homology modeling method. It will introduce you to some of the protein modeling tools and
protocols in Discovery Studio.
Preparing to run protocols
Identifying a homologous structure using the BLAST Search (DS Server) protocol
Analyzing the BLAST results
Locating and opening the selected structure
Aligning the sequences using the Align Multiple Sequences protocol
Manually adjusting the alignment
Building a 3D model with MODELER
Evaluating the model
Assessing the validity of the 3D structure using the Verify Protein (Profiles-3D)
protocol
Preparing to run protocols
Navigate to and double-click the P41131.fasta data file.
This opens the file in a new Sequence Window.
Identifying a homologous structure using the BLAST Search
(DS Server) protocol
In the Protocols Explorer, expand the Sequence Analysis folder and double-click the
BLAST Search (DS Server) protocol.
-57-
In the Parameters Explorer, click the Input Sequence parameter and choose
P41131:P41131 from the dropdown list.
The file name of the Input Sequence is a combination of the name of the Sequence
Window, P41131, and the name of the sequence in that window, P41131.
Click the Input Database parameter and choose PDB_nr95 from the dropdown list.
This specifies that the PDB_nr95 database, non-redundant at 95% sequence identity, is to
be used for the BLAST search.
Note. The PDB_nr95 sequence database is installed with the DS Server installation. To use
other sequence databases in BLAST searches, you will need to install them before running
the protocol.

Note. The results you achieve from running the protocols in this tutorial may change if you
alter the default parameters or use a different or updated PDB_nr95 database.
In the Html Window, in the Output Files section, click the P41131.xml link.
This opens the sequences found by the BLAST search.
Note. By default, the BLAST Search files are saved in My Document s\ BLASTSear ch ( DS
Ser ver ) _<t i me st amp><I D>.
Analyzing the BLAST results
In the P41131 - Blast Window, click the Table View tab at the bottom of the window.
The Table View displays the hits with one line per sequence. In Discovery Studio, gray cells
cannot be edited.
Note. The hits are initially ranked according to the E-value (the probability of a gapless
alignment sequence occurring by chance alone), with the best, lowest value first.
In the P41131 - Blast Window, click the Map View tab.
-58-
The Map View displays the coverage of the hits in a map, with one line per sequence. The
bars are colored according to the bit score of the hits (with above 400, red, being the best
hits). The query sequence, P41131 in this case, is displayed at the top of the Map View
represented as a line 443 residues long.
Hover the cursor over a hit on the Map View to display the following:
description of the database sequence
sequence accession number
start and end position of the query sequence
start and end of the database sequence of the hit
sequence length of the hit
scores for the hit
When a database sequence matches the query sequence in multiple regions (multiple hits),
the matches are displayed as different bars in the same line (one line per sequence).
Selecting any hit in the Map View selects the corresponding row in the Table View.
Zoom in on the Map View.
This allows you to see the residue display on the ruler at the top of the window. You may
have to zoom in several times to see the residue display.
Note. The order of the hits in the Map View is the same as in Table View, with initially the
best hits colored orange and appearing at the top of the view.
Locating and opening the selected structure
In this tutorial, you will create a 3D model of the protein sequence P41131. To achieve
this, a suitable homolog or template must be found. An ideal homolog covers the entire
sequence length, has a fairly high sequence identity, and has a good E-value (< 1 x 10
-5
).
This reduces the number of qualifying candidates to roughly half of the hits obtained from
the BLAST Search (DS Server) run.
Note. When an actual crystal structure of this protein sequence has been solved, the
results described in this tutorial will change. Watch for sequence identity scores greater
than 95%.
Hover the cursor over the top few hits in the Map View. Note the general description of the
top hits.
It is clear from most of the entries that have low E-values that the protein is similar to -
amylase, so this function can be inferred on P41131.
Next, use 1HX0 as a template because it represents the best combination of sequence
coverage, resolution, and R-free value among the top four hits, which all have good E-
values.
-59-
Also note that the PDB_nr95 database is a non-redundant database. This means that each
hit can represent a group of structures with similar sequences. Beyond the scope of this
tutorial, it is often useful to consider related structures that are not included in the
database as potential templates. You can do this by checking proteins belonging to the
same SCOP or CATH classification as a hit. Such proteins can have similar structure as the
original BLAST hits, but might be better templates for the specific application in terms of
other criteria (e.g., ligands bound to the protein, alignment coverage, etc.)
Make P41131 - Blast Window active.

In the Table View, click the hit 1HX0_A to select it, and then right-click and choose Load
Selected Structures.
This opens the complete structure of 1HX0 chain A and any ligands and waters in a new 3D
Window labeled 1HX0A - 3D Window.
Make P41131 - Sequence Window active.

In the Sequence Window, right-click and choose Insert Sequence | From Windows....
This opens the Insert Sequences from Windows dialog.
Click 1HX0A in the listbox to select it.

This adds the 1HX0A sequence to the Sequence Window.
Close the P41131 - Blast Window and the BLAST Search (DS Server) - Html
Window.
Aligning the sequences using the Align Multiple Sequences
protocol
This section demonstrates how the model sequence can be aligned with the template
sequence using the Align Multiple Sequences protocol. This step is necessary before
building a model from the P41131 sequence based on the 1HX0A template.
Click the P41131 - Sequence Window to make it active.
The Sequence Identity, ~5%, and the Sequence Similarity, ~19%, are shown in the Status
Bar (in the lower right corner of the Discovery Studio). Both values will increase after the
alignment step.
The residue background is colored according to the residue similarity. Note the distribution
of colors after the alignment step. You will see that more residues have a background color
of dark turquoise (indicating identical residues).
-60-
Note. If the coloring does not automatically appear, right-click in the Sequence Window
and chose Display Style.... Click the Background tab, click the Color by alignment radio
button, and choose Sequence similarity from the dropdown list. You can change the
coloring scheme by clicking the colors in the Color column, and then clicking the OK button.
Tip. Monitor the sequence identity/similarity scores and color pattern of the residues in the
Sequence Window. If you do not wish to run the Align Multiple Sequences protocol, you
could align the sequences manually by inserting gaps. To manually insert gaps, click in the
Sequence Window at the place you wish to insert a gap and then press the SPACE bar.

In the Protocols Explorer, in the Sequence Analysis folder, double-click the Align Multiple
Sequences protocol.
In the Parameters Explorer, click the Alignment Type parameter and choose Align
Sequences from the dropdown list.

Click the Input Sequence Set parameter and choose P41131 from the dropdown list.
This takes approximately two minutes on a Pentium 4, 2Gb RAM, 2.8GHz machine.
In the Html Window, in the Output Files section, click the P41131.bsml link.
This opens a new Sequence Window labeled P41131 - Sequence Window(1), which
shows the aligned sequences.
The Sequence Identity (~39%) and Sequence Similarity (~51%) scores are now higher.
Additionally, there are now more residues colored dark turquoise, indicating strong
similarity in those regions.
Right-click in the Sequence Window and choose Secondary Structure Cartoon.
This allows you to view the Kabsch-Sander secondary structure of 1HX0A under the 1HX0A
sequence in the Sequence Window. Red bars indicate alpha helices and blue arrows
indicate beta strands.
Observe that most of the gaps are inserted in the loop region of 1HX0A.
-61-
Manually adjusting the alignment
The alignment of the query sequence to the template sequence is based on sequence
based information alone and does not incorporate information about the spatial
arrangement of the residues in the template. For this reason, there can be regions where
the alignment is obviously incorrect in terms of the connectivity of the backbone. This often
happens in regions where a number of residues in template sequence do not have any
corresponding residues in the query sequence (i.e., template residues are aligned to gaps).
Though the MODELER program will correct the backbone connectivity issues during the
model building process, it is sometimes beneficial to manually adjust the alignment to help
guide the model building process.
For the alignment of P41131 to 1HX0A, there is a large gap in the query sequence aligned
to the region from residues A:ASP212 to A:PRO228 of 1HX0A, which is unrealistic from a
geometric point of view. The first (A:LYS213) and last residue (A:ALA224) still aligned to
residues in the query sequence are separated by a distance much larger than the typical
distance between consecutive residues.
Tip. You can see the residue names by hovering the cursor over the residue in the
Sequence Window.
KAVLDKLHNLNTNWFPAGSRPFI FQ
KAI LDK- - - - - - - - - - AGS- PRAYL

To see the distance, hold SHIFT, left-click and drag over the residues of the template
1HX0A that are aligned to a gap in the P41131 - Sequence Window (1).
This will select the residues, which are highlighted in yellow in the 1HX0A - 3D Window.
Given this separation, even though the three residues ALA215-SER217 of the P41131
sequence are aligned with perfect sequence identity, the alignment in this region is unlikely
to be correct from a geometric point of view.
The gap position can be improved by manually moving the three residues ALA215-SER217
of the query sequence to the beginning of the gap.
In the P41131 - Sequence Window (1), left-click residue ALA215 of the P41131 sequence,
and drag the cursor to residue SER217 without releasing the mouse button.
This selects the three residues ALA215-SER217.
Move ALA215 of the P41131 sequence into alignment with A:LEU214 of 1HX0A . Use cut
and paste to move the selected residues.
The alignment becomes:
KAVLDKLHNLNTNWFPAGSRPFI FQ
KAI LDKAGS- - - - - - - - - - - PRAYL

-62-
Note. You can also remove gaps by pressing DELETE when the gaps are selected or insert
gaps by pressing the SPACE bar when the cursor is in the appropriate position in the
sequence. Also, to reverse a command, choose Edit | Undo from the menu bar.
Close the Align Multiple Sequences - Html Window.
Building a 3D model
In this section, you will build a 3D model of your original protein sequence using the
alignment you just created and the 1HX0A structure as the template.
In the Protocols Explorer, expand the Protein Modeling folder and double-click the Build
Homology Models protocol.
In the Parameters Explorer, click the Input Sequence Alignment parameter and choose
P41131(1) from the dropdown list.

Expand the Input Sequence Alignment parameter. Click the Input Model Sequence
parameter and choose P41131 from the dropdown list.

Click the Input Template Structures parameter and check the checkbox for 1HX0A.

Click the Optimization Level parameter and choose Low from the dropdown list.
Changing the Optimization Level parameter from the default setting allows the protocol to
run faster.
The job takes about three minutes on a Pentium 4, 2Gb RAM, 2.8GHz machine.
Evaluating the model
In this section, you will compare the model structure with the template and evaluate the
model score generated by the MODELER program.
Close the P41131 - Sequence Window and the P41131 - Sequence Window(1).

Make 1HX0A - 3D Window active.

From the Files Explorer, open the Output folder for the Build Homology Models run.
-63-

Drag and drop the P41131.B99990001.msv file into the 1HX0A- 3D Window.
This adds the model structure to the 1HX0A - 3D Window.
Make Build Homology Models - Html Window active.

In the Output Files section, click the P41131.bsml link.
This opens the alignment file of the model and the template sequences. Three sequences
are displayed as aligned in the Sequence Window:
P41131 (query sequence)
P41131.B99990001 (model/query sequence)
1HX0A (template)
Note. The sequences, 1HX0A and P41131.B99990001, are automatically linked to the
corresponding structures in the 3D Window.
Make 1HXOA - 3D Window active.

In the Hierarchy View, click 1HX0A to select it.

From the menu bar choose Structure | Superimpose | By Sequence Alignment....
This opens the Superimpose by Sequence Alignment dialog.
On the dialog, in the Sequence alignment listbox, select P41131 for the value.

In the Molecules to superimpose listbox, click P41131.B99990001 to select it.

The two structures are superimposed and a text report is generated and opened in a text
window. The RMSD between the alpha-carbon atoms of the two proteins is ~2.0 over
424 aligned residues. The transformation matrix applied to P41131.B99990001 is also
shown.
Close the Superimpose By Sequence Alignment - Html Window.

Make 1HXOA - 3D Window active.

With nothing selected in the 3D Window, right-click in the Graphics View and choose
Display Style....
On the dialog, click the Atom tab. In the Display style control group, click the None radio
button.
-64-

Click the Protein tab. In the Display style control group, click the Solid ribbon radio
button. In the Coloring control group, click the Color by radio button and choose Molecule

This displays the two superimposed proteins.
From the menu bar, choose View | Tile Molecules in View.
The two structures are shown side-by-side in the Graphics View. You can rotate and move
them in sync with the normal rotate and move mouse actions or by using the commands
on the View toolbar.
From the menu bar, choose View | Data Table.
This opens the Data Table View and allows you to further evaluate the model.
In the Data Table View, click the AminoAcid tab. Scroll to the far right and locate the
column labeled PDF Total.

Click the PDF Total column header to select it and from the menu bar choose Chart |
Line Plot.
This opens two line plots in two individual windows.

The first plot, opened in the P41131.B99990001 - Line Plot Window, contains data for the
homology model of P41131. This plot displays the per-residue total PDF energy against the
residue index for the model structure. You should see a big peak at the PDF total score
around 200.

The second plot, opened in the 1HX0A - Line Plot Window, is a flat line plot, as there is no
PDF Total data for the template 1HX0A.
Close the 1HX0A - Line Plot Window.

Click the P41131.B99990001 - Line Plot Window tab to make it active. Next, click and
drag the cursor around the tall peak, at about residue 200, to select it.
The selection consists of the residues with highest scores. The selected residues are
highlighted in the Graphics View (click the 1HXOA - 3D Window to view) and in the
Sequence Window (click the P41131 - Sequence Window tab to view). The high PDF energy
indicates bad alignment for those residues.
When comparing the template structure to the model in this region, notice that in the
template this region is a -strand aligned to the C-terminal -strand of the chain. Because
the C-terminal region is shorter in the query sequence, this -strand is missing, and the
alignment is likely to be incorrect. This indicates that 1HX0A is not a good template for this
-65-
region. Another template could be used for this part of the structure if available, but this
action is not required for this tutorial.
Close the P41131.B99990001 - Line Plot Window, and the P41131 - Sequence
Window.
Assessing the validity of the 3D structure using the Verify
Protein (Profiles-3D) protocol
The model structure can be further evaluated using the Verify Protein (Profiles-3D)
protocol, which assesses the compatibility of the 3D structure of a protein model with the
sequence of residues it contains.
Make 1HX0A - 3D Window active.

In the Hierarchy View, select the 1HX0A molecule.

Right-click and choose Hide and 1HX0A becomes invisible.
Observe the P41131.B99990001 molecule. Its ribbon trace is colored and varies in
thickness in accordance with PDF Total values. You can examine the structure and
immediately identify the less favorable areas (colored red) relative to the remainder of the
model.
In the Protocols Explorer, in the Protein Modeling folder, double-click the Verify Protein
(Profiles 3D) protocol.
In the Parameters Explorer, click the Input Protein Molecules and check the
1HX0A:P41131:B99990001 checkbox. Remove checkmarks for any other options.

In the Jobs Explorer, double-click the Verify Protein (Profiles-3D) job.
Close the 1HX0A - 3D Window.

In the Html Window, in the Output Files section, click the P41131.B99990001.msv link.
This opens the model structure in a window labeled P41131.B99990001 - 3D Window.
The structure is displayed as a solid ribbon with variable width and color based on the
Verify score: the higher the score, the better the structure. The color varies from blue to
white to red, with blue corresponding to high scores, white to average scores, and red to
low scores. The width of the ribbon is inversely correlated with the Verify score: the worse
-66-
the structure, the wider the ribbon. Be aware that the thickness and color of the ribbon is
relative to this particular model. You are looking for significant variations from the
remainder of the ribbon.
In the Data Table View, click the Molecule tab and scroll to the end of the table to the
Verify Expected High Score, Verify Expected Low Score, and Verify Score columns.
If the Verify Score result of the model protein is higher than the Verify Expected Low Score
value, the model is of acceptable quality. The closer the Verify Score result is to the Verify
Expected High Score value, the better the quality of the model.
Click the AminoAcid tab and scroll to end of the table to the Verify Score column.

Click the Verify Score column header to select it.

From the menu bar, choose Chart | Line Plot.
This opens a plot containing the score for each residue in sequence. Note the low scoring
region is at the C-terminal end of the protein.
Make Build Homology Models - Html Window active.

In the Html Window, in the Output Files section, click the P41131.bsml link.
Make P41131.B99990001 - Line Plot active.

Next, click and drag around the low score peak to select the residues with the lowest
scores in the Line Plot.
The selected residues are highlighted in the Graphics View (click the P41131.B99990001 -
3D Window tab to view) and the Sequence Window (click the P41131 - Sequence Window
tab to view), allowing you to see the low scoring region on the alignment.
The selected region with the low Verify Score values is in the vicinity of the residues which
had high PDF energies. This indicates that the alignment in this region may need further
adjustments and that the model may need to be rebuilt.

-67-
Looper with antibodies tutorial
Purpose: Create high-quality loop conformations using Looper - a CHARMm based
algorithm for loop refinement.

Required functionality and modules: Discovery Studio Visualizer client, CHARMm Lite
(or full CHARMm), Protein Refine, and Analysis.

Required data files: 2aab_H3_missing.pdb and 2aab_cluster.msv.

Background
Loop Refinement is a technique used to provide sensible loop conformations in cases where
the loop coordinates are missing due to lack of experimental data (e.g., unresolved
electron-density), or for areas which have been built using template-based homology
modeling and require loop reconstruction to remove any bias introduced by the template,
or simply to mimic protein flexibility by providing a set of possible loop orientations.
Introduction
In this tutorial, you will build and refine the H3 loop of the antigen mimicking melanoma
antibody fragment MK2-23 (PDB ID: 2aab). For purposes of this tutorial, the H3 loop has
been deleted from the crystal structure.
Construction of the missing loop and refinement using the Loop Refinement protocol
Cluster analysis of the loop conformations
RMSD validation
Construction of the missing loop and refinement using the
Loop Refinement protocol
1. To open the protein in a 3D Window
In the Files Explorer, navigate to and double-click the 2aab_H3_missing.pdb data
file.
This opens the protein in a 3D Window.
In the original crystal structure, the H3 loop for this antibody fragment corresponds to
residues TYR97, VAL98, GLY99, TYR100, HIS100A, VAL100B, and ARG100C.
2. To locate the insertion point for the H3-loop
In the Hierarchy View, expand H chain and click the ASN96 residue to select it.
The selected residue is highlighted in the 3D Window.
-68-
In the View toolbar, click the Fit to Screen button.
This zooms the view to the site where you will insert the loop.
3. To insert the H3-loop
In the Tools Explorer, select Protein Modeling from the dropdown list.
This opens the Protein Modeling tool panels.
In the Build and Edit Protein tool panel, click Choose Build Action and choose Insert
After from the dropdown list.

In the Choose Amino Acid tools group, click Specify....
This opens the Enter an Amino Acid Sequence dialog.
Enter the sequence YVGYHVR in the textbox.

The new loop is inserted and has a covalent peptide bond to TRP100D. In the Hierarchy
View, the newly inserted residues have insertion codes following the residue number
they were inserted from. In this example, the following amino acids are inserted:
o TYR96A
o VAL96B
o GLY96C
o TYR96D
o HIS96E
o VAL96F
o ARG96G
Tip. You can adjust the numbering using the Renumber Sequence tool from the Protein
Reports and Utilities tool panel; but, for this tutorial, leave the numbering as it is.
4. To set up the refinement of the H3-loop
In the Tools Explorer, select Simulation from the dropdown list.
This opens the Simulation tool panels.
In the Forcefield tool panel, choose the CHARMm Polar H forcefield from the
dropdown list. Click Apply Forcefield.
This sets the proper atom types in preparation for running the Loop Refinement
protocol.
5. To open the Loop Refinement protocol and launch the job
From the Hierarchy View, choose the newly built H3-loop. For this example, click
TYR96A , hold SHIFT, and click TRP100D.
-69-
This selects the two residues and all residues listed between them.
Note. One extra residue (TRP100D) is selected in order to ensure that the peptide
bond between ARG96G and TRP100D adopts a reasonable conformation after
refinement.
In the Protocols Explorer, expand the Protein Modeling folder and double-click the
Loop Refinement protocol.
In the Parameters Explorer, click the Loop parameter and choose the loop from the
dropdown list.
At this point, the molecular system has been typed and the loop is selected so all the
required parameter prerequisites have been met.
The job takes about 45 minutes on a Pentium 4, 2Gb RAM, 2.8GHz machine.
In the Html Window, in the Output Files section, click the ViewResults.pl link.
This runs Vi ewResul t s. pl . This Perl script automatically opens all the loop
conformations in a Table Browser and displays the starting structure. This view has
been designed to easily browse through the different loop conformations.
In the Table Browser Window, in the Table View, click the top-left cell and press
CTRL+H to open the Hierarchy View.

Expand <Cell>.
In the Hierarchy View, you will see two systems displayed: 2aab_H3_missing and
2aab_H3_missing.input.
In the Hierarchy View, expand 2aab_H3_missing.

In the Table View, click the Name column header and press the DOWN ARROW KEY
to browse through all the different loop conformations. Browse through the entire list.
As you browse, new elements are added to the Hierarchy View. In the
2aab_H3_missing.input system, the following elements are added:
2aab_H3_missing.L_0001, 2aab.H3_missing.L_0002, 2aab_H3_missing.L_0003, and
-70-
so on. In the 2aab_H3_missing system, the following elements are added: Index-1,
Index-2, Index-3, and so on. These indices will be used later in the tutorial.
In the Hierarchy View, click the checkbox next to 2aab_H3_missing.L_0001.

In the View toolbar, click the Fit to Screen button to have a better view of the
loops.
7. To export the loops
Loops generated by the Loop Refinement protocol are displayed as . sd files for easier
viewing and browsing. However, if molecular coordinates are needed, the full system
should be saved as an . msv, . mol 2, or . pdb file.
Click the first loop of the form 2aab_H3_missing.input, hold SHIFT and click the T
last loop of the form 2aab_H3_missing.Lxxx to select all the loops in between.

Press DELETE.

Check the checkbox next to the 2aab_H3_missing system.
By using the checkbox to the left of the index, you can render a loop conformation
active (indicated by [Active] displayed after the index name). Index-1 represents the
starting conformation prior to loop refinement. Index-2 corresponds to loop
conformation 2aab_H3_missing.L_0001, Index-3 corresponds to loop conformation
2aab.H3_missing_L0002, and so on.
Click the checkbox next to Index-2

From the menu bar choose File | Save As....
This opens the Save As dialog, allowing you to save the active molecule containing the
newly generated loop conformation.
On the dialog, enter a name in the File name textbox and for Files of type, choose
Protein Data Bank Files from the dropdown list.
In the Data Table, there are two columns labeled Loop.BackboneRMSD and
Loop.HeavyAtomRMSD. You can use these column as diagnostic tools for comparing
the initial loop conformation with the newly refined loops. In this tutorial, the H3-loop
de novo was generated using the Biopolymer tools. Therefore, the values in these
columns are not relevant. If this tutorial had started directly from the H3-loop provided
by the crystal structure, then these columns would provide a sense of how well the
Loop Refinement protocol performed.
Although you could use this system for the next section of the tutorial, a new data file
(2aab_cl ust er . msv) is provided to keep file names consistent.
From the menu bar, choose Window | Close All.
-71-
Cluster analysis of the loop conformations
1. To perform the clustering on the backbone atoms of the H3-loop
The protocol for analysis of molecular dynamics trajectories can also be used to create
a dendrogram of the loop conformations and to identify which loops may be of interest.
In the Files Explorer, navigate to and double-click the 2aab_cluster.msv data file.
This opens the file in a new 3D Window.
In the Hierarchy View, expand H chain.

Click TYR96A , hold SHIFT, and click T TRP100D.
This selects residues TYR96A through TRP100D.
From the menu bar, choose Edit | Select....
This opens the Select dialog.
On the dialog, click Group checkbox and choose Sidechain from the dropdown list.

Click the Deselect button, and then the Close button.
This deselects the side-chain atoms, but leaves the backbone atoms selected.
2. To set up and run the Analyze Trajectory protocol for cluster analysis
In the Protocols Explorer, expand the Simulation folder and double-click the Analyze
Trajectory protocol.
The Input Molecule parameter is automatically assigned to the system called
2aab_cl ust er : 2aab. i n. All required parameters are colored red.
In the Parameters Explorer, set the RMSD Reference parameter to All frames.

Expand the RMSD Reference parameter.

In the Atom Group parameter, select Create New Group from selection....
This command will automatically create a new group called Group for the selected
backbone atoms. This group will be used for the clustering analysis.
The job takes less than one minute on a Pentium 4, 2Gb RAM, 2.8GHz machine.
-72-
3. To analyze clusters with an interactive dendrogram
In the Html Window, in the Output Files section, click the RMSD to conf.pl script link.
This opens a Dendrogram Window that displays all the different clusters based on
RMSD deviations between the loops. You may need to resize the view using the Zoom
tool on the View toolbar.
The cluster numbering is carried out based on the Index number, so that Index-1
corresponds to the initial pre-refined loop conformation and Index-2 corresponds to
loop 1.
Look for the cluster containing Index-2. At about the 1.0 RMSD Cutoff, you will find a
six-member cluster containing the Index-2 loop conformation. The other elements in
this cluster are loops with Indices 13, 22, 3, 4, and 5. You will use this set of structures
to perform the last step of this tutorial.
4. To save the selected loop conformations for further analysis
In the Files Explorer, locate the Input folder of the Analyze Trajectory run and double-
click the 2aab.in.msv file.
This opens the file in a new 3D Window. This file contains all the loop conformations as
indices.
In the Hierarchy View, expand 2aab.in.

Click Index-1 to select it.

On the dialog, enter the name of initial loop 2aab_h3-loop_00.pdb in the File name
textbox and for Files of type, choose Protein Data Bank Files from the dropdown
list.

-73-
Repeat the same steps for the other loop conformations belonging to the initial cluster
as described in the following table:
Select Save As...
Index-1 2aab_h3-loop_01.pdb

RMSD validation
1. To open the system for RMSD validation
From the menu bar, choose File | Open URL....
This opens the Open URL dialog.
On the dialog, enter 2aab in the PDB ID textbox.
This opens the original structure of the MK2-23 antibody fragment directly from the
PDB.
From the menu bar, choose File | Insert From | File....
This opens the Open dialog.
On the dialog, navigate to and select the loop files generated in the previous section:
2aab_h3-loop_00.pdb, 2aab_h3-loop_01.pdb, 2aab_h3-loop_02.pdb,
2aab_h3-loop_03.pdb, 2aab_h3-loop_04.pdb, 2aab_h3-loop_12.pdb, and
2aab_h3-loop_21.pdb.

All the loop conformations identified in the previous section of the tutorial are opened
including the starting loop conformation prior to refinement: 2aab_h3-loop_00.pdb.
-74-
2. To perform the RMSD calculation
In the Hierarchy View, expand chain H

Click residue TYR97, hold SHIFT, and click T TRP100D.
This selects all the residues TYR97 through TRP100D. These residue selections will be
used to set the original crystal structure H3-loop as reference.
From the menu bar, choose Structure | RMSD | Biopolymer Structures....
This opens the Biopolymer RMSD Calculation dialog.
On the dialog, select 2aab as the Reference molecule. In the Calculate RMSD for
molecules control group, click all of the molecules except for 2aab.

Click the Selected residues radio button to only calculate the RMSD deviation for the
H3-loop.

In the Amino Acids group, check the C-Alpha and the Main-chain boxes.

This opens an RMSD Report containing columns for the Molecule, Reference, C-Alpha,
and Main-chain atoms RMSD.
The main-chain RMSD for the 2aab_h3-loop_00.pdb molecule is calculated at 6.259 .
The best loop from the selected cluster has an RMSD of 0.989 corresponding to the
2aab_h3-loop_12.pdb refined loop.
Though the top scored solution corresponding to 2aab_h3-loop_01.pdb only has an
RMSD of 1.552 , this still represents an impressive low RMSD given that the refined
loop was eight-residues in length and was constructed using de novo methods
exclusively.

-75-
ZDOCK tutorial
Purpose: Learn how to determine and refine protein-protein complex structures using
ZDOCK and RDOCK.

Required functionality and modules: Discovery Studio Visualizer client, Dock Proteins
with ZDOCK, and Refinement with RDOCK.

Required data files: 2ptn.pdb and 2sta_I.pdb.

Background
ZDOCK is a rigid-body protein-protein docking algorithm based on the Fast Fourier
Transform correlation technique. It is used to explore the rotational and translational space
of a protein-protein system. At the end of the ZDOCK protocol, ZRANK scores the ZDOCK
output poses using an empirical energy function, and the top set of poses are clustered
according to the RMSD of the interface residues. RDOCK is a CHARMm-based energy
minimization procedure for refining and scoring docked poses using energy scoring
functions.
Introduction
In biological systems, many proteins interact with other proteins to carry out important
biological functions (antibodies interacting with antigens for example). Understanding how
a protein interacts structurally provides important insights to its function and helps with
designing new proteins for therapeutic purposes and other applications. When the
structures of the individual proteins are known, computational methods can be used to
predict their binding conformations.
Bovine beta-trypsin interacts with trypsin inhibitor, and the structures of the two proteins
are known experimentally (PDB ID 2ptn for trypsin and PDB ID 2sta, I chain for the
inhibitor). In this tutorial, you will predict the docking conformation of beta-trypsin and its
inhibitor by performing protein docking with ZDOCK, scoring the poses using ZRANK,
analyzing the docked poses, and refining a set of selected poses with RDOCK. The X-ray
crystal structure of the bovine beta-trypsin and inhibitor CMTI-I protein complex is known
(PDB ID 1ppe) and you will compare the predicted results with the X-ray results at the end
of the tutorial.
Setting up a ZDOCK run
Analyzing ZDOCK results
Refining docked poses with RDOCK
Analyzing the binding interface residues using RMSD
-76-
Setting up a ZDOCK run
1. To open the receptor and ligand protein files in the same 3D Window
In the Files Explorer, navigate to and double-click the 2ptn.pdb data file.
Next, navigate to the 2sta_I.pdb data file and drag and drop it into the same 3D
Window.
2. To open the Dock Proteins (ZDOCK) protocol and modify the parameters
In the Protocols Explorer, expand the Protein Modeling folder and double-click Dock
Proteins (ZDOCK).
In the Parameters Explorer, click the Input Receptor Protein parameter and choose
2ptn:2ptn from the dropdown list.

Click the Input Ligand Protein parameter and choose 2ptn:2sta_I from the
dropdown list.

Click the Angular Step Size parameter and choose 15 from the dropdown list.
In general, for performance purposes, the larger protein is set as the receptor protein
(which is stationary during the docking calculation), and the smaller protein is selected
as the ligand protein (which is moving during the calculation).
Note. ZDOCK can run with two Euler angle sampling sizes: 6 degrees or 15 degrees.
The 6 degree option samples more exhaustively with 54,000 poses, and in general
provides more accurate predictions. Although the 15 degree search is not as accurate
because it samples only 3600 poses, it finishes faster than the 6 degree search. The 15
degree option is used in this tutorial.
Expand the ZRANK parameter group to verify that the top 2000 poses from the
ZDOCK calculation will be re-scored by ZRANK energy function.

Expand the Clustering parameter group.

Click the RMSD Cutoff parameter and enter 6.0 for the value.

Click the Interface Cutoff parameter and enter 9.0 for the value.

Click the Maximum Number of Clusters parameter and enter 60 for the value.
For this small protein inhibitor, the default cutoff parameter values of 10.0 are too
large. A smaller RMSD Cutoff of 6.0 for clustering, and a smaller Interface Cutoff of
9.0 produce better clustering results.
-77-
ZDOCK has two different scoring methods. One is based only on shape
complementarity while the other one includes additional electrostatic and desolvation
energy terms. The second scoring method requires more memory and runs slower.
When used alone, the second method gives more accurate predictions. However, when
combined with ZRANK, the two methods have similar accuracy. In this tutorial,
electrostatic and desolvation energy terms are not used. For more details about the
two scoring methods, refer to the Dock Protein (ZDOCK) theory Help topic in Discovery
Studio.
In this tutorial, a single processor ZDOCK job is performed without specifying blocked
residues or filtering binding site residues.
Note. ZDOCK can be run with blocking and/or filtering options:

If data suggests that certain residues are not likely to appear in the protein-protein
interface, select those residues on the receptor protein, and then set the Receptor
Blocked Residues parameter to Create New Group from selection. Similarly, select
residues to be blocked on the ligand protein for input in the Ligand Blocked Residues
parameter.

If data suggests that certain residues would appear in the protein-protein interface,
select them. Then expand the Filter Poses parameter group and set the Receptor
Binding Site Residues and Ligand Binding Site Residues parameters to Create New
Group from selection.

The Filter Poses function can also be performed using the Process Poses (ZDOCK)
protocol after a ZDOCK run.
Expand the Advanced parameter group and set the PreserveDisplayStyle parameter
to False.
This will display the proteins as ribbons.
The job takes about 15 minutes on a Pentium 4, 2Gb RAM, 2.8GHz machine.
Analyzing ZDOCK results
1. To visualize the clustering of poses
In the Output Files section, click the ZDockResults.msv link.
-78-
The ZDockResults.msv file contains the input proteins with information about the
density of poses and poses that are docked. Small spheres indicate the center of each
cluster and are colored by their ZRANK score from red (best scoring poses) to blue
(worst scoring poses). The poses in each cluster are also represented by spheres, but
are hidden from display by default.
Protein surfaces are colored to indicate the density of the poses. Note that the display
of the protein surface is turned off by default for faster loading, but can be turned on
once the protein is loaded.
In the Hierarchy View, expand Docked Poses and Clusters.
This displays the groups of clusters and poses. Cluster_1 is the largest cluster, followed
by progressively smaller clusters. The poses that fall out of the top 60 clusters are
grouped into the unassigned group.
Note. You can view the protein surface by checking the <ViewSolidSurface> checkbox
in the Hierarchy View. Red areas indicate the highest density regions and blue areas
indicate the lowest density regions.
In the Data Table, click the ProteinPose tab.
This displays the following data from the ZDOCK run:
o ZDock Score - contains the ZDOCK score of each pose.
o ZRank Score contains the ZRANK score of each pose. Pose 1 has the lowest
(best) ZRANK score.
o Rank is the rank order of the poses according to ZRANK score. If ZRANK is not
executed, the rank will be based on ZDOCK score.
o Cluster contains the cluster group number for each pose. The poses fall out of the
top 60 clusters are assigned cluster number of 2001.
o ClusterSize reports the size (number of poses) of a cluster.
o Density represents the number of neighboring poses in the clustering.
For details about the clustering algorithm, refer to the Clustering and analysis of
docked protein poses Help topic in Discovery Studio.
In the Tools Explorer, choose Protein Modeling from the dropdown list.
In the Analyze Docked Proteins tool panel, in the Clustering tools group, click Show
Full Cluster and then Show Cluster Center.
This toggles the display between showing all poses and only the cluster centers.

-79-
2. To plot the scores and clustering of poses
The following steps illustrate several ways to use plots and the Analyze Docked
Proteins tools to help view the poses and select a smaller number of ZDOCK poses for
further RDOCK refinement.
In the Hierarchy View, press and hold SHIFT and click Cluster_1 and then
Cluster_60.
This selects clusters 1 through 60.
In the Data Table, click the ProteinPose tab to make it active. From the menu bar,
choose Chart | Point Plot.
This opens the Choose Plot Axes dialog.
On the dialog, choose Cluster for the X axis, and ZRank Score for the Y axis.

This opens a 2D Point Plot.
Note that the pose with the best score is in cluster 2, the second largest cluster, and
that cluster 2 has the most low energy poses. Often, the best scoring pose may not be
the best prediction, giving a high rate of false positives in protein docking prediction
methods. The top few clusters with the highest population of best scoring poses are
more likely to give good predictions.
Repeat these steps, but with ZDock Score selected for the Y axis.
ZDOCK score is not as discriminative as ZRANK score, and both cluster 1 and cluster 2
have a few top scoring poses.
Click the ZDockResults - Point Plot tab and drag the window so that you can also
view the ZDockResults - 3D Window.

Click the ZDockResults - 3D Window tab to make it active. In the Hierarchy View, in
the Clusters group, click Cluster_2 to select it.

In the Analyze Docked Proteins tool panel, in the Clustering tools group, click Show
Full Cluster to show all the poses of cluster 2.
You can examine the location of the poses in cluster 2 with respect to the receptor
protein, 2ptn in the 3D Window.
3. To display the full complex of the docked protein poses
Click the ZDockResults - 3D Window tab to make it active.

In the Hierarchy View, in the Clusters group, click Cluster_2 to select it.

-80-
In the Tools Explorer, in the Analyze Docked Proteins tool panel, in the Display tools
group, click Show Full Docked Complex.
This displays Pose 1 in the 3D Window. This also displays the two names for the Pose 1
protein complex in the Hierarchy View: Pose1_2ptn and Pose1_2sta_I.
In the Analyze Docked Proteins tool panel, click Show Next Docked Complex.
This displays Pose 2 in the 3D Window. This also displays the two names for the Pose 2
protein complex in the Hierarchy View: Pose2_2ptn and Pose2_2sta_I.
Click Show Next Docked Complex to cycle forward through the poses and click
Show Previous Docked Complex to cycle back through all the poses that you
selected.

Return to display Pose 1.
4. To calculate the RMSD to a selected pose
In the Hierarchy View, click Pose1_2ptn.
This selects the entire protein.
In the Analyze Docked Proteins tool panel, click Identify Binding Interface.
This selects the receptor and ligand binding interface residues.
In the RMSD tools group, click Define RMSD Reference.
This creates a group called Pose1 POSE_RMSD_REFERENCE in the Hierarchy View. This
pose will be used for the RMSD calculation.
In the Analyze Docked Proteins tool panel, in the RMSD tools group, click Calculate
Binding Site RMSD (ZDOCK).
This adds a column named RMSD to Pose1 POSE_RMSD_REFERENCE to the
ProteinPose tab in the Data Table. The RMSD reported are calculated based on the
heavy atoms of the binding site residues defined by Pose1 and it shows how different
each poses are to Pose1.
Tip. In the ProteinPose tab of the Data Table, double-click the Cluster column to sort
by cluster size.
5. To calculate the RMSD of the poses to experimentally determined complex
structure 1ppe.
From the menu bar, choose File | Insert From | URL....
This opens the Insert From URL dialog.
Check that the PDB ID radio button is selected, then enter 1ppe in the textbox.

-81-
This inserts the protein complex into the ZDockResults - 3D Window
Before running the RMSD calculation, the poses and 1ppe should be superimposed
based on the receptor protein.
From the menu bar, choose Sequence | Show Sequence.
This opens a Sequence Window.
In the Hierarchy View of the ZDockResults - 3D Window, click 2ptn to select it.

From the menu bar, choose Structure | Superimpose | By Sequence Alignment....
This opens the Sequence Alignment dialog. 2ptn is the reference molecule.
On the dialog, select 1ppe in the Molecules to superimpose list.

This superimposes 1ppe chain E with 2ptn.
Close Superimpose By Sequence Alignment - Html Window.

In the Hierarchy View of the ZDockResults - 3D Window, expand <Cell>, then
expand 1ppe.

Click Water to select it, and then right-click and choose Cut.
This removes the water chain from 1ppe.
Click chain E to select chain E of 1ppe.

In the Tools Explorer, in the Analyze Docked Proteins tool panel, click the Define
Receptor command.
This defines chain E of protein 1ppe as the receptor.
In the Tools Explorer, in the Analyze Docked Proteins tool panel, click the Identify
Binding Interface command.
This identifies the binding interface residues between chain E and chain I of protein
1ppe (the residues are highlighted yellow in the 3D Window).
RMSD Reference command.
This sets the experimental structure 1ppe as the RMSD reference.
In the Analyze Docked Proteins tool panel, in the RMSD tools group, click Calculate
Binding Site RMSD (ZDOCK).
-82-
This adds a column named RMSD to 1ppe POSE_RMSD_REFERENCE to the ProteinPose
tab in the Data Table. You can see that many top poses in Cluster_2 (the second
largest cluster) predict the nearest correct binding mode of PDB 1ppe with low
interface RMSD values.
Refining docked poses with RDOCK
RDOCK can be used to refine the binding interface based on CHARMm energy minimization
and it also re-scores the minimized poses. This serves two purposes, one to minimize the
docked complex so that some obvious clashes are removed from the rigidly docked
interfaces. The second purpose is to check if the poses selected by ZRANK are also poses
with good RDOCK score.
1. To set up and run the Refine Docked Proteins (RDOCK) protocol
Close the ZDockResults - Point Plot Window.

Click the ZDockResults - 3D Window tab to make it active.
RDOCK requires that the receptor and ligand proteins are typed, without errors, by the
CHARMm Polar H forcefield.
In the Tools Explorer, choose Simulations from the dropdown list.
This opens the Simulations tool panels.
In the Forcefield tool panel, choose CHARMm Polar H from the dropdown list, then
click Apply Forcefield.
This types the receptor and ligand proteins with the CHARMm Polar H forcefield. The
status ZDockResults typed with CHARMm Polar H displays in the tool panel.
In the Protocols Explorer, in the Protein Modeling folder, double-click the Refine
Docked Proteins (RDOCK) protocol.
In the Parameters Explorer, click the Input Typed Receptor Protein parameter and
choose ZDockResults:2ptn from the dropdown list.

Click the Input Typed Ligand Protein parameter and choose ZDockResults:2sta_I
You can select a set of top ranked poses by ZRANK score or by ZDOCK score to run
RDOCK refinement. In this tutorial, you will optimize a set of poses with top ZDOCK.
Click on the 2D point plot called ZdockResults-Point Plot (1) generated earlier, select
the top scoring poses (score > 12) in the first 10 clusters by lasso over them.

In the Parameters Explorer, click the Input Poses parameter and choose Create New
Group from selection... from the dropdown list.
-83-
The new group is named Group by default.
Note. Leave the Dielectric Constant parameter set to 4.0. This will be the dielectric
value used in the RDOCK CHARMm energy calculation.
The job takes about ten minutes on a Pentium 4, 2Gb RAM, 2.8GHz machine.

2. To view and analyze the scores
In the Jobs Explorer, double-click the job summary.
This opens the Report.htm file in an Html Window and highlights the Output folder of
the job in the Files Explorer.
In the Files Explorer, in the Output folder, click ViewResults.pl.
Refined poses appear in a Table Browser with the energies calculated by RDOCK listed
in the columns.
o E_elec1 and E_elec2 - are the electrostatic energies of the protein complex after
the first and second CHARMm minimizations.
o E_vdw1 and E_vdw2 - are the van der Waals nonbond energies of the protein
complex after the first and second CHARMm minimizations.
o E_sol - is the desolvation energy of the protein complex calculated by the ACE
method.
o E_RDock - is the RDOCK score defined as: E_sol + bet a*E_el ec2
The complex with high van der Waals energies (E-vdw1 > 10) after first minimization
are considered as not well optimized complexes with clashes and are usually not good
poses. Here we use a 3D plot of cluster number, E_vdw1 and E_RDock to select the
hits.
Check that the Table Browser Window is the active window. From the menu bar, select
Chart | 3D Point Plot to bring up the 3D Point Plot dialog.

In the dialog, set x, y, and z to be Cluster, E_vdw1, and E_RDock respectively. Set
Color to ZDock Score
Note that none of the poses in this example have high van der Waals energy. The
poses with good RDOCK scores are all in cluster 2. This result confirms the selection of
cluster 2 as the best hits by ZRANK score.
3. To view the refined structures
In the Poses - Table Browser, right-click in the first cell and choose Show Structure
in 3D Window.
-84-
This displays the refined structure in a Graphics View in the 3D Window.
From the menu bar, choose View | Hierarchy.
This displays the Hierarchy View in the 3D Window.
In the Table Browser, click a row number.
This adds the pose to the Hierarchy View and displays it in the Graphics View.
In the Table Browser, click additional row numbers.
This adds the poses to the Hierarchy View. When a new row is selected, the other
poses are hidden in the Graphics View.
Tip. Check the pose checkbox in the Hierarchy View to turn the molecule display on or
off in the Graphics View.
4. To select the interface residues
You can identify the binding interface residues with the same method described at the
end of the Analyzing ZDOCK results section.
In the Hierarchy View, expand the selected pose and select the receptor chain
<AminoAcidChain>.

In the Analyze Docked Proteins tool panel, in the Display tools group, click Define
Receptor and then click Identify Binding Interface.
This selects the interface residues in the Graphics View.
Tip. The Distance Cutoff value used to define the binding interface can be set in the
Preferences. From the menu bar, choose Edit | Preferences | Protein Docking. The
default is 5 .
Analyzing the binding interface residues using RMSD
In this section, you will compare the refined poses with the crystal structure of the protein
complex PDB 1ppe. This section is optional and you can stop the tutorial here. However, if
you are interested in learning how to superimpose protein structures based on their
sequence alignment and to calculate RMSD of a set of selected residues, continue with this
section.
From the previous sections, you already know that the poses in cluster 2 have the lowest
interface RMSD to the experimental complex. The RMSD values calculated here should be
similar to the ones calculated based on the ZDOCK results.
1. To insert PDB 1ppe into the Graphics View
-85-
Ensure that the Graphics View with the refined poses is active.

From the menu bar, choose File | Insert From | URL....
This opens the Insert From URL dialog.
Check that the PDB ID radio button is selected, then enter 1ppe in the textbox.

This inserts the protein complex in the Graphics View.
In the Hierarchy View, expand <Cell>, then expand 1ppe.

Click Water to select it, and then right-click and choose Cut.
This removes the water chain from 1ppe.
From the menu bar, choose Sequence | Show Sequence.
This opens a Sequence Window.
Before running the RMSD calculation, the poses and 1ppe should be superimposed
based on the receptor protein.
2. To superimpose the poses and 1ppe
Click the Poses - Table Browser tab to make the window active.

In the Hierarchy View, click chain E to select it.
This selects the receptor chain of 1ppe.
From the menu bar, choose Edit | Preferences.
This opens the Preferences dialog.
Click the Superimposition page. Check the Selected reference residues only
checkbox.


From the menu bar, choose Structure | Superimpose | By Sequence Alignment....
This opens the Sequence Alignment dialog. 1ppe is the reference molecule.
On the dialog, select all of the poses in the Molecules to superimpose list.


Click the Poses - Table Browser tab to make it active. In the Table Browser, check
that the 1ppe receptor chain E is still selected.
-86-

Receptor command and then click the Identify Binding Interface command.
This selects the interface residues.
3. To calculate the binding interface residues RMSD values
From the menu bar, choose Structure | RMSD | By Sequence Alignment....
This opens the RMSD by Sequence Alignment dialog.
On the dialog, click the Reference molecule dropdown list and choose 1ppe.

Click the Selected residues radio button, and clear the Report at residue level
checkbox.

This opens the RMSD Report using the selected residues for poses in an Html Window.
Compare the E_RDock scores of the docked poses with the RMSD values in the RMSD
report. The pose with the lowest RMSD also has the lowest (best) E_RDock score.

-87-
Building a QSAR equation tutorial
Purpose: Learn how to predict the IC50 value of an unknown compound using a diverse
compound training set and a quantitative structure activity relationship (QSAR).


Required data files: dbh-mols-w-act.sd and dbh02_1.msv.

Time: 30 minutes.
Background
A QSAR (quantitative structure-activity relationship) is a mathematical relationship
between a set of physicochemical properties, known as descriptors, and the biological
activity of that set of molecules. The descriptors on which a QSAR is based are derived
from topological and 3D information for the molecules.
A QSAR can be used to predict the activity of a molecule prior to testing or even synthesis.
Additionally, analysis of relationships between structure, properties, and biological activity
based on a QSAR can facilitate understanding of the nature of the underlying interactions
in a particular system.
Introduction
QSAR protocols can be used to make informed decisions about which candidate compounds
should be considered (based on estimates of biological activity), as well as to gain insight
into various underlying biological processes. QSAR can also be used to provide basic insight
into structure-property relationships.
In this tutorial, you will calculate several molecular properties for a series of beta-
hydroxylase inhibitors. These properties are then used to construct a QSAR model using
multiple linear regression.
Entering molecules into a training set
Calculating molecular descriptors
Generating a QSAR equation
Predicting the activity of new molecules
Examining the prediction within the original training set
Entering molecules into a training set
Choose the molecular structures to use as a training set. These structures can be built
using the building and sketching tools, or you can open structural data from a variety of
common file formats generated by other molecular modeling tools. However, a prepared
set of dopamine beta-hydroxylase inhibitors can be opened from files in this case.
-88-
To open the training set molecules
In the Files Explorer, navigate to and double-click the dbh-mols-w-act.sd data file.
This opens 47 molecules in a new Table Browser Window.
The column neg_log_IC50 displays the activity data -log(IC50). This column will be used as
the dependent variable in this tutorial.
Calculating molecular descriptors
A descriptor is a molecular property that can be calculated and used in determining a new
QSAR equation. A wide range of spatial, electronic, topological, and other descriptors can
be calculated using the Calculate Molecular Properties protocol. For this tutorial, some
calculable properties will be used to build the QSAR model.
The Calculate Molecular Properties protocol will be used to determine the number of
hydrogen bond acceptors and four topological descriptors (CHI_V_0, CHI_V_1, CHI_V_3_C,
and SC_3_P).
1. To calculate molecular descriptors
In the Protocols Explorer, expand the QSAR folder and double-click the Calculate
Molecular Properties protocol.
button.
On the dialog, select the All Ligands from a Table Browser Window radio button,
and check that dbh-mols-w-act displays in the textbox.


Click the Molecular Properties parameter, then click the button.
This opens the Molecular Properties dialog.
On the dialog, uncheck the All checkbox.
This clears the default selections.
Enter num in the text box at the top of the dialog and check the Num_H_Acceptors
checkbox.

Enter chi in the textbox at the top of the dialog, check the checkboxes for CHI_V_0,
CHI_V_1, CHI_V_3_C, and SC_3_P.

-89-
The topological descriptors selected in this step have been shown to be very useful in
building QSAR equations in modeling enzyme inhibition by Kier (Kier, K.L., 1986,
Shape indexes of orders one and three from molecular graphs, Quat. Struct. - Act.
Relat. 5, 1-7). Ultimately, the choice of descriptors in QSAR models relies on the
underlying physical or chemical mechanism of the particular property being modeled.

In the Html Window, in the Output Files section, click the dbh-mols-w-act.sd link.
This opens the results of the Calculate Molecular Properties protocol in a Table
Browser.
2. To use the five calculated properties to build a model
In the Protocols Explorer, in the QSAR folder, double-click the Create Multiple Linear
Regression Model protocol.
button.
that dbh-mols-w-act displays in the text box.

Click the Model Name parameter and replace MLRTempModel with MLRdbhModel.

Click the Dependent Property parameter and choose neg_log_IC50 from the
dropdown list.

Click the Independent Properties parameter and check the checkbox for the five
descriptors: Num_H_Acceptors, CHI_V_0, CHI_V_1, CHI_V_3_C, and SC_3_P.
-90-

Generating a QSAR equation
The Create Multiple Linear Regression Model protocol also returns a linear equation with
the five descriptors. This equation can be analyzed using the files generated in the Output
Files folder.
To review the results
In the Html Window, in the Output Files section, click the View Results link.
This opens a Point Plot that compares the experimental with the predicted activity.
Predicting the activity of new molecules
Once you have calculated a QSAR equation, you can use it to predict the activity of a
molecule with unknown activity. Models generated by the QSAR Create Multiple Linear
Regression Model protocol are saved directly onto the server and are listed as choices for
the Calculate Molecular Properties protocol.
1. To open a new molecule and predict its activity
In the Files Explorer, navigate to and double-click the dbh02_1.msv data file.
This opens the molecule in a new 3D Window.
In the Parameters Explorer, click the Calculate Molecular Properties protocol tab.
Click the Input Ligands parameter, then click the button.
On the dialog, click the All Ligands from a 3D Window radio button and check that
dbh02_1 displays in the text box.

Click the Molecular Properties parameter, then click the button.
This opens the Molecular Properties dialog.
In the dialog, expand Other. Check the checkbox for MLRdbhModel.

-91-
2. To run the protocol and view the results
In the Html Window, in the Output Files section, click the dbh02_1.sd link.
This opens the results in a Table Browser.

The predicted activity for this test molecule is about 4.5. The test molecule has been
modified from the training set molecule dbh02 to contain an oxygen atom instead of a
sulfur atom. The modification makes the molecule a better inhibitor. (IC50 goes from 1
mM to 31.6 M).
Examining the prediction within the original training set
We can further examine the predicted IC50 of dbh02_1 within the context of the original
training set.
1. To open the training set molecules and the test molecule

In the Jobs Explorer window, double click the completed Create Multiple Linear
Regression Model job.
This opens the job folder in the File Explorer window and the report file in a new
window. You can close the report window.
From the Output folder, locate file dbh-mols-w-act.sd. Right click this file and choose
This opens all the 47 molecules in the training set in a 3D window.
In the Jobs Explorer window, double click the completed Calculate Molecular Properties
job.
This opens the job folder in the File Explorer window and the report file in a new
window. You can close the report window.
From the Output folder, locate file dbh02_1.sd. Drag and drop this file into the 3D
window.
-92-
2. To plot the predicted results
If the Data Table associated with the 3D window is not open, press Ctrl-T to make it
visible.

With Ctrl key pressed, click the neg_log_IC50 and MLRdbhModel columns to select
them all.

From the menu bar, choose Chart | Point Plot.
This opens a graphics window with a point chart showing the actual -log(IC50) and
predicted value for each molecule.
Drag the tab of the chart window to make it display side to side with the 3D window.

Select a row in the Data Table of the 3D window to highlight the entry of this molecule
in the point plot interactively. Select one or more points in the point plot will make the
corresponding molecules being selected in the Data Table as well as the 3D window.
The interactivity illustrates that dbh02_1 is in the mid range of neg_log_IC50, while
the original molecule dbh02 is among the lowest in neg_log_IC50.

-93-
Docking small molecules with LibDock tutorial
Purpose: Explore docking and analysis with a set of ligands and a specific protein active
site using LibDock.

Required functionality and modules: Discovery Studio Visualizer client, DS LibDock,
and DS Catalyst Conformation.

Required data files: 1kim_protH.msv and TK-xray-ligs.sd.

Time: 20 minutes.
Background
LibDock is an algorithm for docking small molecules into an active receptor site. Initially, a
hotspot map is calculated for the receptor site which contains polar and apolar groups. This
hotspot map is used to rigidly align the ligand conformations to form favorable interactions.
After a final energy minimization step, the top scoring ligand poses are saved.
Introduction
In this tutorial, you will dock a set of ligands into the thymidine kinase receptor.
Preparing the molecular systems for docking and performing the docking run
Analyzing docked poses
Preparing the molecular systems for docking and performing
the docking run
1. To define the protein as the receptor
In the Files Explorer, navigate to and double-click the 1kim_protH.msv data file.
In the Hierarchy View, expand <Cell>, then click 1kim_proth to select it.
In the Tools Explorer, in the Binding Site tool panel, in the Definition tools group, click
Define Selected Molecule as Receptor.
This defines the selected protein as the receptor for the next step.
2. To find potential binding sites in the receptor
In the Binding Site tool panel, click Find Sites from Receptor Cavities.
-94-
In the Hierarchy View, expand 1kim_proth.
Nine potential binding sites are identified, with the largest potential binding site
displayed.
Note. Use Next Site and Previous Site in the Display tools group to view the other
binding sites.
3. To define the sphere from the binding site
In the Hierarchy View, click Site 1 to select it.

In the View toolbar, click the Fit To Screen button.
This zooms the Graphics View to the binding site.
In the Binding Site tool panel, in the Definition tools group, click Define Sphere from
Selection.
The SBD_Site_Sphere is displayed as a red sphere.
4. To expand the sphere from 5 to 9 to encompass the ligand
In the Hierarchy View, click SBD_Site_Sphere to select it. Right-click
SBD_Site_Sphere and choose Attributes of SBD_Site_Sphere....
This opens the SphereObjects Attributes dialog.
On the dialog, in the Radius row, enter 9 for the value.

The radius is increased to ensure that most of the binding site points are contained
within the radius of the sphere. This can be verified visually in the Graphics View.
In the Hierarchy View, clear the checkbox beside Site 1.
This hides Site 1 from the 3D Window, as it is not needed for subsequent steps of this
tutorial.
Press CTRL+T.
This hides the Data Table.
5. To open the ligand file in a Table Browser
In the Files Explorer, navigate to and double-click the TK-xray-ligs.sd data file.
6. To open the protocol and modify the parameters
In the Protocols Explorer, expand the Receptor-Ligand Interactions folder and
double-click the Dock Ligands (LibDock) protocol.
-95-
In the Parameters Explorer, click the Input Receptor parameter and choose
1kim_protH:1kim_proth from the dropdown list.
TK-xray-ligs from the dropdown list.

Click the Input Site Sphere parameter and choose the coordinates of the sphere

Click the Docking Preferences parameter and choose User Specified from the
dropdown list.
This allows you to change specific parameters in the docking run.
Expand the Docking Preferences parameter. Click the Max Hits to Save parameter
and enter 10 for the value.
Note. For this tutorial, the default setting is lowered to reduce the number of poses
reported.
Click No if prompted, to skip saving the files.

This opens the results of the docking run.
The table shows the top scoring pose alignments for each input ligand; the poses are
sorted by the LibDock score.

-96-
8. To browse through the docked poses
Click TK-xray-ligs - Table Browser tab to make it active. Press CTRL+1 to hide the
Protocols Explorer and the Jobs Explorer.

In the Graphics View, click the surface of the binding site sphere to select it.

Right-click in the Graphics View and choose Hide.
This hides the sphere display.
In the Table Browser, click 1e2k in the first row.

Press CTRL+H to open the Hierarchy View.
This adds the ligand to the Hierarchy View and displays it in the Graphics View.
In the Hierarchy View, click 1e2k to select it. From the View toolbar, click the Fit To
Screen button.
This zooms the Graphics View to the binding site.
In the Graphics View, right-click and choose Receptor-Ligand Hydrogen Bonds.
This displays green dashes that represent the hydrogen bonds between the ligand
poses and the receptor atoms.
Tip. Use the Rotate tool in the View toolbar to rotate the molecule for better viewing of
the pose and bonds.

In the Table Browser, click through some of the other ligand poses.
Analyzing docked poses
To calculate RMSD values, count hydrogen bonds made by the poses to the receptor
molecule and count close contacts (van der Waals clashes) between the poses and the
receptor molecule, use the Analyze Ligand Poses protocol.
1. To open the Analyze Ligand Poses protocol and modify the parameters
Press CTRL+1.
This re-displays the Protocols Explorer and the Jobs Explorer.
In the Protocols Explorer, in the Receptor-Ligand Interactions folder, double-click the
Analyze Ligand Poses protocol.
The parameters display in the Parameters Explorer.
-97-
TK-xray-ligs from the dropdown list.

Click the Input Receptor parameter and choose TK-xray-ligs:1kim_proth from the
dropdown list.

Expand the Input Receptor parameter. Click the Hydrogen Bond parameter and
choose True from the dropdown list.

Expand the Hydrogen Bond parameter. Click the Scope parameter and check the
Residue checkbox.
This counts hydrogen bonds made by each docked pose to the entire receptor as well
as individual residues in the receptor. When the analysis is carried out, counts are
reported for the entire receptor and for each individual receptor residue.
Click the Contacts parameter and choose True from the dropdown list.

Expand the Contacts parameter. Click the Type parameter and choose All from the
dropdown list.
This is the default option and counts both nonpolar and polar contacts between each
docked pose and the entire receptor as well as each docked pose and individual
residues in the receptor.
Click the Scope parameter and check the Residue checkbox.
2. To run the protocol and view the results
Click No if prompted, to skip saving the files.
-98-
This opens the results of the analysis run in a Table Browser with columns that list the
new properties calculated:
o HBOND Count for the entire receptor
o HBOND Count for each of residues in the receptor (304)
o Contacts to the entire receptor
o Contacts to each of the residues in the receptor (304)
Note. If HBOND Count 1 is not the first HBOND column, turn off the sort properties on
input option for the Table Browser Window in the Preferences dialog.
Click the TK-xray-ligs - Table Browser tab to make it active. Press CTRL+1 to hide
the explorers.
3. To generate a Heat Map of hydrogen bond count
In the Table Browser, click the HBOND Count 2: 1kim_proth A MET46 column
header.
This selects all rows in this column.
Use the scroll bar to scroll to the end of the HBOND Count columns, press and hold
SHIFT, and click the HBOND Count 305: 1kim_proth A ALA375 column header.
This selects all the cells in the TK-xray-ligs - Table Browser Window that contain a
count of the hydrogen bonds, with one column per residue, and one row per docked
pose.
From the menu bar, choose Chart | Heat Map.
This generates a Heat Map with hotspots (blue = 0, red = maximum # of HBONDS)
corresponding to hydrogen bonds between the ligand poses (Y axis) and the residues
of the receptor molecule (X axis). You can see, for example, that Gln125 makes
hydrogen bonds with many of the docked poses.
Identifying the prevalence of such hydrogen bonds can be very challenging when
manually browsing through all docked poses of the ligands. The Heat Map quickly
reveals patterns in hydrogen bond formation that can provide insight into (for
example) the importance of specific residues for binding.
Hover the mouse over the Heat Map for information about specific cells (e.g., residue
name, pose number, and hydrogen bond count).
4. To generate a Heat Map of close contact count
Click the TK-xray-ligs - Table Browser tab to make it active.
On the Table Browser, click the Contacts 2: 1kim_proth A MET36 column header.
This selects all rows in this column.
-99-
Use the scroll bar to scroll to the end of the Contacts columns, press and hold SHIFT, T
and click the Contacts 305: 1kim_proth A ALA375 column header.
This selects all the cells in the TK-xray-ligs - Table Browser Window with contact count
information.
From the menu bar, choose Chart | Heat Map.
This generates a Heat Map colored by the number of close contacts between the ligand
poses (Y axis) and the residues of the receptor molecule (X axis). You can see the
number of close contacts with a pose by hovering over the points in the map. Vertical
stripes indicate residues with many close contacts with several poses.
5. To view close contacts in the Graphics View
Click the TK-xray-ligs - Table Browser tab to make it active.
In the Table Browser, click row 1.
In the Graphics View, right-click and choose Receptor-Ligand Bumps.
This displays purple dashes that represent the bumps (close contacts) between the
current pose and the receptor atoms.
Tip. Use the Rotate tool in the toolbar to rotate the molecule for better viewing of the
pose and close contacts.
Click the other cells in the Table Browser Window to display the close contacts between
the selected pose and the receptor atoms.
Note. Not all poses in this tutorial exhibit close contacts.

-100-
Simulation of a small peptide with restraints (protein
simulation) tutorial
Purpose: Build a small peptide and run the Standard Dynamics Cascade protocol with
distance and dihedral restraints.

Required functionality and modules: Discovery Studio Visualizer client and CHARMm.

Required data files: none.

Time: 20 minutes.
Background
CHARMm is designed to simulate molecular systems and provides capabilities to analyze
such simulations. CHARMm requires atomic coordinates and forcefield parameters,
including partial charges, as input. Discovery Studio provides these inputs, allowing you to
simulate and analyze molecular systems using CHARMm. In this tutorial, an input molecule
is built using the Protein Builder, forcefield parameters and partial atomic charges are
assigned to the system, a molecular dynamics calculation is carried out, and the resulting
dynamics trajectory is animated.
Introduction
Build a small peptide and run the Standard Dynamics Cascade protocol with distance and
dihedral restraints.
1. To build enkephalin in a new 3D Window
From the menu bar, choose File | New | 3D Window.
This opens a new 3D Window.
In the Tools Explorer, in the Build and Edit Protein tool panel, in the Choose Build
Action tools group and choose Create/Grow Chain from the dropdown list.

In the Choose Amino Acid tools group, click the following commands (in the indicated
order): Tyr, Gly, Gly, Phe, Met.
This builds a met-enkephalin. As each residue is clicked, that residue is added to in the
3D Window. The Choose Conformation setting on the tool panel specifies the
conformation.
2. To assign forcefield parameters
In the Tools Explorer, choose Simulations from the dropdown list.
-101-
In the Forcefield tool panel, choose CHARMm from the dropdown list. Click Apply
Forcefield.
This selects the CHARMm all-atom forcefield for this calculation and then assigns the
CHARMm all-atom forcefield parameterization to your system. A message that the
molecule is typed is displayed in the Status group (Molecule typed with CHARMm).
3. To define a distance monitor
In the Hierarchy View, expand Molecule and then expand chain A.

Next, expand TYR1 and expand Met5.

In TYR1, click N to select the atom. Next, press and hold the CTRL key, and in MET5,
click C to select the atom.
This selects the terminal ammonium nitrogen (the N in the NH
3
+
group of Tyr) and the
carboxylate carbon (C in the COO
-
group in Met).
From the menu bar, choose Structure | Monitor | Distance.
A green line connecting the selected atoms is created, indicating that the distance
between the atoms is monitored.
4. To set up a distance restraint
Reselect the atoms in the Hierarchy that you selected from the previous step.

In the Tools Explorer, in the Constraints tool panel, click Create Distance Restraint.
A DistanceRestraint entry is added to the Hierarchy View and to the Data Table.
5. To modify the distance parameters of the distance restraint
In the Data Table View, click the DistanceRestraint tab, scroll to the
ThresholdMaximum column, and enter 17.0 for the value. Next, scroll to the
ThresholdMinimum column and enter 16.0 for the value.
This will allow the N and C atoms to move without penalty within a distance of 16 to
17 . Blue, ball-shaped objects are displayed on the two selected atoms.
6. To set up a dihedral restraint
In the Hierarchy View, expand Gly2 and Gly3.

In Gly2 , click C to select the atom. Next, press and hold down the CTRL key, and in
Gly3, click N, CA, and C to select the atoms.
This selects four atoms in the middle of the peptide chain defining the Phi angle in the
second Gly residue of the enkephalin molecule.
-102-
In the Tools Explorer, in the Constraint tool panel, click Create Dihedral Restraint.
Blue, ball-shaped graphical objects are displayed on the four selected atoms. A
DihedralRestraint entry is added to the Hierarchy View and to the Data Table View. By
default, the dihedral restraint value is the current angle.
7. To modify the dihedral restraint value
In the Data Table View, click the DihedralRestraint tab, scroll to the Force Constant
column and enter 500 for the value.

Scroll to the Periodicity column and enter 3 for the value.

Scroll to the Reference Angle column and enter -57 for the value.
If the Periodicity is set to 0, the constraint is assumed to be a planar constraint
(pseudo-dihedral constraint). A Periodicity of 3 means that a 360 rotation has three
energy minima (every 120 starting from the Reference Angle).
8. To open the Standard Dynamics Cascade protocol
In the Protocols Explorer, expand the Simulation folder and double-click the
Standard Dynamics Cascade protocol.
9. To disable the distance and dihedral restraints during the production stage
In the Parameters Explorer, expand the Production parameter group and click the
Constraints parameter.
This displays a listbox of constraints.
In the listbox, uncheck the Dihedral and Distance restraints.
This removes the restraints from the production stage of the cascade.
10. To run the Standard Dynamics Cascade
The default value for the Steps parameters in the Equilibration and the Production
parameter groups is 1000 steps. In most cases, the Equilibration and Production steps
will need to be much longer. The default value for the Save Results Frequency
parameter in the Production parameter group is set to 100 steps. This could be
changed to save only once every 1,000 or 10,000 steps. Remember, all frames from
the Production phase are saved in the final MSV file, which can lead to an extremely
large file.
On the Protocols toolbar, click the Run button and wait for the job to complete.
The job should take about three minutes on a single processor Pentium 4, 2Gb RAM,
2.8GHz machine.
-103-
11. To inspect the results from your minimization and dynamics calculation
Browse through the report to inspect the results from various stages of the simulation
cascade.
12. To animate frames from the simulation
In the Html Window, in the Output Files section, click the Molecule.msv link.
This opens the final structure.
From the menu bar, choose Structure | Animation | Play.
Frames from the enkephalin molecule production dynamics trajectory are displayed in
the 3D Window in sequence.
In this short (1 picosecond) animation, you will see that the distance between the
amino and carboxylic acid groups diminishes during the production stage of the
simulation because the distance restraint is disabled. Molecular dynamics calculations
such as this allow you to explore the time dependent conformational characteristics of
molecular structures.

-104-
Using the Calculate Energy (QM-MM) protocol to
determine energies, rank order poses and compare
interaction energies before and after minimization
Purpose: To learn how to calculate the QM-MM interaction energy in protein-ligand
complexes.

Required functionality and modules: Discovery Studio client, Calculate Energy (QM-
MM), Minimization (QM-MM).

Required data files: 1AQ1_pose01.msv, 1AQ1_pose10.msv, 1AQ1_pose13.msv, and
1AQ1_pose_xray.msv.

Time: Several hours.
Background
Applying QM-MM methods to a simulation system involves dividing the input system into
two regions:
The central region (e.g., a ligand) to be treated by a quantum mechanical (QM)
calculation.
The outer region (e.g., a protein) whose constituents are treated by a CHARMm
forcefield molecular mechanics (MM) calculation.
Typically, a quantum mechanical treatment is desired for the ligand and surrounding
residues because it is highly precise (e.g., if a chemical reaction occurs or polarization
effects in a protein play an important role). The remaining bulk of the structure is
described using a forcefield. The combination of a QM treatment with a forcefield allows
precise calculations to be carried out on the region of interest without evaluating the bulk
in this computationally expensive manner. A forcefield provides a less time-consuming
method for characterizing the remainder of the structure so that its energy can be
evaluated. The way in which the two regions are allowed to interact and how the total QM-
MM energy is evaluated define the specific QM-MM method employed. Special care must be
taken in handling the interactions at the boundary zone when covalent bonds between QM
and MM bonds are intersected.
Introduction
In this tutorial, three docking poses from the Flexible Docking protocol of the 1AQ1. pdb
complex are utilized. The poses selected are the three best poses based on a ranking of
RMSD to the X-ray pose. The protein residues surrounding the ligand are allowed to move
so each input file has a unique protein structure.
-105-
Preparing the structures
Running the Calculate Energy (QM-MM) protocol
Analyzing the results of the energy calculation
Preparing a pose for minimization
Running the Minimization (QM-MM) protocol
Analyzing the results of the minimization
Preparing the structures
1. To open the input structures
In the Files Explorer, navigate to and double-click the 1AQ1_pose01.msv,
1AQ1_pose10.msv, and 1AQ1_pose13.msv data files.
This opens the data files in three individual 3D Windows.
Press CTRL+H.
This opens the Hierarchy View in the 3D Window.
2. To create two groups: one for the ligand and one for the protein
In the Graphics View, click the 1AQ1_pose01 - 3D Window tab to make it active. In
the Hierarchy View for 1AQ1_pose01, expand 1AQ1_pose_01 and click the chain
labeled protein01 to select it.

From the menu bar, choose Edit | Group....
This opens the Edit Group dialog.
On the dialog, in the Group name text field, enter protein01.

Click the Define button.

Next, repeat these steps for the ligand chain by selecting it in the Hierarchy View and
naming the group pose01.

In the Graphics View, click the 1AQ1_pose10 - 3D Window tab to make it active.
Create a group called protein10 for the protein chain and another called pose10 for
the ligand chain.

Repeat this exercise for 1AQ1_pose_13 and maintain the naming conventions.
Note. Each complex will already have two groups listed in the Hierarchy View for the
fixed and flexible residues. These will not be used in this tutorial.
-106-
3. To type the complexes with the CHARMm forcefield
In the Graphics View, click the 1AQ1_pose01 - 3D Window tab to make it active.

Deselect anything that was selected by clicking an empty area of the 3D Window.

In the Tools Explorer, choose Simulation from the dropdown list.
On the ForceField tool panel, ensure that the Forcefield is set to CHARMm and click
the Apply Forcefield button.
The Status message in the tool panel reports 1AQ1_pose01 typed with CHARMm.
Repeat this process for the two remaining complexes, 1AQ1_pose10 and
1AQ1_pose_13.
Running the Calculate Energy (QM-MM) protocol
1. To open the protocol and set the parameters
In the Protocols Explorer, expand the Simulations folder and double-click the
Calculate Energy (QM-MM) protocol.
In the Parameters Explorer, click the Input Typed Molecule parameter and choose
1AQ1_pose01: 1AQ1_pose_01 from the dropdown list.

Click the Quantum Atoms parameter, and choose pose01 from the dropdown list.
For this tutorial we will leave the remaining options set to their defaults.
2. To run the job
Repeat this process for the two remaining complexes, 1AQ1_pose10 and
1AQ1_pose_13.
Depending on the speed of your workstation, each protocol will take between 15 and
45 minutes to complete.

The Job Completed dialog displays for each job when complete.
-107-
Analyzing the results of the energy calculation
The calculations performed return a series of energy values that can be used to rank order
the ligand poses. For this tutorial, the QM-MM interaction energy is the quantity of interest
and we will compare this with the CHARMm interaction energy.
1. To review the results
In the Jobs Explorer, double-click each completed job.
This opens the Repor t . ht mfiles in separate Html Windows.
The following table lists the rank ordering of poses based on RMSD to the X-ray
structure, QM-MM Interaction Energy and MM Interaction Energy:
Pose RMSD QM-MM Int E CHARMm Int E
Pose13 0.85 (1
st
) -94.6 (1
st
) -129.2 (3
rd
)
Pose10 1.55 (2
nd
) -92.2 (2
nd
) -135.3 (2
nd
)
Pose01 2.54 (3
rd
) -82.2 (3
rd
) -138.0 (1
st
)
In this example, the rank ordering of the poses by RMSD to the X-ray structure is
Pose13 Pose 10 Pose01. The QM-MM interaction energy (the lowest energy is
ranked first) rank ordering matches the rank ordering by RMSD. The energy
calculations using the CHARMm forcefield alone did not evaluate the poses in the same
order as the RMSD. This indicates that the combination of the QM-MM methods with
the CHARMm forcefield leads to more accurate identification of the pose (in this case
Pose13) present in the crystal structure.
Preparing a pose for minimization
1. To open the input structure
Close all the open windows from the energy calculation, saving the files when
prompted.

From the Files Explorer, reopen 1AQ1_pose_13.msv from the Input folder of the
appropriate CalculateEnergy(QM-MM) folder in the Files explorer.
2. To add a fixed atom constraint for all atoms in the protein
On the ForceField tool panel, ensure that Forcefield is set to CHARMm and the Status
message in the tool panel reports 1AQ1_pose_13 typed with CHARMm.
In the Hierarchy View, click the protein13 group to select it.

In the Tools Explorer, in the Constraints tool panel, click the Create Fixed Atom
Constraint button.
-108-
A series of small blue spheres appear in the 3D Window, corresponding to this new
constraint of all protein atoms, along with a new entry in the Hierarchy View labeled
Fixed:fix_1_protein13.
Running the Minimization (QM-MM) protocol
1. To open the protocol and set the parameters
In the Protocols Explorer, expand the Simulations folder and double-click the
Minimization (QM/MM) protocol.
In the Parameters Explorer, click the Input Typed Molecule parameter and choose
1AQ1_pose_13: 1AQ1_pose_13 from the dropdown list.

Click the Quantum Atoms parameter and choose pose13 from the list.

Expand the Advanced | Minimization parameters group, click the Minimization
Constraints parameter, and choose Fixed:fix_1_protein13 from the list.
This completes the parameters set up in preparation for the protocol run. The accuracy
of the DMol
3
part of the calculation remains set to Medium. Finer calculations require
more time while coarser calculations are likely to be less accurate.
2. To run the job
The job takes several hours to complete on a Pentium 4, 2Gb RAM, 2.8GHz machine.
Analyzing the results of the minimization
1. To review the results
This opens the Repor t . ht mfile in an Html Window.
For the QM-MM results, observe the initial and final QM-MM Interaction Energy values.
In the Report.htm file, right-click the link to the output .msv file and choose Save
Target As... from the context menu.
On the dialog, save the file as 1AQ1_pose13_qmmm_opt.msv.

-109-
2. To calculate RMSD values for the input, output and X-ray poses
In the Files Explorer, navigate to and double-click the 1AQ1_pose_xray.msv data
file.
This opens the X-ray structure in a separate 3D Window.
From the menu bar, choose File | Insert From | File.
This opens an Open dialog.
Select the other two ligands, 1AQ1_pose13 and 1AQ1_pose13_qmmm_opt.

The 3D Window now contains all three ligand conformations.
From the menu bar, choose Structure | RMSD | Heavy Atoms.
This opens a new RMSDReport tab, which shows an RMSD matrix of values for the
three conformations. The RMSD values are also reported in the Molecule tab of the
Data Table as a series of columns labeled Heavy Atom RMSD to ....
Save the new 3D Window with the three ligand conformations as an MSV file. From the
menu bar, choose File | Save As..., and enter 1AQ1_3poses as the file name.

You will observe that the RMSD decreases from 0.8 to 0.6 after the QM-MM
minimization is performed.
For a full description of the RMSD calculation, refer to the Structure | RMSD Help topic.
3. To compare the energy before and after minimization
Close all open Windows.

Reopen the Report.htm files in the Output folders of the CalculateQUANTUMmEnergy
(pose 13) and Minimization(QM-MM) folders created during this tutorial.

In the Minimization(QM-MM) - Html Window, scroll to the bottom and locate the
Initial QM/MM Interaction Energy.
This should be the same as the QM/MM Interaction Energy reported in the Calculate
Energy(QM-MM) - Html Window.
Compare the Initial QM/MM Interaction Energy in the Minimization(QM-MM) -
Html Window with the QM/MM Interaction Energy reported in the same window.
-110-
The QM/MM interaction energy after minimization is lower than the corresponding
initial energy, for example:
Initial QM/MM Interaction Energy
QM/MM Interaction Energy
(after minimization)
-94.6 kcal/mol -116.1 kcal/mol

Note. There may be slight differences in the results depending on the operating
system used.

-111-
Appending a Final Minimization Stage to the Standard
Dynamics Cascade protocol
Purpose: Learn how to customize a protocol by modifying the Standard Dynamics Cascade
protocol.

Required functionality and modules: Discovery Studio and Discovery Studio Developer client.

Required data files: none

Time: 30 minutes.
Background
Discovery Studio user can take advantage of the combined capabilities of Discovery Studio and
Discovery Studio Developer to create and customize protocols. Using the Discovery Studio
Developer, a custom protocol designer can access the individual DS/PP components and
parameters directly, modifying and promoting the appropriate parameters to the top level of the
protocol, and publish the custom protocol for the use in Discovery Studio.
Introduction
The Standard Dynamics Cascade (SDC) protocol is a Discovery Studio protocol. It runs a CHARMm
simulation cascade including minimization and dynamics stages. This custom protocol example
tutorial uses the Pipeline Pilot Client application to add another pipeline that selects the N lowest
potential energy conformations from the trajectory frames generated from the SDC protocol,
minimize those conformations and write out to a mol2 file.
Opening a protocol
Building a pipeline
Setting parameters
Promoting parameters
Testing the protocol and publishing in Discovery Studio
Opening a protocol
Open Discovery Studio Developer client.

In the Protocols tab, navigate to Accelrys Discovery Studio/Discovery
Studio/Simulation/Standard Dynamics Cascade.

Double-click to open the Standard Dynamics Cascade protocol.
The Standard Dynamics Cascade (SDC) protocol is displayed in the workspace, shown as pipeline
1.
-112-

In the second pipeline, click Add Help to Output Files and delete it.
The Add Help to Output Files component adds some description about the output files in the report.
For simplicity this component is not used for this tutorial.
For this tutorial, you will build the second pipeline for the purpose of selecting a number of low
energy conformations from the MD trajectory file generated by the first pipeline (the SDC
protocol), minimize those conformations, and write them out to a mol2 file. The pipeline will look
like the following after it is completed.

Building a pipeline
Switch to the Components tab in the Protocols Explorer. Open the Accelrys Discovery Studio
folder, and open the Readers folder.

Double-click (or drag and drop) to open the DS Molecule Reader component into the workspace, it
is shown as the first component of pipeline 2.
Note. You should use the DS Molecule Reader component because this reader component includes
the Read Conformations parameter that you can use to read the conformations of the MSV file.
While still in the Components tab, search and double-click to open the Count and Index Data
component to append to the pipeline. Start typing "Count" in the search bar at the top, and hit F3
key repeatedly until you see the Count and Index Data component. Double click it to add it.
It will be connected to the DS Molecule Reader component automatically.
-113-
Similarly, search and open the Top N Filter, and then the CHARMm Minimization components.

Search and open the Mol2 Writer component.
It will be automatically connected to the CHARMm Minimization component.
This completes the second pipeline. Notice three components are shown in red, which means there
are required parameters needed to be set.

The rest of the tutorial steps involve editing the parameters in each of the component to specify
appropriate parameter values, and promoting the parameters up to be exposed at the protocol
level.
Setting parameters
Click the Update MSV component in the workspace.
The purpose of this component is to update the MSV file after the SDC run.
Note. In the Parameters window below, the Output MSV File parameter is set to
$(RunDirectory)/$(Input Molecule).msv. This is the file path that you will use as the Source
parameter for the DS Molecule Reader component.
Copy and paste this file path to the Source parameter of the DS Molecule Reader component.

Open the Source parameter in the Parameters Explorer and set the parameter Read
Conformations to True.
The following shows the parameters setting:

Click the next component, Count and Index Data.
-114-
This component will put a property named "Index" on the data stream to hold the data record
count. The data record in our case is the frame number (conformation number) of the trajectory
MSV file. Nothing needs to change in this component.
Click the next component, Top N Filter.

Enter the following in the Expression parameter: Property('Potential Energy')
This specifies that the Potential Energy property value will be used as criteria to select the data
records sent out to the Pass port.
Change the Max or Min parameter to Min.
Now promote the Number to Keep parameter in this component to the top level of the protocol, so
that user can have access to change this parameter value at the protocol level.
Right-click on the Number to Keep parameter and select the Edit command from the context
menu.
This launches the Edit dialog:

Click the Number to Keep parameter and click the Promote Parameter button in this dialog.
-115-
This brings up the Promote Parameter dialog.
Click the OK button to accept the default parameter name. Click OK.
The Number to Keep parameter has been promoted to the parent protocol level.
Next, click the Mol2 Writer component. Enter the following for the Destination parameter:
$(RunDirectory)/$(Input Molecule)_min.mol2
This specifies the mol2 file name for saving the final minimized conformations.
Open the Additional Options parameter and set the WriteProperties parameter to True.
Notice in the workspace, no more components are shown in red (i.e., their required parameters are
set).
Promoting parameters
Click the white space of the protocol workspace.
The protocol level parameters currently are those parameters from pipeline 1 (the SDC protocol)
and the parameter Number to Keep that was promoted from the Top N Filter component in pipeline
2. None of the parameters from the CHARMm Minimization component of pipeline 2 are promoted
to the protocol level.
If you want the user to be able to access these parameters for this final minimization at the
protocol level, you can promote the parameters in this CHARMm Minimization component to the
top protocol level, with the same procedure for promotion as described above.
Click the CHARMm Minimization component. Right-click and select Edit.
See that there are five parameters under the Minimization group parameter: Minimization
Algorithm, Minimization Max Steps, Minimization RMS Gradient, Minimization Energy Change,
Minimization Constraints. If desired, you could promote each parameter individually.
Select the Minimization Algorithm parameter and click the Promote Parameter button.

In the dialog shown below, select Link parameter to a new parameter on the parent and enter
a name for the parameter.

Click OK.
-116-

Similarly, promote the other parameters: Minimization Max Steps, Minimization RMS
Gradient, and Minimization Energy Change.

Create a new parameter called Final Minimization, where the Parameter type is set to Group
Type, and then drag those promoted parameters into it.
In the protocol workspace, right-click to Edit.

In the Edit dialog, click the Add Parameter button.

In the Define Parameter dialog, fill in the appropriate selection as shown in the screenshot
below.

Click OK.
-117-

In the Edit dialog, move up the Final Minimization parameter, and drag and drop those promoted
parameters into it.

Tip.When you select a parameter and drag it around, notice the horizontal bar that is displayed. It
moves up and down as you drag an item in the list. Use it as a visual Indicator of where the item
will be placed in the list when you release the mouse button. Parameter name may be highlighted
when you drag an item up and down the list, when you release the mouse button when the
parameter is highlighted, the parameter you are moving is placed as a child of that highlighted
parameter.
You can use Pipeline Pilot Client Help > Scitegic Server Home Page >SciTegic Help Center (User
Resources) for online documentation. Search for 'parameter group' and read more details of the
sections Parameter Groups, Dragging and Dropping to Group Parameters.
The following illustrates the new parameter grouping:
-118-

You can use the same procedures to promote other parameters in the CHARMm Minimization
component to the protocol level.
Open the Edit dialog, click the Help Text tab and enter text to describe how the protocol or
component works.
To test run a protocol from the Pipeline Pilot client
Enter the required parameters and click the now active Run Protocol (green) button. If you want
to save a working copy of the protocol .xml file to a local directory, use the File > Export
Protocol command from the menu.
To publish the custom protocol in Discovery Studio
Choose the File > Save Protocol As command from the menu and use the Up One Level button
to browse to your User Name folder. Name the protocol SDC with Final Minimization.

Click OK.
This will save the protocol to your user protocols folder.
-119-
Start up Discovery Studio.

Connect to the Pipeline Pilot server. If you are already connected, switch to the Protocols tab
and press the F5 key or select Refresh from the context menu to refresh the view.
The new protocol should appear in your user folder in the Protocols tab and is ready to be used just
like the other Discovery Studio protocols.

-120-
Calculating molecular properties on the lowest energy
conformation extracted from a Catalyst database
Purpose: Learn how to use the Discovery Studio Developer client to create a new protocol.

Required functionality and modules: Discovery Studio and Discovery Studio Developer client.

Required data files: none

Time: 30 minutes.
Background
Discovery Studio users can take advantage of the combined capabilities of Discovery Studio and
Pipeline Pilot to create and customize protocols. Using the Pipeline Pilot Client application, a custom
protocol designer can access the individual DS/PP components and parameters directly, modifying
and promoting the appropriate parameters to the top level of the protocol, and publish the custom
protocol for the use in Discovery Studio client.
Introduction
Accessing the Discovery Studio components through the Pipeline Pilot Client also makes it possible
to combine several different steps into a single workflow. In this example, you will use the Search
3D Database component to extract the lowest energy conformer from a Catalyst database and use
these structures to calculate some 3D molecular properties using the components in the Calculate
Molecular Properties protocol.
The Calculate Molecular Properties protocol under the QSAR folder in Discovery Studio calculates
various molecular properties for ligands ranging from traditional molecular descriptors,
semiempricial QM descriptors, density functional QM descriptors, to user-built models. Some of
these descriptors are 2D, but others are 3D descriptors. The input ligands for the protocol can be
specified either from the Discovery Studio interface or from an external sd file.
This tutorial illustrates how this protocol can be combined using the lowest energy conformer
stored in an already built Catalyst database. A Catalyst database is a multi-conformational
database, hence we can sort the compounds according to their energies and extract only the
lowest energy conformation which then can be used as input for calculating molecular properties.
Opening a protocol
Replacing components
Exposing parameters
Ranking structures and retaining the lowest energy conformer

-121-
Opening a protocol
In Discovery Studio Developer client, in the Protocols tab, navigate to the Accelrys Discovery
Studio/Discovery Studio/QSAR/Calculate Molecular Properties.

Double-click to open the Calculate Molecular Properties protocol.
This opens and displays the protocol in the workspace. Notice that the protocol consists of several
pipelines and each pipeline consists of several different components and subprotocols. The
components and subprotocols are labeled so they describe briefly their functions.
The first pipeline sets up the job as it is described in the yellow sticky note. You can show/hide the
sticky note using the toolbar icon or the View > Sticky Note menu option.
The second pipeline specifies the input ligand that will be used by the DS Ligand Reader
component for the property calculation.

In this tutorial, you will modify the input ligands that are passed onto the property calculation, so
the modifications will occur in this pipeline. The rest of the steps carried out in the protocol will be
unchanged.
Replacing components
You will use structures stored in a Catalyst database and to extract the lowest energy conformer
for each structure as input for the property calculation step.To access data in a Catalyst database,
you can use the Search 3D Database component, which will search and extract structures from a
specified database using the catSearch program.
-122-
From the Accelrys Discovery Studio/Discovery Studio/Pharmacophore folder in the
Protocols tab, open the Search 3D Database protocol.
The second pipeline of the Search 3D Database protocol shows the steps necessary to access and
search the database and write out the resulting structures to an sd file in the SD Writer component
at the end of the pipeline.

You will use the same four components to modify the original DS Ligand Reader component of the
Calculate Molecular Properties protocol.
Copy the first three components, Input Hitlist, No-op, DS Ligand Reader, and replace the DS
Ligand Reader of the Calculate Molecular Properties protocol with these three components.

From the Components tab, in the Accelrys Discovery Studio/Pharmacophore/Database
folder, add the 3D Database Search component so the first four components now matches what
the components you saw in the Search 3D Database protocol.

-123-
Exposing parameters
Notice that the added 3D Database Search component is displayed in red. There are no new
parameters exposed to the protocol's top level, which would allow the user to specify the input
database for the protocol. You must individually promote the parameters that are accessible when
running the protocol in Discovery Studio.
Click the Search 3D Database component. Right-click and select Edit.

Select the Input Database parameter and then click the Prompote Parameter button.

Accept the default choice Link parameter to a new parameter on the parent.
This creates a parameter with this name on the top level of the protocol.
To be able to search a subset of compounds from the Catalyst database and also specify the
number of compounds to extract from the database, you need to add two more parameters. The
Maximum Hits parameter will allows the user to enter the number of compounds which will be
returned from the database.
Repeat the steps used to promote Search 3D Database to promote the Maximum Hits
parameter.
To allow only a subset of the database to be searched you can use the Input Hitlist parameter.
Right-click the white space of workspace and choose Edit, and then click on the first cross icon on
the panel.

-124-
Add a new parameter here called Input Hitlist and set the type to URL.

Right-click the white space of the Search 3D Database protocol and click on the Input Hitlist
parameter and click the Edit button. Copy the help text of the corresponding parameter from the
Search 3D Database protocol and paste it in the Help tab.
The following is the help text: "The search will be restricted to the ligands in the database that
have the same Names as these ligands."
Click Delete (red cross icon) to remove the Input Ligands parameter on the Edit dialog.
This parameter is no longer exposed at the top level of the protocol.
Drag the Input Database parameter to the top of the other parameters in the Edit panel.
This makes the Input Database parameter to first input parameter in the protocol.
Drag the Maximum Hits and Input Hitlist parameters so these are under the Input Database.

-125-
Note. You could also drag these so these parameters are a level below the Input Database
parameter.
Since the protocol no longer uses an sd file as input, you need to change the output sd file in the
SD Writer component at the end of the fifth pipeline.
Select the SD Writer component and set the Destination parmater to:
$(RunDirectory)\low_energy_confs_withProperties.sd
Note. If you run this protocol it its current state, it will only take the specified number of
compounds given in the Maximum Hits parameter from the database and return the calculated
properties but it will not select only the lowest energy conformer.
To add an option to consider only the lowest energy conformer, the protocol needs to extract all
the conformers contained in the database for each compound and sort them according to energies
and then keep only the lowest energy conformer.
On the Search 3D Database component, set the What to Output parameter to All
Conformations.
This ensures that all the conformers are extracted for each compound from the database.
Ranking structures and retaining the lowest energy conformer
To be able to rank the structures and retain only the lowest energy conformer, you need to add
some additional steps after the Search 3D Database component before the data is passed onto the
Cache writer in this pipeline. As there are multiple conformers for each structure being extracted
from the database, the first step is to group the molecules by "Name" using the Group Data by Tag
component.
From the Component tab, begin typing "Group Data by Tag" to find this component quickly.

Drag and drop it after the Search 3D Database component.

Click on the component and set the Group Using parameter to Name.
After this, the protocol needs to access the energies for each of the conformers and retain only the
lowest energy conformer.
Add the following components to the pipeline: Ungroup Data, Diverse Conformation
Generator, and Top N filter.
These steps need to be carried out for each compound separately, so you must set them to "Run to
Completion". This option is causes the subprotocol to be re-initialized and executed in its entirety
for each record it processes.
Click the Diverse Conformer Generator component and set the the Run to Completion
parameter to True. Repeat this for the Top N Filter component.
-126-
After grouping the structures according to name to process them we need to ungroup them and
compare the energies.

On the Diverse Conformation Generator component set the Maximum Conformations number to
0.
This enables the protocol to access only the energies for each conformer, but not generate any
further conformation for the structure.
Check that the Discard Existing Conformations parameter is set to False.
The Top N Filter component can be used to sort the data records according to a specific property
and pass only a set number of records at the top or bottom of the list. On the Top N Filter
component in the Expression Line, you can specify the property on which to base the selection. In
this casethe Relative Energy value is used because when the relative energy of the conformer is 0,
it is the lowest energy conformer.
In the Expression line add the following: property('Relative Energy')

Set the Numbers to Keep property to 1 and the Max or Min parameter set to Min.

Select these three components. Right-click and select Collapse to Subprotocol.
This groups the components into a single subprotocol, which allows you to use the Run to
completion option. Notice that all three components are now represented as a single icon.
Rename the subprotocol to reflect the tasks which it carries out: Compute and extract lowest
energy conf. .

Click on this subprotocol, under the Implementation tab, set the RunToCompletion parameter
to True.
-127-
After these changes the pipeline should look similar to the following:

To test run a protocol from the Pipeline Pilot client
To thoroughly test this protocol, you should run it in both Pipeline Pilot and Discovery Studio. To
run the protocol in Pipeline Pilot you must specify the path to the Catalyst database.
For the Input Database parameter, navigate to the following location: <SciTegic installation
directory> /apps/accelrys/ds/public/data/CatalystDB/MiniMaybridge/MiniMaybridge.bdb
After selecting the MiniMaybridge.bdb file, the parameter now shows the following:
data\CatalystDB\MiniMaybridge\MiniMaybridge.bdb
For test purposes, reduce the number under the Maximum Hit parameter to 10.
Run the protocol.
To publish the custom protocol in Discovery Studio
Choose the File > Save Protocol As command from the menu and use the Up One Level button
to browse to your User name folder. Name the protocol SDC with Final Minimization.

Click OK.
This saves the protocol to your user protocols folder.
-128-
Start up Discovery Studio.

Connect to the Pipeline Pilot server. If you are already connected, switch to the Protocols tab
and press the F5 key or select Refresh from the context menu to refresh the view.
The new protocol should appear in your user folder in the Protocols tab and is ready to be used just
like the other Discovery Studio protocols.

-129-
Contacting Accelrys Scientific and Technical Support
Accelrys is committed to providing you with the best overall product experience. We
place the highest priority on our customer interactions and understand that issues
may arise that require technical support. Feel free to contact us by phone or email
and we will direct your inquiry to a support representative.
Americas Support
7am -5pm (Pacific Time)
Tel: 1 800 756 4674 (Toll free within USA)
Tel: 1 858 799 5509
Fax: 1 858 799 5102
Email: support@accelrys.com
Europe Support
09:00 - 17:30 (UK Time)
Tel: +44 845 741 3375 (Local rate in UK)
Tel: +44 1223 228822
Fax: +44 1223 228501
Email: support-eu@accelrys.com
Japan & Asia Pacific Support
10:00 - 17:00 (Japan Time)
Tel: +81 3 3578 3861
Fax: +81 3 3578 3873
Email: support-jp@accelrys.com
Accelrys Advantage
Our online support system, is available 24/7, offering a knowledge base, interactive
request tracking, and personal bug tracking.
Web: http://customer.accelrys.com

Accelrys Locations
San Diego Office
Accelrys, Inc.
10188 Telesis Court, Suite 100
San Diego, CA 92121-4779
U.S.A.
Cambridge Office
Accelrys, Ltd.
334 Cambridge Science Park
Cambridge
CB4 0WN
United Kingdom
Tokyo Office
Accelrys KK
Nishi-shimbashi TS bldg 11F
Nishi-shimbashi 3-3-1, Minato-ku
Tokyo, 105-0003
Japan
Tel: +1 (858) 799-5000
Fax: +1 (858) 799-5100
Web: http://www.accelrys.com
Tel: +44 1223 228500
Fax: +44 1223 228501
Tel: 81 3 3578 3861
Fax: 81 3 3578 3873

-130-

Discovery Studio 2.1 Tutorials Guide

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discovery Studio 2.1 Tutorials Guide

Uploaded by

Copyright:

Available Formats

Discovery Studio 2.

You might also like