DOCK 5.1.0

 

 

10/15/2002

 

 

 

Demetri Moustakas

Kuntz Laboratory, UCSF

demer@francisco.compchem.ucsf.edu

 

 


Foreward

 

I would like to thank a lot of people for their help and support with this project.   Members of the Kuntz and Kollman labs provided many fruitful discussions and advice during the initial design stages of the code, and helped me with debugging and validation of the DOCK 5 modules as they were being developed.  I owe a debt of gratitude to group members past and present whose ideas, advice, and lively debates have shaped this project and helped make DOCK 5 what it is today.  Specifically, I would like to acknowledge people who contributed significantly to the project.  Fernando Martin designed and wrote the simplex minimizer and the optimization framework classes.  Scott Pegg has worked on code and algorithm optimization, leading to significant speedup in the code.  Geoff Skillman was instrumental in the early design stages; I owe him particular thanks for introducing me to the OELib, which became the basis for the DOCK 5 architecture.  All my thanks to OpenEye Scientific Software: Matt Stahl, Ant Nicholls, Geoff Skillman, Mark McGann, Joe Corkery, and Roger Sayle who answered countless questions, and provided a wealth of chemistry, physics, and programming advice.  Xiaoqin Zou provided me with the SDOCK source code, which was incorporated into the GB/SA scoring class.  Jim Frazine provided me endless help with many matters related to Linux and SGI systems, and built and rebuilt test clusters to get the MPI code working.  I have to thank my wife Katie for putting up with me these last few years through many late nights in front of the computer.  And finally I would like to think Tack, for allowing me the opportunity to work on DOCK, and for providing many wonderful experiences these past few years.


Introduction

            This is the release of DOCK 5.1.0.  This is the lastest full release of Dock that is built on the new C++ codebase.  It contains all of the major dock functionality from DOCK 4, as well as a number of new functions.  The next minor release will include MPI parallization, and an interface to the ZAP PB/SA library from Openeye Scientific software.  All DOCK 5 licensees will be alerted of the incremental releases via email, and notices will be posted on the DOCK web site as well.

 

This release contains ligand I/O, rigid orienting, anchor first search, energy & contact scoring, GB/SA scoring, simplex minimization.  The main additions to this release are:

 

·        Automated matching

·        Internal energy (used in flexible docking)

·        Scoring function hierarchy

·        New minimizer termination criteria

·        Bugs addressed in

o       Minimzer & optimizer classes

o       GB/SA scoring

o       Anchor & Grow

o       Rigid orienting

o       Atom typing

 

This version of DOCK is written in C++, and each of the major DOCK functions has been implemented as a class.  Most classes are designed for maximum ease of debugging and validation, and are continually being optimized for performance.  The Dock 5 developers manual will describe the API for each class in detail.  The Dock 5 code manual will describe in detail the data structures and algorithms used in each class.  These additional manuals will be completed soon, and released with the first minor incremental release.

 

I would like to ask for feedback in several areas.  Please report any bugs to demer@francisco.compchem.ucsf.edu.  Additionally, please report any suggestions for new features, or new ways to combine or use the existing features.  Thanks, and happy docking! 


General Overview

            The major features of DOCK 5.1.0 include rigid orienting of ligands to receptor spheres, AMBER energy scoring, GB/SA solvation scoring, contact scoring, Internal non-bonded energy scoring, ligand flexibility, and both rigid and torsional simplex minimization.  Each DOCK function is implemented as a C++ class, and molecules are represented by a molecule class (based on the OElib’s OEMol) that are passed from one functional class to another.  Much of the theory of the DOCK functions is described in the DOCK 4 manual, in the advanced section.  I recommend users wanting to know more about the theory behind the algorithms refer to it.

 

Ligand File I/O

            Currently, only MOL2 file I/O is supported.  Ligands are read in from a single MOL2 database file.  Atom and bond types are assigned using the DOCK 4 atom/bond typing parameter files (vdw.defn, flex.defn, flex_table.defn).  There are several ligand output options, which write molecules to files whose names are formed using the output_file_prefix parameter:

 

Users can choose to write out orientations.  This will create a file called outputprefix_orients.mol2.  This will write out the molecules after they have been rigidly oriented and optimized.  If anchor & grow is being used, this option will write out only the anchor fragment.  All orientations generated will be written out, so be careful that the output doesn’t get too huge.

 

Users can also write out conformers prior to final optimization.  This will create a file called outputprefix_confs.mol2.  Again, be aware that the number of molecules in the output file will be equal to the database size * the # of anchors per molecule * the number of orientations per anchor * the number of conformers per cycle.  This file can grow quite large, so only use it on small databases.

 

DOCK will always write out a scored molecules output file, which contains the best scoring pose for each molecule in the database.  This will create a file called outputprefix_scored.mol2.  In DOCK 5.1.0, users can use molecule ranking, which writes out the top N molecules in the database in a file called outputprefix_ranked.mol2.  This option disables the scored molecule output file by default, though users can override this and write out the best pose for each molecule as well.

 

            The ligand class also handles the MPI parallelization of DOCK over SMP and distributed clusters.  When DOCK is compiled and run in parallel mode, a master processor distributes molecules to client processors, each of which performs the desired docking.  The client node returns the top score list molecules to the master node, to be written out to a file (default filename = output_mpi.mol2).  Due to discrepancies in the different MPI implementations, it was not possible to easily use a commandline flag to enable or disable MPI.  Therefore, there is a #define statement in the dock.cpp file that enables or disables MPI.  Therefore DOCK5 will compile into single processor and MPI versions.  The code for this release is set to disable MPI, while a bug related to parallel file access is worked out.  This will be fully functional in the next incremental release.

 

Rigid Orienting

            DOCK 5 uses receptor spheres and ligand heavy atom centers to rigidly orient ligands in the receptor.  Cliques of receptor spheres & ligand centers are identified using the maximum subgraph clique detection algorithm from DOCK 4.  All cliques that satisfy the matching parameters are generated in the matching step, and can be sorted or ordered prior to the loop where the program cycles through the orientations.  This leaves open the possibility for the orientational sampling of the site to be directed by a function (e.g. uniform sphere sampling, uniform Cartesian sampling, spatially weighted, etc…).  For details on the theory of sphere matching, please see the included DOCK4 manual.

 

            Both automated and manual matching are available in DOCK5.  The sphere/center matches are determined by 2 parameters:

 

1)      The distance tolerance is the tolerance in angstroms within which a pair of spheres is considered equivalent to a pair of centers

2)      The distance minimum is the shortest distance allowed between 2 spheres (any sphere pair with a shorter distance is disregarded)

 

Manual matching will create as many matches as possible given the specified parameters, and sort the matches according to the RMS error between the spheres and centers in the match.  The matches are provided as orientations until either the max_orients # of orientations are reached, or the end of the match list is reached.

 

            Automated matching will start with the default values for the distance tolerance and distance minimum.  A list of matches will be generated, and if the # of matches is less than the # max_orientations, then the distance tolerance is increased and the matching is repeated until there are at least max_orientations in the match list.  Then the list is sorted, and orientations are generated.

 

 

Ligand Flexibility

            Ligand flexibility in DOCK 5 uses an anchor first search introduced in DOCK 4.  Rotatable bonds (not contained in rings) are used to partition the molecule into rigid segments, from which all anchors that meet the criteria are selected beginning with the largest anchor segment.  If no segments meet the anchor criteria, the largest segment is selected as the only anchor.  All anchor orientations (or the starting orientation only, if no orienting is selected) are used as starting configurations onto which the first flexible layer is appended and conformationally expanded.  The total population of conformers is then reduced to the number specified in Nc, and the process is repeated until the last layer is reached.

 

            The conformer generator class now integrates score optimization in the anchor & grow algorithm.  The anchors can be rigidly optimized, the final conformations can be either rigidly, torsionally, or completely optimized, and the partially grown conformers can be completely optimized.  Additionally, a look ahead heuristic designed to optimize the conformation-pruning step has been developed, and is currently being validated.  It will be included (pending validation) in an incremental release.  The anchor & grow steps use whichever scoring function the user selects as the primary scoring function.  The final minimization step uses the secondary scoring function.

 

Scoring Functions

            This release of DOCK5 implements a hierarchical scoring function strategy.  A master score class manages all scoring functions that DOCK uses.  Any of the DOCK scoring functions can be selected as the primary and/or the secondary scoring function.  The primary scoring function is used during the rigid minimization, and anchor & grow steps, which typically make many calls to the scoring function.  The secondary scoring function is used in the final minimization, scoring, and ranking of the molecules.  If no secondary scoring function is selected, the primary scoring function is used as the secondary.

 

            This release contains intermolecular AMBER energy scoring (vdw + columbic terms only), contact scoring and bump filtering as implemented in DOCK 4.  It also contains GB/SA scoring, as implemented in SDOCK, by Dr. Xiaoqin Zou (ZouX@missouri.edu).  The scoring functions currently only compute grid based scores; continuum scoring for the AMBER energy score will be implemented in an incremental release.  Scoring grids are created using the GRID program distributed with DOCK 4.  Scoring grids for GB/SA require that the SDOCK accessory chemgrid be run.  This program is included in the utilities/GBSA_Grids/ directory, for both Linux and SGI platforms.  There is a README file in this directory with instructions on creating GB/SA grids.

 

            One important note regarding the implementation of the scoring function classes is that each class is implemented as a completely separate class from the other scoring functions.  This requires that during parameter input, a path to the grid prefix needs to be supplied to each scoring function.

 

            This release also includes an internal energy scoring function, that is used during the anchor & grow flexible search.  This function computes the Lennard-Jones and columbic energy between all ligand atom pairs, excluding all 1-2, 1-3, and 1-4 pairs.  This energy is not included in the final reported score.

 

Score Optimization

            Score optimization is implemented using a simplex minimizer based on the DOCK 4 minimizer.  Users can choose to minimize the rigid anchors, minimize during flexible growth, and minimize the final conformation.  The anchor minimization is always done rigidly; also, if no flexible growth is being done, this step will minimize the entire molecule.  The minimization during the flexible growth is a complete (torsions + rigid) minimization.  The final minimization can be rigid, torsions only, or complete.  There are two termination criteria that the simplex minimizer can use to end minimzation before the maximum number of iterations has been reached.  One is a window based termination scheme that evaluates a window of steps in the minimzation, and terminates the minimzation when the largest difference between the energies in the window is within a user-specified tolerance.  The other termination criteria is the scaled range termination scheme.  This is the termination criterion used in DOCK4, where the difference between the highest and lowest point in the simplex is compared to a tolerance specified by the user.  When the simplex “shrinks” enough so that the highest and lowest points are within the tolerance, the minimizer terminates.  Unlike the previous version of DOCK5, the minimizer will optimize any scoring function that is used as the primary or secondary score.

 


User instructions

 

Installation Instructions

            This DOCK 5 beta release has been built and tested on SGI, linux (both AMD and Intel chips), and windows 2000 (Intel chips) platforms.  I have not included the windows distribution in this release, however I can provide it to any user who desires it, and it will be provided by default in all future beta releases.  Binaries are included for Irix and Linux platforms, and makefiles for each platform are included.  The binaries are located in the bin/ subdirectory.  If the binaries work on your system, and you have no desire to recompile the program, feel free to skip to the rest of this section.  Otherwise I’ll assume you have either a good spirit of adventure, or the need to compile DOCK 5 on a system other than the ones listed above.  In the event the latter is the case, please feel free to contact me regarding compilation problems/successes on different platforms. 

 

The dock5 directory contains the following subdirectories:

 

REQUIRED_LIBRARIES/

bin/

demo/

docs/

mpich/

oelib/

parameters/

src/

utilities/

            accessories/

            grid/

            GBSA_Grids/

 

DOCK 5 is built upon two libraries.  The first is the OELib, provided by OpenEye scientific software (www.eyesopen.com).  The version of the OELib used by DOCK 5 is open source, and freeware.  Redistribution is restricted to use allowed by the GNU public license, or through arrangement with OpenEye.  The second required library is the MPICH library, provided freely by Argonne National Labs (http://www-unix.mcs.anl.gov/mpi/mpich/).  The MPI library must be built in order to compile DOCK 5, however it only needs to be installed and running on the system if the MPI features are to be used.

 

            The directory REQUIRED_LIBRARIES/ contains tar.gz archives of both the oelib/ and the mpich/ install directories.  The directories oelib/ and mpich/ contain the unpacked install directories for each library.  If the libraries are built in these directories, then the provided makefiles should work with no modification.  If the library locations are customized, then the makefile include and library paths will require modification.  Since the libraries need to be built specifically for one computing platform, if you plan to compile DOCK 5 on multiple platforms, it is advisable to create one copy of the dock_v5.0b1 directory for each platform you wish to compile on.  Above all else, make sure that the platform you are compiling DOCK 5 on is the same platform used to build the required libraries.

 

Building the OELib:(on both SGI & Linux platforms)

            From the dock_v5.0b1 directory:

            cd oelib

            ./configure

            make

            make install

 

Building MPICH: (on SGI platforms)

            From the dock_v5.0b1 directory:

            cd mpich/

            ./configure --with-arch=IRIXN32

            make

 

Building MPICH: (on Linux platforms)

            From the dock_v5.0b1 directory:

            cd mpich/

            ./configure

            make

 

            Once the required libraries are built, change into the src/ directory.  There are two makefiles provided (Makefile.sgi & Makefile.linux), that differ primarily by the use of the CC compiler on SGI platforms, and the g++ compiler on Linux platforms.

 

Building DOCK 5: (all platforms)

            From the dock_v5.0b1 directory:

            cd src/

            make –f  Makefile.(sgi or linux)  clean

            make –f  Makefile.(sgi or linux)  dock

            make –f  Makefile.(sgi or linux)  install

 

            the install command will move an executable named dock5.sgi or dock5.linux into the bin/ directory, where it will be ready for use.

 

            To build the utilities, simply change into the utilities/accessories directory, and type:

make all

 

Then change into the utilities/grid directory, and depending on whether you are using a linux or SGI system, type either:

make –f   Makefile.linux   grid

or:

make –f   Makefile.sgi   grid

 

This will install all of the dock utilities (grid, sphgen, showsphere, etc…) into the bin directory.  See the DOCK 4 manual for instructions on how to use these programs.

 

Running DOCK 5

            DOCK 5 reads a parameter file containing field/value pairs similar to the DOCK 4 infile.  The program is run as follows:

 

            ./dock5  -i   parameter.in  [-v1]   [-v2]

 

If the parameter file exists, any parameter values found will be read, and any required but not found will be queried to the user via stdin/stdout.  An important note regarding MPI use is that the stdin/stdout interfaces are disabled across MPI, therefore the parameter file must be complete in order to work properly.  It is advisable to test the parameter file on a single processor job prior to launching an MPI job.  If an MPI job is launched with missing parameters, the job will wait indefinitely on user input for the missing parameters.  The next beta release will determine whether the program is running as an MPI job, and return an error if missing parameters are present.

 

            DOCK 5 outputs the job parameters to the screen at the start of the job, and prints summary information for each molecule processed.  Additional summary information will be included in future releases.  The –v1 flag turns on low level verbosity.  This will print out a histogram of sphere matching information, as well as other useful output that will be added in the future (minimization statistics, molecule statistics, etc…).  The –v2 flag turns on high level verbosity, printing details about the breakdown of the GB/SA terms, and in the future, atom type, bond type, and atom by atom breakdown of energy scores.

 

DOCK 5 Parameters

            The DOCK 5 parameter parser requires that the values entered for a parameter exactly match one of the legal values if any legal values are specified.  For example:

 

param_a                 [5] ():

            param_b                 [5] (0 5 10):

 

Param_a can be assigned any value, however param_b can only be assigned 0, 5, or 10.  If no value is entered, both will default to a value of 5.  Below are listed all DOCK 5 parameters, their default values, legal values, and a brief description of each.  The parameters are listed in order of function.  Also, for questions requiring a yes/no answer, please use the full word (yes or no) as opposed to y or n.  Its inconvenient, but prevents problems with the parser in the long run.

 

Ligand I/O Parameters

 

Parameter Name

Default Value

Legal Values

Description

ligand_atom_file

database.mol2

 

The ligand input filename

ligand_outfile_prefix

output

 

The prefix that all output files will use

write_orientations

no

yes, no

Flag to write orientations

write_conformations

no

yes, no

Flag to write conformations

calculate_rmsd

no

yes, no

Flag to perform an RMSD calculation between the final molecule pose and its initial structure.  This value is reported in the outfile_scored.mol2 file

rank_ligands

no

yes, no

Flag to enable a ligand top-score list.  These ligands will be written to outfile_ranked.mol2, and outfile_scored.mol2 will be empty by default

max_ranked_ligands

500

 

The # of ligands to be stored in the top score list

scored_mol_output_override

no

yes, no

This flag causes all ligands to be written to outfile_scored.mol2, even when rank_ligands is true

max_send_queue_size

10

 

The maximum number of ligands sent in a workunit to an MPI client

max_recv_queue_size

10000

 

The maximum number of ligands returned in one message from an MPI client

 

Orient Ligand Parameters

 

Parameter Name

Default Value

Legal Values

Description

orient_ligand

no

yes, no

Flag to orient ligand to spheres

automated_matching

no

yes, no

Flag to perform automated matching instead of manual matching

distance_tolerence

0.25

 

The distance tolerance applied to each edge in a clique

distance_minimum

2.0

 

The minimum size for an edge in a clique

nodes_minimum

3

 

The minimum # of nodes in a clique

nodes_maximum

10

 

The maximum # of nodes in a clique

receptor_site_file

receptor.sph

 

The file containing the receptor spheres

max_orientations

1000

 

The maximum # of orientations that will be cycled through

 

Flexible Ligand Parameters

 

Parameter Name

Default Value

Legal Values

Description

flexible_ligand

no

yes, no

Flag to perform anchor first search

min_anchor_size

10

 

The minimum # of heavy atoms for an anchor segment

number_confs_per_cycle

25

 

The maximum number of conformations carried forward in the anchor & grow search

 

Scoring Ligand Parameters

 

Parameter Name

Default Value

Legal Values

Description

bump_filter

no

yes, no

Flag to perform bump filtering

bump_grid_prefix

grid

 

The prefix to the grid file(s) containing the desired bump grid

max_bumps

0

 

The maximum allowed # of bumps for a molecule to pass the filter

score_molecules

no

yes, no

Enables scoring of molecules

energy_score_primary

no

yes, no

Flag to perform energy scoring as the primary scoring function

energy_score_secondary

no

yes, no

Flag to perform energy scoring as the secondary scoring function

vdw_scale

1

 

Scalar multiplier of the vdw energy component

es_scale

1

 

Scalar multiplier of the electrostatic energy component

nrg_grid_prefix

grid

 

The prefix to the grid files containing the desired nrg grid

contact_score_primary

no

yes, no

Flag to perform contact scoring as the primary scoring function

contact_score_secondary

no

yes, no

Flag to perform contact scoring as the secondary scoring function

contact_cutoff_distance

4.5

 

The distance threshold defining a contact

contact_clash_overlap

0.75

 

Contact definition for use with intramolecular scoring

contact_clash_penalty

50

 

The penalty for each contact overlap made

cnt_grid_prefix

grid

 

The prefix to the grid files containing the desired cnt grid

gbsa_score_primary

no

yes, no

Toggles whether or not to use GB/SA scoring as the primary scoring function

gbsa_score_secondary

no

yes, no

Toggles whether or not to use GB/SA scoring as the secondary scoring function

gb_grid_prefix

gb_grid

 

The path to the pairwise GB grids

sa_grid_prefix

sa_grid

 

The path to the SA grids

screen_file

screen.in

 

GB parameter file for electrostatic screening.  Its located in the parameters dir by default

solvent_dielectric

78.300003

 

The value for the solvent dielectric

vdw_grid_prefix

grid

 

The path to the dock4 nrg grids, used for the vdw portion of the GB/SA calculation

 

Score Optimization Parameters

 

Parameter Name

Default Value

Legal Values

Description

minimize_ligand

no

yes, no 

Flag to perform score optimization

minimize_rigid_anchor

no

yes, no 

Flag to perform rigid optimization of the anchor

minimize_layer_growth

no

yes, no 

Flag to perform complete optimization during conformational search

minimize_final_pose

yes

yes, no 

Flag to perform minimization of the final ligand pose

minimze_final_pose_rigid

no

yes, no

Flag to perform rigid minimization of the final pose

minimze_final_pose_rigid

no

yes, no

Flag to perform torsional minimization of the final pose

minimize_final_pose_complete

yes

yes, no

Flag to perform complete minimization of the final pose

minimizer_choice

0

0, 1

Chooses whether to use the Simplex (0) minimizer or none (1).  This will allow other minimizers to be used in the future

initial_translation

1.0

 

Initial translation step size

initial_rotation

1.0

 

Initial rigid rotation step size

initial_torsion

10.0

 

Initial torsion angle step size

maximum_iterations

100

 

Maximum # of simplex iterations / cycle

maximum_function_calls

500

 

Maximum # of function calls / cycle

window_based_termination

no

yes, no

Flag to use the score window termination criteria

window_size

55

 

The width of the window (the # of iterations)

window_delta

1.0

 

The threshold energy for the scores in the window- when the highest score – lowest score is less than window_delta, the minimizer will terminate

scaled_range_termination

no

yes, no

Flag to use the scaled range termination criteria (the DOCK4 termination criteria)

scaled_range_fsize

0.0

 

The maximum score value to be considered in the scaled range calculation

scaled_range_tolerance

1.0

 

When the fraction

(hi – low)/max(hi, fsize)  is less than tolerance, the function terminates (where hi and low are the higest and lowest score values in the simplex)

multiple_simplex_cycles

no

yes, no

Flag to use multiple cycles of minimization

maximum_cycles

5

 

Maximum # of minimization cycles allowed

random_number_generator

0

0, 1

Choice of internal RNG (0) or system RNG (1)

random_number_seed

2002

 

Seed for RNG

 

Atom & Bond Typing Parameters

 

Parameter Name

Default Value

Legal Values

Description

atom_model

all

all, united

Choice of all atom or united atom models

vdw_defn_file

vdw.defn

 

File containing vdw parameters for atom types

flex_defn_file

flex.defn

 

File containing bond definition parameters

flex_drive_file

flex_drive.tbl

 

File containing conformational search parameters

calc_internal_energy

no

yes, no

Flag to calculate the interal energy (only used during the anchor & grow)

internal_energy_att_exp

6

 

L-J attractive exponent

Internal_energy_rep_exp

12

 

L-J repulsive exponent

Internal_energy_dielectric

4.0

 

Dielectric value for coulumbic calculation