SubPEx
Generate a diverse ensemble of protein conformations for use in ensemble docking
Documentation
What is it?¶
Subpocket explorer (SubPEx) is a tool that uses weighted ensemble (WE) path sampling, as implemented in WESTPA, to maximize pocket conformational search. SubPEx's goal is to produce a diverse ensemble of protein conformations for use in ensemble docking.
As with any WE implementation, SubPEx uses a progress coordinate to focus computational power on sampling phase space. The available progress coordinates are:
- composite RMSD (a linear combination of backbone and pocket heavy-atom RMSD)
- pocket heavy atoms RMSD
- backbone RMSD
- Jaccard distance of pocket volumes (jd)
We highly recommend using the composite RMSD progress coordinate. Use of other coordinates may have inadequate performance.
Installation¶
The first step is to download, clone, or copy the repository.
The repository includes lock files to build the same environment we use for development. Simply run this command (after ensuring you have Makefile tools installed):
To activate the new conda
environment, run:
Some users may wish to create their own environments or to use an existing WESTPA environment. If so, install the following packages so SubPEx can calculate the progress coordinate:
- loguru
- mdanalysis
- westpa
- numpy
- scipy
- scikit-learn
- jinja2
- pydantic
- pydantic-settings
Usage¶
TODO: Needs to be updated to new version
Users should take advantage of our autobuilder (wizard.py
) to setup their
SubPEx simulations. In some cases, however, users may wish to manually set up
their simulations by editing key SubPEx/WESTPA files. This approach is not
officially supported, but we provide the below instructions for
advanced/adventurous users.
Link your preliminary, equilibrated simulation
-
SubPEx assumes you have already run preliminary simulations to equilibrate your system. Soft link or copy your preliminary, equilibrated trajectories and necessary restart files to the
./reference/
directory. Rename the filesmol
with the appropriate extension. (Note that./reference/
already contains thenamd.md.conf
andamber.prod_npt.in
template files, which SubPEX uses to interface with the NAMD and AMBER MD engines, respectively.)- If using NAMD, soft link the
.dcd
file of the final equilibration run. NAMD requires other files to restart simulations as well. Be sure to soft link the.xsc
,.coor
, and.inpcrd
files as well. Remember the.prmtop
file as well.
ln -s /file/path/to/simulation/my_namd_file.dcd /WEST/ROOT/reference/mol.dcd ln -s /file/path/to/simulation/my_namd_file.xsc /WEST/ROOT/reference/mol.xsc ln -s /file/path/to/simulation/my_namd_file.coor /WEST/ROOT/reference/mol.coor ln -s /file/path/to/simulation/my_namd_file.inpcrd /WEST/ROOT/reference/mol.inpcrd ln -s /file/path/to/simulation/my_namd_file.prmtop /WEST/ROOT/reference/mol.prmtop
- If using Amber, the filetype that works with the SubPEx algorithm is
.nc
. You need to soft link the.rst
file of the final equilibration run as well. Remember the.prmtop
file as well.
- If using NAMD, soft link the
-
Extract the last frame of the preliminary, equilibrated trajectory as a
pdb
file with your preferred molecular analysis program (e.g., VMD). Soft link that to the./reference/
directory as well, and name the linklast_frame.pdb
.
Edit the west.cfg
file
- Edit the following parameters in the
west.cfg
file:- the directory portion of the path variables, though the basename itself should not change. NOTE: Be sure to use full (not relative) paths.
reference
: the PDB file that will be used in EVERY SINGLE progress-coordinate calculation (the last frame of the preliminary, equilibrated simulation mentioned above).selection_file
: path to a text file that will contain the pocket selection string (MDAnalysis selection notation). This file will be automatically generated in a subsequent step, but specify its future path here.reference_fop
: path to anxyz
file that will contain the field of points needed to calculate thejd
progress coordinate. This file is also useful for visualizing the selected pocket. It will be automatically generated in a subsequent step.west_home
: home directory of the SubPEx run. You'll most likely want to use the same directory that contains thewest.cfg
file itself.topology
: topology file needed for the MD simulations (likely the same topology file used in the preliminary, equilibrated simulations).- the progress coordinate (
pcoord
) to use. composite
: composite RMSD (recommended)prmsd
: pocket heavy atoms RMSD (not officially supported)bb
: backbone RMSD (not officially supported)jd
: Jaccard distance (not officially supported)- the auxiliary data (
auxdata
) to calculate and save. composite
: composite RMSDprmsd
: pocket heavy atoms RMSD*pvol
: pocket volume (requiresjd
too)bb
: backbone RMSDrog
: radius of gyration of the pocket (requiresjd
too)jd
: Jaccard distance- make sure that the WESTPA progress coordinate and auxdata match the SubPEx
ones (these sections are both found in the
west.cfg
file). - The WESTPA progress coordinate is specified at
west -> data -> datasets
,subpex -> pcoord
, and inadaptive_binning/adaptive.py
- The WESTPA auxiliary data is at
west -> executable -> datasets
- The SubPEx progress coordinate is at
subpex -> pcoord
- The SubPEx auxiliary data is at
subpex -> auxdata
Define the pocket to sample
- You must define the location of the binding pocket you wish to sample. Find
the coordinates of the pocket center and radius using the extracted last
frame.
- Visual inspection is often useful at this step. You might first create a PDB file with a CA dummy atom. Load that together with the extracted last frame of the previous step into your preferred visualization software (ChimeraX, PyMol, VMD, etc.). Then manually move the dummy atom to the pocket center and measure its location. Similarly, use the dummy atom to determine the radius from that center required to encompass the pocket of interest.
- Return to the
west.cfg
file and edit the following parameters: center
: the pocket centerradius
: the pocket radiusresolution
: the distance between adjacent pocket-filling grid points (especially important if using thejd
progress coordinate)- Run
python westpa_scripts/get_reference_fop.py west.cfg
. This script will generate the files specified by theselection_file
andreference_fop
parameters in thewest.cfg
file. - Visually inspect the pocket field of points (fop) and/or the selection string (MDAnalysis selection syntax).
- Ensure that the points in the fop (
reference_fop
) file entirely fill the pocket of interest. - Ensure that the residues (
selection_file
) truly line the pocket of interest. - Note that the popular molecular visualization program VMD can load
xyz
files and select residues. - After visual inspection, adjust the
west.cfg
file (center
,radius
, andresolution
parameters) and re-run thewestpa_scripts/get_reference_fop.py
script. Continue to recalculate the pocket as needed to fine-tune your pocket.
Setup the progress coordinate calculations
- Update the variables in the
adaptive_binning/adaptive.py
file to indicate the number of walkers per bin, the bins' minimum and maximum values, etc.- This file controls the adaptive binning scheme that SubPEx uses.
- A detailed description of each variable is given in the file itself.
- Change
westpa_scripts/get_pcoord.sh
.- This script runs when calculating initial progress coordinates for new initial states (istates).
- A the beginning of the file, modify the line
export ENGINE="NAMD"
to match your MD engine (NAMD
orAMBER
, in capital letters).
- Modify the
westpa_scripts/runseg.sh
file.- This file runs each WESTPA/SubPEx walker (segment). It creates the needed directory, runs the walker simulation, and calculates the progress coordinate.
- At the beginning of the file, modify the line
export ENGINE="NAMD"
to match your MD engine (NAMD
orAMBER
, in capital letters).
Setup the environment
- Revise the
env.sh
file. - The file itself contains further instructions as comments.
- Among other things, be sure to set the environmental variables required to run the NAMD or AMBER executables, as well as the appropriate WORKMANAGER.
- Setting the appropriate variables may be complicated if using a supercomputing center. You may need to consult with an IT administrator.
- Modify the appropriate MD configuration file in
./reference/
directory (./reference/amber.prod_npt.in
if using AMBER,./reference/namd.md.conf
if using NAMD). - Make sure the number of frames saved per simulation equals
pcoordlength
minus one (pcoordlength
is defined in theadaptive_binning/adaptive.py
file). For example:- If using AMBER, modify
./reference/amber.prod_npt.in
to make surenstlim
/ntwx
=pcoordlength
- 1. - If using NAMD, modify
./reference/namd.md.conf
to make surerun
/dcdfreq
=pcoordlength
- 1.
- If using AMBER, modify
- Activate the WESTPA conda environment and source the init.sh file.
- Execute the
. init.sh
file. Note that this will delete any data from previous SubPEx runs.
Running SubPEx
- To run SubPEx, execute the
./run.sh
file from the command line. - You can also run SubPEx on a supercomputing cluster. See the
./aux_scripts/run.slurm.sh
for an example submission script for the slurm job scheduler. Note that you will likely need to modify the submission script for your specific cluster. Please check with your IT administrator to troubleshoot any cluster-specific problems. - If errors occur during execution, check the
./job_logs
directory. (If there is no./job_logs
directory, that alone will cause WESTPA/SubPEx to fail.)
Notes: WESTPA simulations are not easy to set up. You are likely to
encounter errors on your first attempt. To debug, check the west.log
file and
the output in the job_logs
directory.
Important scripts and files and what they do¶
- env.sh sets up some environmental variables.
- init.sh initializes the run. Creates the basis and initial states and
calculates progress coordinates for those, using the
bstate.py
script. - gen_istate.sh makes
istates
directory. - get_pcoord.sh copies and links necessary files. Then calls on
bstate.py
script. This script gets called byinit.sh
. - run.sh starts the run.
- runseg.sh WESTPA runs this script for each trajectory segment. The script
has three jobs:
- Link the necessary files for MD simulations
- Run the MD simulation
- Calculate the progress coordinate using the
pcoord.py
script.
- west.cfg the file containing the run's configuration.
- west.h5 contains all the results of the WE run.
- get_reference_fop.py calculates the initial field of points for JD progress coordinate and creates a selection string for MDAnalysis.
- pcoord_istate.py calculates the progress coordinate for the initial states.
- pcoord.py calculates the progress coordinate for the production run.
Deploying¶
We use bump-my-version to release a new version.
This will create a git tag that is used by poetry-dynamic-version to generate version strings and update CHANGELOG.md
.
For example, to bump the minor
version you would run the following command.
After releasing a new version, you need to push and include all tags.
License¶
This project is released under the MIT License as specified in LICENSE.md
.