Unverified Commit 572235e6 authored by Axel Kohlmeyer's avatar Axel Kohlmeyer Committed by GitHub
Browse files

Merge pull request #1660 from tanmoy7989/reorder_remd_traj

python tool to reorder replica traj
parents fb7a439c 28b634f2
Loading
Loading
Loading
Loading
+18 −1
Original line number Diff line number Diff line
@@ -76,6 +76,7 @@ Post-processing tools :h3
"pymol_asphere"_#pymol,
"python"_#pythontools,
"reax"_#reax_tool,
"replica"_#replica,
"smd"_#smd,
"spin"_#spin,
"xmgrace"_#xmgrace :tb(c=6,ea=c,a=l) 
@@ -485,6 +486,21 @@ README for more info on Pizza.py and how to use these scripts.

:line

replica tool :h4,link(replica)

The tools/replica directory contains the reorder_remd_traj python script which
can be used to reorder the replica trajectories (resulting from the use of the 
temper command) according to temperature. This will produce discontinuous
trajectories with all frames at the same temperature in each trajectory.
Additional options can be used to calculate the canonical configurational
log-weight for each frame at each temperature using the pymbar package. See
the README.md file for further details. Try out the peptide example provided.

This tool was written by (and is maintained by) Tanmoy Sanyal, 
while at the Shell lab at UC Santa Barbara. (tanmoy dot 7989 at gmail.com) 

:line

reax tool :h4,link(reax_tool)

The reax sub-directory contains stand-alone codes that can
@@ -549,3 +565,4 @@ simulation.
See the README file for details.

These files were provided by Vikas Varshney (vv0210 at gmail.com)
+7 −1
Original line number Diff line number Diff line
@@ -110,7 +110,13 @@ the information from the log.lammps file. E.g. you could produce one
dump file with snapshots at 300K (from all replicas), another with
snapshots at 310K, etc.  Note that these new dump files will not
contain "continuous trajectories" for individual atoms, because two
successive snapshots (in time) may be from different replicas.
successive snapshots (in time) may be from different replicas. The 
reorder_remd_traj python script can do the reordering for you 
(and additionally also calculated configurational log-weights of 
trajectory snapshots in the canonical ensemble). The script can be found
in the tools/replica directory while instructions on how to use it is
available in doc/Tools (in brief) and as a README file in tools/replica
(in detail).

The last argument {index} in the temper command is optional and is
used when restarting a tempering run from a set of restart files (one
+5 −0
Original line number Diff line number Diff line
@@ -2238,6 +2238,7 @@ Py
pydir
pylammps
PyLammps
pymbar
pymodule
pymol
pypar
@@ -2324,6 +2325,7 @@ reinit
relink
relTol
remappings
remd
Ren
Rendon
reneighbor
@@ -2455,6 +2457,7 @@ Sandia
sandybrown
Sanitizer
sanitizers
Sanyal
sc
scafacos
SCAFACOS
@@ -2688,6 +2691,8 @@ Tajkhorshid
Tamaskovics
Tanaka
tanh
tanmoy
Tanmoy
Tartakovsky
taskset
taubi
+1 −0
Original line number Diff line number Diff line
@@ -38,6 +38,7 @@ polybond Python tool for programmable polymer bonding
pymol_asphere	       convert LAMMPS output of ellipsoids to PyMol format
python		       Python scripts for post-processing LAMMPS output
reax	       	       Tools for analyzing output of ReaxFF simulations
replica        tool to reorder LAMMPS replica trajectories according to temperature
smd                    convert Smooth Mach Dynamics triangles to VTK
spin                   perform a cubic polynomial interpolation of a GNEB MEP
vim		       add-ons to VIM editor for editing LAMMPS input scripts
+86 −0
Original line number Diff line number Diff line
## reorder_remd_traj

LAMMPS Replica Exchange Molecular Dynamics (REMD) trajectories (implemented using the temper command) are arranged by replica, i.e., each trajectory is a continuous replica that records all the ups and downs in temperature. However, often the requirement is  that trajectories be continuous in temperature. This requires the LAMMPS REMD trajectories to be re-ordered, which LAMMPS does not do automatically. (see the discussion [here](https://lammps.sandia.gov/threads/msg60440.html)). The reorderLAMMPSREMD tool does exactly this in parallel (using MPI)

(Protein folding trajectories in [Sanyal, Mittal and Shell, JPC, 2019, 151(4), 044111](https://aip.scitation.org/doi/abs/10.1063/1.5108761) were ordered in temperature space using this tool)

#### Author

Tanmoy Sanyal, Shell lab, UC Santa Barbara

(currently at UC San Francisco)

email: tanmoy dot 7989 at gmail.com

#### Features

- reorder LAMMPS REMD trajectories by temperature keeping only desired frames.
  Note: this only handles LAMMPS format trajectories (i.e., lammpstrj format)
  Trajectories can be gzipped or bz2-compressed. The trajectories are assumed to
  be named as \<prefix>\.%d.lammpstrj[.gz or .bz2]

- (optionally) calculate configurational weights for each frame at each
  temperature if potential energies are supplied (only implemented for the canonical (NVT) ensemble)

#### Dependencies

[`mpi4py`](https://mpi4py.readthedocs.io/en/stable/)  
[`pymbar`](https://pymbar.readthedocs.io/en/master/) (for getting configurational weights)  
[`tqdm`](https://github.com/tqdm/tqdm) (for printing pretty progress bars)  
[`StringIO`](https://docs.python.org/2/library/stringio.html) (or [`io`](https://docs.python.org/3/library/io.html) if in Python 3.x)

#### Example

###### REMD Simulation specs 
Suppose you ran a REMD simulation for the peptide example using the CHARMM forcefield (see lammps/examples/peptide) in Lammps with the following settings:

- number of replicas = 16
- temperatures used (in K): 200 209 219 230 241 252 264 276 289 303 317 332 348 365 382 400 (i.e., exponentially distributed in the range 270-400 K)
- timestep = 2 fs
- total number of timesteps simulated using temper = 2000 (i.e. 4 ps)
- swap frequency = temperatures swapped after every this many steps = `ns` = 10 (i.e. 20 fs)
- write frequency = trajectory frame written to disk after this many steps (using the dump command) = `nw` = 20 (i.e. 40 fs)

###### LAMMPS output
So, when the dust settles,

- You'll have 16 replica trajectories. For this tool to work, each replica traj must be named: `<prefix>.<n>.lammpstrj[.gz or .bz2]`, where,
  - `prefix` = some common prefix for all your trajectories and (say it is called "peptide")` 
  - `n` = replica number (0-15 in this case). Note: trajectories **must be in default LAMMPS format **(so stuff like dcd won't work)

- You will also have a master LAMMPS log file (`logfn`) that contains the swap history of all the replicas
  (for more details see [here](https://lammps.sandia.gov/doc/temper.html). Assume that this is called `log.peptide`

- Further you must have a txt file that numpy can read which stores all the temperature values (say this is called `temps.txt`)

######  Your desired output
- The total number of timesteps you want consider as production (i.e. after equilbration)  = 1000 (i.e. last 2 ps)

- Reordered trajectories at temperatures 200 K, 276 K, 400 K

- Configurational log-weight calculation (using [`pymbar`](https://github.com/choderalab/pymbar)). Here, this is limited to the canonical (NVT) ensemble **and without biasing restraints** in your simulation. To do this you'd need to have a file (say called `ene.dat`) that stores a 2D  (K X N) array of total potential energies, where,

  - K = total number of replicas = 16, and N = total number of frames in each replica trajectory (= 1000 / 20 = 50 in this case) 

  - `ene[k,n]` = energy from n-th frame of k-th replica.

###### Using the tool (description of the workflow)
Assume you have 16 processors at your disposal. When you run the following:

```bash
mpirun -np 16 python reorder_remd_traj.py peptide -logfn log.peptide -tfn temps.txt -ns 10 -nw 20 -np 1000 -ot 200 276 400 -logw -e ene.peptide -od ./output
```

1. First the temperature swap history file (`log.peptide` in this case) is read. This is done on one processor since it is usually fast.
2. Then the (compressed or otherwise) LAMMPS replica trajectories are read in parallel. So if you have less processors than replicas at this stage, it'll be slower.
3. Then using the frame ordering generated in (1), trajectory frames read in (2) are re-ordered and written to disk in parallel. Each processor writes one trajectory. So, If you request reordered trajectories for less temperatures (3 in this case) than the total number of temperatures (16), then 16-3 = 13 processors will be retired.
4. If you have further requested configurational log-weight calculation, then they will be done on a single processor since pymbar is pretty fast.
5. Finally you will have 3 LAMMPS trajectories of the form ``peptide.<temp>.lammpstrj.gz`` each with 1000 / 20 = 50 frames,  where `<temp>` = 200, 276, 400. If you request reordering at a temperature like say 280 K which is not present in the supplied temp schedule (as written in `temps.txt`), the closest temperature (276 K) will be chosen.

For more details, use the help menu generated by the tool by using:
python reorder_remd_traj.py -h

###### Caveats
- This tool crawls through the replica trajectories and creates index files that are hidden. These are called .byteind_`<replicanum>'.gz files. You may delete these if you want, but subsequent replica reads will be slow in that case.

- When writing trajectories to disk, the trajectories are first written to a buffer in memory, and then finally dumped all-at-once to the disk. While this makes the tool very fast, it can cause out-of-memory errors for very large trajectories. A useful feature might be to write to the buffer in batches and emptying to disk when some (predefined) max-buffer-size is exceeded.
Loading