Commit 21699b89 authored by tanmoy.7989's avatar tanmoy.7989
Browse files

python tool to reorder replica traj

parent 0b34db78
Loading
Loading
Loading
Loading
+17 −1
Original line number Diff line number Diff line
@@ -78,7 +78,8 @@ Post-processing tools :h3
"reax"_#reax_tool,
"smd"_#smd,
"spin"_#spin,
"xmgrace"_#xmgrace :tb(c=6,ea=c,a=l)
"xmgrace"_#xmgrace,
"replica"_#replica :tb(c=6,ea=c,a=l)

Miscellaneous tools :h3

@@ -549,3 +550,18 @@ simulation.
See the README file for details.

These files were provided by Vikas Varshney (vv0210 at gmail.com)

:line

replica tool :h4,link(replica)

The tools/replica directory contains the reorder_remd_traj python script which
can be used to reorder the replica trajectories (resulting from the use of the 
temper command) according to temperature. This will produce discontinuous
trajectories with all frames at the same temperature in each trajectory.
Additional options can be used to calculate the canonical configurational
log-weight for each frame at each temperature using the pymbar package. See
the README.md file for further details. Try out the peptide example provided.

This tool was written by Tanmoy Sanyal, 
while at the Shell lab at UC Santa Barbara. (tanmoy dot 7989 at gmail.com) 
+86 −0
Original line number Diff line number Diff line
## reorder_remd_traj

LAMMPS Replica Exchange Molecular Dynamics (REMD) trajectories (implemented using the temper command) are arranged by replica, i.e., each trajectory is a continuous replica that records all the ups and downs in temperature. However, often the requirement is  that trajectories be continuous in temperature. This requires the LAMMPS REMD trajectories to be re-ordered, which LAMMPS does not do automatically. (see the discussion [here](https://lammps.sandia.gov/threads/msg60440.html)). The reorderLAMMPSREMD tool does exactly this in parallel (using MPI)

(Protein folding trajectories in [Sanyal, Mittal and Shell, JPC, 2019, 151(4), 044111](https://aip.scitation.org/doi/abs/10.1063/1.5108761) were ordered in temperature space using this tool)

#### Author

Tanmoy Sanyal, Shell lab, UC Santa Barbara

(currently at UC San Francisco)

email: tanmoy dot 7989 at gmail.com

#### Features

- reorder LAMMPS REMD trajectories by temperature keeping only desired frames.
  Note: this only handles LAMMPS format trajectories (i.e., lammpstrj format)
  Trajectories can be gzipped or bz2-compressed. The trajectories are assumed to
  be named as \<prefix>\.%d.lammpstrj[.gz or .bz2]

- (optionally) calculate configurational weights for each frame at each
  temperature if potential energies are supplied (only implemented for the canonical (NVT) ensemble)

#### Dependencies

[`mpi4py`](https://mpi4py.readthedocs.io/en/stable/)  
[`pymbar`](https://pymbar.readthedocs.io/en/master/) (for getting configurational weights)  
[`tqdm`](https://github.com/tqdm/tqdm) (for printing pretty progress bars)  
[`StringIO`](https://docs.python.org/2/library/stringio.html) (or [`io`](https://docs.python.org/3/library/io.html) if in Python 3.x)

#### Example

###### REMD Simulation specs 
Suppose you ran a REMD simulation for the peptide example using the CHARMM forcefield (see lammps/examples/peptide) in Lammps with the following settings:

- number of replicas = 16
- temperatures used (in K): 200 209 219 230 241 252 264 276 289 303 317 332 348 365 382 400 (i.e., exponentially distributed in the range 270-400 K)
- timestep = 2 fs
- total number of timesteps simulated using temper = 2000 (i.e. 4 ps)
- swap frequency = temperatures swapped after every this many steps = `ns` = 10 (i.e. 20 fs)
- write frequency = trajectory frame written to disk after this many steps (using the dump command) = `nw` = 20 (i.e. 40 fs)

###### LAMMPS output
So, when the dust settles,

- You'll have 16 replica trajectories. For this tool to work, each replica traj must be named: `<prefix>.<n>.lammpstrj[.gz or .bz2]`, where,
  - `prefix` = some common prefix for all your trajectories and (say it is called "peptide")` 
  - `n` = replica number (0-15 in this case). Note: trajectories **must be in default LAMMPS format **(so stuff like dcd won't work)

- You will also have a master LAMMPS log file (`logfn`) that contains the swap history of all the replicas
  (for more details see [here](https://lammps.sandia.gov/doc/temper.html). Assume that this is called `log.peptide`

- Further you must have a txt file that numpy can read which stores all the temperature values (say this is called `temps.txt`)

######  Your desired output
- The total number of timesteps you want consider as production (i.e. after equilbration)  = 1000 (i.e. last 2 ps)

- Reordered trajectories at temperatures 200 K, 276 K, 400 K

- Configurational log-weight calculation (using [`pymbar`](https://github.com/choderalab/pymbar)). Here, this is limited to the canonical (NVT) ensemble **and without biasing restraints** in your simulation. To do this you'd need to have a file (say called `ene.dat`) that stores a 2D  (K X N) array of total potential energies, where,

  - K = total number of replicas = 16, and N = total number of frames in each replica trajectory (= 1000 / 20 = 50 in this case) 

  - `ene[k,n]` = energy from n-th frame of k-th replica.

###### Using the tool (description of the workflow)
Assume you have 16 processors at your disposal. When you run the following:

```bash
mpirun -np 16 python reorder_remd_traj.py peptide -logfn log.peptide -tfn temps.txt -ns 10 -nw 20 -np 1000 -ot 200 276 400 -logw -e ene.peptide -od ./output
```

1. First the temperature swap history file (`log.peptide` in this case) is read. This is done on one processor since it is usually fast.
2. Then the (compressed or otherwise) LAMMPS replica trajectories are read in parallel. So if you have less processors than replicas at this stage, it'll be slower.
3. Then using the frame ordering generated in (1), trajectory frames read in (2) are re-ordered and written to disk in parallel. Each processor writes one trajectory. So, If you request reordered trajectories for less temperatures (3 in this case) than the total number of temperatures (16), then 16-3 = 13 processors will be retired.
4. If you have further requested configurational log-weight calculation, then they will be done on a single processor since pymbar is pretty fast.
5. Finally you will have 3 LAMMPS trajectories of the form ``peptide.<temp>.lammpstrj.gz`` each with 1000 / 20 = 50 frames,  where `<temp>` = 200, 276, 400. If you request reordering at a temperature like say 280 K which is not present in the supplied temp schedule (as written in `temps.txt`), the closest temperature (276 K) will be chosen.

For more details, use the help menu generated by the tool by using:
python reorder_remd_traj.py -h

###### Caveats
- This tool crawls through the replica trajectories and creates index files that are hidden. These are called .byteind_`<replicanum>'.gz files. You may delete these if you want, but subsequent replica reads will be slow in that case.

- When writing trajectories to disk, the trajectories are first written to a buffer in memory, and then finally dumped all-at-once to the disk. While this makes the tool very fast, it can cause out-of-memory errors for very large trajectories. A useful feature might be to write to the buffer in batches and emptying to disk when some (predefined) max-buffer-size is exceeded.
+6531 −0

File added.

Preview size limit exceeded, changes collapsed.

+50 −0
Original line number Diff line number Diff line
# Solvated 5-mer peptide

#---------------------------------- 
# Taken as is from examples/peptide 

units		    real
atom_style	    full
boundary        p p p 

pair_style	    lj/charmm/coul/long 8.0 10.0 10.0
bond_style      harmonic
angle_style     charmm
dihedral_style  charmm
improper_style  harmonic
kspace_style	pppm 0.0001

read_data	    data.peptide

neighbor	    2.0 bin
neigh_modify	delay 5

timestep	    2.0
#----------------------------------


# temperature schedule for REMD
variable        idx world 0     1     2     3     4     5     6     7     8     9     10    11    12    13    14    15
variable        t   world 200.0 209.0 219.0 230.0 241.0 252.0 264.0 276.0 289.0 303.0 317.0 332.0 348.0 365.0 382.0 400.0

# thermostat
fix             thermostat all langevin $t $t 1000 772530

# log-file output before minimization
thermo_style	custom step temp ke pe
thermo		    20

# minimization
minimize        1e-4 0.0 1000 1000

# change logfile output after minimization
thermo_style	custom step temp pe
thermo		    20

# trajectory style
dump            myDump all atom 20 peptide.${idx}.lammpstrj.gz
dump_modify     myDump sort id scale no

# run REMD (for realistic results run for 100000000 steps with 10000 frequency) 
reset_timestep  0
temper          2000 10 $t thermostat 3847 5382
+30 −0
Original line number Diff line number Diff line
#!/usr/bin/env python

import os, sys, numpy as np

tempfn = os.path.abspath(sys.argv[1])
logfnprefix = os.path.abspath(sys.argv[2])

temps = np.loadtxt(tempfn)
ntemps = len(temps)
u_kn = []
start_token = 'Step Temp PotEng'
end_token = 'Loop time'

for i in range(ntemps):
    logfn = '%s.%d' % (logfnprefix, i)
    with open(logfn, 'r') as of:
        lines = of.readlines()
    
    # extract relevant lines from logfile
    start = [lines.index(line) for line in lines if line.startswith(start_token)][0]
    lines = lines[(start+1) : ]
    stop = [lines.index(line) for line in lines if line.startswith(end_token)][0]
    lines = lines[:stop]
    
    # store the potential energies
    pe = [float(line.strip().split()[-1]) for line in lines]
    u_kn.append(pe)
    
u_kn = np.array(u_kn)
np.savetxt('ene.peptide', u_kn, fmt = '%5.5f')
Loading