python tool to reorder replica traj (21699b89) · Commits · 郑智淋 / lammps

doc/src/Tools.txt

+17 −1

Original line number	Diff line number	Diff line
		@@ -78,7 +78,8 @@ Post-processing tools :h3
		"reax"_#reax_tool,
		"smd"_#smd,
		"spin"_#spin,
		"xmgrace"_#xmgrace :tb(c=6,ea=c,a=l)
		"xmgrace"_#xmgrace,
		"replica"_#replica :tb(c=6,ea=c,a=l)

		Miscellaneous tools :h3

		@@ -549,3 +550,18 @@ simulation.
		See the README file for details.

		These files were provided by Vikas Varshney (vv0210 at gmail.com)

		:line

		replica tool :h4,link(replica)

		The tools/replica directory contains the reorder_remd_traj python script which
		can be used to reorder the replica trajectories (resulting from the use of the
		temper command) according to temperature. This will produce discontinuous
		trajectories with all frames at the same temperature in each trajectory.
		Additional options can be used to calculate the canonical configurational
		log-weight for each frame at each temperature using the pymbar package. See
		the README.md file for further details. Try out the peptide example provided.

		This tool was written by Tanmoy Sanyal,
		while at the Shell lab at UC Santa Barbara. (tanmoy dot 7989 at gmail.com)

tools/replica/README.md

0 → 100644

+86 −0

Original line number	Diff line number	Diff line
		## reorder_remd_traj

		LAMMPS Replica Exchange Molecular Dynamics (REMD) trajectories (implemented using the temper command) are arranged by replica, i.e., each trajectory is a continuous replica that records all the ups and downs in temperature. However, often the requirement is that trajectories be continuous in temperature. This requires the LAMMPS REMD trajectories to be re-ordered, which LAMMPS does not do automatically. (see the discussion [here](https://lammps.sandia.gov/threads/msg60440.html)). The reorderLAMMPSREMD tool does exactly this in parallel (using MPI)

		(Protein folding trajectories in [Sanyal, Mittal and Shell, JPC, 2019, 151(4), 044111](https://aip.scitation.org/doi/abs/10.1063/1.5108761) were ordered in temperature space using this tool)

		#### Author

		Tanmoy Sanyal, Shell lab, UC Santa Barbara

		(currently at UC San Francisco)

		email: tanmoy dot 7989 at gmail.com

		#### Features

		- reorder LAMMPS REMD trajectories by temperature keeping only desired frames.
		Note: this only handles LAMMPS format trajectories (i.e., lammpstrj format)
		Trajectories can be gzipped or bz2-compressed. The trajectories are assumed to
		be named as \<prefix>\.%d.lammpstrj[.gz or .bz2]

		- (optionally) calculate configurational weights for each frame at each
		temperature if potential energies are supplied (only implemented for the canonical (NVT) ensemble)

		#### Dependencies

		[`mpi4py`](https://mpi4py.readthedocs.io/en/stable/)
		[`pymbar`](https://pymbar.readthedocs.io/en/master/) (for getting configurational weights)
		[`tqdm`](https://github.com/tqdm/tqdm) (for printing pretty progress bars)
		[`StringIO`](https://docs.python.org/2/library/stringio.html) (or [`io`](https://docs.python.org/3/library/io.html) if in Python 3.x)

		#### Example

		###### REMD Simulation specs
		Suppose you ran a REMD simulation for the peptide example using the CHARMM forcefield (see lammps/examples/peptide) in Lammps with the following settings:

		- number of replicas = 16
		- temperatures used (in K): 200 209 219 230 241 252 264 276 289 303 317 332 348 365 382 400 (i.e., exponentially distributed in the range 270-400 K)
		- timestep = 2 fs
		- total number of timesteps simulated using temper = 2000 (i.e. 4 ps)
		- swap frequency = temperatures swapped after every this many steps = `ns` = 10 (i.e. 20 fs)
		- write frequency = trajectory frame written to disk after this many steps (using the dump command) = `nw` = 20 (i.e. 40 fs)

		###### LAMMPS output
		So, when the dust settles,

		- You'll have 16 replica trajectories. For this tool to work, each replica traj must be named: `<prefix>.<n>.lammpstrj[.gz or .bz2]`, where,
		- `prefix` = some common prefix for all your trajectories and (say it is called "peptide")`
		- `n` = replica number (0-15 in this case). Note: trajectories must be in default LAMMPS format (so stuff like dcd won't work)

		- You will also have a master LAMMPS log file (`logfn`) that contains the swap history of all the replicas
		(for more details see [here](https://lammps.sandia.gov/doc/temper.html). Assume that this is called `log.peptide`

		- Further you must have a txt file that numpy can read which stores all the temperature values (say this is called `temps.txt`)

		###### Your desired output
		- The total number of timesteps you want consider as production (i.e. after equilbration) = 1000 (i.e. last 2 ps)

		- Reordered trajectories at temperatures 200 K, 276 K, 400 K

		- Configurational log-weight calculation (using [`pymbar`](https://github.com/choderalab/pymbar)). Here, this is limited to the canonical (NVT) ensemble and without biasing restraints in your simulation. To do this you'd need to have a file (say called `ene.dat`) that stores a 2D (K X N) array of total potential energies, where,

		- K = total number of replicas = 16, and N = total number of frames in each replica trajectory (= 1000 / 20 = 50 in this case)

		- `ene[k,n]` = energy from n-th frame of k-th replica.

		###### Using the tool (description of the workflow)
		Assume you have 16 processors at your disposal. When you run the following:

		```bash
		mpirun -np 16 python reorder_remd_traj.py peptide -logfn log.peptide -tfn temps.txt -ns 10 -nw 20 -np 1000 -ot 200 276 400 -logw -e ene.peptide -od ./output
		```

		1. First the temperature swap history file (`log.peptide` in this case) is read. This is done on one processor since it is usually fast.
		2. Then the (compressed or otherwise) LAMMPS replica trajectories are read in parallel. So if you have less processors than replicas at this stage, it'll be slower.
		3. Then using the frame ordering generated in (1), trajectory frames read in (2) are re-ordered and written to disk in parallel. Each processor writes one trajectory. So, If you request reordered trajectories for less temperatures (3 in this case) than the total number of temperatures (16), then 16-3 = 13 processors will be retired.
		4. If you have further requested configurational log-weight calculation, then they will be done on a single processor since pymbar is pretty fast.
		5. Finally you will have 3 LAMMPS trajectories of the form ``peptide.<temp>.lammpstrj.gz`` each with 1000 / 20 = 50 frames, where `<temp>` = 200, 276, 400. If you request reordering at a temperature like say 280 K which is not present in the supplied temp schedule (as written in `temps.txt`), the closest temperature (276 K) will be chosen.

		For more details, use the help menu generated by the tool by using:
		python reorder_remd_traj.py -h

		###### Caveats
		- This tool crawls through the replica trajectories and creates index files that are hidden. These are called .byteind_`<replicanum>'.gz files. You may delete these if you want, but subsequent replica reads will be slow in that case.

		- When writing trajectories to disk, the trajectories are first written to a buffer in memory, and then finally dumped all-at-once to the disk. While this makes the tool very fast, it can cause out-of-memory errors for very large trajectories. A useful feature might be to write to the buffer in batches and emptying to disk when some (predefined) max-buffer-size is exceeded.

tools/replica/example/data.peptide

0 → 100644

+6531 −0

File added.

Preview size limit exceeded, changes collapsed.

tools/replica/example/in.peptide

0 → 100644

+50 −0

Original line number	Diff line number	Diff line
		# Solvated 5-mer peptide

		#----------------------------------
		# Taken as is from examples/peptide

		units real
		atom_style full
		boundary p p p

		pair_style lj/charmm/coul/long 8.0 10.0 10.0
		bond_style harmonic
		angle_style charmm
		dihedral_style charmm
		improper_style harmonic
		kspace_style pppm 0.0001

		read_data data.peptide

		neighbor 2.0 bin
		neigh_modify delay 5

		timestep 2.0
		#----------------------------------


		# temperature schedule for REMD
		variable idx world 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
		variable t world 200.0 209.0 219.0 230.0 241.0 252.0 264.0 276.0 289.0 303.0 317.0 332.0 348.0 365.0 382.0 400.0

		# thermostat
		fix thermostat all langevin $t $t 1000 772530

		# log-file output before minimization
		thermo_style custom step temp ke pe
		thermo 20

		# minimization
		minimize 1e-4 0.0 1000 1000

		# change logfile output after minimization
		thermo_style custom step temp pe
		thermo 20

		# trajectory style
		dump myDump all atom 20 peptide.${idx}.lammpstrj.gz
		dump_modify myDump sort id scale no

		# run REMD (for realistic results run for 100000000 steps with 10000 frequency)
		reset_timestep 0
		temper 2000 10 $t thermostat 3847 5382

tools/replica/example/parse_ene.py

0 → 100644

+30 −0

Original line number	Diff line number	Diff line
		#!/usr/bin/env python

		import os, sys, numpy as np

		tempfn = os.path.abspath(sys.argv[1])
		logfnprefix = os.path.abspath(sys.argv[2])

		temps = np.loadtxt(tempfn)
		ntemps = len(temps)
		u_kn = []
		start_token = 'Step Temp PotEng'
		end_token = 'Loop time'

		for i in range(ntemps):
		logfn = '%s.%d' % (logfnprefix, i)
		with open(logfn, 'r') as of:
		lines = of.readlines()

		# extract relevant lines from logfile
		start = [lines.index(line) for line in lines if line.startswith(start_token)][0]
		lines = lines[(start+1) : ]
		stop = [lines.index(line) for line in lines if line.startswith(end_token)][0]
		lines = lines[:stop]

		# store the potential energies
		pe = [float(line.strip().split()[-1]) for line in lines]
		u_kn.append(pe)

		u_kn = np.array(u_kn)
		np.savetxt('ene.peptide', u_kn, fmt = '%5.5f')

Admin message