Merge pull request #1660 from tanmoy7989/reorder_remd_traj (572235e6) · Commits · 郑智淋 / lammps

doc/src/Tools.txt

+18 −1

Original line number	Diff line number	Diff line
		@@ -76,6 +76,7 @@ Post-processing tools :h3
		"pymol_asphere"_#pymol,
		"python"_#pythontools,
		"reax"_#reax_tool,
		"replica"_#replica,
		"smd"_#smd,
		"spin"_#spin,
		"xmgrace"_#xmgrace :tb(c=6,ea=c,a=l)
		@@ -485,6 +486,21 @@ README for more info on Pizza.py and how to use these scripts.

		:line

		replica tool :h4,link(replica)

		The tools/replica directory contains the reorder_remd_traj python script which
		can be used to reorder the replica trajectories (resulting from the use of the
		temper command) according to temperature. This will produce discontinuous
		trajectories with all frames at the same temperature in each trajectory.
		Additional options can be used to calculate the canonical configurational
		log-weight for each frame at each temperature using the pymbar package. See
		the README.md file for further details. Try out the peptide example provided.

		This tool was written by (and is maintained by) Tanmoy Sanyal,
		while at the Shell lab at UC Santa Barbara. (tanmoy dot 7989 at gmail.com)

		:line

		reax tool :h4,link(reax_tool)

		The reax sub-directory contains stand-alone codes that can
		@@ -549,3 +565,4 @@ simulation.
		See the README file for details.

		These files were provided by Vikas Varshney (vv0210 at gmail.com)

doc/src/temper.txt

+7 −1

Original line number	Diff line number	Diff line
		@@ -110,7 +110,13 @@ the information from the log.lammps file. E.g. you could produce one
		dump file with snapshots at 300K (from all replicas), another with
		snapshots at 310K, etc. Note that these new dump files will not
		contain "continuous trajectories" for individual atoms, because two
		successive snapshots (in time) may be from different replicas.
		successive snapshots (in time) may be from different replicas. The
		reorder_remd_traj python script can do the reordering for you
		(and additionally also calculated configurational log-weights of
		trajectory snapshots in the canonical ensemble). The script can be found
		in the tools/replica directory while instructions on how to use it is
		available in doc/Tools (in brief) and as a README file in tools/replica
		(in detail).

		The last argument {index} in the temper command is optional and is
		used when restarting a tempering run from a set of restart files (one

doc/utils/sphinx-config/false_positives.txt

+5 −0

Original line number	Diff line number	Diff line
		@@ -2238,6 +2238,7 @@ Py
		pydir
		pylammps
		PyLammps
		pymbar
		pymodule
		pymol
		pypar
		@@ -2324,6 +2325,7 @@ reinit
		relink
		relTol
		remappings
		remd
		Ren
		Rendon
		reneighbor
		@@ -2455,6 +2457,7 @@ Sandia
		sandybrown
		Sanitizer
		sanitizers
		Sanyal
		sc
		scafacos
		SCAFACOS
		@@ -2688,6 +2691,8 @@ Tajkhorshid
		Tamaskovics
		Tanaka
		tanh
		tanmoy
		Tanmoy
		Tartakovsky
		taskset
		taubi

tools/README

+1 −0

Original line number	Diff line number	Diff line
		@@ -38,6 +38,7 @@ polybond Python tool for programmable polymer bonding
		pymol_asphere convert LAMMPS output of ellipsoids to PyMol format
		python Python scripts for post-processing LAMMPS output
		reax Tools for analyzing output of ReaxFF simulations
		replica tool to reorder LAMMPS replica trajectories according to temperature
		smd convert Smooth Mach Dynamics triangles to VTK
		spin perform a cubic polynomial interpolation of a GNEB MEP
		vim add-ons to VIM editor for editing LAMMPS input scripts

tools/replica/README.md

0 → 100644

+86 −0

Original line number	Diff line number	Diff line
		## reorder_remd_traj

		LAMMPS Replica Exchange Molecular Dynamics (REMD) trajectories (implemented using the temper command) are arranged by replica, i.e., each trajectory is a continuous replica that records all the ups and downs in temperature. However, often the requirement is that trajectories be continuous in temperature. This requires the LAMMPS REMD trajectories to be re-ordered, which LAMMPS does not do automatically. (see the discussion [here](https://lammps.sandia.gov/threads/msg60440.html)). The reorderLAMMPSREMD tool does exactly this in parallel (using MPI)

		(Protein folding trajectories in [Sanyal, Mittal and Shell, JPC, 2019, 151(4), 044111](https://aip.scitation.org/doi/abs/10.1063/1.5108761) were ordered in temperature space using this tool)

		#### Author

		Tanmoy Sanyal, Shell lab, UC Santa Barbara

		(currently at UC San Francisco)

		email: tanmoy dot 7989 at gmail.com

		#### Features

		- reorder LAMMPS REMD trajectories by temperature keeping only desired frames.
		Note: this only handles LAMMPS format trajectories (i.e., lammpstrj format)
		Trajectories can be gzipped or bz2-compressed. The trajectories are assumed to
		be named as \<prefix>\.%d.lammpstrj[.gz or .bz2]

		- (optionally) calculate configurational weights for each frame at each
		temperature if potential energies are supplied (only implemented for the canonical (NVT) ensemble)

		#### Dependencies

		[`mpi4py`](https://mpi4py.readthedocs.io/en/stable/)
		[`pymbar`](https://pymbar.readthedocs.io/en/master/) (for getting configurational weights)
		[`tqdm`](https://github.com/tqdm/tqdm) (for printing pretty progress bars)
		[`StringIO`](https://docs.python.org/2/library/stringio.html) (or [`io`](https://docs.python.org/3/library/io.html) if in Python 3.x)

		#### Example

		###### REMD Simulation specs
		Suppose you ran a REMD simulation for the peptide example using the CHARMM forcefield (see lammps/examples/peptide) in Lammps with the following settings:

		- number of replicas = 16
		- temperatures used (in K): 200 209 219 230 241 252 264 276 289 303 317 332 348 365 382 400 (i.e., exponentially distributed in the range 270-400 K)
		- timestep = 2 fs
		- total number of timesteps simulated using temper = 2000 (i.e. 4 ps)
		- swap frequency = temperatures swapped after every this many steps = `ns` = 10 (i.e. 20 fs)
		- write frequency = trajectory frame written to disk after this many steps (using the dump command) = `nw` = 20 (i.e. 40 fs)

		###### LAMMPS output
		So, when the dust settles,

		- You'll have 16 replica trajectories. For this tool to work, each replica traj must be named: `<prefix>.<n>.lammpstrj[.gz or .bz2]`, where,
		- `prefix` = some common prefix for all your trajectories and (say it is called "peptide")`
		- `n` = replica number (0-15 in this case). Note: trajectories must be in default LAMMPS format (so stuff like dcd won't work)

		- You will also have a master LAMMPS log file (`logfn`) that contains the swap history of all the replicas
		(for more details see [here](https://lammps.sandia.gov/doc/temper.html). Assume that this is called `log.peptide`

		- Further you must have a txt file that numpy can read which stores all the temperature values (say this is called `temps.txt`)

		###### Your desired output
		- The total number of timesteps you want consider as production (i.e. after equilbration) = 1000 (i.e. last 2 ps)

		- Reordered trajectories at temperatures 200 K, 276 K, 400 K

		- Configurational log-weight calculation (using [`pymbar`](https://github.com/choderalab/pymbar)). Here, this is limited to the canonical (NVT) ensemble and without biasing restraints in your simulation. To do this you'd need to have a file (say called `ene.dat`) that stores a 2D (K X N) array of total potential energies, where,

		- K = total number of replicas = 16, and N = total number of frames in each replica trajectory (= 1000 / 20 = 50 in this case)

		- `ene[k,n]` = energy from n-th frame of k-th replica.

		###### Using the tool (description of the workflow)
		Assume you have 16 processors at your disposal. When you run the following:

		```bash
		mpirun -np 16 python reorder_remd_traj.py peptide -logfn log.peptide -tfn temps.txt -ns 10 -nw 20 -np 1000 -ot 200 276 400 -logw -e ene.peptide -od ./output
		```

		1. First the temperature swap history file (`log.peptide` in this case) is read. This is done on one processor since it is usually fast.
		2. Then the (compressed or otherwise) LAMMPS replica trajectories are read in parallel. So if you have less processors than replicas at this stage, it'll be slower.
		3. Then using the frame ordering generated in (1), trajectory frames read in (2) are re-ordered and written to disk in parallel. Each processor writes one trajectory. So, If you request reordered trajectories for less temperatures (3 in this case) than the total number of temperatures (16), then 16-3 = 13 processors will be retired.
		4. If you have further requested configurational log-weight calculation, then they will be done on a single processor since pymbar is pretty fast.
		5. Finally you will have 3 LAMMPS trajectories of the form ``peptide.<temp>.lammpstrj.gz`` each with 1000 / 20 = 50 frames, where `<temp>` = 200, 276, 400. If you request reordering at a temperature like say 280 K which is not present in the supplied temp schedule (as written in `temps.txt`), the closest temperature (276 K) will be chosen.

		For more details, use the help menu generated by the tool by using:
		python reorder_remd_traj.py -h

		###### Caveats
		- This tool crawls through the replica trajectories and creates index files that are hidden. These are called .byteind_`<replicanum>'.gz files. You may delete these if you want, but subsequent replica reads will be slow in that case.

		- When writing trajectories to disk, the trajectories are first written to a buffer in memory, and then finally dumped all-at-once to the disk. While this makes the tool very fast, it can cause out-of-memory errors for very large trajectories. A useful feature might be to write to the buffer in batches and emptying to disk when some (predefined) max-buffer-size is exceeded.

Admin message