Merge pull request #585 from akohlmey/make-py-manual-cleanup (3cd597e9) · Commits · 郑智淋 / lammps

bench/FERMI/README

+7 −59

Original line number	Diff line number	Diff line
		These are input scripts used to run versions of several of the
		benchmarks in the top-level bench directory using the GPU and
		USER-CUDA accelerator packages. The results of running these scripts
		on two different machines (a desktop with 2 Tesla GPUs and the ORNL
		Titan supercomputer) are shown on the "GPU (Fermi)" section of the
		Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench.
		benchmarks in the top-level bench directory using the GPU accelerator
		package. The results of running these scripts on two different machines
		(a desktop with 2 Tesla GPUs and the ORNL Titan supercomputer) are shown
		on the "GPU (Fermi)" section of the Benchmark page of the LAMMPS WWW
		site: lammps.sandia.gov/bench.

		Examples are shown below of how to run these scripts. This assumes
		you have built 3 executables with both the GPU and USER-CUDA packages
		you have built 3 executables with the GPU package
		installed, e.g.

		lmp_linux_single
		lmp_linux_mixed
		lmp_linux_double

		The precision (single, mixed, double) refers to the GPU and USER-CUDA
		package precision. See the README files in the lib/gpu and lib/cuda
		directories for instructions on how to build the packages with
		different precisions. The GPU and USER-CUDA sub-sections of the
		doc/Section_accelerate.html file also describes this process.

		Make.py -d ~/lammps -j 16 -p #all orig -m linux -o cpu -a exe
		Make.py -d ~/lammps -j 16 -p #all opt orig -m linux -o opt -a exe
		Make.py -d ~/lammps -j 16 -p #all omp orig -m linux -o omp -a exe
		Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
		-gpu mode=double arch=20 -o gpu_double -a libs exe
		Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
		-gpu mode=mixed arch=20 -o gpu_mixed -a libs exe
		Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
		-gpu mode=single arch=20 -o gpu_single -a libs exe
		Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
		-cuda mode=double arch=20 -o cuda_double -a libs exe
		Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
		-cuda mode=mixed arch=20 -o cuda_mixed -a libs exe
		Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
		-cuda mode=single arch=20 -o cuda_single -a libs exe
		Make.py -d ~/lammps -j 16 -p #all intel orig -m linux -o intel_cpu -a exe
		Make.py -d ~/lammps -j 16 -p #all kokkos orig -m linux -o kokkos_omp -a exe
		Make.py -d ~/lammps -j 16 -p #all kokkos orig -kokkos cuda arch=20 \
		-m cuda -o kokkos_cuda -a exe

		Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
		-gpu mode=double arch=20 -cuda mode=double arch=20 -m linux \
		-o all -a libs exe

		Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
		-kokkos cuda arch=20 -gpu mode=double arch=20 \
		-cuda mode=double arch=20 -m cuda -o all_cuda -a libs exe

		------------------------------------------------------------------------

		To run on just CPUs (without using the GPU or USER-CUDA styles),
		To run on just CPUs (without using the GPU styles),
		do something like the following:

		mpirun -np 1 lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj
		@@ -81,23 +47,5 @@ node via a "-ppn" setting.

		------------------------------------------------------------------------

		To run with the USER-CUDA package, do something like the following:

		mpirun -np 1 lmp_linux_single -c on -sf cuda -v x 16 -v y 16 -v z 16 -v t 100 < in.lj
		mpirun -np 2 lmp_linux_double -c on -sf cuda -pk cuda 2 -v x 32 -v y 64 -v z 64 -v t 100 < in.eam

		The "xyz" settings determine the problem size. The "t" setting
		determines the number of timesteps. The "np" setting determines how
		many MPI tasks (per node) the problem will run on. The numeric
		argument to the "-pk" setting is the number of GPUs (per node); 1 GPU
		is the default. Note that the number of MPI tasks must equal the
		number of GPUs (both per node) with the USER-CUDA package.

		These mpirun commands run on a single node. To run on multiple nodes,
		scale up the "-np" setting, and control the number of MPI tasks per
		node via a "-ppn" setting.

		------------------------------------------------------------------------

		If the script has "titan" in its name, it was run on the Titan
		supercomputer at ORNL.

bench/README

+15 −31

Original line number	Diff line number	Diff line
		@@ -71,49 +71,33 @@ integration

		----------------------------------------------------------------------

		Here is a src/Make.py command which will perform a parallel build of a
		LAMMPS executable "lmp_mpi" with all the packages needed by all the
		examples. This assumes you have an MPI installed on your machine so
		that "mpicxx" can be used as the wrapper compiler. It also assumes
		you have an Intel compiler to use as the base compiler. You can leave
		off the "-cc mpi wrap=icc" switch if that is not the case. You can
		also leave off the "-fft fftw3" switch if you do not have the FFTW
		(v3) installed as an FFT package, in which case the default KISS FFT
		library will be used.

		cd src
		Make.py -j 16 -p none molecule manybody kspace granular rigid orig \
		-cc mpi wrap=icc -fft fftw3 -a file mpi

		----------------------------------------------------------------------

		Here is how to run each problem, assuming the LAMMPS executable is
		named lmp_mpi, and you are using the mpirun command to launch parallel
		runs:

		Serial (one processor runs):

		lmp_mpi < in.lj
		lmp_mpi < in.chain
		lmp_mpi < in.eam
		lmp_mpi < in.chute
		lmp_mpi < in.rhodo
		lmp_mpi -in in.lj
		lmp_mpi -in in.chain
		lmp_mpi -in in.eam
		lmp_mpi -in in.chute
		lmp_mpi -in in.rhodo

		Parallel fixed-size runs (on 8 procs in this case):

		mpirun -np 8 lmp_mpi < in.lj
		mpirun -np 8 lmp_mpi < in.chain
		mpirun -np 8 lmp_mpi < in.eam
		mpirun -np 8 lmp_mpi < in.chute
		mpirun -np 8 lmp_mpi < in.rhodo
		mpirun -np 8 lmp_mpi -in in.lj
		mpirun -np 8 lmp_mpi -in in.chain
		mpirun -np 8 lmp_mpi -in in.eam
		mpirun -np 8 lmp_mpi -in in.chute
		mpirun -np 8 lmp_mpi -in in.rhodo

		Parallel scaled-size runs (on 16 procs in this case):

		mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.lj
		mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.chain.scaled
		mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.eam
		mpirun -np 16 lmp_mpi -var x 4 -var y 4 < in.chute.scaled
		mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.rhodo.scaled
		mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.lj
		mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.chain.scaled
		mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.eam
		mpirun -np 16 lmp_mpi -var x 4 -var y 4 -in in.chute.scaled
		mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.rhodo.scaled

		For each of the scaled-size runs you must set 3 variables as -var
		command line switches. The variables x,y,z are used in the input

doc/src/Manual.txt

+0 −1

Original line number	Diff line number	Diff line
		@@ -261,7 +261,6 @@ END_RST -->
		:link(start_6,Section_start.html#start_6)
		:link(start_7,Section_start.html#start_7)
		:link(start_8,Section_start.html#start_8)
		:link(start_9,Section_start.html#start_9)

		:link(cmd_1,Section_commands.html#cmd_1)
		:link(cmd_2,Section_commands.html#cmd_2)

doc/src/Section_accelerate.txt

+5 −5

Original line number	Diff line number	Diff line
		@@ -56,7 +56,7 @@ timings; you can simply extrapolate from short runs.

		For the set of runs, look at the timing data printed to the screen and
		log file at the end of each LAMMPS run. "This
		section"_Section_start.html#start_8 of the manual has an overview.
		section"_Section_start.html#start_7 of the manual has an overview.

		Running on one (or a few processors) should give a good estimate of
		the serial performance and what portions of the timestep are taking
		@@ -226,16 +226,16 @@ re-build LAMMPS \|
		make machine \|
		prepare and test a regular LAMMPS simulation \|
		lmp_machine -in in.script; mpirun -np 32 lmp_machine -in in.script \|
		enable specific accelerator support via '-k on' "command-line switch"_Section_start.html#start_7, \|
		enable specific accelerator support via '-k on' "command-line switch"_Section_start.html#start_6, \|
		only needed for KOKKOS package \|
		set any needed options for the package via "-pk" "command-line switch"_Section_start.html#start_7 or "package"_package.html command, \|
		set any needed options for the package via "-pk" "command-line switch"_Section_start.html#start_6 or "package"_package.html command, \|
		only if defaults need to be changed \|
		use accelerated styles in your input via "-sf" "command-line switch"_Section_start.html#start_7 or "suffix"_suffix.html command \| lmp_machine -in in.script -sf gpu
		use accelerated styles in your input via "-sf" "command-line switch"_Section_start.html#start_6 or "suffix"_suffix.html command \| lmp_machine -in in.script -sf gpu
		:tb(c=2,s=\|)

		Note that the first 4 steps can be done as a single command, using the
		src/Make.py tool. This tool is discussed in "Section
		2.4"_Section_start.html#start_4 of the manual, and its use is
		4"_Section_packages.html of the manual, and its use is
		illustrated in the individual accelerator sections. Typically these
		steps only need to be done once, to create an executable that uses one
		or more accelerator packages.

doc/src/Section_errors.txt

+1 −1

Original line number	Diff line number	Diff line
		@@ -71,7 +71,7 @@ style", with ... being fix, compute, pair, etc, it means that you
		mistyped the style name or that the command is part of an optional
		package which was not compiled into your executable. The list of
		available styles in your executable can be listed by using "the -h
		command-line argument"_Section_start.html#start_7. The installation
		command-line argument"_Section_start.html#start_6. The installation
		and compilation of optional packages is explained in the "installation
		instructions"_Section_start.html#start_3.

Admin message