Doc tweak (073f0034) · Commits · 郑智淋 / lammps

doc/src/Speed_kokkos.txt

+16 −14

Original line number	Diff line number	Diff line
		@@ -46,7 +46,7 @@ software version 7.5 or later must be installed on your system. See
		the discussion for the "GPU package"_Speed_gpu.html for details of how
		to check and do this.

		NOTE: Kokkos with CUDA currently implicitly assumes, that the MPI
		NOTE: Kokkos with CUDA currently implicitly assumes that the MPI
		library is CUDA-aware and has support for GPU-direct. This is not
		always the case, especially when using pre-compiled MPI libraries
		provided by a Linux distribution. This is not a problem when using
		@@ -207,19 +207,21 @@ supports.

		[Running on GPUs:]

		Use the "-k" "command-line switch"_Run_options.html to
		specify the number of GPUs per node. Typically the -np setting of the
		mpirun command should set the number of MPI tasks/node to be equal to
		the number of physical GPUs on the node. You can assign multiple MPI
		tasks to the same GPU with the KOKKOS package, but this is usually
		only faster if significant portions of the input script have not
		been ported to use Kokkos. Using CUDA MPS is recommended in this
		scenario. Using a CUDA-aware MPI library with support for GPU-direct
		is highly recommended. GPU-direct use can be avoided by using
		"-pk kokkos gpu/direct no"_package.html.
		As above for multi-core CPUs (and no GPU), if N is the number of
		physical cores/node, then the number of MPI tasks/node should not
		exceed N.
		Use the "-k" "command-line switch"_Run_options.html to specify the
		number of GPUs per node. Typically the -np setting of the mpirun command
		should set the number of MPI tasks/node to be equal to the number of
		physical GPUs on the node. You can assign multiple MPI tasks to the same
		GPU with the KOKKOS package, but this is usually only faster if some
		portions of the input script have not been ported to use Kokkos. In this
		case, also packing/unpacking communication buffers on the host may give
		speedup (see the KOKKOS "package"_package.html command). Using CUDA MPS
		is recommended in this scenario.

		Using a CUDA-aware MPI library with
		support for GPU-direct is highly recommended. GPU-direct use can be
		avoided by using "-pk kokkos gpu/direct no"_package.html. As above for
		multi-core CPUs (and no GPU), if N is the number of physical cores/node,
		then the number of MPI tasks/node should not exceed N.

		-k on g Ng :pre

doc/src/package.txt

+14 −13

Original line number	Diff line number	Diff line
		@@ -513,14 +513,14 @@ identically. When using GPUs, the {device} value is the default since it
		will typically be optimal if all of your styles used in your input
		script are supported by the KOKKOS package. In this case data can stay
		on the GPU for many timesteps without being moved between the host and
		GPU, if you use the {device} value. This requires that your MPI is able
		to access GPU memory directly. Currently that is true for OpenMPI 1.8
		(or later versions), Mvapich2 1.9 (or later), and CrayMPI. If your
		script uses styles (e.g. fixes) which are not yet supported by the
		KOKKOS package, then data has to be move between the host and device
		anyway, so it is typically faster to let the host handle communication,
		by using the {host} value. Using {host} instead of {no} will enable use
		of multiple threads to pack/unpack communicated data.
		GPU, if you use the {device} value. If your script uses styles (e.g.
		fixes) which are not yet supported by the KOKKOS package, then data has
		to be move between the host and device anyway, so it is typically faster
		to let the host handle communication, by using the {host} value. Using
		{host} instead of {no} will enable use of multiple threads to
		pack/unpack communicated data. When running small systems on a GPU,
		performing the exchange pack/unpack on the host CPU can give speedup
		since it reduces the number of CUDA kernel launches.

		The {gpu/direct} keyword chooses whether GPU-direct will be used. When
		this keyword is set to {on}, buffers in GPU memory are passed directly
		@@ -533,7 +533,8 @@ the {gpu/direct} keyword is automatically set to {off} by default. When
		the {gpu/direct} keyword is set to {off} while any of the {comm}
		keywords are set to {device}, the value for these {comm} keywords will
		be automatically changed to {host}. This setting has no effect if not
		running on GPUs.
		running on GPUs. GPU-direct is available for OpenMPI 1.8 (or later
		versions), Mvapich2 1.9 (or later), and CrayMPI.

		:line

Admin message