Commit 3cd597e9 authored by sjplimp's avatar sjplimp Committed by GitHub
Browse files

Merge pull request #585 from akohlmey/make-py-manual-cleanup

Make.py removal and manual cleanup
parents eca61226 bdd2f3a6
Loading
Loading
Loading
Loading
+7 −59
Original line number Diff line number Diff line
These are input scripts used to run versions of several of the
benchmarks in the top-level bench directory using the GPU and
USER-CUDA accelerator packages.  The results of running these scripts
on two different machines (a desktop with 2 Tesla GPUs and the ORNL
Titan supercomputer) are shown on the "GPU (Fermi)" section of the
Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench.
benchmarks in the top-level bench directory using the GPU accelerator
package.  The results of running these scripts on two different machines
(a desktop with 2 Tesla GPUs and the ORNL Titan supercomputer) are shown
on the "GPU (Fermi)" section of the Benchmark page of the LAMMPS WWW
site: lammps.sandia.gov/bench.

Examples are shown below of how to run these scripts.  This assumes
you have built 3 executables with both the GPU and USER-CUDA packages
you have built 3 executables with the GPU package
installed, e.g.

lmp_linux_single
lmp_linux_mixed
lmp_linux_double

The precision (single, mixed, double) refers to the GPU and USER-CUDA
package precision.  See the README files in the lib/gpu and lib/cuda
directories for instructions on how to build the packages with
different precisions.  The GPU and USER-CUDA sub-sections of the
doc/Section_accelerate.html file also describes this process.

Make.py -d ~/lammps -j 16 -p #all orig -m linux -o cpu -a exe
Make.py -d ~/lammps -j 16 -p #all opt orig -m linux -o opt -a exe
Make.py -d ~/lammps -j 16 -p #all omp orig -m linux -o omp -a exe
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
        -gpu mode=double arch=20 -o gpu_double -a libs exe
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
        -gpu mode=mixed arch=20 -o gpu_mixed -a libs exe
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
        -gpu mode=single arch=20 -o gpu_single -a libs exe
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
        -cuda mode=double arch=20 -o cuda_double -a libs exe
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
        -cuda mode=mixed arch=20 -o cuda_mixed -a libs exe
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
        -cuda mode=single arch=20 -o cuda_single -a libs exe
Make.py -d ~/lammps -j 16 -p #all intel orig -m linux -o intel_cpu -a exe
Make.py -d ~/lammps -j 16 -p #all kokkos orig -m linux -o kokkos_omp -a exe
Make.py -d ~/lammps -j 16 -p #all kokkos orig -kokkos cuda arch=20 \
        -m cuda -o kokkos_cuda -a exe

Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
        -gpu mode=double arch=20 -cuda mode=double arch=20 -m linux \
        -o all -a libs exe

Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
        -kokkos cuda arch=20 -gpu mode=double arch=20 \
        -cuda mode=double arch=20 -m cuda -o all_cuda -a libs exe

------------------------------------------------------------------------

To run on just CPUs (without using the GPU or USER-CUDA styles),
To run on just CPUs (without using the GPU styles),
do something like the following:

mpirun -np 1 lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj
@@ -81,23 +47,5 @@ node via a "-ppn" setting.

------------------------------------------------------------------------

To run with the USER-CUDA package, do something like the following:

mpirun -np 1 lmp_linux_single -c on -sf cuda -v x 16 -v y 16 -v z 16 -v t 100 < in.lj
mpirun -np 2 lmp_linux_double -c on -sf cuda -pk cuda 2 -v x 32 -v y 64 -v z 64 -v t 100 < in.eam

The "xyz" settings determine the problem size.  The "t" setting
determines the number of timesteps.  The "np" setting determines how
many MPI tasks (per node) the problem will run on.  The numeric
argument to the "-pk" setting is the number of GPUs (per node); 1 GPU
is the default.  Note that the number of MPI tasks must equal the
number of GPUs (both per node) with the USER-CUDA package.

These mpirun commands run on a single node.  To run on multiple nodes,
scale up the "-np" setting, and control the number of MPI tasks per
node via a "-ppn" setting.

------------------------------------------------------------------------

If the script has "titan" in its name, it was run on the Titan
supercomputer at ORNL.
+15 −31
Original line number Diff line number Diff line
@@ -71,49 +71,33 @@ integration

----------------------------------------------------------------------

Here is a src/Make.py command which will perform a parallel build of a
LAMMPS executable "lmp_mpi" with all the packages needed by all the
examples.  This assumes you have an MPI installed on your machine so
that "mpicxx" can be used as the wrapper compiler.  It also assumes
you have an Intel compiler to use as the base compiler.  You can leave
off the "-cc mpi wrap=icc" switch if that is not the case.  You can
also leave off the "-fft fftw3" switch if you do not have the FFTW
(v3) installed as an FFT package, in which case the default KISS FFT
library will be used.

cd src
Make.py -j 16 -p none molecule manybody kspace granular rigid orig \
  -cc mpi wrap=icc -fft fftw3 -a file mpi

----------------------------------------------------------------------

Here is how to run each problem, assuming the LAMMPS executable is
named lmp_mpi, and you are using the mpirun command to launch parallel
runs:

Serial (one processor runs):

lmp_mpi < in.lj
lmp_mpi < in.chain
lmp_mpi < in.eam
lmp_mpi < in.chute
lmp_mpi < in.rhodo
lmp_mpi -in in.lj
lmp_mpi -in in.chain
lmp_mpi -in in.eam
lmp_mpi -in in.chute
lmp_mpi -in in.rhodo

Parallel fixed-size runs (on 8 procs in this case):

mpirun -np 8 lmp_mpi < in.lj
mpirun -np 8 lmp_mpi < in.chain
mpirun -np 8 lmp_mpi < in.eam
mpirun -np 8 lmp_mpi < in.chute
mpirun -np 8 lmp_mpi < in.rhodo
mpirun -np 8 lmp_mpi -in in.lj
mpirun -np 8 lmp_mpi -in in.chain
mpirun -np 8 lmp_mpi -in in.eam
mpirun -np 8 lmp_mpi -in in.chute
mpirun -np 8 lmp_mpi -in in.rhodo

Parallel scaled-size runs (on 16 procs in this case):

mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.lj
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.chain.scaled
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.eam
mpirun -np 16 lmp_mpi -var x 4 -var y 4 < in.chute.scaled
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.rhodo.scaled
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.lj
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.chain.scaled
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.eam
mpirun -np 16 lmp_mpi -var x 4 -var y 4 -in in.chute.scaled
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.rhodo.scaled

For each of the scaled-size runs you must set 3 variables as -var
command line switches.  The variables x,y,z are used in the input
+0 −1
Original line number Diff line number Diff line
@@ -261,7 +261,6 @@ END_RST -->
:link(start_6,Section_start.html#start_6)
:link(start_7,Section_start.html#start_7)
:link(start_8,Section_start.html#start_8)
:link(start_9,Section_start.html#start_9)

:link(cmd_1,Section_commands.html#cmd_1)
:link(cmd_2,Section_commands.html#cmd_2)
+5 −5
Original line number Diff line number Diff line
@@ -56,7 +56,7 @@ timings; you can simply extrapolate from short runs.

For the set of runs, look at the timing data printed to the screen and
log file at the end of each LAMMPS run.  "This
section"_Section_start.html#start_8 of the manual has an overview.
section"_Section_start.html#start_7 of the manual has an overview.

Running on one (or a few processors) should give a good estimate of
the serial performance and what portions of the timestep are taking
@@ -226,16 +226,16 @@ re-build LAMMPS |
  make machine |
prepare and test a regular LAMMPS simulation |
  lmp_machine -in in.script; mpirun -np 32 lmp_machine -in in.script |
enable specific accelerator support via '-k on' "command-line switch"_Section_start.html#start_7, |
enable specific accelerator support via '-k on' "command-line switch"_Section_start.html#start_6, |
  only needed for KOKKOS package |
set any needed options for the package via "-pk" "command-line switch"_Section_start.html#start_7 or "package"_package.html command, |
set any needed options for the package via "-pk" "command-line switch"_Section_start.html#start_6 or "package"_package.html command, |
  only if defaults need to be changed |
use accelerated styles in your input via "-sf" "command-line switch"_Section_start.html#start_7 or "suffix"_suffix.html command | lmp_machine -in in.script -sf gpu
use accelerated styles in your input via "-sf" "command-line switch"_Section_start.html#start_6 or "suffix"_suffix.html command | lmp_machine -in in.script -sf gpu
:tb(c=2,s=|)

Note that the first 4 steps can be done as a single command, using the
src/Make.py tool.  This tool is discussed in "Section
2.4"_Section_start.html#start_4 of the manual, and its use is
4"_Section_packages.html of the manual, and its use is
illustrated in the individual accelerator sections.  Typically these
steps only need to be done once, to create an executable that uses one
or more accelerator packages.
+1 −1
Original line number Diff line number Diff line
@@ -71,7 +71,7 @@ style", with ... being fix, compute, pair, etc, it means that you
mistyped the style name or that the command is part of an optional
package which was not compiled into your executable.  The list of
available styles in your executable can be listed by using "the -h
command-line argument"_Section_start.html#start_7.  The installation
command-line argument"_Section_start.html#start_6.  The installation
and compilation of optional packages is explained in the "installation
instructions"_Section_start.html#start_3.

Loading