Unverified Commit 7e1a3bd4 authored by Axel Kohlmeyer's avatar Axel Kohlmeyer Committed by GitHub
Browse files

Merge pull request #2302 from akohlmey/consistent-doc-headers

Consistent subsection headers for commands
parents 1879106c b776e1ee
Loading
Loading
Loading
Loading
+3 −1
Original line number Diff line number Diff line
@@ -136,6 +136,7 @@ src directory.
.. _cmake_presets:

CMake presets for installing many packages
""""""""""""""""""""""""""""""""""""""""""

Instead of specifying all the CMake options via the command-line,
CMake allows initializing its settings cache using script files.
@@ -168,7 +169,8 @@ one of them as a starting point and customize it to your needs.
   in a single cmake run, or change settings incrementally by running
   cmake with new flags.

**Example:**
Example
"""""""

.. code-block:: bash

+27 −14
Original line number Diff line number Diff line
@@ -12,7 +12,8 @@ When offloading to a co-processor from a CPU, the same routine is run
twice, once on the CPU and once with an offload flag. This allows
LAMMPS to run on the CPU cores and co-processor cores simultaneously.

**Currently Available USER-INTEL Styles:**
Currently Available USER-INTEL Styles
"""""""""""""""""""""""""""""""""""""

* Angle Styles: charmm, harmonic
* Bond Styles: fene, fourier, harmonic
@@ -31,9 +32,10 @@ LAMMPS to run on the CPU cores and co-processor cores simultaneously.
   support computing per-atom stress.  If any compute or fix in your
   input requires it, LAMMPS will abort with an error message.

**Speed-ups to expect:**
Speed-up to expect
"""""""""""""""""""

The speedups will depend on your simulation, the hardware, which
The speedup will depend on your simulation, the hardware, which
styles are used, the number of atoms, and the floating-point
precision mode. Performance improvements are shown compared to
LAMMPS *without using other acceleration packages* as these are
@@ -59,7 +61,8 @@ instructions to reproduce.

----------

**Accuracy and order of operations:**
Accuracy and order of operations
""""""""""""""""""""""""""""""""

In most molecular dynamics software, parallelization parameters
(# of MPI, OpenMP, and vectorization) can change the results due
@@ -96,7 +99,8 @@ mode should not be used without appropriate validation.

----------

**Quick Start for Experienced Users:**
Quick Start for Experienced Users
"""""""""""""""""""""""""""""""""

LAMMPS should be built with the USER-INTEL package installed.
Simulations should be run with 1 MPI task per physical *core*\ ,
@@ -136,7 +140,8 @@ For Intel Xeon Phi co-processors (Offload):

----------

**Required hardware/software:**
Required hardware/software
""""""""""""""""""""""""""

When using Intel compilers version 16.0 or later is required.

@@ -159,7 +164,8 @@ For best performance, we recommend that the MCDRAM is configured in
"Cache" mode can also be used, although the performance might be
slightly lower.

**Notes about Simultaneous Multithreading:**
Notes about Simultaneous Multithreading
"""""""""""""""""""""""""""""""""""""""

Modern CPUs often support Simultaneous Multithreading (SMT). On
Intel processors, this is called Hyper-Threading (HT) technology.
@@ -196,7 +202,8 @@ this information can normally be obtained with:

   cat /proc/cpuinfo

**Building LAMMPS with the USER-INTEL package:**
Building LAMMPS with the USER-INTEL package
"""""""""""""""""""""""""""""""""""""""""""

See the :ref:`Build extras <user-intel>` doc page for
instructions.  Some additional details are covered here.
@@ -263,7 +270,8 @@ recommended CCFLAG options for best performance are "-O2 -fno-alias
   in most of the example Makefiles is to use "-xHost", however this
   should not be used when cross-compiling.

**Running LAMMPS with the USER-INTEL package:**
Running LAMMPS with the USER-INTEL package
""""""""""""""""""""""""""""""""""""""""""

Running LAMMPS with the USER-INTEL package is similar to normal use
with the exceptions that one should 1) specify that LAMMPS should use
@@ -304,7 +312,8 @@ almost all cases.
   recommended, especially when running on a machine with Intel
   Hyper-Threading technology disabled.

**Run with the USER-INTEL package from the command line:**
Run with the USER-INTEL package from the command line
"""""""""""""""""""""""""""""""""""""""""""""""""""""

To enable USER-INTEL optimizations for all available styles used in
the input script, the "-sf intel" :doc:`command-line switch <Run_options>` can be used without any requirement for
@@ -339,7 +348,8 @@ launching MPI applications):
   mpirun -np 72 -ppn 36 lmp_machine -sf intel -in in.script                                 # 2 nodes, 36 MPI tasks/node, $OMP_NUM_THREADS OpenMP Threads
   mpirun -np 72 -ppn 36 lmp_machine -sf intel -in in.script -pk intel 0 omp 2 mode double   # Don't use any co-processors that might be available, use 2 OpenMP threads for each task, use double precision

**Or run with the USER-INTEL package by editing an input script:**
Or run with the USER-INTEL package by editing an input script
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

As an alternative to adding command-line arguments, the input script
can be edited to enable the USER-INTEL package. This requires adding
@@ -361,7 +371,8 @@ Alternatively, the :doc:`suffix intel <suffix>` command can be added to
the input script to enable USER-INTEL styles for the commands that
follow in the input script.

**Tuning for Performance:**
Tuning for Performance
""""""""""""""""""""""

.. note::

@@ -431,7 +442,8 @@ series processors will always perform better using MCDRAM. Please
consult your system documentation for the best approach to specify
that MPI runs are performed in MCDRAM.

**Tuning for Offload Performance:**
Tuning for Offload Performance
""""""""""""""""""""""""""""""

The default settings for offload should give good performance.

@@ -521,7 +533,8 @@ the pair styles in the USER-INTEL package currently support the
:doc:`run_style respa <run_style>` command; only the "pair" option is
supported.

**References:**
References
""""""""""

* Brown, W.M., Carrillo, J.-M.Y., Mishra, B., Gavhane, N., Thakkar, F.M., De Kraker, A.R., Yamada, M., Ang, J.A., Plimpton, S.J., "Optimizing Classical Molecular Dynamics in LAMMPS," in Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition, J. Jeffers, J. Reinders, A. Sodani, Eds. Morgan Kaufmann.
* Brown, W. M., Semin, A., Hebenstreit, M., Khvostov, S., Raman, K., Plimpton, S.J. `Increasing Molecular Dynamics Simulation Rates with an 8-Fold Increase in Electrical Power Efficiency. <http://dl.acm.org/citation.cfm?id=3014915>`_ 2016 High Performance Computing, Networking, Storage and Analysis, SC16: International Conference (pp. 82-95).
+12 −6
Original line number Diff line number Diff line
@@ -8,18 +8,21 @@ improper), several Kspace styles, and a few fix styles. It uses
the OpenMP interface for multi-threading, but can also be compiled
without OpenMP support, providing optimized serial styles in that case.

**Required hardware/software:**
Required hardware/software
""""""""""""""""""""""""""

To enable multi-threading, your compiler must support the OpenMP interface.
You should have one or more multi-core CPUs, as multiple threads can only be
launched by each MPI task on the local node (using shared memory).

**Building LAMMPS with the USER-OMP package:**
Building LAMMPS with the USER-OMP package
"""""""""""""""""""""""""""""""""""""""""

See the :ref:`Build extras <user-omp>` doc page for
instructions.

**Run with the USER-OMP package from the command line:**
Run with the USER-OMP package from the command line
"""""""""""""""""""""""""""""""""""""""""""""""""""

These examples assume one or more 16-core nodes.

@@ -52,7 +55,8 @@ details, including the default values used if it is not specified. It
also gives more details on how to set the number of threads via the
OMP_NUM_THREADS environment variable.

**Or run with the USER-OMP package by editing an input script:**
Or run with the USER-OMP package by editing an input script
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

The discussion above for the mpirun/mpiexec command, MPI tasks/node,
and threads/MPI task is the same.
@@ -70,7 +74,8 @@ per MPI task to use. The command doc page explains other options and
how to set the number of threads via the OMP_NUM_THREADS environment
variable.

**Speed-ups to expect:**
Speed-up to expect
""""""""""""""""""

Depending on which styles are accelerated, you should look for a
reduction in the "Pair time", "Bond time", "KSpace time", and "Loop
@@ -92,7 +97,8 @@ sub-section.
A description of the multi-threading strategy used in the USER-OMP
package and some performance examples are `presented here <http://sites.google.com/site/akohlmey/software/lammps-icms/lammps-icms-tms2011-talk.pdf?attredirects=0&d=1>`_

**Guidelines for best performance:**
Guidelines for best performance
"""""""""""""""""""""""""""""""

For many problems on current generation CPUs, running the USER-OMP
package with a single thread/task is faster than running with multiple
+13 −7
Original line number Diff line number Diff line
@@ -7,15 +7,18 @@ Technologies). It contains a handful of pair styles whose compute()
methods were rewritten in C++ templated form to reduce the overhead
due to if tests and other conditional code.

**Required hardware/software:**
Required hardware/software
""""""""""""""""""""""""""

None.
Any hardware. Any compiler.

**Building LAMMPS with the OPT package:**
Building LAMMPS with the OPT package
""""""""""""""""""""""""""""""""""""

See the :ref:`Build extras <opt>` doc page for instructions.

**Run with the OPT package from the command line:**
Run with the OPT package from the command line
""""""""""""""""""""""""""""""""""""""""""""""

.. code-block:: bash

@@ -25,7 +28,8 @@ See the :ref:`Build extras <opt>` doc page for instructions.
Use the "-sf opt" :doc:`command-line switch <Run_options>`, which will
automatically append "opt" to styles that support it.

**Or run with the OPT package by editing an input script:**
Or run with the OPT package by editing an input script
""""""""""""""""""""""""""""""""""""""""""""""""""""""

Use the :doc:`suffix opt <suffix>` command, or you can explicitly add an
"opt" suffix to individual styles in your input script, e.g.
@@ -34,13 +38,15 @@ Use the :doc:`suffix opt <suffix>` command, or you can explicitly add an

   pair_style lj/cut/opt 2.5

**Speed-ups to expect:**
Speed-up to expect
""""""""""""""""""

You should see a reduction in the "Pair time" value printed at the end
of a run.  On most machines for reasonable problem sizes, it will be a
5 to 20% savings.

**Guidelines for best performance:**
Guidelines for best performance
"""""""""""""""""""""""""""""""

Just try out an OPT pair style to see how it performs.

+4 −1
Original line number Diff line number Diff line
@@ -92,7 +92,10 @@ Related commands

:doc:`angle_coeff <angle_coeff>`

**Default:** none
Default
"""""""

none

----------

Loading