Commit b6187173 authored by Christoph Junghans's avatar Christoph Junghans
Browse files

Merge branch 'master' into HEAD

parents 11eed234 88a33edb
Loading
Loading
Loading
Loading
+8 −2
Original line number Diff line number Diff line
@@ -37,6 +37,10 @@ enable_language(CXX)
#####################################################################
include(CheckCCompilerFlag)

if (${CMAKE_CXX_COMPILER_ID} STREQUAL "Intel")
  set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -restrict")
endif()

########################################################################
# User input options                                                   #
########################################################################
@@ -76,7 +80,7 @@ add_definitions(-DLAMMPS_MEMALIGN=${LAMMPS_MEMALIGN})
option(LAMMPS_EXCEPTIONS "enable the use of C++ exceptions for error messages (useful for library interface)" OFF)
if(LAMMPS_EXCEPTIONS)
  add_definitions(-DLAMMPS_EXCEPTIONS)
  set(LAMMPS_API_DEFINES "${LAMMPS_API_DEFINES -DLAMMPS_EXCEPTIONS")
  set(LAMMPS_API_DEFINES "${LAMMPS_API_DEFINES} -DLAMMPS_EXCEPTIONS")
endif()

set(LAMMPS_MACHINE "" CACHE STRING "Suffix to append to lmp binary and liblammps (WON'T enable any features automatically")
@@ -665,7 +669,9 @@ include_directories(${LAMMPS_STYLE_HEADERS_DIR})
############################################
add_library(lammps ${LIB_SOURCES})
target_link_libraries(lammps ${LAMMPS_LINK_LIBS})
if(LAMMPS_DEPS)
  add_dependencies(lammps ${LAMMPS_DEPS})
endif()
set_target_properties(lammps PROPERTIES OUTPUT_NAME lammps${LAMMPS_MACHINE})
if(BUILD_SHARED_LIBS)
  set_target_properties(lammps PROPERTIES SOVERSION ${SOVERSION})
−963 B (19.1 KiB)
Loading image diff...
+2 −1
Original line number Diff line number Diff line
@@ -706,7 +706,7 @@ dynamics can be run with LAMMPS using density-functional tight-binding
quantum forces calculated by LATTE.

More information on LATTE can be found at this web site:
"https://github.com/lanl/LATTE"_#latte_home.  A brief technical
"https://github.com/lanl/LATTE"_latte_home.  A brief technical
description is given with the "fix latte"_fix_latte.html command.

:link(latte_home,https://github.com/lanl/LATTE)
@@ -729,6 +729,7 @@ make lib-latte args="-b" # download and build in lib/latte/LATTE-
make lib-latte args="-p $HOME/latte"    # use existing LATTE installation in $HOME/latte
make lib-latte args="-b -m gfortran"    # download and build in lib/latte and 
                                        #   copy Makefile.lammps.gfortran to Makefile.lammps
:pre

Note that 3 symbolic (soft) links, "includelink" and "liblink" and
"filelink", are created in lib/latte to point into the LATTE home dir.
+30 −21
Original line number Diff line number Diff line
@@ -25,14 +25,14 @@ LAMMPS to run on the CPU cores and coprocessor cores simultaneously.
[Currently Available USER-INTEL Styles:]

Angle Styles: charmm, harmonic :ulb,l
Bond Styles: fene, harmonic :l
Bond Styles: fene, fourier, harmonic :l
Dihedral Styles: charmm, harmonic, opls :l
Fixes: nve, npt, nvt, nvt/sllod :l
Fixes: nve, npt, nvt, nvt/sllod, nve/asphere :l
Improper Styles: cvff, harmonic :l
Pair Styles: airebo, airebo/morse, buck/coul/cut, buck/coul/long, 
buck, eam, eam/alloy, eam/fs, gayberne, lj/charmm/coul/charmm, 
lj/charmm/coul/long, lj/cut, lj/cut/coul/long, lj/long/coul/long, rebo,
sw, tersoff :l
buck, dpd, eam, eam/alloy, eam/fs, gayberne, lj/charmm/coul/charmm, 
lj/charmm/coul/long, lj/cut, lj/cut/coul/long, lj/long/coul/long, 
rebo, sw, tersoff :l
K-Space Styles: pppm, pppm/disp :l
:ule

@@ -54,11 +54,12 @@ warmup run (for use with offload benchmarks).
:c,image(JPG/user_intel.png)

Results are speedups obtained on Intel Xeon E5-2697v4 processors
(code-named Broadwell) and Intel Xeon Phi 7250 processors
(code-named Knights Landing) with "June 2017" LAMMPS built with
Intel Parallel Studio 2017 update 2. Results are with 1 MPI task
per physical core. See {src/USER-INTEL/TEST/README} for the raw
simulation rates and instructions to reproduce.
(code-named Broadwell), Intel Xeon Phi 7250 processors (code-named
Knights Landing), and Intel Xeon Gold 6148 processors (code-named
Skylake) with "June 2017" LAMMPS built with Intel Parallel Studio
2017 update 2. Results are with 1 MPI task per physical core. See
{src/USER-INTEL/TEST/README} for the raw simulation rates and
instructions to reproduce.

:line

@@ -82,6 +83,11 @@ this order :l
The {newton} setting applies to all atoms, not just atoms shared
between MPI tasks :l
Vectorization can change the order for adding pairwise forces :l
When using the -DLMP_USE_MKL_RNG define (all included intel optimized
makefiles do) at build time, the random number generator for
dissipative particle dynamics (pair style dpd/intel) uses the Mersenne
Twister generator included in the Intel MKL library (that should be
more robust than the default Masaglia random number generator) :l
:ule

The precision mode (described below) used with the USER-INTEL
@@ -119,8 +125,8 @@ For Intel Xeon Phi CPUs:
Runs should be performed using MCDRAM. :ulb,l
:ule

For simulations using {kspace_style pppm} on Intel CPUs
supporting AVX-512:
For simulations using {kspace_style pppm} on Intel CPUs supporting
AVX-512:

Add "kspace_modify diff ad" to the input script :ulb,l
The command-line option should be changed to
@@ -237,14 +243,17 @@ However, if you do not have coprocessors on your system, building
without offload support will produce a smaller binary.

The general requirements for Makefiles with the USER-INTEL package
are as follows. "-DLAMMPS_MEMALIGN=64" is required for CCFLAGS. When
using Intel compilers, "-restrict" is required and "-qopenmp" is
highly recommended for CCFLAGS and LINKFLAGS. LIB should include
"-ltbbmalloc". For builds supporting offload, "-DLMP_INTEL_OFFLOAD"
is required for CCFLAGS and "-qoffload" is required for LINKFLAGS.
Other recommended CCFLAG options for best performance are
"-O2 -fno-alias -ansi-alias -qoverride-limits fp-model fast=2
-no-prec-div".
are as follows. When using Intel compilers, "-restrict" is required 
and "-qopenmp" is highly recommended for CCFLAGS and LINKFLAGS. 
CCFLAGS should include "-DLMP_INTEL_USELRT" (unless POSIX Threads
are not supported in the build environment) and "-DLMP_USE_MKL_RNG"
(unless Intel Math Kernel Library (MKL) is not available in the build
environment). For Intel compilers, LIB should include "-ltbbmalloc" 
or if the library is not available, "-DLMP_INTEL_NO_TBB" can be added
to CCFLAGS. For builds supporting offload, "-DLMP_INTEL_OFFLOAD" is
required for CCFLAGS and "-qoffload" is required for LINKFLAGS. Other
recommended CCFLAG options for best performance are "-O2 -fno-alias
-ansi-alias -qoverride-limits fp-model fast=2 -no-prec-div".

NOTE: The vectorization and math capabilities can differ depending on
the CPU. For Intel compilers, the "-x" flag specifies the type of
+29 −25
Original line number Diff line number Diff line
@@ -16,7 +16,7 @@ atom_modify keyword values ... :pre
one or more keyword/value pairs may be appended :ulb,l
keyword = {id} or {map} or {first} or {sort} :l
   {id} value = {yes} or {no}
   {map} value = {array} or {hash}
   {map} value = {yes} or {array} or {hash}
   {first} value = group-ID = group whose atoms will appear first in internal atom lists
   {sort} values = Nfreq binsize
     Nfreq = sort atoms spatially every this many time steps
@@ -25,8 +25,8 @@ keyword = {id} or {map} or {first} or {sort} :l

[Examples:]

atom_modify map hash
atom_modify map array sort 10000 2.0
atom_modify map yes
atom_modify map hash sort 10000 2.0
atom_modify first colloid :pre

[Description:]
@@ -62,29 +62,33 @@ switch. This is described in "Section 2.2"_Section_start.html#start_2
of the manual.  If atom IDs are not used, they must be specified as 0
for all atoms, e.g. in a data or restart file.

The {map} keyword determines how atom ID lookup is done for molecular
atom styles.  Lookups are performed by bond (angle, etc) routines in
LAMMPS to find the local atom index associated with a global atom ID.

When the {array} value is used, each processor stores a lookup table
of length N, where N is the largest atom ID in the system.  This is a
The {map} keyword determines how atoms with specific IDs are found
when required.  An example are the bond (angle, etc) methods which
need to find the local index of an atom with a specific global ID
which is a bond (angle, etc) partner.  LAMMPS performs this operation
efficiently by creating a "map", which is either an {array} or {hash}
table, as descibed below.

When the {map} keyword is not specified in your input script, LAMMPS
only creates a map for "atom_styles"_atom_style.html for molecular
systems which have permanent bonds (angles, etc).  No map is created
for atomic systems, since it is normally not needed.  However some
LAMMPS commands require a map, even for atomic systems, and will
generate an error if one does not exist.  The {map} keyword thus
allows you to force the creation of a map.  The {yes} value will
create either an {array} or {hash} style map, as explained in the next
paragraph.  The {array} and {hash} values create an atom-style or
hash-style map respectively.

For an {array}-style map, each processor stores a lookup table of
length N, where N is the largest atom ID in the system.  This is a
fast, simple method for many simulations, but requires too much memory
for large simulations.  The {hash} value uses a hash table to perform
the lookups.  This can be slightly slower than the {array} method, but
its memory cost is proportional to the number of atoms owned by a
processor, i.e. N/P when N is the total number of atoms in the system
and P is the number of processors.

When this setting is not specified in your input script, LAMMPS
creates a map, if one is needed, as an array or hash.  See the
discussion of default values below for how LAMMPS chooses which kind
of map to build.  Note that atomic systems do not normally need to
create a map.  However, even in this case some LAMMPS commands will
create a map to find atoms (and then destroy it), or require a
permanent map.  An example of the former is the "velocity loop
all"_velocity.html command, which uses a map when looping over all
atoms and insuring the same velocity values are assigned to an atom
ID, no matter which processor owns it.
for large simulations.  For a {hash}-style map, a hash table is
created on each processor, which finds an atom ID in constant time
(independent of the global number of atom IDs).  It can be slightly
slower than the {array} map, but its memory cost is proportional to
the number of atoms owned by a processor, i.e. N/P when N is the total
number of atoms in the system and P is the number of processors.

The {first} keyword allows a "group"_group.html to be specified whose
atoms will be maintained as the first atoms in each processor's list
Loading