Commit 4cfa88b7 authored by Steve Plimpton's avatar Steve Plimpton Committed by GitHub
Browse files

Merge pull request #674 from wmbrownIntel/user-intel-update

Mike Brown has added DPD to the USER-INTEL package with speedups over 3X for Xeon Phi and over 1.7X for some Xeon processors. 
parents 439c2fd9 05847a0e
Loading
Loading
Loading
Loading
−963 B (19.1 KiB)
Loading image diff...
+30 −21
Original line number Diff line number Diff line
@@ -25,14 +25,14 @@ LAMMPS to run on the CPU cores and coprocessor cores simultaneously.
[Currently Available USER-INTEL Styles:]

Angle Styles: charmm, harmonic :ulb,l
Bond Styles: fene, harmonic :l
Bond Styles: fene, fourier, harmonic :l
Dihedral Styles: charmm, harmonic, opls :l
Fixes: nve, npt, nvt, nvt/sllod :l
Fixes: nve, npt, nvt, nvt/sllod, nve/asphere :l
Improper Styles: cvff, harmonic :l
Pair Styles: airebo, airebo/morse, buck/coul/cut, buck/coul/long, 
buck, eam, eam/alloy, eam/fs, gayberne, lj/charmm/coul/charmm, 
lj/charmm/coul/long, lj/cut, lj/cut/coul/long, lj/long/coul/long, rebo,
sw, tersoff :l
buck, dpd, eam, eam/alloy, eam/fs, gayberne, lj/charmm/coul/charmm, 
lj/charmm/coul/long, lj/cut, lj/cut/coul/long, lj/long/coul/long, 
rebo, sw, tersoff :l
K-Space Styles: pppm, pppm/disp :l
:ule

@@ -54,11 +54,12 @@ warmup run (for use with offload benchmarks).
:c,image(JPG/user_intel.png)

Results are speedups obtained on Intel Xeon E5-2697v4 processors
(code-named Broadwell) and Intel Xeon Phi 7250 processors
(code-named Knights Landing) with "June 2017" LAMMPS built with
Intel Parallel Studio 2017 update 2. Results are with 1 MPI task
per physical core. See {src/USER-INTEL/TEST/README} for the raw
simulation rates and instructions to reproduce.
(code-named Broadwell), Intel Xeon Phi 7250 processors (code-named
Knights Landing), and Intel Xeon Gold 6148 processors (code-named
Skylake) with "June 2017" LAMMPS built with Intel Parallel Studio
2017 update 2. Results are with 1 MPI task per physical core. See
{src/USER-INTEL/TEST/README} for the raw simulation rates and
instructions to reproduce.

:line

@@ -82,6 +83,11 @@ this order :l
The {newton} setting applies to all atoms, not just atoms shared
between MPI tasks :l
Vectorization can change the order for adding pairwise forces :l
When using the -DLMP_USE_MKL_RNG define (all included intel optimized
makefiles do) at build time, the random number generator for
dissipative particle dynamics (pair style dpd/intel) uses the Mersenne
Twister generator included in the Intel MKL library (that should be
more robust than the default Masaglia random number generator) :l
:ule

The precision mode (described below) used with the USER-INTEL
@@ -119,8 +125,8 @@ For Intel Xeon Phi CPUs:
Runs should be performed using MCDRAM. :ulb,l
:ule

For simulations using {kspace_style pppm} on Intel CPUs
supporting AVX-512:
For simulations using {kspace_style pppm} on Intel CPUs supporting
AVX-512:

Add "kspace_modify diff ad" to the input script :ulb,l
The command-line option should be changed to
@@ -237,14 +243,17 @@ However, if you do not have coprocessors on your system, building
without offload support will produce a smaller binary.

The general requirements for Makefiles with the USER-INTEL package
are as follows. "-DLAMMPS_MEMALIGN=64" is required for CCFLAGS. When
using Intel compilers, "-restrict" is required and "-qopenmp" is
highly recommended for CCFLAGS and LINKFLAGS. LIB should include
"-ltbbmalloc". For builds supporting offload, "-DLMP_INTEL_OFFLOAD"
is required for CCFLAGS and "-qoffload" is required for LINKFLAGS.
Other recommended CCFLAG options for best performance are
"-O2 -fno-alias -ansi-alias -qoverride-limits fp-model fast=2
-no-prec-div".
are as follows. When using Intel compilers, "-restrict" is required 
and "-qopenmp" is highly recommended for CCFLAGS and LINKFLAGS. 
CCFLAGS should include "-DLMP_INTEL_USELRT" (unless POSIX Threads
are not supported in the build environment) and "-DLMP_USE_MKL_RNG"
(unless Intel Math Kernel Library (MKL) is not available in the build
environment). For Intel compilers, LIB should include "-ltbbmalloc" 
or if the library is not available, "-DLMP_INTEL_NO_TBB" can be added
to CCFLAGS. For builds supporting offload, "-DLMP_INTEL_OFFLOAD" is
required for CCFLAGS and "-qoffload" is required for LINKFLAGS. Other
recommended CCFLAG options for best performance are "-O2 -fno-alias
-ansi-alias -qoverride-limits fp-model fast=2 -no-prec-div".

NOTE: The vectorization and math capabilities can differ depending on
the CPU. For Intel compilers, the "-x" flag specifies the type of
+1 −0
Original line number Diff line number Diff line
@@ -7,6 +7,7 @@
:line

dihedral_style fourier command :h3
dihedral_style fourier/intel command :h3
dihedral_style fourier/omp command :h3

[Syntax:]
+1 −0
Original line number Diff line number Diff line
@@ -8,6 +8,7 @@

pair_style dpd command :h3
pair_style dpd/gpu command :h3
pair_style dpd/intel command :h3
pair_style dpd/omp command :h3
pair_style dpd/tstat command :h3
pair_style dpd/tstat/gpu command :h3
+4 −3
Original line number Diff line number Diff line
@@ -15,13 +15,14 @@ SHELL = /bin/sh

CC =		CC
OPTFLAGS =      -xMIC-AVX512 -O2 -fp-model fast=2 -no-prec-div -qoverride-limits
CCFLAGS =	-g -qopenmp -DLAMMPS_MEMALIGN=64 -qno-offload \
                -fno-alias -ansi-alias -restrict $(OPTFLAGS) -DLMP_INTEL_NO_TBB
CCFLAGS =	-qopenmp -qno-offload -fno-alias -ansi-alias -restrict \
                -DLMP_INTEL_USELRT -DLMP_USE_MKL_RNG -DLMP_INTEL_NO_TBB \
                $(OPTFLAGS)
SHFLAGS =	-fPIC
DEPFLAGS =	-M

LINK =		CC
LINKFLAGS =	-g -qopenmp $(OPTFLAGS)
LINKFLAGS =	-qopenmp $(OPTFLAGS)
LIB =           
SIZE =		size

Loading