Merge pull request #674 from wmbrownIntel/user-intel-update (4cfa88b7) · Commits · 郑智淋 / lammps

doc/src/JPG/user_intel.png

−963 B (19.1 KiB)

Loading image diff...

doc/src/accelerate_intel.txt

+30 −21

Original line number	Diff line number	Diff line
		@@ -25,14 +25,14 @@ LAMMPS to run on the CPU cores and coprocessor cores simultaneously.
		[Currently Available USER-INTEL Styles:]

		Angle Styles: charmm, harmonic :ulb,l
		Bond Styles: fene, harmonic :l
		Bond Styles: fene, fourier, harmonic :l
		Dihedral Styles: charmm, harmonic, opls :l
		Fixes: nve, npt, nvt, nvt/sllod :l
		Fixes: nve, npt, nvt, nvt/sllod, nve/asphere :l
		Improper Styles: cvff, harmonic :l
		Pair Styles: airebo, airebo/morse, buck/coul/cut, buck/coul/long,
		buck, eam, eam/alloy, eam/fs, gayberne, lj/charmm/coul/charmm,
		lj/charmm/coul/long, lj/cut, lj/cut/coul/long, lj/long/coul/long, rebo,
		sw, tersoff :l
		buck, dpd, eam, eam/alloy, eam/fs, gayberne, lj/charmm/coul/charmm,
		lj/charmm/coul/long, lj/cut, lj/cut/coul/long, lj/long/coul/long,
		rebo, sw, tersoff :l
		K-Space Styles: pppm, pppm/disp :l
		:ule

		@@ -54,11 +54,12 @@ warmup run (for use with offload benchmarks).
		:c,image(JPG/user_intel.png)

		Results are speedups obtained on Intel Xeon E5-2697v4 processors
		(code-named Broadwell) and Intel Xeon Phi 7250 processors
		(code-named Knights Landing) with "June 2017" LAMMPS built with
		Intel Parallel Studio 2017 update 2. Results are with 1 MPI task
		per physical core. See {src/USER-INTEL/TEST/README} for the raw
		simulation rates and instructions to reproduce.
		(code-named Broadwell), Intel Xeon Phi 7250 processors (code-named
		Knights Landing), and Intel Xeon Gold 6148 processors (code-named
		Skylake) with "June 2017" LAMMPS built with Intel Parallel Studio
		2017 update 2. Results are with 1 MPI task per physical core. See
		{src/USER-INTEL/TEST/README} for the raw simulation rates and
		instructions to reproduce.

		:line

		@@ -82,6 +83,11 @@ this order :l
		The {newton} setting applies to all atoms, not just atoms shared
		between MPI tasks :l
		Vectorization can change the order for adding pairwise forces :l
		When using the -DLMP_USE_MKL_RNG define (all included intel optimized
		makefiles do) at build time, the random number generator for
		dissipative particle dynamics (pair style dpd/intel) uses the Mersenne
		Twister generator included in the Intel MKL library (that should be
		more robust than the default Masaglia random number generator) :l
		:ule

		The precision mode (described below) used with the USER-INTEL
		@@ -119,8 +125,8 @@ For Intel Xeon Phi CPUs:
		Runs should be performed using MCDRAM. :ulb,l
		:ule

		For simulations using {kspace_style pppm} on Intel CPUs
		supporting AVX-512:
		For simulations using {kspace_style pppm} on Intel CPUs supporting
		AVX-512:

		Add "kspace_modify diff ad" to the input script :ulb,l
		The command-line option should be changed to
		@@ -237,14 +243,17 @@ However, if you do not have coprocessors on your system, building
		without offload support will produce a smaller binary.

		The general requirements for Makefiles with the USER-INTEL package
		are as follows. "-DLAMMPS_MEMALIGN=64" is required for CCFLAGS. When
		using Intel compilers, "-restrict" is required and "-qopenmp" is
		highly recommended for CCFLAGS and LINKFLAGS. LIB should include
		"-ltbbmalloc". For builds supporting offload, "-DLMP_INTEL_OFFLOAD"
		is required for CCFLAGS and "-qoffload" is required for LINKFLAGS.
		Other recommended CCFLAG options for best performance are
		"-O2 -fno-alias -ansi-alias -qoverride-limits fp-model fast=2
		-no-prec-div".
		are as follows. When using Intel compilers, "-restrict" is required
		and "-qopenmp" is highly recommended for CCFLAGS and LINKFLAGS.
		CCFLAGS should include "-DLMP_INTEL_USELRT" (unless POSIX Threads
		are not supported in the build environment) and "-DLMP_USE_MKL_RNG"
		(unless Intel Math Kernel Library (MKL) is not available in the build
		environment). For Intel compilers, LIB should include "-ltbbmalloc"
		or if the library is not available, "-DLMP_INTEL_NO_TBB" can be added
		to CCFLAGS. For builds supporting offload, "-DLMP_INTEL_OFFLOAD" is
		required for CCFLAGS and "-qoffload" is required for LINKFLAGS. Other
		recommended CCFLAG options for best performance are "-O2 -fno-alias
		-ansi-alias -qoverride-limits fp-model fast=2 -no-prec-div".

		NOTE: The vectorization and math capabilities can differ depending on
		the CPU. For Intel compilers, the "-x" flag specifies the type of

doc/src/dihedral_fourier.txt

+1 −0

Original line number	Diff line number	Diff line
		@@ -7,6 +7,7 @@
		:line

		dihedral_style fourier command :h3
		dihedral_style fourier/intel command :h3
		dihedral_style fourier/omp command :h3

		[Syntax:]

doc/src/pair_dpd.txt

+1 −0

Original line number	Diff line number	Diff line
		@@ -8,6 +8,7 @@

		pair_style dpd command :h3
		pair_style dpd/gpu command :h3
		pair_style dpd/intel command :h3
		pair_style dpd/omp command :h3
		pair_style dpd/tstat command :h3
		pair_style dpd/tstat/gpu command :h3

src/MAKE/MACHINES/Makefile.cori2

+4 −3

Original line number	Diff line number	Diff line
		@@ -15,13 +15,14 @@ SHELL = /bin/sh

		CC = CC
		OPTFLAGS = -xMIC-AVX512 -O2 -fp-model fast=2 -no-prec-div -qoverride-limits
		CCFLAGS = -g -qopenmp -DLAMMPS_MEMALIGN=64 -qno-offload \
		-fno-alias -ansi-alias -restrict $(OPTFLAGS) -DLMP_INTEL_NO_TBB
		CCFLAGS = -qopenmp -qno-offload -fno-alias -ansi-alias -restrict \
		-DLMP_INTEL_USELRT -DLMP_USE_MKL_RNG -DLMP_INTEL_NO_TBB \
		$(OPTFLAGS)
		SHFLAGS = -fPIC
		DEPFLAGS = -M

		LINK = CC
		LINKFLAGS = -g -qopenmp $(OPTFLAGS)
		LINKFLAGS = -qopenmp $(OPTFLAGS)
		LIB =
		SIZE = size

Admin message