本帖最后由 Tosykie 于 2025-2-23 02:08 编辑
(AMD EPYC核心多,Yes! (编译环境配置复杂,有点不Yes了 服务器软硬件概要: $ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 1
Model name: AMD EPYC-Milan Processor
Stepping: 1
CPU MHz: 1996.250
BogoMIPS: 3992.50
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 32768K
NUMA node0 CPU(s): 0-127
Flags: ...
$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.4 (Ootpa)Cray默认环境: $ module list
Currently Loaded Modulefiles:
1) craype-x86-rome 5) cce/13.0.2 9) cray-libsci/21.08.1.2
2) libfabric/1.11.0.4.125 6) craype/2.7.15 10) cray-pals/1.1.6
3) craype-network-ofi 7) cray-dsmml/0.2.2 11) PrgEnv-cray/8.3.3
4) perftools-base/22.04.0 8) cray-mpich/8.1.15重要提示: 1. AMD CPU + Intel OneAPI 版本编译环境参考了超算管理员给的guides文件。 1.1 编译环境加载 $ module swap PrgEnv-cray PrgEnv-intel
$ module swap craype-x86-rome craype-x86-milan
$ module load mkl/2024.0
$ module load cray-hdf5-parallel
$ module rm cray-libsci
$ module list
Currently Loaded Modulefiles:
1) craype-x86-milan 5) intel/2024.0 9) cray-pals/1.1.6
2) libfabric/1.11.0.4.125 6) craype/2.7.15 10) PrgEnv-intel/8.3.3
3) craype-network-ofi 7) cray-dsmml/0.2.2 11) mkl/2024.0
4) perftools-base/22.04.0 8) cray-mpich/8.1.15 12) cray-hdf5-parallel/1.12.1.1
$ cp arch/makefile.include.oneapi_omp makefile.include
1.2 makefile.include修改复制的makefile.include模板不是.aocc后缀的,而是.oneapi。没有测试过老编译器(如<=2023的OneAPI或者Parallel Studio XE)编译新VASP6.5.0的运行性能如何,读者可以自行测试。一般期望的是:新软件搭配新编译器。 我这里使用的是OneAPI+OpenMP组合arch/makefile.include.oneapi_omp,主要修改的内容如下: 第2行:-DHOST的值我改为AMDIFC(可选)。 在第8行:加入-Duse_bse_te \,打开BSE triplet excitation的支持(可选)。 第15-16行(行号取决于读者自己的文件):Fortran编译器FC和链接器FCL的值中mpiifort -fc=ifx替换为ftn(必须,强制)。
Support Removed 第29行:CC_LIB的值改为cc,即HPE Cray环境中封装的C编译器。 第37行:CXX_PARS的值改为CC,即HPE Cray环境中封装的C++编译器。 第60-63行:取消注释,打开HDF5的支持(可选),注意所使用的HDF5必须是和拿来编译VASP的是同一个系列的,并向下兼容,否则会报错:例如,GCC编译的HDF5 + Intel OneAPI编译VASP会报错;但是老版Intel OneAPI编译的HDF5 + 新版OneAPI编译VASP则可以。并且,所安装的HDF5安装根目录要指向HDF5_ROOT这个环境变量或者手动将其改为正确的路径。
# Default precompiler options, ! revised from arch/makefile.include.oneapi_omp
CPP_OPTIONS = -DHOST=\"AMDIFC\" \
-DMPI -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-Dvasp6 \
-Duse_bse_te \
-Dtbdyn \
-Dfock_dblbuf \
-D_OPENMP
CPP = fpp -f_com=no -free -w0 $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)
FC = ftn -qopenmp -diag-disable=10448
FCL = ftn -diag-disable=10448
FREE = -free -names lowercase
FFLAGS = -assume byterecl -w
OFLAG = -O2
OFLAG_IN = $(OFLAG)
DEBUG = -O0
# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = cc #icx
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB = $(FREE)
OBJECTS_LIB = linpack_double.o
# For the parser library
CXX_PARS = CC #icpx
LLIBS = -lstdc++
##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##
# When compiling on the target machine itself, change this to the
# relevant target when cross-compiling for another architecture
#VASP_TARGET_CPU ?= -xHOST
#VASP_TARGET_CPU ?= -march=core-avx2
#FFLAGS += $(VASP_TARGET_CPU)
# Intel MKL (FFTW, BLAS, LAPACK, and scaLAPACK)
# (Note: for Intel Parallel Studio's MKL use -mkl instead of -qmkl)
FCL += -qmkl
MKLROOT ?= /path/to/your/mkl/installation
LLIBS += -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
INCS =-I$(MKLROOT)/include/fftw
# HDF5-support (optional but strongly recommended, and mandatory for some features)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT ?= /path/to/your/hdf5/installation
LLIBS += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS += -I$(HDF5_ROOT)/include
# For the VASP-2-Wannier90 interface (optional)
#CPP_OPTIONS += -DVASP2WANNIER90
#WANNIER90_ROOT ?= /path/to/your/wannier90/installation
#LLIBS += -L$(WANNIER90_ROOT)/lib -lwannier
# For the fftlib library (hardly any benefit in combination with MKL's FFTs)
#FCL = mpiifort fftlib.o -qmkl
#CXX_FFTLIB = icpc -qopenmp -std=c++11 -DFFTLIB_USE_MKL -DFFTLIB_THREADSAFE
#INCS_FFTLIB = -I./include -I$(MKLROOT)/include/fftw
#LIBS += fftlib
# For machine learning library vaspml (experimental)
#CPP_OPTIONS += -Dlibvaspml
#CPP_OPTIONS += -DVASPML_USE_CBLAS
#CPP_OPTIONS += -DVASPML_USE_MKL
#CPP_OPTIONS += -DVASPML_DEBUG_LEVEL=3
#CXX_ML = mpiicpc -cxx=icpx -qopenmp
#CXXFLAGS_ML = -O3 -std=c++17 -Wall
#INCLUDE_ML =
1.3 编译在登录节点上编译,每个用户被限制了4个核心。 注意加上DEPS=1指定编译的文件依赖,否则并行编译会报错。 $ make DEPS=1 -j4 all
......
$ ls bin/
vasp_gam vasp_ncl vasp_std
2. AMD CPU + NVIDIA A100 GPU版本编译环境参考了超算管理员给的guides文件。 注意:用户需要在有显卡硬件和驱动的节点上编译(即能找到nvidia-smi这个命令)。否则在编译到需要GPU硬件的代码时,会报错libcuda.so.1 not found。解决办法:先在CPU上编译,在报错之后再登录到GPU节点上继续编译,这样可以节省一些宝贵的机时。 2.1 编译环境加载 $ module swap PrgEnv-cray PrgEnv-nvhpc
$ module swap craype-x86-rome craype-x86-milan
$ module load craype-accel-nvidia80
$ module swap nvhpc nvhpc/23.7
$ module swap cuda cuda/11.8.0
$ module rm cray-libsci # cray-libsci may intefere with math libs
$ module load hdf5/1.12.1-nvhpc
$ module load mkl/2024.0
$ module list
Currently Loaded Modulefiles:
1) craype-x86-milan 6) craype/2.7.15 11) cuda/11.8.0
2) libfabric/1.11.0.4.125 7) cray-dsmml/0.2.2 12) craype-accel-nvidia80
3) craype-network-ofi 8) cray-mpich/8.1.15 13) hdf5/1.12.1-nvhpc
4) perftools-base/22.04.0 9) cray-pals/1.1.6 14) mkl/2024.0
5) nvhpc/23.7 10) PrgEnv-nvhpc/8.3.3
$ cp arch/makefile.include.nvhpc_ompi_mkl_omp_acc makefile.include
2.2 makefile.include修改复制模板arch/makefile.include.nvhpc_ompi_mkl_omp_acc,主要修改的内容如下: 第2行:-DHOST的值我改为LinuxNVGPU(可选)。 在第8行:加入-Duse_bse_te \,打开BSE triplet excitation的支持(可选)。 第21-23行(行号取决于读者自己的文件):C编译器CC从mpicc改为cc;Fortran编译器FC和链接器FCL的值中mpif90替换为ftn,然后根据自己的GPU架构和CUDA版本修改-gpu=(必须,强制)
-gpu=指定GPU的物理架构和CUDA版本:我所使用的GPU是A100为安培架构,代码cc80, CUDA版本为11.8。 Pascal: cc60 (e.g., Tesla P100, GTX 1080) Volta: cc70 (e.g., Tesla V100) Turing: cc75 (e.g., RTX 2080) Ampere: cc80 (e.g., A100, RTX 3080) 所以我的-gpu=cc80,cuda11.8。 第50行:nvc++改为CC 关于MKL: 方案1:注释掉原有的 MKLLIBS和它下面的LLIBS_MKL;改为只使用一个 LLIBS_MKL = -Mmkl -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 方案2:取消#MKLLIBS = -Mmkl的注释;然后在它的下一行中的MKLLIBS =改成MKLLIBS += (加上一个加号);接着修改在下面的LLIBS_MKL = -L$(MKLROOT)/lib -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 $(MKLLIBS)中的-lmkl_blacs_openmpi_lp64为-lmkl_blacs_intelmpi_lp64(openmpi --> intelmpi) 不做修改可能会出现未定义引用错误libmkl_blacs_openmpi_lp64.so: undefined reference 第105-108行:取消注释,打开HDF5的支持(可选)
# Default precompiler options, ! revised from arch/makefile.include.nvhpc_ompi_mkl_omp_acc
CPP_OPTIONS = -DHOST=\"LinuxNVGPU\" \
-DMPI -DMPI_INPLACE -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-Dvasp6 \
-Duse_bse_te \
-Dtbdyn \
-Dqd_emulate \
-Dfock_dblbuf \
-D_OPENMP \
-DACC_OFFLOAD \
-DNVCUDA \
-DUSENCCL
CPP = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX) > $*$(SUFFIX)
# N.B.: you might need to change the cuda-version here
# to one that comes with your NVIDIA-HPC SDK
CC = cc -acc -gpu=cc80,cuda11.8 -mp
FC = ftn -acc -gpu=cc80,cuda11.8 -mp
FCL = ftn -acc -gpu=cc80,cuda11.8 -mp -c++libs
FREE = -Mfree
FFLAGS = -Mbackslash -Mlarge_arrays
OFLAG = -fast
DEBUG = -Mfree -O0 -traceback
LLIBS = -cudalib=cublas,cusolver,cufft,nccl -cuda
# Redefine the standard list of O1 and O2 objects
SOURCE_O1 := pade_fit.o minimax_dependence.o wave_window.o
SOURCE_O2 := pead.o
# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = $(CC)
CFLAGS_LIB = -O -w
FFLAGS_LIB = -O1 -Mfixed
FREE_LIB = $(FREE)
OBJECTS_LIB = linpack_double.o
# For the parser library
CXX_PARS = CC --no_warnings #nvc++ --no_warnings
##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##
# When compiling on the target machine itself , change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -tp host
FFLAGS += $(VASP_TARGET_CPU)
# Specify your NV HPC-SDK installation (mandatory)
#... first try to set it automatically
NVROOT =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')
# If the above fails, then NVROOT needs to be set manually
#NVHPC ?= /opt/nvidia/hpc_sdk
#NVVERSION = 21.11
#NVROOT = $(NVHPC)/Linux_x86_64/$(NVVERSION)
## Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2
#OFLAG_IN = -fast -Mwarperf
#SOURCE_IN := nonlr.o
# Software emulation of quadruple precsion (mandatory)
QD ?= $(NVROOT)/compilers/extras/qd
LLIBS += -L$(QD)/lib -lqdmod -lqd
INCS += -I$(QD)/include/qd
# Intel MKL for FFTW, BLAS, LAPACK, and scaLAPACK
MKLROOT ?= /path/to/your/mkl/installation
#MKLLIBS = -Mmkl
#MKLLIBS += -lmkl_intel_lp64 -lmkl_pgi_thread -lmkl_core -pgf90libs -mp -lpthread -lm -ldl
LLIBS_MKL = -Mmkl -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
INCS += -I$(MKLROOT)/include/fftw
# If you want to use scaLAPACK from MKL
#LLIBS_MKL = -L$(MKLROOT)/lib -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 $(MKLLIBS)
# Use a separate scaLAPACK installation (optional but recommended in combination with OpenMPI)
# Comment out the two lines below if you want to use scaLAPACK from MKL instead
#SCALAPACK_ROOT ?= /path/to/your/scalapack/installation
#LLIBS_MKL = -L$(SCALAPACK_ROOT)/lib -lscalapack $(MKLLIBS)
LLIBS += $(LLIBS_MKL)
INCS += -I$(MKLROOT)/include/fftw
# Use cusolvermp (optional)
# supported as of NVHPC-SDK 24.1 (and needs CUDA-11.8)
#CPP_OPTIONS+= -DCUSOLVERMP -DCUBLASMP
#LLIBS += -cudalib=cusolvermp,cublasmp -lnvhpcwrapcal
# HDF5-support (optional but strongly recommended, and mandatory for some features)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT ?= /path/to/your/hdf5/installation
LLIBS += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS += -I$(HDF5_ROOT)/include
# For the VASP-2-Wannier90 interface (optional)
#CPP_OPTIONS += -DVASP2WANNIER90
#WANNIER90_ROOT ?= /path/to/your/wannier90/installation
#LLIBS += -L$(WANNIER90_ROOT)/lib -lwannier
# For the fftlib library (hardly any benefit for the OpenACC GPU port, especially in combination with MKL's FFTs)
#CPP_OPTIONS+= -Dsysv
#FCL += fftlib.o
#CXX_FFTLIB = nvc++ -mp --no_warnings -std=c++11 -DFFTLIB_USE_MKL -DFFTLIB_THREADSAFE
#INCS_FFTLIB = -I./include -I$(MKLROOT)/include/fftw
#LIBS += fftlib
#LLIBS += -ldl
# For machine learning library vaspml (experimental)
#CPP_OPTIONS += -Dlibvaspml
#CPP_OPTIONS += -DVASPML_USE_CBLAS
#CPP_OPTIONS += -DVASPML_DEBUG_LEVEL=3
#CXX_ML = mpic++ -mp
#CXXFLAGS_ML = -O3 -std=c++17 -Wall -Wextra
#INCLUDE_ML =
2.3 编译在登录节点上编译,然后遇到libcuda.so.1 not found后请求到GPU节点,重新加载编译环境后继续编译。 $ make DEPS=1 -j4 all
... 报错
$ qsub -I ...#请求一个交互式任务
$ nvidia-smi
$ #需要重新加载编译环境
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:41:00.0 Off | 0 |
| N/A 41C P0 55W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
$ make DEPS=1 -j16 all
...可能有报错(见下文)
2.4 报错解决OpenACC (如果makefile.include中CPP_OPTIONS使用了指令-D_OPENACC)或者NVCUDA(如果makefile.include中CPP_OPTIONS使用了指令-DNVCUDA) 导致MPIX_Query_cuda_support相关报错。报错时会提示对应的文件和行号,该错误一般是一下三个文件(对于gam/std/ncl都一样):./build/{gam,std,ncl}/{openacc,nvcuda}.f90 解决办法: vim +行号 打开./build/{gam,std,ncl}/{openacc,nvcuda}.F,注释掉下面这4行(行首加入感叹号) ! INTERFACE
! INTEGER(c_int) FUNCTION MPIX_Query_cuda_support() BIND(C, name="MPIX_Query_cuda_support")
! END FUNCTION
! END INTERFACE并修改其下方CUDA_AWARE_SUPPORT = MPIX_Query_cuda_support() == 1为CUDA_AWARE_SUPPORT = .TRUE. 接着继续编译。
2.5 测试单层石墨烯的SCF INCAR: SYSTEM = graphene
ISTART = 0; ICHARG = 2
ENCUT = 520
ISIF = 3
ISMEAR = -5 ; SIGMA = 0.05
ALGO = Fast
# NPAR = 3
#########
EDIFF = 1E-7
PREC = Accurate
EDIFFG = -0.01
#########
#ISPIN = 2
#MAGMOM =
LCHARG = .TRUE.
LWAVE = .TRUE.
LORBIT = 11
LREAL = .FALSE.
#########
SYMPREC = 1E-4
ISYM = 1
NELM = 200
#########
NSW = 0
POTIM = 0.5
IBRION = -1
#########VDW=DFT-D2
#LVDW = .TRUE.
#IVDW = 1KPOINTS: K-POINTS
0
Gamma-Centered
25 25 1
0 0 0POSCAR: 注意,晶格矩阵中有一点浮点数的误差,仅供测试。 graphene
1.00000000000000
2.4677557588200547 0.0000000001951262 -0.0000000000000000
-1.2338785942720587 2.1371404153443971 -0.0000000000000000
0.0000000000000000 0.0000000000000000 14.9975103391044442
C
2
Direct
0.3333328829999971 0.6666671669999999 0.2000000060000033
0.6666671540000024 0.3333328579999986 0.2000000060000033 $ export OMP_NUM_THREADS=16 # numer of CPU cores
$ mpirun -np 1 --cpu-bind depth -d $OMP_NUM_THREADS vasp_std | tee vasp_run.out
running 1 mpi-ranks, with 16 threads/rank, on 1 nodes
distrk: each k-point on 1 cores, 1 groups
distr: one band on 1 cores, 1 groups
Offloading initialized ... 1 GPUs detected
vasp.6.5.0 16Dec24 (build ?? 2025 ??) complex
POSCAR found type information on POSCAR C
POSCAR found : 1 types and 2 ions
Reading from existing POTCAR
scaLAPACK will be used selectively (only on CPU)
Reading from existing POTCAR
LDA part: xc-table for (Slater+PW92), standard interpolation
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ... GRIDC
FFT: planning ... GRID_SOFT
FFT: planning ... GRID
WAVECAR not read
entering main loop
$ head OUTCAR
vasp.6.5.0 16Dec24 (build ??) complex
executed on LinuxNVGPU date 2025 ??
running 1 mpi-ranks, with 16 threads/rank, on 1 nodes
distrk: each k-point on 1 cores, 1 groups
distr: one band on NCORE= 1 cores, 1 groups
Offloading initialized ... 1 GPUs detected
$ $ tail -14 OUTCAR
General timing and accounting informations for this job:
========================================================
Total CPU time used (sec): 26.150
User time (sec): 25.026
System time (sec): 1.125
Elapsed time (sec): 25.703
Maximum memory used (kb): 1377240.
Average memory used (kb): N/A
Minor page faults: 263714
Major page faults: 0
Voluntary context switches: 18839能成功检测到GPU并在GPU上运行SCF计算。
3. 结束语注意:上述将Fortran/C/C++编译器的名字统一改为ftn,cc,CC仅适用于HPE Cray超算环境。 本文到此,后面有空更新在自有集群上VASP6.5.0+Intel CPU+NVIDIA A40 GPU的编译经验。 转载请注明出处。 欢迎交流。 PS:请不要私信或者回帖索要VASP源码,我不会回复此类要求。请记住VASP是商业软件,谢谢 :)
|