计算化学公社

标题: VASP6.5.0+AMD CPU/NVIDIA A100 GPU编译 [打印本页]

作者
Author: Tosykie 时间: 2025-2-23 02:08
标题: VASP6.5.0+AMD CPU/NVIDIA A100 GPU编译
本帖最后由 Tosykie 于 2025-2-23 02:08 编辑

继上次分享intel编译器套件编译vasp6.5.0，本次尝试来使用AMD CPU/NVIDIA GPU编译VASP，硬件使用的是某超算。

（AMD EPYC核心多，Yes！

（编译环境配置复杂，有点不Yes了

服务器软硬件概要：

CPU：双路AMD EPYC-Milan 7713 (64核/CPU，共128核/节点)
GPU：NVIDIA A100 40 GB
操作系统：RHEL 8.4
软件环境：HPE Cray

$ lscpu
Architecture:       x86_64
CPU op-mode(s):    32-bit, 64-bit
Byte Order:       Little Endian
CPU(s):             128
On-line CPU(s) list: 0-127
Thread(s) per core:  1
Core(s) per socket:  64
Socket(s):          2
NUMA node(s):       1
Vendor ID:          AuthenticAMD
CPU family:       25
Model:             1
Model name:       AMD EPYC-Milan Processor
Stepping:          1
CPU MHz:          1996.250
BogoMIPS:          3992.50
Hypervisor vendor: KVM
Virtualization type: full
L1d cache:          32K
L1i cache:          32K
L2 cache:          512K
L3 cache:          32768K
NUMA node0 CPU(s): 0-127
Flags:             ...

$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.4 (Ootpa)

Cray默认环境：

$ module list
Currently Loaded Modulefiles:
1) craype-x86-rome       5) cce/13.0.2             9) cray-libsci/21.08.1.2
2) libfabric/1.11.0.4.125 6) craype/2.7.15          10) cray-pals/1.1.6
3) craype-network-ofi    7) cray-dsmml/0.2.2       11) PrgEnv-cray/8.3.3
4) perftools-base/22.04.0 8) cray-mpich/8.1.15

重要提示：

在HPE Cray环境中，所有的fortran编译器均被包装为ftn；C编译器为cc（小写的cc），C++编译器为CC （大写的CC）
每一类硬件/软件相关的环境被模块化为各个module，模块名一般叫PrgEnv-xxx，xxx可以是cray，intel，gnu，aocc，nvhpc，默认是PrgEnv-cray。

1. AMD CPU + Intel OneAPI 版本

编译环境参考了超算管理员给的guides文件。

1.1 编译环境加载

编译器：Intel OneAPI 2024.0
数学库：Intel OneAPI 2024.0里的MKL
MPI：Cray MPICH 8.1.15
I/O增强：Intel编译器编译的HDF5 (parallel version=1.12.1.1)

$ module swap PrgEnv-cray PrgEnv-intel
$ module swap craype-x86-rome craype-x86-milan
$ module load mkl/2024.0
$ module load cray-hdf5-parallel
$ module rm cray-libsci
$ module list
Currently Loaded Modulefiles:
1) craype-x86-milan             5) intel/2024.0                9) cray-pals/1.1.6
2) libfabric/1.11.0.4.125       6) craype/2.7.15             10) PrgEnv-intel/8.3.3
3) craype-network-ofi          7) cray-dsmml/0.2.2          11) mkl/2024.0
4) perftools-base/22.04.0       8) cray-mpich/8.1.15          12) cray-hdf5-parallel/1.12.1.1

$ cp arch/makefile.include.oneapi_omp makefile.include

1.2 makefile.include修改

复制的makefile.include模板不是.aocc后缀的，而是.oneapi。没有测试过老编译器（如<=2023的OneAPI或者Parallel Studio XE）编译新VASP6.5.0的运行性能如何，读者可以自行测试。一般期望的是：新软件搭配新编译器。

我这里使用的是OneAPI+OpenMP组合arch/makefile.include.oneapi_omp，主要修改的内容如下：

第2行：-DHOST的值我改为AMDIFC（可选）。
在第8行：加入-Duse_bse_te \，打开BSE triplet excitation的支持（可选）。
第15-16行（行号取决于读者自己的文件）：Fortran编译器FC和链接器FCL的值中mpiifort -fc=ifx替换为ftn（必须，强制）。
以上两行，添加-diag-disable=10448这个选项来屏蔽Intel® Fortran Compiler Classic (ifort) 即将被弃用的警告（可选），参见Intel® Fortran Compiler Release Notes：

Support Removed
Intel® Fortran Compiler Classic (ifort) is now discontinued in oneAPI 2025 release.

第29行：CC_LIB的值改为cc，即HPE Cray环境中封装的C编译器。
第37行：CXX_PARS的值改为CC，即HPE Cray环境中封装的C++编译器。
第48行：注释掉VASP_TARGET_CPU ?= -xHOST这一行；或者将其改为VASP_TARGET_CPU ?= -march=core-avx2，如第49行所示（必须，强制）。这应该是AMD和Intel CPU之间的一些指令集差异，参见Problem of installation of vasp632 with intel oneapi compiler.。
第60-63行：取消注释，打开HDF5的支持（可选），注意所使用的HDF5必须是和拿来编译VASP的是同一个系列的，并向下兼容，否则会报错：例如，GCC编译的HDF5 + Intel OneAPI编译VASP会报错；但是老版Intel OneAPI编译的HDF5 + 新版OneAPI编译VASP则可以。并且，所安装的HDF5安装根目录要指向HDF5_ROOT这个环境变量或者手动将其改为正确的路径。

# Default precompiler options, ! revised from arch/makefile.include.oneapi_omp
CPP_OPTIONS = -DHOST=\"AMDIFC\" \
            -DMPI -DMPI_BLOCK=8000 -Duse_collective \
            -DscaLAPACK \
            -DCACHE_SIZE=4000 \
            -Davoidalloc \
            -Dvasp6 \
            -Duse_bse_te \
            -Dtbdyn \
            -Dfock_dblbuf \
            -D_OPENMP

CPP       = fpp -f_com=no -free -w0  $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)

FC       = ftn -qopenmp -diag-disable=10448
FCL       = ftn -diag-disable=10448

FREE       = -free -names lowercase

FFLAGS    = -assume byterecl -w

OFLAG    = -O2
OFLAG_IN = $(OFLAG)
DEBUG    = -O0

# For what used to be vasp.5.lib
CPP_LIB    = $(CPP)
FC_LIB    = $(FC)
CC_LIB    = cc #icx
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1
FREE_LIB = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS = CC #icpx
LLIBS    = -lstdc++

##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##

# When compiling on the target machine itself, change this to the
# relevant target when cross-compiling for another architecture
#VASP_TARGET_CPU ?= -xHOST
#VASP_TARGET_CPU ?= -march=core-avx2
#FFLAGS    += $(VASP_TARGET_CPU)

# Intel MKL (FFTW, BLAS, LAPACK, and scaLAPACK)
# (Note: for Intel Parallel Studio's MKL use -mkl instead of -qmkl)
FCL       += -qmkl
MKLROOT ?= /path/to/your/mkl/installation
LLIBS    += -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
INCS       =-I$(MKLROOT)/include/fftw

# HDF5-support (optional but strongly recommended, and mandatory for some features)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT  ?= /path/to/your/hdf5/installation
LLIBS    += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS    += -I$(HDF5_ROOT)/include

# For the VASP-2-Wannier90 interface (optional)
#CPP_OPTIONS += -DVASP2WANNIER90
#WANNIER90_ROOT ?= /path/to/your/wannier90/installation
#LLIBS       += -L$(WANNIER90_ROOT)/lib -lwannier

# For the fftlib library (hardly any benefit in combination with MKL's FFTs)
#FCL       = mpiifort fftlib.o -qmkl
#CXX_FFTLIB  = icpc -qopenmp -std=c++11 -DFFTLIB_USE_MKL -DFFTLIB_THREADSAFE
#INCS_FFTLIB = -I./include -I$(MKLROOT)/include/fftw
#LIBS    += fftlib

# For machine learning library vaspml (experimental)
#CPP_OPTIONS += -Dlibvaspml
#CPP_OPTIONS += -DVASPML_USE_CBLAS
#CPP_OPTIONS += -DVASPML_USE_MKL
#CPP_OPTIONS += -DVASPML_DEBUG_LEVEL=3
#CXX_ML    = mpiicpc -cxx=icpx -qopenmp
#CXXFLAGS_ML = -O3 -std=c++17 -Wall
#INCLUDE_ML  =

1.3 编译

在登录节点上编译，每个用户被限制了4个核心。

注意加上DEPS=1指定编译的文件依赖，否则并行编译会报错。

$ make DEPS=1 -j4 all
......
$ ls bin/
vasp_gam vasp_ncl vasp_std

2. AMD CPU + NVIDIA A100 GPU版本

编译环境参考了超算管理员给的guides文件。

注意：用户需要在有显卡硬件和驱动的节点上编译（即能找到nvidia-smi这个命令）。否则在编译到需要GPU硬件的代码时，会报错libcuda.so.1 not found。解决办法：先在CPU上编译，在报错之后再登录到GPU节点上继续编译，这样可以节省一些宝贵的机时。

2.1 编译环境加载

编译器套件：NVHPC 23.7
CUDA：11.8
数学库：Intel OneAPI 2024.0里的MKL
MPI：Cray MPICH 8.1.15
I/O增强：对应HVHPC编译的HDF5 (parallel version=1.12.1)

$ module swap PrgEnv-cray PrgEnv-nvhpc
$ module swap craype-x86-rome craype-x86-milan
$ module load craype-accel-nvidia80
$ module swap nvhpc nvhpc/23.7
$ module swap cuda cuda/11.8.0
$ module rm cray-libsci # cray-libsci may intefere with math libs
$ module load hdf5/1.12.1-nvhpc
$ module load mkl/2024.0
$ module list
Currently Loaded Modulefiles:
1) craype-x86-milan       6) craype/2.7.15          11) cuda/11.8.0
2) libfabric/1.11.0.4.125 7) cray-dsmml/0.2.2       12) craype-accel-nvidia80
3) craype-network-ofi    8) cray-mpich/8.1.15    13) hdf5/1.12.1-nvhpc
4) perftools-base/22.04.0 9) cray-pals/1.1.6       14) mkl/2024.0
5) nvhpc/23.7             10) PrgEnv-nvhpc/8.3.3

$ cp arch/makefile.include.nvhpc_ompi_mkl_omp_acc makefile.include

2.2 makefile.include修改

复制模板arch/makefile.include.nvhpc_ompi_mkl_omp_acc，主要修改的内容如下：

第2行：-DHOST的值我改为LinuxNVGPU（可选）。
在第8行：加入-Duse_bse_te \，打开BSE triplet excitation的支持（可选）。
第21-23行（行号取决于读者自己的文件）：C编译器CC从mpicc改为cc；Fortran编译器FC和链接器FCL的值中mpif90替换为ftn，然后根据自己的GPU架构和CUDA版本修改-gpu=（必须，强制）

-gpu=指定GPU的物理架构和CUDA版本：我所使用的GPU是A100为安培架构，代码cc80, CUDA版本为11.8。

Pascal: cc60 (e.g., Tesla P100, GTX 1080)

Volta: cc70 (e.g., Tesla V100)

Turing: cc75 (e.g., RTX 2080)

Ampere: cc80 (e.g., A100, RTX 3080)

所以我的-gpu=cc80,cuda11.8。

第50行：nvc++改为CC
关于MKL：
方案1：注释掉原有的 MKLLIBS和它下面的LLIBS_MKL；改为只使用一个
LLIBS_MKL = -Mmkl -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
方案2：取消#MKLLIBS = -Mmkl的注释；然后在它的下一行中的MKLLIBS =改成MKLLIBS += （加上一个加号）；接着修改在下面的LLIBS_MKL = -L$(MKLROOT)/lib -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 $(MKLLIBS)中的-lmkl_blacs_openmpi_lp64为-lmkl_blacs_intelmpi_lp64（openmpi --> intelmpi）
不做修改可能会出现未定义引用错误libmkl_blacs_openmpi_lp64.so: undefined reference
第105-108行：取消注释，打开HDF5的支持（可选）

# Default precompiler options, ! revised from arch/makefile.include.nvhpc_ompi_mkl_omp_acc
CPP_OPTIONS = -DHOST=\"LinuxNVGPU\" \
            -DMPI -DMPI_INPLACE -DMPI_BLOCK=8000 -Duse_collective \
            -DscaLAPACK \
            -DCACHE_SIZE=4000 \
            -Davoidalloc \
            -Dvasp6 \
            -Duse_bse_te \
            -Dtbdyn \
            -Dqd_emulate \
            -Dfock_dblbuf \
            -D_OPENMP \
            -DACC_OFFLOAD \
            -DNVCUDA \
            -DUSENCCL

CPP       = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX)  > $*$(SUFFIX)

# N.B.: you might need to change the cuda-version here
#    to one that comes with your NVIDIA-HPC SDK
CC       = cc  -acc -gpu=cc80,cuda11.8 -mp
FC       = ftn -acc -gpu=cc80,cuda11.8 -mp
FCL       = ftn -acc -gpu=cc80,cuda11.8 -mp -c++libs

FREE       = -Mfree

FFLAGS    = -Mbackslash -Mlarge_arrays

OFLAG    = -fast

DEBUG    = -Mfree -O0 -traceback

LLIBS    = -cudalib=cublas,cusolver,cufft,nccl -cuda

# Redefine the standard list of O1 and O2 objects
SOURCE_O1  := pade_fit.o minimax_dependence.o wave_window.o
SOURCE_O2  := pead.o

# For what used to be vasp.5.lib
CPP_LIB    = $(CPP)
FC_LIB    = $(FC)
CC_LIB    = $(CC)
CFLAGS_LIB  = -O -w
FFLAGS_LIB  = -O1 -Mfixed
FREE_LIB = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS = CC --no_warnings #nvc++ --no_warnings

##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##
# When compiling on the target machine itself , change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -tp host
FFLAGS    += $(VASP_TARGET_CPU)

# Specify your NV HPC-SDK installation (mandatory)
#... first try to set it automatically
NVROOT    =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')

# If the above fails, then NVROOT needs to be set manually
#NVHPC    ?= /opt/nvidia/hpc_sdk
#NVVERSION = 21.11
#NVROOT    = $(NVHPC)/Linux_x86_64/$(NVVERSION)

## Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2
#OFLAG_IN = -fast -Mwarperf
#SOURCE_IN  := nonlr.o

# Software emulation of quadruple precsion (mandatory)
QD       ?= $(NVROOT)/compilers/extras/qd
LLIBS    += -L$(QD)/lib -lqdmod -lqd
INCS    += -I$(QD)/include/qd

# Intel MKL for FFTW, BLAS, LAPACK, and scaLAPACK
MKLROOT ?= /path/to/your/mkl/installation
#MKLLIBS    = -Mmkl
#MKLLIBS += -lmkl_intel_lp64 -lmkl_pgi_thread -lmkl_core -pgf90libs -mp -lpthread -lm -ldl
LLIBS_MKL = -Mmkl -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
INCS    += -I$(MKLROOT)/include/fftw

# If you want to use scaLAPACK from MKL
#LLIBS_MKL = -L$(MKLROOT)/lib -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 $(MKLLIBS)

# Use a separate scaLAPACK installation (optional but recommended in combination with OpenMPI)
# Comment out the two lines below if you want to use scaLAPACK from MKL instead
#SCALAPACK_ROOT ?= /path/to/your/scalapack/installation
#LLIBS_MKL = -L$(SCALAPACK_ROOT)/lib -lscalapack $(MKLLIBS)

LLIBS    += $(LLIBS_MKL)

INCS    += -I$(MKLROOT)/include/fftw

# Use cusolvermp (optional)
# supported as of NVHPC-SDK 24.1 (and needs CUDA-11.8)
#CPP_OPTIONS+= -DCUSOLVERMP -DCUBLASMP
#LLIBS    += -cudalib=cusolvermp,cublasmp -lnvhpcwrapcal

# HDF5-support (optional but strongly recommended, and mandatory for some features)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT  ?= /path/to/your/hdf5/installation
LLIBS    += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS    += -I$(HDF5_ROOT)/include

# For the VASP-2-Wannier90 interface (optional)
#CPP_OPTIONS += -DVASP2WANNIER90
#WANNIER90_ROOT ?= /path/to/your/wannier90/installation
#LLIBS       += -L$(WANNIER90_ROOT)/lib -lwannier

# For the fftlib library (hardly any benefit for the OpenACC GPU port, especially in combination with MKL's FFTs)
#CPP_OPTIONS+= -Dsysv
#FCL       += fftlib.o
#CXX_FFTLIB  = nvc++ -mp --no_warnings -std=c++11 -DFFTLIB_USE_MKL -DFFTLIB_THREADSAFE
#INCS_FFTLIB = -I./include -I$(MKLROOT)/include/fftw
#LIBS    += fftlib
#LLIBS    += -ldl

# For machine learning library vaspml (experimental)
#CPP_OPTIONS += -Dlibvaspml
#CPP_OPTIONS += -DVASPML_USE_CBLAS
#CPP_OPTIONS += -DVASPML_DEBUG_LEVEL=3
#CXX_ML    = mpic++ -mp
#CXXFLAGS_ML = -O3 -std=c++17 -Wall -Wextra
#INCLUDE_ML  =

2.3 编译

在登录节点上编译，然后遇到libcuda.so.1 not found后请求到GPU节点，重新加载编译环境后继续编译。

$ make DEPS=1 -j4 all
... 报错
$ qsub -I ...#请求一个交互式任务
$ nvidia-smi
$ #需要重新加载编译环境
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05          Driver Version: 535.154.05 CUDA Version: 12.2    |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                Persistence-M | Bus-Id       Disp.A | Volatile Uncorr. ECC |
| Fan  Temp Perf       Pwr:Usage/Cap |       Memory-Usage | GPU-Util  Compute M. |
|                                        |                   |             MIG M. |
|=========================================+======================+======================|
| 0  NVIDIA A100-SXM4-40GB       On  | 00000000:41:00.0 Off |                   0 |
| N/A 41C P0             55W / 400W |    0MiB / 40960MiB |    0%    Default |
|                                        |                   |          Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                         |
|  GPU GI CI       PID Type Process name                         GPU Memory |
|       ID ID                                                          Usage    |
|=======================================================================================|
|  No running processes found                                                          |
+---------------------------------------------------------------------------------------+
$ make DEPS=1 -j16 all
...可能有报错(见下文)

2.4 报错解决

OpenACC （如果makefile.include中CPP_OPTIONS使用了指令-D_OPENACC）或者NVCUDA（如果makefile.include中CPP_OPTIONS使用了指令-DNVCUDA）导致MPIX_Query_cuda_support相关报错。报错时会提示对应的文件和行号，该错误一般是一下三个文件（对于gam/std/ncl都一样）：./build/{gam,std,ncl}/{openacc,nvcuda}.f90

解决办法：

vim +行号打开./build/{gam,std,ncl}/{openacc,nvcuda}.F，注释掉下面这4行（行首加入感叹号）

! INTERFACE
! INTEGER(c_int) FUNCTION MPIX_Query_cuda_support() BIND(C, name="MPIX_Query_cuda_support")
! END FUNCTION
! END INTERFACE

并修改其下方CUDA_AWARE_SUPPORT = MPIX_Query_cuda_support() == 1为CUDA_AWARE_SUPPORT = .TRUE.

接着继续编译。

参考：Error: undefined reference to `MPIX_Query_cuda_support'

2.5 测试

单层石墨烯的SCF

INCAR:

SYSTEM = graphene
ISTART = 0; ICHARG = 2
ENCUT = 520
ISIF = 3
ISMEAR = -5 ; SIGMA = 0.05
ALGO = Fast
# NPAR = 3
#########
EDIFF = 1E-7
PREC = Accurate
EDIFFG = -0.01
#########
#ISPIN = 2
#MAGMOM =
LCHARG = .TRUE.
LWAVE = .TRUE.
LORBIT = 11
LREAL = .FALSE.
#########
SYMPREC = 1E-4
ISYM = 1
NELM = 200
#########
NSW = 0
POTIM = 0.5
IBRION = -1
#########VDW=DFT-D2
#LVDW = .TRUE.
#IVDW = 1

KPOINTS:

K-POINTS
  0
Gamma-Centered
  25 25 1
  0 0 0

POSCAR: 注意，晶格矩阵中有一点浮点数的误差，仅供测试。

graphene
1.00000000000000
   2.4677557588200547 0.0000000001951262 -0.0000000000000000
   -1.2338785942720587 2.1371404153443971 -0.0000000000000000
   0.0000000000000000 0.0000000000000000 14.9975103391044442
C
   2
Direct
0.3333328829999971  0.6666671669999999  0.2000000060000033
0.6666671540000024  0.3333328579999986  0.2000000060000033 $ export OMP_NUM_THREADS=16 # numer of CPU cores
$ mpirun -np 1 --cpu-bind depth -d $OMP_NUM_THREADS vasp_std | tee vasp_run.out
  running 1 mpi-ranks, with 16 threads/rank, on 1 nodes
  distrk:  each k-point on 1 cores, 1 groups
  distr:  one band on 1 cores, 1 groups
  Offloading initialized ... 1 GPUs detected
  vasp.6.5.0 16Dec24 (build ?? 2025 ??) complex
  POSCAR found type information on POSCAR C
  POSCAR found :  1 types and    2 ions
  Reading from existing POTCAR
  scaLAPACK will be used selectively (only on CPU)
  Reading from existing POTCAR
  LDA part: xc-table for (Slater+PW92), standard interpolation
  POSCAR, INCAR and KPOINTS ok, starting setup
  FFT: planning ... GRIDC
  FFT: planning ... GRID_SOFT
  FFT: planning ... GRID
  WAVECAR not read
  entering main loop

$ head OUTCAR
  vasp.6.5.0 16Dec24 (build ??) complex
  executed on          LinuxNVGPU date 2025 ??
  running 1 mpi-ranks, with 16 threads/rank, on 1 nodes
  distrk:  each k-point on 1 cores, 1 groups
  distr:  one band on NCORE= 1 cores, 1 groups
  Offloading initialized ... 1 GPUs detected

$ $ tail -14 OUTCAR
  General timing and accounting informations for this job:
  ========================================================

               Total CPU time used (sec):    26.150
                           User time (sec):    25.026
                        System time (sec):       1.125
                        Elapsed time (sec):    25.703

                  Maximum memory used (kb):    1377240.
                  Average memory used (kb):       N/A

                        Minor page faults:    263714
                        Major page faults:          0
               Voluntary context switches:       18839

能成功检测到GPU并在GPU上运行SCF计算。

3. 结束语

注意：上述将Fortran/C/C++编译器的名字统一改为ftn，cc，CC仅适用于HPE Cray超算环境。

module环境的添加可以仿照之前的帖子：VASP6.5.0+Intel CPU编译并添加module环境，注意设置加载好所需环境依赖即可。

本文到此，后面有空更新在自有集群上VASP6.5.0+Intel CPU+NVIDIA A40 GPU的编译经验。

转载请注明出处。

欢迎交流。

PS：请不要私信或者回帖索要VASP源码，我不会回复此类要求。请记住VASP是商业软件，谢谢 :)

作者
Author: abin 时间: 2025-3-25 14:18

OFLAG = -fast

复制代码

又不是跑分刷分,
生产环境, 没必要用这种优化参数吧?

-fast默认包含, -fp-model fast=2
fast=2 may produce faster and less accurate results, 中文大致意思是优化浮点计算, 允许非严格 IEEE 标准.

鄙人观点, -fast不应该出现在科学计算编译中.

作者
Author: Tosykie 时间: 2025-3-30 03:19

abin 发表于 2025-3-25 14:18
又不是跑分刷分,
生产环境, 没必要用这种优化参数吧?

您说得对。不过，OFLAG = -fast 是VASP源码中给NVHPC显卡编译时的默认OFLAG。如果您觉得不合适，可以自行使用-O1或者O2，谢谢 ;)

欢迎光临计算化学公社 (http://bbs.keinsci.com/)