|
本帖最后由 y597690 于 2025-5-31 16:38 编辑
服务器配置为AMD EPYC 7b12+ 2块NVIDIA Tesla V100 16G。之前CPU成功优化了8x8x2的fcc铜衬底,所以我们想测试GPU的运算性能。
但是我们在尝试使用GPU优化这个衬底时炸显存了。请问如何在不影响精度的情况下降低显存使用?
VASP版本是6.5.1, 使用omp_acc编译
运行脚本和stdout报错如下:
$ mpirun -np 2 --bind-to core \
> -x OMP_NUM_THREADS=32 \
> -x OMP_PLACES=cores \
> -x OMP_PROC_BIND=close \
> --report-bindings \
> ~/vasp651gpu/vasp_gam
MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././././././././././././././././././././././././././././././././././././././././././.]
running 2 mpi-ranks, with 32 threads/rank, on 1 nodes
distrk: each k-point on 2 cores, 1 groups
distr: one band on 1 cores, 2 groups
Offloading initialized ... 2 GPUs detected
vasp.6.5.1 10Mar25 (build May 27 2025 20:02:56) gamma-only
POSCAR found type information on POSCAR Cu
POSCAR found : 1 types and 512 ions
scaLAPACK will be used selectively (only on CPU)
LDA part: xc-table for (Slater+PW92), standard interpolation
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ... GRIDC
FFT: planning ... GRID_SOFT
FFT: planning ... GRID
WAVECAR not read
entering main loop
N E dE d eps ncg rms rms(c)
Out of memory allocating 311500800 bytes of device memory
Failing in Thread:1
total/free CUDA memory: 16928342016/109051904
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 7.0, threadid=1
Hint: specify 0x800 bit in NV_ACC_DEBUG for verbose info.
INCAR:
# === Global Parameters ===
ISTART = 0 # Start from scratch (no WAVECAR)
ICHARG = 2 # Charge density from atomic superposition
ISPIN = 1 # Non-spin-polarized
LREAL = Auto # Use real-space projection for speed on large systems
PREC = Accurate # Full precision for reliable forces
LWAVE = .TRUE. # Write WAVECAR (used in later single-point/molecule run)
LCHARG = .TRUE. # Write CHGCAR (for charge inspection or reuse)
ADDGRID = .TRUE. # Improve GGA integration accuracy
LASPH = .TRUE. # Needed for non-spherical corrections with PAW
#NSIM = 1
LREAL = AUTO
# === Electronic Relaxation ===
ISMEAR = 1 # Gaussian smearing (good for metals)
SIGMA = 0.2 # Smearing width in eV
NELM = 150 # Max SCF steps (increased to avoid premature stop)
NELMIN = 6 # Min SCF steps
EDIFF = 1E-6 # Electronic convergence (loosened for relaxation)
#ALGO = Fast # Good balance of speed and robustness
#AMIX = 0.2 # Mixing amplitude (better for metallic systems)
#BMIX = 0.0001 # Mixing damping (prevents charge oscillation)
# === Ionic Relaxation ===
NSW = 100 # Max ionic steps
IBRION = 2 # Conjugate gradient (stable geometry optimization)
ISIF = 2 # Relax ions only
EDIFFG = -0.02 # Stop if all forces < 0.02 eV/Å
ISYM = 0 # Turn off symmetry (important for steps, surfaces)
# === (Optional tweaks) ===
ENCUT = 500 # Only if POTCAR recommends a higher cutoff
# NGXF/YF/ZF # Set only if you want to enforce FFT grid manually
#NCORE = 64
OUTCAR:
total amount of memory used by VASP MPI-rank0 16517326. kBytes
=======================================================================
base : 30000. kBytes
nonlr-proj : 532224. kBytes
fftplans : 5690184. kBytes
grid : 2253312. kBytes
one-center : 3981. kBytes
wavefun : 8007625. kBytes
INWAV: cpu time 0.0000: real time 0.0000
Broyden mixing: mesh for mixing (old mesh)
NGX =115 NGY =115 NGZ = 83
(NGX =480 NGY =480 NGZ =336)
gives a total of ****** points
initial charge density was supplied:
charge density of overlapping atoms calculated
number of electron 5632.0000000 magnetization
keeping initial charge density in first step
--------------------------------------------------------------------------------------------------------
Maximum index for non-local projection operator 6566
Maximum index for augmentation-charges 47379 (set IRDMAX)
--------------------------------------------------------------------------------------------------------
First call to EWALD: gamma= 0.062
Maximum number of real-space cells 3x 3x 3
Maximum number of reciprocal cells 3x 3x 2
FEWALD: cpu time 6.3257: real time 6.3293
--------------------------------------- Ionic step 1 -------------------------------------------
--------------------------------------- Iteration 1( 1) ---------------------------------------
在这之后因为显存不够报错了
|
|