|
|
最近利用超算中心进行gromacs protein-ligand的动力学模拟,显卡是[backcolor=rgba(0, 0, 0, 0.04)]NVIDIA® Tesla® V100,核数96,卡数8,736GB/节点。大概计算14万个原子的体系,预计用时24h,显卡利用率0%,只有内存2.8%利用率。请问各位前辈出现了啥情况:
#!/bin/bash
#SBATCH --job-name = md_out
#SBATCH --error = %j.err
module add GROMACS/2023-dev #加载软件
export GMX_GPU_DD_COMMS=true
export GMX_GPU_PME_PP_COMMS=true
export GMX_FORCE_UPDATE_DEFAULT_GPU=true
gmx_mpi mdrun -v -pin on -nb gpu -bonded gpu -pme gpu -update gpu -deffnm md
sbatch -N 1 -p g-v100-1 -n 10 -c 1 md.sh
Running on 1 node with total 48 cores, 96 logical cores, 8 compatible GPUs
Hardware detected on host g-v100-8-worker0001 (the node of MPI rank 0):
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
Family: 6 Model: 85 Stepping: 4
Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl avx512secondFMA clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 x2apic
Number of AVX-512 FMA units: 2
...
...
When checking whether update groups are usable:
Domain decomposition is not active, so there is no need for update groups
At least one moleculetype does not conform to the requirements for using update groups
1 GPU selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the GPU
PME tasks will do all aspects on the GPU
Using 1 MPI process
Using 48 OpenMP threads
Pinning threads with an auto-selected logical core stride of 2
System total charge: 0.000
Will do PME sum in reciprocal space for electrostatic interactions.
There are: 145256 Atoms
title = Protein-ligand complex MD simulation
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 50000000 ; 2 * 25000000 = 50000 ps (50 ns)
dt = 0.002 ; 2 fs
; Output control
nstenergy = 5000 ; save energies every 10.0 ps
nstlog = 5000 ; update log file every 10.0 ps
nstxout-compressed = 5000 ; save coordinates every 10.0 ps
; Bond parameters
continuation = yes ; continuing from NPT
constraint_algorithm = lincs ; holonomic constraints
constraints = h-bonds ; bonds to H are constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighbor searching and vdW
cutoff-scheme = Verlet
ns_type = grid ; search neighboring grid cells
nstlist = 20 ; largely irrelevant with Verlet
rlist = 1.2
vdwtype = cutoff
vdw-modifier = force-switch
rvdw-switch = 1.0
rvdw = 1.2 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
rcoulomb = 1.2
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT
; Temperature coupling
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = Protein_2xt Water_NA ; two coupling groups - more accurate
tau_t = 0.1 0.1 ; time constant, in ps
ref_t = 300 300 ; reference temperature, one for each group, in K
; Pressure coupling
pcoupl = Parrinello-Rahman ; pressure coupling is on for NPT
pcoupltype = isotropic ; uniform scaling of box vectors
tau_p = 2.0 ; time constant, in ps
ref_p = 1.0 ; reference pressure, in bar
compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction is not used for proteins with the C36 additive FF
DispCorr = no
; Velocity generation
gen_vel = no ; continuing from NPT equilibration
还望各位前辈帮忙指点指点。(PS:同样的体系,在自己电脑上采用1660 super几乎满gpu利用率,用时27h左右)
|
|