求助：2023版gromacs使用4090显卡低功耗，低性能的问题

Zbin · 发表于 Post on 2023-7-11 11:13:13

本帖最后由 Zbin 于 2023-7-11 11:12 编辑

我利用4090显卡（CPU:AMD EPYC 7763）利用gromacs进行MD模拟，发现运算速度和3090一样（相同体系和参数设置），并没有很大提升。以下是MD模拟的运行命令。

gmx mdrun -v -deffnm md -ntmpi 1 -ntomp 8 -gpu_id 0 -pin on -pinoffset 0 -update gpu

复制代码

以下是gromacs编译的信息：

GROMACS version: 2023
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support: CUDA
NB cluster size: 8
SIMD instructions: AVX2_256
CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
GPU FFT library: cuFFT
Multi-GPU FFT: none
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/cc GNU 11.3.0
C compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG
C++ compiler: /usr/bin/c++ GNU 11.3.0
C++ compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG
BLAS library:
LAPACK library:
CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2022 NVIDIA Corporation;Built on Mon_Oct_24_19:12:58_PDT_2022;Cuda compilation tools, release 12.0, V12.0.76;Build cuda_12.0.r12.0/compiler.31968024_0
CUDA compiler flags:-std=c++17;--generate-code=arch=compute_50,code=sm_50;--generate-code=arch=compute_52,code=sm_52;--generate-code=arch=compute_60,code=sm_60;--generate-code=arch=compute_61,code=sm_61;--generate-code=arch=compute_70,code=sm_70;--generate-code=arch=compute_75,code=sm_75;--generate-code=arch=compute_80,code=sm_80;--generate-code=arch=compute_86,code=sm_86;--generate-code=arch=compute_89,code=sm_89;--generate-code=arch=compute_90,code=sm_90;-Wno-deprecated-gpu-targets;--generate-code=arch=compute_53,code=sm_53;--generate-code=arch=compute_80,code=sm_80;-use_fast_math;-Xptxas;-warn-double-usage;-Xptxas;-Werror;-D_FORCE_INLINES;-fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG
CUDA driver: 12.0
CUDA runtime: 12.20

复制代码

这是模拟中GPU占用率：

我使用其他软件可以将4090的功耗利用到450W（几乎占满）。
目前不清楚这种现象的原因，大家有什么解决方案吗？

dzdhp · 发表于 Post on 2023-7-11 11:39:06

http://bbs.keinsci.com/forum.php ... &highlight=4090
你看看这位老师的测试

Zbin · 发表于 Post on 2023-7-11 11:44:20

dzdhp 发表于 2023-7-11 11:39
http://bbs.keinsci.com/forum.php?mod=viewthread&tid=33296&highlight=4090
你看看这位老师的测试

这个帖子我们很早就关注了，但并不能解决出现的问题

牧生 · 发表于 Post on 2023-7-11 12:56:07

本帖最后由牧生于 2023-7-11 12:58 编辑

http://bbs.keinsci.com/thread-37587-1-1.html

从第七楼开始往下看，看你是不是一样的

Entropy.S.I · 发表于 Post on 2023-7-11 13:35:45

CPU用EPYC 7763，本来就没什么救，在这种CPU下把4090当4080用即可。谁让你买单核性能一塌糊涂的7763呢，看了我文章还买这种CPU，只能尊重祝福了。

先用nvidia-smi -q看看PCIe连接在什么速率，再尝试-bonded gpu。正常来说使用-bonded gpu后4090相较于3090速度至少有90%提升，当然，你这CPU也不能奢求太多，即使-bonded gpu估计也就70~80%。

Zbin · 发表于 Post on 2023-7-12 10:56:33

Entropy.S.I 发表于 2023-7-11 13:35
CPU用EPYC 7763，本来就没什么救，在这种CPU下把4090当4080用即可。谁让你买单核性能一塌糊涂的7763呢，看 ...

感谢，加了-bonded gpu后可以与5950/13900+4090基本打平，速度满足需求。另外，这种配置主要也平衡pytorch的多卡使用，并不完全做MD。

youknowdcf · 发表于 Post on 2025-1-3 14:36:45

Entropy.S.I 发表于 2023-7-11 13:35
CPU用EPYC 7763，本来就没什么救，在这种CPU下把4090当4080用即可。谁让你买单核性能一塌糊涂的7763呢，看 ...

大佬请问一下，现在可用作生产模拟的gromac gpu加速版选哪个好些？我目前用的还是2019.3，感觉有些落后了。

Entropy.S.I · 发表于 Post on 2025-1-3 15:29:11

youknowdcf 发表于 2025-1-3 14:36
大佬请问一下，现在可用作生产模拟的gromac gpu加速版选哪个好些？我目前用的还是2019.3，感觉有些落后了 ...

2023.5或2024.4，取决于你是否需要2024的feature

		自动登录 Automatic login	找回密码 Forget password
密码 Password			注册 Register

[GROMACS] 求助：2023版gromacs使用4090显卡低功耗，低性能的问题

浏览过的版块