计算化学公社

标题: 求助：2023版gromacs使用4090显卡低功耗，低性能的问题 [打印本页]

作者
Author: Zbin 时间: 2023-7-11 11:13
标题: 求助：2023版gromacs使用4090显卡低功耗，低性能的问题
本帖最后由 Zbin 于 2023-7-11 11:12 编辑

我利用4090显卡（CPU:AMD EPYC 7763）利用gromacs进行MD模拟，发现运算速度和3090一样（相同体系和参数设置），并没有很大提升。以下是MD模拟的运行命令。

gmx mdrun -v -deffnm md -ntmpi 1 -ntomp 8 -gpu_id 0 -pin on -pinoffset 0 -update gpu

复制代码

以下是gromacs编译的信息：

GROMACS version: 2023
Precision: mixed
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support: CUDA
NB cluster size: 8
SIMD instructions: AVX2_256
CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
GPU FFT library: cuFFT
Multi-GPU FFT: none
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/cc GNU 11.3.0
C compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG
C++ compiler: /usr/bin/c++ GNU 11.3.0
C++ compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG
BLAS library:
LAPACK library:
CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2022 NVIDIA Corporation;Built on Mon_Oct_24_19:12:58_PDT_2022;Cuda compilation tools, release 12.0, V12.0.76;Build cuda_12.0.r12.0/compiler.31968024_0
CUDA compiler flags:-std=c++17;--generate-code=arch=compute_50,code=sm_50;--generate-code=arch=compute_52,code=sm_52;--generate-code=arch=compute_60,code=sm_60;--generate-code=arch=compute_61,code=sm_61;--generate-code=arch=compute_70,code=sm_70;--generate-code=arch=compute_75,code=sm_75;--generate-code=arch=compute_80,code=sm_80;--generate-code=arch=compute_86,code=sm_86;--generate-code=arch=compute_89,code=sm_89;--generate-code=arch=compute_90,code=sm_90;-Wno-deprecated-gpu-targets;--generate-code=arch=compute_53,code=sm_53;--generate-code=arch=compute_80,code=sm_80;-use_fast_math;-Xptxas;-warn-double-usage;-Xptxas;-Werror;-D_FORCE_INLINES;-fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG
CUDA driver: 12.0
CUDA runtime: 12.20

复制代码

这是模拟中GPU占用率：
(, 下载次数 Times of downloads: 8)

我使用其他软件可以将4090的功耗利用到450W（几乎占满）。
目前不清楚这种现象的原因，大家有什么解决方案吗？

作者
Author: dzdhp 时间: 2023-7-11 11:39
http://bbs.keinsci.com/forum.php ... &highlight=4090
你看看这位老师的测试

作者
Author: Zbin 时间: 2023-7-11 11:44

dzdhp 发表于 2023-7-11 11:39
http://bbs.keinsci.com/forum.php?mod=viewthread&tid=33296&highlight=4090
你看看这位老师的测试

这个帖子我们很早就关注了，但并不能解决出现的问题

作者
Author: 牧生 时间: 2023-7-11 12:56
本帖最后由牧生于 2023-7-11 12:58 编辑

http://bbs.keinsci.com/thread-37587-1-1.html

从第七楼开始往下看，看你是不是一样的

作者
Author: Entropy.S.I 时间: 2023-7-11 13:35
CPU用EPYC 7763，本来就没什么救，在这种CPU下把4090当4080用即可。谁让你买单核性能一塌糊涂的7763呢，看了我文章还买这种CPU，只能尊重祝福了。

先用nvidia-smi -q看看PCIe连接在什么速率，再尝试-bonded gpu。正常来说使用-bonded gpu后4090相较于3090速度至少有90%提升，当然，你这CPU也不能奢求太多，即使-bonded gpu估计也就70~80%。

作者
Author: Zbin 时间: 2023-7-12 10:56

Entropy.S.I 发表于 2023-7-11 13:35
CPU用EPYC 7763，本来就没什么救，在这种CPU下把4090当4080用即可。谁让你买单核性能一塌糊涂的7763呢，看 ...

感谢，加了-bonded gpu后可以与5950/13900+4090基本打平，速度满足需求。另外，这种配置主要也平衡pytorch的多卡使用，并不完全做MD。

作者
Author: youknowdcf 时间: 2025-1-3 14:36

Entropy.S.I 发表于 2023-7-11 13:35
CPU用EPYC 7763，本来就没什么救，在这种CPU下把4090当4080用即可。谁让你买单核性能一塌糊涂的7763呢，看 ...

大佬请问一下，现在可用作生产模拟的gromac gpu加速版选哪个好些？我目前用的还是2019.3，感觉有些落后了。

作者
Author: Entropy.S.I 时间: 2025-1-3 15:29

youknowdcf 发表于 2025-1-3 14:36
大佬请问一下，现在可用作生产模拟的gromac gpu加速版选哪个好些？我目前用的还是2019.3，感觉有些落后了 ...

2023.5或2024.4，取决于你是否需要2024的feature

欢迎光临计算化学公社 (http://bbs.keinsci.com/)

Powered by Discuz! X3.3