GROMACS (2025.2 GPU版) 并行效率测试结果

stishovite · 发表于 Post on 2025-7-15 17:44:03

参考GROMACS (2019.3 GPU版) 并行效率测试及调试思路 - 分子模拟 (Molecular Modeling) - 计算化学公社帖子的内容，

测试了GROMACS 2025.2在AMD 7950x + 64G + NVIDIA 5080 的并行效率，

先说结论，thread-MPI 版gromacs，-ntmpi 1 -ntomp 14，效率最高，可达到539.78ns/day，以下是测试结果

==================================================================
>> 系统环境检测
==================================================================
CPU Model: AMD Ryzen 9 7950X 16-Core Processor (16 Cores)
GPU Model: NVIDIA GeForce RTX 5080
GPU Memory: 16303 MiB
NVIDIA Driver: 575.64.03
GROMACS (thread-MPI): /usr/local/gromacs/bin/gmx
GROMACS (Open MPI): /usr/local/gromacs/bin/gmx_mpi
==================================================================
>> GROMACS基准测试 (v16.0)
==================================================================
测试体系: topol.tpr
测试参数:
- OMP线程数: 2 4 6 8 10 12 14
- 是否测试 gmx_mpi: true
所有文件将生成在: /home/kang/Desktop/MD/gmxbench/gmx_benchmark_run_20250715_171115/
==================================================================
>> 步骤 1: 正在生成测试任务脚本...
==================================================================
任务脚本生成完毕。
==================================================================
>> 步骤 2: 正在自动执行所有测试任务...
==================================================================
==> Executing gmx_threadmpi-pme_gpu-omp_2...
...Success.
==> Executing gmx_threadmpi-pme_gpu-omp_4...
...Success.
==> Executing gmx_threadmpi-pme_gpu-omp_6...
...Success.
==> Executing gmx_threadmpi-pme_gpu-omp_8...
...Success.
==> Executing gmx_threadmpi-pme_gpu-omp_10...
...Success.
==> Executing gmx_threadmpi-pme_gpu-omp_12...
...Success.
==> Executing gmx_threadmpi-pme_gpu-omp_14...
...Success.
==> Executing gmx_openmpi-pme_gpu-omp_2...
...Success.
==> Executing gmx_openmpi-pme_gpu-omp_4...
...Success.
==> Executing gmx_openmpi-pme_gpu-omp_6...
...Success.
==> Executing gmx_openmpi-pme_gpu-omp_8...
...Success.
==> Executing gmx_openmpi-pme_gpu-omp_10...
...Success.
==> Executing gmx_openmpi-pme_gpu-omp_12...
...Success.
==> Executing gmx_openmpi-pme_gpu-omp_14...
...Success.
所有测试任务执行完毕。
==================================================================
>> 步骤 3: 正在分析结果并生成报告...
==================================================================
--- 基准测试结果分析 ---
测试配置 | 性能 (ns/day) | GPU Wait (%) | 瓶颈分析
-------------------------------------------------------------------------------------------------------
gmx_threadmpi-pme_gpu-omp_14 | 539.78 | 1.1 | Optimal (GPU is the bottleneck)
gmx_threadmpi-pme_gpu-omp_12 | 508.29 | 1.1 | Optimal (GPU is the bottleneck)
gmx_threadmpi-pme_gpu-omp_10 | 465.31 | 1.1 | Optimal (GPU is the bottleneck)
gmx_threadmpi-pme_gpu-omp_8 | 432.93 | 1.1 | Optimal (GPU is the bottleneck)
gmx_threadmpi-pme_gpu-omp_6 | 360.62 | 1.1 | Optimal (GPU is the bottleneck)
gmx_openmpi-pme_gpu-omp_8 | 292.19 | 1.0 | Optimal (GPU is the bottleneck)
gmx_openmpi-pme_gpu-omp_12 | 269.03 | 1.0 | Optimal (GPU is the bottleneck)
gmx_threadmpi-pme_gpu-omp_4 | 267.41 | 1.0 | Optimal (GPU is the bottleneck)
gmx_openmpi-pme_gpu-omp_10 | 266.81 | 1.0 | Optimal (GPU is the bottleneck)
gmx_openmpi-pme_gpu-omp_14 | 259.61 | 1.0 | Optimal (GPU is the bottleneck)
gmx_openmpi-pme_gpu-omp_6 | 257.86 | 1.0 | Optimal (GPU is the bottleneck)
gmx_openmpi-pme_gpu-omp_4 | 220.63 | 1.0 | Optimal (GPU is the bottleneck)
gmx_threadmpi-pme_gpu-omp_2 | 150.55 | 0.9 | Optimal (GPU is the bottleneck)
gmx_openmpi-pme_gpu-omp_2 | 136.03 | 0.7 | Optimal (GPU is the bottleneck)
--- 测试完成！最佳配置 ---
配置: gmx_threadmpi-pme_gpu-omp_14
性能: 539.779 ns/day
--- 推荐的生产运行命令 ---
gmx mdrun -deffnm YOUR_TPR_NAME -ntmpi 1 -ntomp 14 -nb gpu -pme gpu -pin on -dlb no

复制代码

wbqdssl · 发表于 Post on 2025-7-15 19:07:33

我的是9950X，也发现-ntmpi 1 -ntomp 14，效率最高

Entropy.S.I · 发表于 Post on 2025-7-15 19:21:35

开SMT，-ntomp 31在绝大多数情况下速度最快

stishovite · 发表于 Post on 2025-7-15 22:14:41

Entropy.S.I 发表于 2025-7-15 19:21
开SMT，-ntomp 31在绝大多数情况下速度最快

感谢大佬，我试试

GHL · 发表于 Post on 2026-1-26 11:24:23

请问一下，gromacs 2025.2 适用于5070ti 吗？是否需要编译，如何取得编译版？

stishovite · 发表于 Post on 2026-1-29 15:08:48

GHL 发表于 2026-1-26 11:24
请问一下，gromacs 2025.2 适用于5070ti 吗？是否需要编译，如何取得编译版？

没有测试过

GHL · 发表于 Post on 2026-1-30 19:31:39

stishovite 发表于 2026-1-29 15:08
没有测试过

感谢

UW_0728. · 发表于 Post on 2026-1-30 20:03:06

GHL 发表于 2026-1-26 11:24
请问一下，gromacs 2025.2 适用于5070ti 吗？是否需要编译，如何取得编译版？

适用。不过GROMACS 2025.2对于CUDA加速的配置有一些bug，建议2025.3或更新版本

exity · 发表于 Post on 2026-2-23 15:59:51

16个物理核心，为啥是14最快不是16呢？

stishovite · 发表于 Post on 7 day ago

exity 发表于 2026-2-23 15:59
16个物理核心，为啥是14最快不是16呢？

这个问题之前论坛里有大神解释过，但我不是很清楚原因，不过测试结果的确是这样。

		自动登录 Automatic login	找回密码 Forget password
密码 Password			注册 Register

[GROMACS] GROMACS (2025.2 GPU版) 并行效率测试结果

浏览过的版块