|
参考GROMACS (2019.3 GPU版) 并行效率测试及调试思路 - 分子模拟 (Molecular Modeling) - 计算化学公社帖子的内容,
测试了GROMACS 2025.2在AMD 7950x + 64G + NVIDIA 5080 的并行效率,
先说结论,thread-MPI 版gromacs,-ntmpi 1 -ntomp 14,效率最高,可达到539.78ns/day,以下是测试结果
- ==================================================================
- >> 系统环境检测
- ==================================================================
- CPU Model: AMD Ryzen 9 7950X 16-Core Processor (16 Cores)
- GPU Model: NVIDIA GeForce RTX 5080
- GPU Memory: 16303 MiB
- NVIDIA Driver: 575.64.03
- GROMACS (thread-MPI): /usr/local/gromacs/bin/gmx
- GROMACS (Open MPI): /usr/local/gromacs/bin/gmx_mpi
- ==================================================================
- >> GROMACS基准测试 (v16.0)
- ==================================================================
- 测试体系: topol.tpr
- 测试参数:
- - OMP线程数: 2 4 6 8 10 12 14
- - 是否测试 gmx_mpi: true
- 所有文件将生成在: /home/kang/Desktop/MD/gmxbench/gmx_benchmark_run_20250715_171115/
- ==================================================================
- >> 步骤 1: 正在生成测试任务脚本...
- ==================================================================
- 任务脚本生成完毕。
- ==================================================================
- >> 步骤 2: 正在自动执行所有测试任务...
- ==================================================================
- ==> Executing gmx_threadmpi-pme_gpu-omp_2...
- ...Success.
- ==> Executing gmx_threadmpi-pme_gpu-omp_4...
- ...Success.
- ==> Executing gmx_threadmpi-pme_gpu-omp_6...
- ...Success.
- ==> Executing gmx_threadmpi-pme_gpu-omp_8...
- ...Success.
- ==> Executing gmx_threadmpi-pme_gpu-omp_10...
- ...Success.
- ==> Executing gmx_threadmpi-pme_gpu-omp_12...
- ...Success.
- ==> Executing gmx_threadmpi-pme_gpu-omp_14...
- ...Success.
- ==> Executing gmx_openmpi-pme_gpu-omp_2...
- ...Success.
- ==> Executing gmx_openmpi-pme_gpu-omp_4...
- ...Success.
- ==> Executing gmx_openmpi-pme_gpu-omp_6...
- ...Success.
- ==> Executing gmx_openmpi-pme_gpu-omp_8...
- ...Success.
- ==> Executing gmx_openmpi-pme_gpu-omp_10...
- ...Success.
- ==> Executing gmx_openmpi-pme_gpu-omp_12...
- ...Success.
- ==> Executing gmx_openmpi-pme_gpu-omp_14...
- ...Success.
- 所有测试任务执行完毕。
- ==================================================================
- >> 步骤 3: 正在分析结果并生成报告...
- ==================================================================
- --- 基准测试结果分析 ---
- 测试配置 | 性能 (ns/day) | GPU Wait (%) | 瓶颈分析
- -------------------------------------------------------------------------------------------------------
- gmx_threadmpi-pme_gpu-omp_14 | 539.78 | 1.1 | Optimal (GPU is the bottleneck)
- gmx_threadmpi-pme_gpu-omp_12 | 508.29 | 1.1 | Optimal (GPU is the bottleneck)
- gmx_threadmpi-pme_gpu-omp_10 | 465.31 | 1.1 | Optimal (GPU is the bottleneck)
- gmx_threadmpi-pme_gpu-omp_8 | 432.93 | 1.1 | Optimal (GPU is the bottleneck)
- gmx_threadmpi-pme_gpu-omp_6 | 360.62 | 1.1 | Optimal (GPU is the bottleneck)
- gmx_openmpi-pme_gpu-omp_8 | 292.19 | 1.0 | Optimal (GPU is the bottleneck)
- gmx_openmpi-pme_gpu-omp_12 | 269.03 | 1.0 | Optimal (GPU is the bottleneck)
- gmx_threadmpi-pme_gpu-omp_4 | 267.41 | 1.0 | Optimal (GPU is the bottleneck)
- gmx_openmpi-pme_gpu-omp_10 | 266.81 | 1.0 | Optimal (GPU is the bottleneck)
- gmx_openmpi-pme_gpu-omp_14 | 259.61 | 1.0 | Optimal (GPU is the bottleneck)
- gmx_openmpi-pme_gpu-omp_6 | 257.86 | 1.0 | Optimal (GPU is the bottleneck)
- gmx_openmpi-pme_gpu-omp_4 | 220.63 | 1.0 | Optimal (GPU is the bottleneck)
- gmx_threadmpi-pme_gpu-omp_2 | 150.55 | 0.9 | Optimal (GPU is the bottleneck)
- gmx_openmpi-pme_gpu-omp_2 | 136.03 | 0.7 | Optimal (GPU is the bottleneck)
- --- 测试完成!最佳配置 ---
- 配置: gmx_threadmpi-pme_gpu-omp_14
- 性能: 539.779 ns/day
- --- 推荐的生产运行命令 ---
- gmx mdrun -deffnm YOUR_TPR_NAME -ntmpi 1 -ntomp 14 -nb gpu -pme gpu -pin on -dlb no
复制代码
|
|