请选择 进入手机版 | 继续访问电脑版

计算化学公社

 找回密码
 现在注册!
查看: 366|回复: 3

[GROMACS] gromacs2019.3gpu加速效果对比

[复制链接]

117

帖子

0

威望

2789

eV
积分
2906

Level 5 (御坂)

发表于 2019-7-17 11:02:49 | 显示全部楼层 |阅读模式
各位老师好,

小弟组里最近新添了一台Platium 8173M+RTX2080Ti服务器,我做了下简单的速度对比,供大家参考:
17000原子体系下:
E5 2686 v4+GTX1080Ti机器156ns/day
  1. GROMACS version:    2019.3
  2. Precision:          single
  3. Memory model:       64 bit
  4. MPI library:        thread_mpi
  5. OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
  6. GPU support:        CUDA
  7. SIMD instructions:  AVX2_256
  8. FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128
  9. RDTSCP usage:       enabled
  10. TNG support:        enabled
  11. Hwloc support:      disabled
  12. Tracing support:    disabled
  13. C compiler:         /usr/bin/cc GNU 4.8.5
  14. C compiler flags:    -mavx2 -mfma     -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  15. C++ compiler:       /usr/bin/c++ GNU 4.8.5
  16. C++ compiler flags:  -mavx2 -mfma    -std=c++11   -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  17. CUDA compiler:      /usr/local/cuda-9.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
  18. CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;;; ;-mavx2;-mfma;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
  19. CUDA driver:        9.10
  20. CUDA runtime:       9.10
复制代码
Platium 8173M+RTX2080Ti机器采用gcc5.0 和avx512指令集编译 154ns/day
  1. GROMACS version:    2019.3
  2. Precision:          single
  3. Memory model:       64 bit
  4. MPI library:        thread_mpi
  5. OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
  6. GPU support:        CUDA
  7. SIMD instructions:  AVX_512
  8. FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
  9. RDTSCP usage:       enabled
  10. TNG support:        enabled
  11. Hwloc support:      disabled
  12. Tracing support:    disabled
  13. C compiler:         /usr/local/bin/gcc GNU 5.5.0
  14. C compiler flags:    -mavx512f -mfma     -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  15. C++ compiler:       /usr/local/bin/g++ GNU 5.5.0
  16. C++ compiler flags:  -mavx512f -mfma    -std=c++11   -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  17. CUDA compiler:      /usr/local/cuda-10.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Wed_Apr_24_19:10:27_PDT_2019;Cuda compilation tools, release 10.1, V10.1.168
  18. CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; ;-mavx512f;-mfma;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
  19. CUDA driver:        10.10
  20. CUDA runtime:       10.10
复制代码


同样机器采用gcc5.0 和avx2_256指令集编译152ns/day
  1. GROMACS version:    2019.3
  2. Precision:          single
  3. Memory model:       64 bit
  4. MPI library:        thread_mpi
  5. OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
  6. GPU support:        CUDA
  7. SIMD instructions:  AVX2_256
  8. FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
  9. RDTSCP usage:       enabled
  10. TNG support:        enabled
  11. Hwloc support:      disabled
  12. Tracing support:    disabled
  13. C compiler:         /usr/local/bin/gcc GNU 5.5.0
  14. C compiler flags:    -mavx2 -mfma     -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  15. C++ compiler:       /usr/local/bin/g++ GNU 5.5.0
  16. C++ compiler flags:  -mavx2 -mfma    -std=c++11   -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  17. CUDA compiler:      /usr/local/cuda-10.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Wed_Apr_24_19:10:27_PDT_2019;Cuda compilation tools, release 10.1, V10.1.168
  18. CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; ;-mavx2;-mfma;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
  19. CUDA driver:        10.10
  20. CUDA runtime:       10.10
复制代码
感觉新机器还不如老的机器呢。。。

评分

参与人数 3eV +10 收起 理由
qinzhong605 + 2 谢谢分享
tjuptz + 3 谢谢
ezez + 5 赞!

查看全部评分

5

帖子

0

威望

133

eV
积分
138

Level 2 能力者

发表于 2019-7-19 08:40:46 | 显示全部楼层
这个结果好意外

36

帖子

0

威望

283

eV
积分
319

Level 3 能力者

发表于 2019-8-8 08:30:02 | 显示全部楼层

可能体系太小,新机器优势体现不了?

7

帖子

0

威望

134

eV
积分
141

Level 2 能力者

发表于 2019-8-10 10:41:18 | 显示全部楼层
您好,能否把tpr文件附上,方便大家测试不同平台配置下的ns/day?
您需要登录后才可以回帖 登录 | 现在注册!

本版积分规则

手机版|北京科音自然科学研究中心|京公网安备 11010502035419号|计算化学公社 — 北京科音旗下高水平计算化学交流论坛 ( 京ICP备14038949-1号 )

GMT+8, 2019-8-21 20:46 , Processed in 0.176402 second(s), 24 queries .

快速回复 返回顶部 返回列表