计算化学公社

标题: gromacs2019.3gpu加速效果对比 [打印本页]

作者
Author:
mol    时间: 2019-7-17 11:02
标题: gromacs2019.3gpu加速效果对比
各位老师好,

小弟组里最近新添了一台Platium 8173M+RTX2080Ti服务器,我做了下简单的速度对比,供大家参考:
17000原子体系下:
E5 2686 v4+GTX1080Ti机器156ns/day
  1. GROMACS version:    2019.3
  2. Precision:          single
  3. Memory model:       64 bit
  4. MPI library:        thread_mpi
  5. OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
  6. GPU support:        CUDA
  7. SIMD instructions:  AVX2_256
  8. FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128
  9. RDTSCP usage:       enabled
  10. TNG support:        enabled
  11. Hwloc support:      disabled
  12. Tracing support:    disabled
  13. C compiler:         /usr/bin/cc GNU 4.8.5
  14. C compiler flags:    -mavx2 -mfma     -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  15. C++ compiler:       /usr/bin/c++ GNU 4.8.5
  16. C++ compiler flags:  -mavx2 -mfma    -std=c++11   -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  17. CUDA compiler:      /usr/local/cuda-9.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
  18. CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;;; ;-mavx2;-mfma;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
  19. CUDA driver:        9.10
  20. CUDA runtime:       9.10
复制代码
Platium 8173M+RTX2080Ti机器采用gcc5.0 和avx512指令集编译 154ns/day
  1. GROMACS version:    2019.3
  2. Precision:          single
  3. Memory model:       64 bit
  4. MPI library:        thread_mpi
  5. OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
  6. GPU support:        CUDA
  7. SIMD instructions:  AVX_512
  8. FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
  9. RDTSCP usage:       enabled
  10. TNG support:        enabled
  11. Hwloc support:      disabled
  12. Tracing support:    disabled
  13. C compiler:         /usr/local/bin/gcc GNU 5.5.0
  14. C compiler flags:    -mavx512f -mfma     -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  15. C++ compiler:       /usr/local/bin/g++ GNU 5.5.0
  16. C++ compiler flags:  -mavx512f -mfma    -std=c++11   -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  17. CUDA compiler:      /usr/local/cuda-10.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Wed_Apr_24_19:10:27_PDT_2019;Cuda compilation tools, release 10.1, V10.1.168
  18. CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; ;-mavx512f;-mfma;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
  19. CUDA driver:        10.10
  20. CUDA runtime:       10.10
复制代码


同样机器采用gcc5.0 和avx2_256指令集编译152ns/day
  1. GROMACS version:    2019.3
  2. Precision:          single
  3. Memory model:       64 bit
  4. MPI library:        thread_mpi
  5. OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
  6. GPU support:        CUDA
  7. SIMD instructions:  AVX2_256
  8. FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
  9. RDTSCP usage:       enabled
  10. TNG support:        enabled
  11. Hwloc support:      disabled
  12. Tracing support:    disabled
  13. C compiler:         /usr/local/bin/gcc GNU 5.5.0
  14. C compiler flags:    -mavx2 -mfma     -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  15. C++ compiler:       /usr/local/bin/g++ GNU 5.5.0
  16. C++ compiler flags:  -mavx2 -mfma    -std=c++11   -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  
  17. CUDA compiler:      /usr/local/cuda-10.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Wed_Apr_24_19:10:27_PDT_2019;Cuda compilation tools, release 10.1, V10.1.168
  18. CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; ;-mavx2;-mfma;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
  19. CUDA driver:        10.10
  20. CUDA runtime:       10.10
复制代码
感觉新机器还不如老的机器呢。。。

作者
Author:
puzhongji    时间: 2019-7-19 08:40
这个结果好意外
作者
Author:
bobosiji    时间: 2019-8-8 08:30
puzhongji 发表于 2019-7-19 08:40
这个结果好意外

可能体系太小,新机器优势体现不了?
作者
Author:
308866814    时间: 2019-8-10 10:41
您好,能否把tpr文件附上,方便大家测试不同平台配置下的ns/day?
作者
Author:
StormSpirts    时间: 2019-12-19 18:32
根据https://onlinelibrary.wiley.com/doi/full/10.1002/jcc.26011上的测试结果,2080TI明显优于1080TI,楼主要么使用文中的条件测试一下?




欢迎光临 计算化学公社 (http://bbs.keinsci.com/) Powered by Discuz! X3.3