gromacs2019.3gpu加速效果对比 - 计算化学公社

GROMACS version: 2019.3
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/bin/cc GNU 4.8.5
C compiler flags: -mavx2 -mfma -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/bin/c++ GNU 4.8.5
C++ compiler flags: -mavx2 -mfma -std=c++11 -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
CUDA compiler: /usr/local/cuda-9.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;;; ;-mavx2;-mfma;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver: 9.10
CUDA runtime: 9.10

复制代码

GROMACS version: 2019.3
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX_512
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/local/bin/gcc GNU 5.5.0
C compiler flags: -mavx512f -mfma -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/local/bin/g++ GNU 5.5.0
C++ compiler flags: -mavx512f -mfma -std=c++11 -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
CUDA compiler: /usr/local/cuda-10.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Wed_Apr_24_19:10:27_PDT_2019;Cuda compilation tools, release 10.1, V10.1.168
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; ;-mavx512f;-mfma;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver: 10.10
CUDA runtime: 10.10

复制代码

GROMACS version: 2019.3
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /usr/local/bin/gcc GNU 5.5.0
C compiler flags: -mavx2 -mfma -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/local/bin/g++ GNU 5.5.0
C++ compiler flags: -mavx2 -mfma -std=c++11 -O2 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
CUDA compiler: /usr/local/cuda-10.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Wed_Apr_24_19:10:27_PDT_2019;Cuda compilation tools, release 10.1, V10.1.168
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; ;-mavx2;-mfma;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver: 10.10
CUDA runtime: 10.10

复制代码