|
|
本帖最后由 Zbin 于 2023-7-11 11:12 编辑
我利用4090显卡(CPU:AMD EPYC 7763)利用gromacs进行MD模拟,发现运算速度和3090一样(相同体系和参数设置),并没有很大提升。以下是MD模拟的运行命令。
- gmx mdrun -v -deffnm md -ntmpi 1 -ntomp 8 -gpu_id 0 -pin on -pinoffset 0 -update gpu
复制代码 以下是gromacs编译的信息:
- GROMACS version: 2023
- Precision: mixed
- Memory model: 64 bit
- MPI library: thread_mpi
- OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
- GPU support: CUDA
- NB cluster size: 8
- SIMD instructions: AVX2_256
- CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
- GPU FFT library: cuFFT
- Multi-GPU FFT: none
- RDTSCP usage: enabled
- TNG support: enabled
- Hwloc support: disabled
- Tracing support: disabled
- C compiler: /usr/bin/cc GNU 11.3.0
- C compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG
- C++ compiler: /usr/bin/c++ GNU 11.3.0
- C++ compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG
- BLAS library:
- LAPACK library:
- CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2022 NVIDIA Corporation;Built on Mon_Oct_24_19:12:58_PDT_2022;Cuda compilation tools, release 12.0, V12.0.76;Build cuda_12.0.r12.0/compiler.31968024_0
- CUDA compiler flags:-std=c++17;--generate-code=arch=compute_50,code=sm_50;--generate-code=arch=compute_52,code=sm_52;--generate-code=arch=compute_60,code=sm_60;--generate-code=arch=compute_61,code=sm_61;--generate-code=arch=compute_70,code=sm_70;--generate-code=arch=compute_75,code=sm_75;--generate-code=arch=compute_80,code=sm_80;--generate-code=arch=compute_86,code=sm_86;--generate-code=arch=compute_89,code=sm_89;--generate-code=arch=compute_90,code=sm_90;-Wno-deprecated-gpu-targets;--generate-code=arch=compute_53,code=sm_53;--generate-code=arch=compute_80,code=sm_80;-use_fast_math;-Xptxas;-warn-double-usage;-Xptxas;-Werror;-D_FORCE_INLINES;-fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG
- CUDA driver: 12.0
- CUDA runtime: 12.20
复制代码 这是模拟中GPU占用率:
我使用其他软件可以将4090的功耗利用到450W(几乎占满)。
目前不清楚这种现象的原因,大家有什么解决方案吗?
|
|