计算化学公社

标题: 前期GPU已经跑了接近350ns，换CPU80核继续运行md模拟时，关于PME rank报错运行被终止 [打印本页]

作者
Author: Wqyin 时间: 2025-5-16 16:35
标题: 前期GPU已经跑了接近350ns，换CPU80核继续运行md模拟时，关于PME rank报错运行被终止
本帖最后由 Wqyin 于 2025-5-16 16:41 编辑

各位老师好，我是用超算平台Gromacs2020.6版本跑的模拟。由于GPU资源紧张，不确定能不能排上，也不确定哪个速度快，我就先用intel-MPI -np 40 纯CPU跑着，然后一边排GPU队列，一边不定时把前面产生的过程的文件（trr cpt log等）都复制到 GPU排队的任务目录下，排上后GPU -ntomp 30继续跑，这个是没问题的，现在还在正常跑着。
我看CPU40跑得就比GPU这边慢20%左右，我就想看一下80核是不是比GPU跑更快，就把GPU跑的文件再不定时复制到CPU 80核排队的目录下，排上后运行就出现了如下报错和提示，不知道是什么原因，和MPI核数有关吗？麻烦各位老师帮忙看一下。
运行命令：gmx_mpi mdrun -s md.tpr -cpi md.cpt -v -deffnm md -dlb
yesstarting mdrun 'Protein in water'
500000000 steps, 1000000.0 ps (continuing from step 174963600, 349927.2 ps).
-------------------------------------------------------
核心报错：Fatal error:
24 particles communicated to PME rank 13 are more than 2/3 times the cut-off
out of the domain decomposition cell of their charge group in dimension y.
This usually means that your system is not well equilibrated.

其他提示：
Compiled SIMD: AVX2_256, but for this host/run AVX_512 might be better (see
log).
Using 80 MPI processes
Non-default thread affinity set, disabling internal thread affinity
Using 1 OpenMP thread per MPI process

MPI rank: 69 (out of 80)

作者
Author: 七尺贱 时间: 2025-5-17 16:26

作者
Author: hp2002 时间: 2026-7-3 19:56
请问楼主解决了吗

欢迎光临计算化学公社 (http://bbs.keinsci.com/)