|
|
遇到这个问题加一 ,一直运行没问题的oneapi 2021.4 在杀掉一个slurm任务之后 突然开始运行mpirun无输出,但后台有一个“mpiexec.hydra”一直在运行(但这个程序貌似和mpirun是等价的几乎)
尝试运行自身的check(mpirun --version)也没有输出 说明和计算程序编译无关 单纯源于mpirun本身运行的问题
尝试用安装包repair以及删了重装问题依然存在
用strace定位了一下mpirun的运行
与另一台正常机器对比只发现有多种mpiexec.hydra存在
- stat("/usr/local/go/bin/mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/home/room/Software/vaspkit.1.3.0/bin//mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/home/room/anaconda3/bin/mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/opt/intel/oneapi/compiler/2021.4.0/linux/bin/mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/home/room/.local/bin/mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/usr/local/go/bin/mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/home/room/Software/vaspkit.1.3.0/bin/mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/home/room/anaconda3/bin/mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/opt/intel/oneapi/compiler/2021.4.0/linux/bin/mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/opt/intel/oneapi/vtune/2021.7.1/bin64/mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/opt/intel/oneapi/vpl/2021.6.0/bin/mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/opt/intel/oneapi/mpi/2021.4.0/libfabric/bin/mpiexec.hydra", 0x7ffd56d14120) = -1 ENOENT (No such file or directory)
- stat("/opt/intel/oneapi/mpi/2021.4.0/bin/mpiexec.hydra", {st_mode=S_IFREG|0755, st_size=3637909, ...}) = 0
- clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb61efc9e50) = 234142
- wait4(-1,
复制代码
尝试注释掉与oneapi无关的mpiexec环境变量也没有效果,希望后面有经验的能推测一下可能原因~ |
|