计算化学公社

标题: 求助:QE测试并行报错 [打印本页]

作者
Author:
ycy    时间: 2022-4-22 22:28
标题: 求助:QE测试并行报错
目前已在机群安装完成qe-6.4.1与openmpi-4.0.3,在利用pw模块测试并行计算时,报错以下信息:
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   compute-065
  Local device: mlx5_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 48 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[compute-065:37968] 63 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[compute-065:37968] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[compute-065:37968] 31 more processes have sent help message help-mpi-api.txt / mpi-abort

机群的输入脚本如下:
#!/bin/bash
#SBATCH -J test
#SBATCH -p cpu-low
#SBATCH -N 2
#SBATCH -n 64
#SBATCH --ntasks-per-node=32
#SBATCH -t 168:00:00
#SBATCH -o test.out
#SBATCH -o test.err

export OMP_NUM_THREADS=1
mpirun -np 64 neb.x -i neb.in >neb.log
请问这个报错情况该怎么处理?



作者
Author:
abin    时间: 2022-4-22 22:42
本帖最后由 abin 于 2022-4-22 22:49 编辑

Google openMPI InfiniBand

简单说, openMPI默认编译, 调试不当, 可能导致不支持IB网络.

不过, IntelMPI默认支持IB网络的.

另, conda forge中的cp2k v8.2.0 openMPI版本中, 这里包含的openMPI就是HPC环境的, 比如IB网络. 可以看看编译参数.

作者
Author:
876449830    时间: 2022-4-23 09:32
重新编译,用编译出MPI+openmpi版本的




欢迎光临 计算化学公社 (http://bbs.keinsci.com/) Powered by Discuz! X3.3