|
最近在集群上运行了cp2k2024.3版本的psmp作业,cp2k是参照sob老师的安装过程:http://bbs.keinsci.com/thread-21608-1-1.html 参考了其他帖子提交作业:http://bbs.keinsci.com/forum.php ... ht=cp2k%B2%A2%D0%D0 还是出现报错: [icn256:40145] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and MPI will try to terminate your MPI job as well)
[icn256:40154] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and MPI will try to terminate your MPI job as well)
[icn256:40160] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and MPI will try to terminate your MPI job as well)
[icn256:40157] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
Tue Dec 24 15:36:51 CST 2024
请问这种问题该怎么解决? 这是我提交作业的脚本 #!/bin/bash
#SBATCH --job-name=jht_bulk
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
#SBATCH --cpus-per-task=1
#SBATCH --partition=hcpu48
#SBATCH --exclude=icn201,icn264
# load the environment
module purge
module load cp2k/2024.3
source /data0/software/cp2k/cp2k-2024.3/tools/toolchain/install/setup
# 定义输出文件
LOGFILE="memory_usage.log"
echo "Start monitoring memory usage..." > $LOGFILE
while true; do
echo "Timestamp: $(date)" >> $LOGFILE
free -h >> $LOGFILE
echo "========================" >> $LOGFILE
sleep 60 # 每隔60秒记录一次
done &
date
#srun cp2k.psmp input.inp > output.out
mpirun --mca pml ucx --mca btl '^openib' -np 96 cp2k.psmp input.inp > output.out
#cp2k.psmp input.inp > output.out
date
|
|