|
本帖最后由 千面追风 于 2024-7-22 19:36 编辑
如题,题主在用128核 512g内存的服务器计算SOC时出现如下报错
Building the sigma vectors ...
Memory handling for direct AO based RPA:
Memory per vector needed ... 373 MB
Memory needed ... 6714 MB
Memory available ... 7000 MB
Number of vectors per batch ... 18
Number of batches ... 1
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 25 with PID 13457 on node a02r08n05 exited on signal 9 (Killed).
--------------------------------------------------------------------------
ORCA finished by error termination in CIS
Calling Command: mpirun -np 32 /work/home/ysuanap125/yeesuan/software/orca_5_0_4_linux_x86-64_shared_openmpi411/orca_cis_mpi design5_SOC.cisinp.tmp design5_SOC
[file orca_tools/qcmsg.cpp, line 465]:
.... aborting the run
[file orca_tools/qcmsg.cpp, line 465]:
.... aborting the run
slurmstepd: error: Detected 149 oom-kill event(s) in StepId=8824917.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
inp文件已上传,用的是32核,maxcore 7000
附:试过16核 maxcore 4000,也是同样的报错信息
根据512g的内存来看不应该会出现如此的问题呀?有老师能提供一个可能的问题来源吗?
另附slurm文件
|
|