计算化学公社

标题: ORCA做counterpoise时,因为内存经常报错,请帮忙看看我的核数和内存设置 [打印本页]

作者
Author:
ktylea    时间: 2024-3-31 09:52
标题: ORCA做counterpoise时,因为内存经常报错,请帮忙看看我的核数和内存设置
本帖最后由 ktylea 于 2024-3-31 10:22 编辑

用ORCA做counterpoise校正计算27个原子的体系,ORCA的计算关键词由Multiwfn生成(一个inp文件里要计算五个能量,有五个结构)
! DLPNO-CCSD(T) tightPNO RIJK aug-cc-pVTZ aug-cc-pVTZ/JK aug-cc-pVTZ/C tightSCF noautostart miniprint nopop
机子逻辑核心112个
MemTotal:263608308KB
想要并行同时提交2-3个,内存不够结果都失败了,部分单独提交也失败了
单独提交的时候%maxcore 为1000(inp文件第一个结构设置3000)%pal nprocs 为36 ,算完一个提交一个,试了三个文件,两个文件失败了(文件2)。
调整内存,同时提交两个设置分别为 %maxcore 为10000,%pal nprocs 为16, %maxcore 为6000,%pal nprocs 为16,结果也报错了(文件3,4)。
单独提交%maxcore 为10000,%pal nprocs 为16,也报错(文件5),请大佬们帮忙看看我的内存和核数问题。


================
想在尝试改改内存,提交文件结果报错了
[1] 36984
[yueyang@localhost step3]$ [localhost.localdomain:36985] PMIX ERROR: OUT-OF-RESOURCE in file dstore_segment.c at line 207
[localhost.localdomain:36985] PMIX ERROR: OUT-OF-RESOURCE in file dstore_base.c at line 696
[localhost.localdomain:36985] PMIX ERROR: OUT-OF-RESOURCE in file dstore_base.c at line 1857
[localhost.localdomain:36985] PMIX ERROR: OUT-OF-RESOURCE in file dstore_base.c at line 2846
[localhost.localdomain:36985] PMIX ERROR: OUT-OF-RESOURCE in file dstore_base.c at line 2894
[localhost.localdomain:36985] PMIX ERROR: OUT-OF-RESOURCE in file server/pmix_server.c at line 3423
[localhost.localdomain:36994] PMIX ERROR: OUT-OF-RESOURCE in file client/pmix_client.c at line 277
[localhost.localdomain:36994] OPAL ERROR: Error in file pmix3x_client.c at line 112
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[localhost.localdomain:36994] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[16630,1],1]
  Exit code:    1
--------------------------------------------------------------------------








作者
Author:
sobereva    时间: 2024-3-31 14:57
DLPNO-CCSD(T)对内存和硬盘都有很高要求,特别是结合大基组时。maxcore给小了、剩余硬盘空间不够肯定完蛋。此外,同时跑多个DLPNO-CCSD(T)任务有可能造成硬盘I/O发生争抢,没绝对必要别同时跑多个
作者
Author:
ktylea    时间: 2024-3-31 20:55
好的,谢谢大神,我把内存调整一个一个跑一跑试一试。




欢迎光临 计算化学公社 (http://bbs.keinsci.com/) Powered by Discuz! X3.3