计算化学公社
标题:
vasp报错咨询
[打印本页]
作者Author:
hzf
时间:
2023-11-2 09:37
标题:
vasp报错咨询
本帖最后由 hzf 于 2023-11-2 09:57 编辑
我使用是intel oneapi base toolkit2023,vasp6.3.0,这两套软件我安装在我的主控节点的/opt/software路径,切节点都使用该目录并挂载这个共享目录,在机器1上我使用vasp计算异常,下面是我的命令
source /opt/software/intel/oneapi/setvars.sh --force
/opt/software/intel/oneapi/mpi/2021.9.0/bin/mpirun /opt/software/vasp.6.3.0-constr-wannier-intelmpi/bin/vasp_std 2>&1 | tee vasp-cpu.log
下面是我异常节点的报错
[node2:24059:0:24059] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x54b6)
==== backtrace (tid: 24059) ====
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_std 000000000210EECE Unknown Unknown Unknown
libpthread-2.17.s 00007F9883389630 Unknown Unknown Unknown
vasp_std 000000000210EAAC Unknown Unknown Unknown
libpthread-2.17.s 00007F9883389630 Unknown Unknown Unknown
下面是正常节点的结果
LDA part: xc-table for Pade appr. of Perdew
POSCAR found type information on POSCAR Fe
POSCAR found : 1 types and 168 ions
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ... GRIDC
FFT: planning ... GRID_SOFT
FFT: planning ... GRID
WAVECAR not read
entering main loop
N E dE d eps ncg rms rms(c)
DAV: 1 0.902727997835E+04 0.90273E+04 -0.39808E+05 23984 0.983E+02
DAV: 2 -0.635668846795E+03 -0.96629E+04 -0.90394E+04 23784 0.303E+02
DAV: 3 -0.141523991939E+04 -0.77957E+03 -0.63745E+03 24760 0.108E+02
DAV: 4 -0.147839391224E+04 -0.63154E+02 -0.55756E+02 31768 0.301E+01
DAV: 5 -0.148104386544E+04 -0.26500E+01 -0.25785E+01 31960 0.615E+00 0.132E+02
RMM: 6 -0.185469084417E+04 -0.37365E+03 -0.28571E+03 23899 0.875E+01 0.304E+02
RMM: 7 -0.153193466175E+04 0.32276E+03 -0.59300E+02 23909 0.327E+01 0.224E+02
RMM: 8 -0.151764831815E+04 0.14286E+02 -0.54976E+02 24353 0.362E+01 0.192E+02
RMM: 9 -0.142933363886E+04 0.88315E+02 -0.12484E+02 23962 0.193E+01 0.156E+02
RMM: 10 -0.142091763761E+04 0.84160E+01 -0.18933E+02 24060 0.228E+01 0.165E+02
RMM: 11 -0.133717936505E+04 0.83738E+02 -0.59597E+01 24032 0.130E+01 0.101E+02
RMM: 12 -0.133064661234E+04 0.65328E+01 -0.60344E+01 24148 0.130E+01 0.971E+01
RMM: 13 -0.133061485103E+04 0.31761E-01 -0.55526E+01 24343 0.111E+01 0.925E+01
RMM: 14 -0.130621152903E+04 0.24403E+02 -0.30659E+01 23896 0.989E+00 0.456E+01
RMM: 15 -0.130904725550E+04 -0.28357E+01 -0.82268E+00 24035 0.389E+00 0.396E+01
RMM: 16 -0.130918910721E+04 -0.14185E+00 -0.24648E+00 24738 0.154E+00 0.382E+01
RMM: 17 -0.130919828747E+04 -0.91803E-02 -0.61068E-01 26538 0.667E-01 0.387E+01
RMM: 18 -0.130897952233E+04 0.21877E+00 -0.11896E+00 26668 0.136E+00 0.421E+01
RMM: 19 -0.130736745368E+04 0.16121E+01 -0.73217E+00 32683 0.368E+00 0.462E+01
RMM: 20 -0.130521937683E+04 0.21481E+01 -0.82858E+00 33032 0.394E+00 0.488E+01
RMM: 21 -0.130369378358E+04 0.15256E+01 -0.61770E+00 32342 0.322E+00 0.467E+01
RMM: 22 -0.130328773671E+04 0.40605E+00 -0.31753E+00 31055 0.245E+00 0.510E+01
RMM: 23 -0.130340716474E+04 -0.11943E+00 -0.12643E+00 26931 0.156E+00 0.515E+01
RMM: 24 -0.130333641890E+04 0.70746E-01 -0.26226E-01 24028 0.759E-01 0.494E+01
我应该怎么排查呢
作者Author:
wsz
时间:
2023-11-2 14:38
先执行一下 ulimit -s unlimited
作者Author:
hzf
时间:
2023-11-3 09:27
本帖最后由 hzf 于 2023-11-3 09:28 编辑
wsz 发表于 2023-11-2 14:38
先执行一下 ulimit -s unlimited
已经执行
任旧是一样的报错
:: initializing oneAPI environment ...
slurm_script: BASH_VERSION = 4.2.46(2)-release
args: Using "$@" for setvars.sh arguments: --force
:: advisor -- latest
:: ccl -- latest
:: clck -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: inspector -- latest
:: ipp -- latest
:: ippcp -- latest
:: ipp -- latest
:: itac -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vtune -- latest
:: oneAPI environment initialized ::
[node2:44714:0:44714] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x54b6)
==== backtrace (tid: 44714) ====
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
vasp_std 000000000210EECE Unknown Unknown Unknown
libpthread-2.17.s 00007FE5B58ED630 Unknown Unknown Unknown
vasp_std 000000000210EAAC Unknown Unknown Unknown
libpthread-2.17.s 00007FE5B58ED630 Unknown Unknown Unknown
作者Author:
hzf
时间:
2023-11-3 10:02
hzf 发表于 2023-11-3 09:27
已经执行
任旧是一样的报错
[root@node2 software]# ulimit -a unlimited
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 768251
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 768251
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
上面是我们的cpu和内存的限制情况,可以看到这个已经是无限制的了
欢迎光临 计算化学公社 (http://bbs.keinsci.com/)
Powered by Discuz! X3.3