把一个装有ORCA6的singularity容器拷到一个新电脑的Ubuntu22.04系统中,运行时却出现了错误: wlc@wlc:~/wan$ singularityexec ~/software/orca6 orca orca.inp > orca.out & [1] 11360 * hwloc2.0.2rc1-git has detected buggy sysfs package information: Two packages have * thesame physical package id 0 but different core_siblings 0x000000ff and0x00000100 * hwlocis merging these packages into a single one assuming your Linux kernel * doesnot support this processor correctly. * You mayhide this warning by setting HWLOC_HIDE_ERRORS=1 in the environment. * * Ifhwloc does not report the right number of packages, * pleasereport this error message to the hwloc user's mailing list, * alongwith the files generated by the hwloc-gather-topology script. **************************************************************************** **************************************************************************** * hwloc2.0.2rc1-git has encountered what looks like an error from the operatingsystem. * * L1d(cpuset 0x00001003) intersects with L2 (cpuset 0x0000f000) without inclusion! * Erroroccurred in topology.c line 1384 * * Thefollowing FAQ entry in the hwloc documentation may help: * What should I do when hwloc reports"operating system" warnings? *Otherwise please report this error message to the hwloc user's mailing list, * alongwith the files generated by the hwloc-gather-topology script. **************************************************************************** [wlc:11400]*** Process received signal *** [wlc:11400]Signal: Segmentation fault (11) [wlc:11400]Signal code: Address not mapped (1) [wlc:11400]Failing at address: (nil) [wlc:11400][ 0] /lib64/libpthread.so.0(+0xf630)[0x15383a72b630] [wlc:11400][ 1] /centos/openmpi416/lib/libopen-pal.so.40(opal_hwloc201_hwloc_bitmap_copy+0x16)[0x15383b2dd7e6] [wlc:11400][ 2] /centos/openmpi416/lib/libopen-pal.so.40(+0xbd192)[0x15383b306192] [wlc:11400][ 3] /centos/openmpi416/lib/libopen-pal.so.40(+0xbd2a8)[0x15383b3062a8] [wlc:11400][ 4] /centos/openmpi416/lib/libopen-pal.so.40(+0xbd2a8)[0x15383b3062a8] [wlc:11400][ 5] /centos/openmpi416/lib/libopen-pal.so.40(+0xbd2a8)[0x15383b3062a8] [wlc:11400][ 6]/centos/openmpi416/lib/libopen-pal.so.40(opal_hwloc201_hwloc_topology_load+0x1f3)[0x15383b30d743] [wlc:11400][ 7]/centos/openmpi416/lib/libopen-pal.so.40(opal_hwloc_base_get_topology+0xbc9)[0x15383b2da8e9] [wlc:11400][ 8] /centos/openmpi416/lib/openmpi/mca_ess_hnp.so(+0x535c)[0x15383912b35c] [wlc:11400][ 9] /centos/openmpi416/lib/libopen-rte.so.40(orte_init+0x295)[0x15383b5ebe75] [wlc:11400][10]/centos/openmpi416/lib/libopen-rte.so.40(orte_submit_init+0x56c)[0x15383b59caac] [wlc:11400][11] mpirun[0x400e2f] [wlc:11400][12] /lib64/libc.so.6(__libc_start_main+0xf5)[0x15383a370555] [wlc:11400][13] mpirun[0x400cde] [wlc:11400]*** End of error message *** [fileorca_tools/qcmsg.cpp, line 394]: .... aborting the run 产生的out文件最后的错误信息是: …… …… ORCAfinished by error termination in Startup CallingCommand: mpirun -np 4 /orca600/orca_startup_mpi orca.int.tmp orca [fileorca_tools/qcmsg.cpp, line 394]: .... aborting the run …… …… 可以肯定的是: 1)容器没问题,因为这个容器已经在其他电脑上使用很久了,没出现过问题;并且这个容器在这个新电脑上只用单核计算也正常,只是在使用2核及以上核数计算时才会出现这样的错误; 2)输入文件也没问题,这是一个已经成功算过的任务; 3)新电脑的操作系统没问题。电脑主机是win10和ubuntu22.04双系统,在主机的win10系统中用虚拟机的话,可以用多核进行ORCA计算。在主机的ubuntu22.04系统中也可以用多核进行Gaussian计算。 我尝试的解决办法: 1)Google了一下,提出这种问题的很少,有一个答复说可能是主机上的openmpi没安装,难道调用容器中的openmpi还需要主机上也安装一个?那就安装吧,我就在主机上安装了和容器中相同版本的openmpi416,还是出现相同错误。 2)从错误信息上看,虽然是hwloc检测出来的软硬件兼容问题(大概意思?不太明白),还是更新成了hwloc (2.7.0-2ubuntu1),但是出现相同报错。 3)尝试过把ubuntu20.04更换为ubuntu22.04和centos8.9,也有同样的错误。 4)尝试过不用容器,直接在主机的ubuntu系统上安装orca6,也有同样错误。 难道是双系统的事?还没尝试过卸载win10(正版,没舍得)后只装一个ubuntu系统。除了单系统这一个办法,还有别的办法吗? 请大佬指教。
|