|
本帖最后由 熊崪 于 2024-9-18 16:50 编辑
问题描述:
在VMWare 16 pro虚拟机平台安装了CentOS 8.3.2011系统,更换yum为aliyun,并安装了mkl库。
首先尝试了openmpi编译cp2k-2024.1,在i7-12700k的工作站上编译可正常运行,但在8375C工作站上单核运行可以跑满,然而mpirun -np时就出现了问题:尽管可以运行,但CPU几乎完全罢工,间隔一段时间才会占用率上升一瞬间,计算速度极慢。
于是尝试改用intelmpi编译
首先安装了intel-basekit和intel-hpckit,source过setvars.sh后重新进入终端,
:: oneAPI environment initialized ::
环境初始化没有问题。
而后尝试
./install_cp2k_toolchain.sh --with-sirius=no --with-intelmpi=system --with-plumed=install
make -j 16 ARCH=local VERSION="ssmp sopt psmp popt"
编译报了2个error
make -j 16 ARCH=local VERSION=psmp test无法运行
*************************** Testing started ****************************
[mpiexec@localhost.localdomain] match_arg (../../../../../src/pm/i_hydra/libhydra/arg/hydra_arg.c:82): unrecognized argument bind-to
[mpiexec@localhost.localdomain] Similar arguments:
[mpiexec@localhost.localdomain] n
[mpiexec@localhost.localdomain] HYD_arg_parse_array (../../../../../src/pm/i_hydra/libhydra/arg/hydra_arg.c:106): argument matching returned error
[mpiexec@localhost.localdomain] mpiexec_get_parameters (../../../../../src/pm/i_hydra/mpiexec/mpiexec_params.c:1190): error parsing input array
[mpiexec@localhost.localdomain] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1725): error parsing parameters
Could not parse feature flags.
make[3]: *** [/root/cp2k-2024.1/Makefile:256: test] Error 1
make[2]: *** [/root/cp2k-2024.1/Makefile:151: test] Error 2
make[1]: *** [/root/cp2k-2024.1/Makefile:135: popt] Error 2
make: *** [Makefile:123: test] Error 2
[root@localhost cp2k-2024.1]# make -j1 16 ARCH=local VERSION=popt test
Discovering programs ...
Makefile:134: warning: overriding recipe for target 'popt'
Makefile:128: warning: ignoring old recipe for target 'popt'
make: *** No rule to make target '16'. Stop.
测试ssmp可以运行,但有大量RUNTIME FAIL,跑了1000多个就因为fail超过50个退出了。
------------------------------- Timings --------------------------------
Plot: name="timings", title="Timing Distribution", ylabel="time 【s】"
PlotPoint: name="100th_percentile", plot="timings", label="100th %ile", y=23.71, yerr=0.0
PlotPoint: name="99th_percentile", plot="timings", label="99th %ile", y=11.26, yerr=0.0
PlotPoint: name="98th_percentile", plot="timings", label="98th %ile", y=9.74, yerr=0.0
PlotPoint: name="95th_percentile", plot="timings", label="95th %ile", y=8.10, yerr=0.0
PlotPoint: name="90th_percentile", plot="timings", label="90th %ile", y=6.68, yerr=0.0
PlotPoint: name="80th_percentile", plot="timings", label="80th %ile", y=5.11, yerr=0.0
----------------------------- Slow Tests -------------------------------
Duration threshold (2x 95th %ile): 16.21 sec
Found 0 slow tests (2 suppressed):
------------------------------- Summary --------------------------------
Number of FAILED tests 52
Number of WRONG tests 0
Number of CORRECT tests 1016
Total number of tests 1068
Summary: correct: 1016 / 1068; failed: 52; 8min
Status: FAILED
所有fail都是
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread-2.28.s 00007FCC86CB7B20 Unknown Unknown Unknown
cp2k.ssmp 0000000003123A61 pw_methods_mp_pw_ 2581 pw_methods.F
cp2k.ssmp 000000000312A85E pw_poisson_method 306 pw_poisson_methods.F
cp2k.ssmp 00000000011F1161 qs_ks_methods_mp_ 406 qs_ks_methods.F
cp2k.ssmp 00000000011EFC0A qs_ks_methods_mp_ 1215 qs_ks_methods.F
cp2k.ssmp 00000000011EF4F8 qs_ks_methods_mp_ 1111 qs_ks_methods.F
cp2k.ssmp 00000000013AAF71 qs_scf_mp_init_sc 813 qs_scf.F
cp2k.ssmp 00000000013A5250 qs_scf_mp_scf_env 457 qs_scf.F
cp2k.ssmp 000000000139ADA4 qs_scf_mp_scf_ 247 qs_scf.F
cp2k.ssmp 00000000010D8739 qs_energy_mp_qs_e 112 qs_energy.F
cp2k.ssmp 000000000110E003 qs_force_mp_qs_fo 200 qs_force.F
cp2k.ssmp 0000000000B62A59 force_env_methods 255 force_env_methods.F
cp2k.ssmp 0000000000545C40 cp_eval_at_ 142 gopt_f77_methods.F
cp2k.ssmp 0000000000663EEC bfgs_optimizer_mp 286 bfgs_optimizer.F
cp2k.ssmp 000000000054443F geo_opt_mp_cp_geo 90 geo_opt.F
cp2k.ssmp 0000000000456029 cp2k_runs_mp_cp2k 369 cp2k_runs.F
cp2k.ssmp 0000000000454A54 cp2k_runs_mp_run_ 983 cp2k_runs.F
cp2k.ssmp 0000000000452F95 MAIN__ 379 cp2k.F
cp2k.ssmp 000000000045118D Unknown Unknown Unknown
libc-2.28.so 00007FCC7FD367B3 __libc_start_main Unknown Unknown
cp2k.ssmp 00000000004510AE Unknown Unknown Unknown
实际运行时也是报同样的错误、
这一问题在12700k和8375C上相同。
求各位大佬解惑。
|
|