|
slurm24并行计算,节点经常死机,要怎么查看日志,有哪几个文件,分析,下面是一个2025-01-10 下午3:00多发现死机,时间一定是在 3:54分之前
其中一个如下,centos8.5
var/log/messages
- Jan 10 15:03:09 node01 NetworkManager[36915]: <info> [1736492589.7207] agent-manager: agent[b913d79752165acd,:1.86/org.gnome.Shell.NetworkAgent/42]: agent registered
- Jan 10 15:03:10 node01 dbus-daemon[3665]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.86' (uid=42 pid=5880 comm="/usr/bin/gnome-shell ")
- Jan 10 15:03:10 node01 systemd[1]: Starting Fingerprint Authentication Daemon...
- Jan 10 15:03:11 node01 dbus-daemon[3665]: [system] Successfully activated service 'net.reactivated.Fprint'
- Jan 10 15:03:11 node01 systemd[1]: Started Fingerprint Authentication Daemon.
- Jan 10 15:03:16 node01 systemd-logind[5098]: New session 125 of user sutai.
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (**) Option "fd" "21"
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (II) event0 - Power Button: device removed
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (**) Option "fd" "24"
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (II) event1 - SIGMACHIP Usb Mouse: device removed
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (**) Option "fd" "25"
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (II) event2 - Dell KB216 Wired Keyboard: device removed
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (**) Option "fd" "26"
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (II) event3 - Dell KB216 Wired Keyboard System Control: device removed
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (**) Option "fd" "27"
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (II) event4 - Dell KB216 Wired Keyboard Consumer Control: device removed
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (II) systemd-logind: got pause for 13:67
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (II) systemd-logind: got pause for 13:66
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (II) systemd-logind: got pause for 13:68
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (II) systemd-logind: got pause for 226:0
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (II) systemd-logind: got pause for 13:64
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[5322]: (II) systemd-logind: got pause for 13:65
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: _XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: _XSERVTransMakeAllCOTSServerListeners: server already running
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: (--) Log file renamed from "/home/sutai/.local/share/xorg/Xorg.pid-40666.log" to "/home/sutai/.local/share/xorg/Xorg.1.log"
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: X.Org X Server 1.20.11
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: X Protocol Version 11, Revision 0
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: Build Operating System: 4.19.34-300.el7.x86_64
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: Current Operating System: Linux node01 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 22 13:25:12 UTC 2021 x86_64
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-348.7.1.el8_5.x86_64 root=/dev/mapper/cl-root ro crashkernel=auto resume=/dev/mapper/cl-swap rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: Build Date: 10 June 2021 11:58:07PM
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: Build ID: xorg-x11-server 1.20.11-2.el8
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: Current version of pixman: 0.38.4
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: #011Before reporting problems, check http://wiki.x.org
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: #011to make sure that you have the latest version.
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: Markers: (--) probed, (**) from config file, (==) default setting,
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: #011(++) from command line, (!!) notice, (II) informational,
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: (==) Log file: "/home/sutai/.local/share/xorg/Xorg.1.log", Time: Fri Jan 10 15:03:16 2025
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: (==) Using config directory: "/etc/X11/xorg.conf.d"
- Jan 10 15:03:16 node01 /usr/libexec/gdm-x-session[40663]: (==) Using system config directory "/usr/share/X11/xorg.conf.d"
复制代码 完整在附件
|
-
日志.png
(277.92 KB, 下载次数 Times of downloads: 63)
-
-
messages
536.25 KB, 下载次数 Times of downloads: 6
|