|
安装完之后启动 slurm
- (base) huan@grape:~/scitools$ sinfo --long
- Tue May 09 22:43:49 2023
- PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST
- debug* up infinite 1-infinite no NO all 1 unknown* grape
复制代码
STATE 的状态是 unknown* ,不是 idle
提交计算任务始终在 PANDDING 的状态
- (base) huan@grape:~/scitools$ squeue
- JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
- 1 debug h2o huan PD 0:00 1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
复制代码
查看了 slurmd 和 slurmctld 的状态,显示如下:
应该是 slurmd 没正常运行(Active: failed ), slurmctld 似乎是正常运行的(Active: active (running) )。
- (base) huan@grape:~/scitools$ sudo systemctl status slurmd
- × slurmd.service - Slurm node daemon
- Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
- Active: failed (Result: exit-code) since Tue 2023-05-09 22:35:32 CST; 24s ago
- Docs: man:slurmd(8)
- Process: 145145 ExecStart=/usr/sbin/slurmd -D -s $SLURMD_OPTIONS (code=exited, status=1/FAILURE)
- Main PID: 145145 (code=exited, status=1/FAILURE)
- CPU: 152ms
- May 09 22:35:31 grape systemd[1]: Started Slurm node daemon.
- May 09 22:35:32 grape systemd[1]: slurmd.service: Main process exited, code=exited, status=1/FAILURE
- May 09 22:35:32 grape systemd[1]: slurmd.service: Failed with result 'exit-code'.
- (base) huan@grape:~/scitools$
-
- (base) huan@grape:~/scitools$ sudo systemctl status slurmctld
- ● slurmctld.service - Slurm controller daemon
- Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled; vendor preset: enabled)
- Active: active (running) since Tue 2023-05-09 22:35:25 CST; 1min 12s ago
- Docs: man:slurmctld(8)
- Main PID: 145054 (slurmctld)
- Tasks: 10
- Memory: 2.3M
- CPU: 83ms
- CGroup: /system.slice/slurmctld.service
- ├─145054 /usr/sbin/slurmctld -D -s
- └─145055 "slurmctld: slurmscriptd" "" ""
- May 09 22:35:25 grape slurmctld[145054]: slurmctld: Recovered information about 1 jobs
- May 09 22:35:25 grape slurmctld[145054]: slurmctld: select/cons_tres: part_data_create_array: select/cons_tres: preparing for 1 partitions
- May 09 22:35:25 grape slurmctld[145054]: slurmctld: Recovered state of 0 reservations
- May 09 22:35:25 grape slurmctld[145054]: slurmctld: read_slurm_conf: backup_controller not specified
- May 09 22:35:25 grape slurmctld[145054]: slurmctld: select/cons_tres: select_p_reconfigure: select/cons_tres: reconfigure
- May 09 22:35:25 grape slurmctld[145054]: slurmctld: select/cons_tres: part_data_create_array: select/cons_tres: preparing for 1 partitions
- May 09 22:35:25 grape slurmctld[145054]: slurmctld: Running as primary controller
- May 09 22:35:25 grape slurmctld[145054]: slurmctld: No parameter for mcs plugin, default values set
- May 09 22:35:25 grape slurmctld[145054]: slurmctld: mcs: MCSParameters = (null). ondemand set.
- May 09 22:36:25 grape slurmctld[145054]: slurmctld: SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched>
- (base) huan@grape:~/scitools$
复制代码 |
|