Santz 发表于 2024-10-28 11:59 老师您好,我尝试了这四种设置,结果是能提交任务但是都不能排队,您知道是什么原因吗?附件是我slurm.conf文件,麻烦您看一下。 #slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # #ControlAddr= SlurmctldHost=node0 #DebugFlags=NO_CONF_HASH # #MailProg=/bin/mail MpiDefault=none #MpiParams=ports=#-# ProctrackType=proctrack/cgroup ReturnToService=2 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurm/slurmd SlurmUser=slurm #SlurmdUser=root #SlurmdUser= StateSaveLocation=/var/spool/slurm/ SwitchType=switch/none TaskPlugin=task/cgroup #TaskPlugin=task/affinity # # # TIMERS KillWait=30 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 InactiveLimit=0 WaitTime=0 # # # SCHEDULING SchedulerType=sched/backfill #SelectType=select/cons_res SelectType=select/cons_tres #SelectTypeParameters=CR_CPU_memory #SelectTypeParameters=CR_CPU SelectTypeParameters=CR_Core_memory DefMemPerCPU=6000 #SelectTypeParameters=CR_Core # # # LOGGING AND ACCOUNTING #AccountingStorageType=accounting_storage/slurmdbd #AccountingStoreFlags=JobComment #AccountingStorageEnforce=associations,limits,qos #AccountingStoragePass=/var/run/munge/munge.socket.2 #AccountingStorageHost=node0 AccountingStoragePort=6819 ClusterName=cluster #JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none SlurmctldDebug=info SlurmctldLogFile=/var/log/slurmctld.log SlurmdDebug=info SlurmdLogFile=/var/log/slurmd.log AllowSpecResourcesUsage=YES # # # COMPUTE NODES # NodeName=clustermaster CPUs=144 Boards=1 SocketsPerBoard=2 CoresPerSocket=36 ThreadsPerCore=2 RealMemory=515457 NodeName=node0 CPUs=144 Boards=1 SocketsPerBoard=2 CoresPerSocket=36 ThreadsPerCore=2 RealMemory=488448 Procs=2 State=IDLE NodeName=node1 CPUs=144 Boards=1 SocketsPerBoard=2 CoresPerSocket=36 ThreadsPerCore=2 RealMemory=253952 Procs=2 State=IDLE NodeName=node2 CPUs=144 Boards=1 SocketsPerBoard=2 CoresPerSocket=36 ThreadsPerCore=2 RealMemory=252928 procs=2 State=IDLE NodeName=node3 CPUs=144 Boards=1 SocketsPerBoard=2 CoresPerSocket=36 ThreadsPerCore=2 RealMemory=252928 Procs=2 State=IDLE NodeName=node4 CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=498688 Procs=2 State=IDLE PartitionName=test Nodes=node[0-4] Default=YES MaxTime=INFINITE State=UP OverSubscribe=NO # PartitionName=test Nodes=192.168.1.243,linux[1-32] Default=YES MaxTime=INFINITE State=UP # DefMemPerNode=1000 # MaxMemPerNode=1000 # DefMemPerCPU=4000 # MaxMemPerCPU=4096 |
Santz 发表于 2024-10-28 11:59 好的,谢谢您。 |
懒人方法:找几个商用超算或者试用平台直接对比看看人家的设置。 |
一般用CR_Core_Memory或CR_Core,我更喜欢前者加一个DefMemPerCPU设置,因为可以把内存也作为可分配资源。 CR_Core 的话每个任务默认都会申请计算节点的最大内存,除非脚本里明确指定申请多少内存。 [root@Master ~]# scontrol show config | grep -i SelectTypeParameters [root@master ~]# scontrol show config | grep -i SelectTypeParameters |
手机版 Mobile version|北京科音自然科学研究中心 Beijing Kein Research Center for Natural Sciences|京公网安备 11010502035419号|计算化学公社 — 北京科音旗下高水平计算化学交流论坛 ( 京ICP备14038949号-1 )|网站地图
GMT+8, 2024-11-27 04:55 , Processed in 0.186649 second(s), 26 queries , Gzip On.