|
|
本帖最后由 KazusaT 于 2025-9-29 22:50 编辑
在一台双路E5 36核72线程的Ubuntu 20.04机器上尝试配置Slurm来运行QE,安装后slurm配置文件如下
- # slurm.conf file generated by configurator easy.html.
- # Put this file on all nodes of your cluster.
- # See the slurm.conf man page for more information.
- #
- SlurmctldHost=localhost
- #
- #MailProg=/bin/mail
- MpiDefault=none
- #MpiParams=ports=#-#
- ProctrackType=proctrack/cgroup
- ReturnToService=1
- SlurmctldPidFile=/var/run/slurmctld.pid
- #SlurmctldPort=6817
- SlurmdPidFile=/var/run/slurmd.pid
- #SlurmdPort=6818
- SlurmdSpoolDir=/var/spool/slurm/slurmd
- SlurmUser=slurm
- #SlurmdUser=root
- StateSaveLocation=/var/spool/slurm
- SwitchType=switch/none
- TaskPlugin=task/affinity
- #
- #
- # TIMERS
- #KillWait=30
- #MinJobAge=300
- #SlurmctldTimeout=120
- #SlurmdTimeout=300
- #
- #
- # SCHEDULING
- SchedulerType=sched/backfill
- SelectType=select/cons_tres
- SelectTypeParameters=CR_Core
- #
- #
- # LOGGING AND ACCOUNTING
- AccountingStorageType=accounting_storage/none
- ClusterName=cluster
- #JobAcctGatherFrequency=30
- JobAcctGatherType=jobacct_gather/none
- #SlurmctldDebug=info
- #SlurmctldLogFile=
- #SlurmdDebug=info
- #SlurmdLogFile=
- #
- #
- # COMPUTE NODES
- NodeName=CZK-E5 CPUs=36 Boards=1 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=128000 State=idle
- # NodeName=linux[1-32] CPUs=1 State=UNKNOWN
- # NodeName=linux1 NodeAddr=128.197.115.158 CPUs=4 State=UNKNOWN
- # NodeName=linux2 NodeAddr=128.197.115.7 CPUs=4 State=UNKNOWN
- PartitionName=coc Nodes=CZK-E5 Default=YES MaxTime=INFINITE State=UP
- # PartitionName=test Nodes=CZK-E5,linux[1-32] Default=YES MaxTime=INFINITE State=UP
- # DefMemPerNode=1000
- # MaxMemPerNode=1000
- # DefMemPerCPU=4000
- # MaxMemPerCPU=4096
复制代码 超线程是开启的,但为了防止slurm用超线程跑我设定了CPUs=36和ThreadsPerCore=1,随后使用社长的测试文件来测试QE,测试脚本如下
- #!/bin/bash
- #SBATCH -p coc
- #SBATCH -J test
- #SBATCH --nodes=1
- #SBATCH --ntasks-per-node=36
- #SBATCH --cpus-per-task=1
- module load openmpi4.1.8
- export OMPI_ALLOW_RUN_AS_ROOT=1
- export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
- export PATH=$PATH:/home/czk/software/qe741/bin
- export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/oneapi/mkl/2025.2/lib
- export OMP_NUM_THREADS=1
- mpirun -n 36 pw.x < pwscf.in > pwscf.out
复制代码 随后在运行时htop发现貌似用超线程跑了,这里我哪里配置不正确?
|
|