本帖最后由 Kamistry 于 2024-3-21 20:02 编辑
consearch:一键提交slurm的molclus构象搜索脚本
Author: Zihan Lin @ USTC (Kamistry @ 计算化学公社) 在使用本脚本进行科学研究,研究结果发表时,如若能正确引用,笔者将万分感激!
使用前准备 1、首先需要给予脚本运行权限,可以使用如下命令 强烈推荐把consearch加入环境变量中,无需每次复制到计算文件夹中 2、修改 # commands部分 或 将xtb、crest、molclus、isostat加入环境变量(推荐后者,特别是在不同环境下使用或者不清楚如何更改脚本的情况下) 3、计算前在计算文件夹中提供输入文件(例如input.xyz)、xtb动力学需要的设置(例如md.inp,如果使用了--traj选项则不用提供)、molclus需要的template.gjf、settings.ini、template_SP.inp或template_SP.gjf等(如果使用了-3选项则不用提供)。如果缺少文件会有提示(参见后面的说明)。
用法 以下是可用的选项: --cpu= 设置运行核数,只能是正整数。 --mem= 设置运行内存,单位只能是MB或者GB。 --traj 使用已有的轨迹文件。使用这个选项时,代表输入的input.xyz是已得到的轨迹文件(例如通过gromacs、gentor等方式获得),此时不会调用xtb做动力学。 --mdinput= 设置xtb动力学的参数文件,对应xtb --input选项。当使用--traj选项时,此选项会被忽略。如果不使用这个选项去指定文件名,则默认是md.inp。 --solvent= 设置第2步GFN2-xTB优化的溶剂环境,对应xtb --gbsa选项,当额外参数为空字符串时代表使用气相计算。 --chrg= 对应xtb --chrg选项,默认值是0。 --uhf= 对应xtb --uhf选项,默认值是0。 --Nout= 对应isostat -Nout选项,默认值是10。 --Eout= 对应isostat -Eout选项,默认值是2.4。 --Edis= 对应isostat -Edis选项,默认值是0.5。 --Gdis= 对应isostat -Gdis选项,默认值是0.5。 --temp= 设置温度,单位是K,对应isostat -T选项,默认值是298.15。 --clean= 自动清理。设置为0时,保留所有文件;设置为1时,仅保留每一步的最终结果(在计算文件夹下00result.xyz等文件);设置为2时,清理所有中间文件,仅保留最终文件。 -1 跳过第一步 (GFN0-xTB优化)。此时输入文件将直接提供给后面步骤。 -2 跳过第二步 (GFN0-xTB优化)。此时输入文件将直接提供给后面步骤。 -3 跳过第三步 (molclus)。当-1、-2、-3选项同时被使用时,相当于只用xtb跑动力学。当--traj、-1、-2、-3选项同时被使用时没有意义。 -h|--help 显示帮助
如何设置参数 1、临时更改 临时更改的情况强烈不建议直接修改脚本,请使用命令行参数。当选项后面带有=时,意思是需要提供额外的参数,与长选项以空格或等号隔开。如果为空字符串,例如设置气相条件计算,可以用--solvent=""或--solvent "" 2、多次使用 a. 直接更改脚本# default settings部分(不建议,除了cpu和mem) b. 写一个如下的调用脚本consearch2.sh(还是推荐加入环境变量) - #!/bin/bash
- consearch 要添加的选项 $@
复制代码然后 使用时只需要用consearch2.sh替代consearch即可,但是要注意参数冲突的情况。
提交任务 本脚本会自动提交slurm,在提交前会检查已有的文件是否齐全,并输出如下。(注意:不同的超算上可能需要在sbatch << EOF部分加上复制到计算节点的命令,具体参考超算的说明) - --------------------------------------------------------------------------------
- Check files
- --------------------------------------------------------------------------------
- Input file : found input.xyz
- Molclus settings : found settings.ini
- Molclus template : found template.gjf
- Molclus template_SP : found template_SP.inp
复制代码
通过检查后会自动提交slurm,并根据cpu和mem参数请求资源,如果不需要用slurm可以将sbatch << EOF替换为bash << EOF。 输出会被保存在consearch.log中,每一步的中间文件会在0x(x对应具体步骤)子文件夹中,每一步结果会被复制到计算文件夹中,命名为0xresult.xyz。
脚本分析 以下是这个脚本的思路和分析,希望能起到抛砖引玉的作用。
第一行指定默认使用bash运行,之后是用法注释。 - #!/bin/bash
- # author: Zihan Lin
- # Usage:
- # consearch <options> input.xyz
- #
- # Example:
- # consearch input.xyz
- # consearch --traj --cpu 64 --mem 480GB traj.xyz
- # consearch --solvent toluene input.xyz
- #
- # Options:
- # --cpu= : Set the number of cores used
- # --mem= : Set the number of memery used. Unit is MB or GB.
- # --traj : Use existed traj file
- # --mdinput= : Set --input for xtb. Default is md.inp.
- # --solvent= : Set solvent for xtb. Default is water.
- # --chrg= : Set --chrg for xtb. Default is 0.
- # --uhf= : Set --uhf for xtb. Default is 0.
- # --Nout= : Set -Nout for isostat. Default is 10.
- # --Eout= : Set -Eout for isostat. Default is 2.4.
- # --Edis= : Set -Edis for isostat. Default is 0.5.
- # --Gdis= : Set -Gdis for isostat. Default is 0.5.
- # --temp= : Set temperature for isostat. Default is 298.15.
- # --clean= : Auto clean or not. 0: keep all files. 1: keep only the
- # final result of each step. 2: delete all intermediate files
- # -1 : Skip step 1 (GFN0-xTB optimization).
- # -2 : Skip step 2 (GFN2-xTB optimization).
- # -3 : Skip step 3 (molclus).
- # -h|--help : Show this information.
复制代码接下来这一部分是调用命令的设置,请根据自己的情况修改或加入环境变量(推荐后者,省时省力,一劳永逸)。 - # commands
- # change them or add to environment variables
- xtb=xtb
- crest=crest
- molclus=molclus
- isostat=isostat
复制代码 默认设置部分,除了cpu和mem以外不推荐直接修改。
- # default settings
- # not recommended to change this every time, please use the command line argument
- cpu=64 # cpu core
- mem=480GB # memory
- omp_stacksize=4000m # memory for xtb
- inputTraj=0 # whether use existed traj file or not
- mdFile=md.inp # md settings for xtb
- solvent=water # solvent for xtb (step 2)
- chrg=0 # chrg for xtb
- uhf=0 # uhf for xtb
- Nout=10 # Nout for isostat
- Eout=2.4 # Eout for isostat
- Edis=0.5 # Edis for isostat
- Gdis=0.5 # Gdis for isostat
- temperature=298.15 # temperature for isostat
- clean=0 # auto clean after completed
- skip1=0 # skip step 1 (GFN0-xTB optimization)
- skip2=0 # skip step 2 (GFN2-xTB optimization)
- skip3=0 # skip step 3 (molclus)
复制代码其他需要的变量,不要修改,否则可能出错。
- # other variables, DO NOT CHANGE!
- inputFile=""
- divideLine=$(printf '%.0s-' {1..80})
- spTemplate=""
复制代码这一部分是输出帮助信息,因为可能会用到多次,所以写成一个function - function printHelp {
- printf "\n%s\n%42s\n%s\n" "$divideLine" "Help" "$divideLine"
- printf "Usage: \n %s\n\n" "consearch <options> input.xyz"
- printf "Example: \n %s\n" "consearch input.xyz"
- printf " %s\n" "consearch --traj --cpu 64 --mem 480GB traj.xyz"
- printf " %s\n\n" "consearch --solvent toluene input.xyz"
- printf "Options:\n"
- printf " %-15s: %s\n" "--cpu=" "Set the number of cores used"
- printf " %-15s: %s\n" "--mem=" "Set the number of memery used. Unit is MB or GB."
- printf " %-15s: %s\n" "--traj" "Use existed traj file"
- printf " %-15s: %s\n" "--mdinput=" "Set --input for xtb. Default is md.inp."
- printf " %-15s: %s\n" "--solvent=" "Set solvent for xtb. Default is water."
- printf " %-15s: %s\n" "--chrg=" "Set --chrg for xtb. Default is 0."
- printf " %-15s: %s\n" "--uhf=" "Set --uhf for xtb. Default is 0."
- printf " %-15s: %s\n" "--Nout=" "Set -Nout for isostat. Default is 10."
- printf " %-15s: %s\n" "--Eout=" "Set -Eout for isostat. Default is 2.4."
- printf " %-15s: %s\n" "--Edis=" "Set -Edis for isostat. Default is 0.5."
- printf " %-15s: %s\n" "--Gdis=" "Set -Gdis for isostat. Default is 0.5."
- printf " %-15s: %s\n" "--temp=" "Set temperature for isostat. Default is 298.15."
- printf " %-15s: %s\n" "--clean=" "Auto clean or not. 0: keep all files. 1: keep only the"
- printf " %-15s %s\n" "" "final result of each step. 2: delete all intermediate files"
- printf " %-15s: %s\n" "-1" "Skip step 1 (GFN0-xTB optimization)."
- printf " %-15s: %s\n" "-2" "Skip step 2 (GFN2-xTB optimization)."
- printf " %-15s: %s\n" "-3" "Skip step 3 (molclus)."
- printf " %-15s: %s\n" "-h|--help" "Show this information."
- printf "\nAuthor: \n %s\n\n" "Zihan Lin @ USTC"
- }
复制代码主体部分,首先是解析命令行参数 - # handle options and parameters
- printf "\n%s\n%54s\n%s\n" "$divideLine" "Parse options and parameters" "$divideLine"
- options=$(getopt -q -l cpu:,mem:,traj,mdinput:,chrg:,uhf:,solvent:,Nout:,Eout:,Edis:,Gdis:,temp:,clean:,help h123 "$@")
- printf "command parameters: %s\n" "$options"
- eval set -- "$options"
- while [ -n "$1" ]; do
- case "$1" in
- --cpu)
- if [[ $2 =~ ^[1-9][0-9]*$ ]]; then
- cpu=$2
- printf "%-10s: %s\n" "--cpu" "use $2 core(s)"
- else
- printf "%-10s: %s\n" "--cpu" "invalid --cpu argument, this will be ignored"
- fi
- shift
- ;;
- --mem)
- if [[ $2 =~ ^[1-9][0-9]*[mMgG][bB]?$ ]]; then
- mem=$2
- printf "%-10s: %s\n" "--mem" "use $2 memory"
- else
- printf "%-10s: %s\n" "--mem" "invalid --mem argument, this will be ignored"
- fi
- shift
- ;;
- --traj) # use exists trajectory file
- printf "%-10s: %s\n" "--traj" "use exists trajectory file"
- inputTraj=1
- ;;
- --mdinput) # set input for xtb
- if [ -n "$2" ]; then
- mdFile=$2
- if [ "$2" != "md.inp" ]; then
- printf "%-10s: %s\n" "--mdinput" "use $2 instead of md.inp as xtb --input argument"
- fi
- else
- printf "%-10s: %s\n" "--mdinput" "invalid --mdinput argument, this will be ignored"
- fi
- shift
- ;;
- --chrg) # set chrg for xtb
- if [[ $2 =~ ^[0-9]+$ ]]; then
- chrg=$2
- printf "%-10s: %s\n" "--chrg" "use $2 for xtb --chrg argument"
- else
- printf "%-10s: %s\n" "--chrg" "invalid --chrg argument, this will be ignored"
- fi
- shift
- ;;
- --uhf) # set uhf for xtb
- if [[ $2 =~ ^[0-9]+$ ]]; then
- uhf=$2
- printf "%-10s: %s\n" "--uhf" "use $2 for xtb --uhf argument"
- else
- printf "%-10s: %s\n" "--uhf" "invalid --uhf argument, this will be ignored"
- fi
- shift
- ;;
- --solvent) # set solvent for xtb
- solvent=$2
- if [ -n "$solvent" ]; then
- printf "%-10s: %s\n" "--solvent" "use $2 for xtb --input argument"
- else
- printf "%-10s: %s\n" "--solvent" "use gas phase"
- fi
- shift
- ;;
- --Nout) # set Nout for isostat
- if [[ $2 =~ ^[0-9]+$ ]]; then
- Nout=$2
- printf "%-10s: %s\n" "--Nout" "use $2 for isostat -Nout argument"
- else
- printf "%-10s: %s\n" "--Nout" "invalid --Nout argument, this will be ignored"
- fi
- shift
- ;;
- --Eout) # set Eout for isostat
- if [[ $2 =~ ^[0-9]+\.[0-9]+$ ]]; then
- Eout=$2
- printf "%-10s: %s\n" "--Eout" "use $2 for isostat -Eout argument"
- else
- printf "%-10s: %s\n" "--Eout" "invalid --Eout argument, this will be ignored"
- fi
- shift
- ;;
- --Edis) # set Edis for isostat
- if [[ $2 =~ ^[0-9]+\.[0-9]+$ ]]; then
- Edis=$2
- printf "%-10s: %s\n" "--Edis" "use $2 for isostat -Edis argument"
- else
- printf "%-10s: %s\n" "--Edis" "invalid --Edis argument, this will be ignored"
- fi
- shift
- ;;
- --Gdis) # set Gdis for isostat
- if [[ $2 =~ ^[0-9]+\.[0-9]+$ ]]; then
- Gdis=$2
- printf "%-10s: %s\n" "--Gdis" "use $2 for isostat -Gdis argument"
- else
- printf "%-10s: %s\n" "--Gdis" "invalid --Gdis argument, this will be ignored"
- fi
- shift
- ;;
- --temp) # set temperature for isostat
- if [[ $2 =~ ^[0-9]+\.[0-9]+$ ]]; then
- temperature=$2
- printf "%-10s: %s\n" "--temp" "use $2 for isostat -T argument"
- else
- printf "%-10s: %s\n" "--temp" "invalid --temp argument, this will be ignored"
- fi
- shift
- ;;
- --clean) # auto clean
- if [[ $2 =~ ^[0-2]$ ]]; then
- clean=$2
- case $clean in
- 0)
- printf "%-10s: %s\n" "--clean" "keep all files"
- ;;
- 1)
- printf "%-10s: %s\n" "--clean" "keep only the final result of each step"
- ;;
- 2)
- printf "%-10s: %s\n" "--clean" "delete all intermediate files"
- ;;
- esac
- else
- printf "%-10s: %s\n" "--clean" "invalid --clean argument, this will be ignored"
- fi
- shift
- ;;
- --)
- shift
- break
- ;;
- --help)
- printHelp
- exit 0
- ;;
- -h)
- printHelp
- exit 0
- ;;
- -1)
- skip1=1
- printf "%-10s: %s\n" "-1" "skip step 1 (GFN0-xTB optimization)"
- ;;
- -2)
- skip2=1
- printf "%-10s: %s\n" "-2" "skip step 2 (GFN2-xTB optimization)"
- ;;
- -3)
- skip3=1
- printf "%-10s: %s\n" "-3" "skip step 3 (molclus)"
- ;;
- *)
- echo "$1 is not an option"
- ;;
- esac
- shift
- done
复制代码接下来根据不同的选项情况,检查文件是否都已提供
- # check if all files provided
- printf "\n%s\n%45s\n%s\n" "$divideLine" "Check files" "$divideLine"
- allFilesProvided=1
- if [ -f "$1" ]; then
- inputFile="$1"
- printf "%-30s: found %s\n" "Input file" "$1"
- else
- printHelp
- exit 1
- fi
- if [ $inputTraj -ne 1 ]; then
- # if md settings file not exists, input and save to md.inp or specified file
- if [ ! -e "$mdFile" ]; then
- check=N
- until [[ "$check" == [Yy] ]]; do
- echo "Please provide the following parameters:"
- read -r -p "Temperature (K): " temp
- read -r -p "Total simulation time (ps): " time
- read -r -p "Dump frequency (fs): " dump
- read -r -p "Step size (fs): " step
- cat >"$mdFile" <<-EOF
- \$md
- temp=$temp
- time=$time
- dump=$dump
- step=$step
- hmass=1
- shake=1
- \$end
- EOF
- printf "%s\n" "$divideLine"
- read -r -p "Is ok? (Y/N) " check
- done
- printf "%s\n" "$divideLine"
- fi
- printf "%-30s: found %s\n" "MD settings" "$mdFile"
- fi
- if [ $skip3 -ne 1 ]; then
- # check if settings.ini is provided
- printf "%-30s: " "Molclus settings"
- if [ -f "settings.ini" ]; then
- printf "found settings.ini\n"
- else
- printf "missing\n"
- allFilesProvided=0
- fi
- # check if template.gjf is provided
- printf "%-30s: " "Molclus template"
- if [ -f "template.gjf" ]; then
- printf "found template.gjf\n"
- if [ -e template2.gjf ]; then
- printf "%-30s: found template2.gjf" "Molclus template2"
- fi
- else
- printf "missing\n"
- allFilesProvided=0
- fi
- # check if template_SP.inp or template_SP.gjf is provided
- printf "%-30s: " "Molclus template_SP"
- if [ -f "template_SP.inp" ]; then
- printf "found template_SP.inp\n"
- spTemplate=template_SP.inp
- elif [ -f "template_SP.gjf" ]; then
- printf "found template_SP.gjf\n"
- spTemplate=template_SP.gjf
- else
- printf "missing\n"
- spTemplate=""
- check=N
- read -r -p "Warning: Single point energy is not calculated. Continue? (Y/N) " check
- if [[ "$check" != [Yy] ]]; then
- allFilesProvided=0
- fi
- fi
- fi
- if [ $allFilesProvided -eq 0 ]; then
- printf "\nMissing file(s). Aborted.\n"
- exit 1
- else
- echo "All files provided. Submit to slurm."
- fi
复制代码提交部分,通过here document给sbatch提供运行脚本,一直到最后的EOF。此处设置了任务资源以及输出重定向到consearch.log和xtb需要的export。 - # submit to slurm
- sbatch <<EOF
- #!/bin/bash
- #SBATCH -J "consearch"
- #SBATCH -n $cpu
- #SBATCH --mem $mem
- # save to consearch.log
- exec 1>consearch.log
- export OMP_NUM_THREADS=$cpu # CPU cores for xtb
- export MKL_NUM_THREADS=$cpu # CPU cores for xtb
- export OMP_STACKSIZE=$omp_stacksize # memory for xtb
复制代码检查命令是否都可用,如果否,中止并报错。
- # check commands are available
- printf "\n%s\n%47s\n%s\n" "$divideLine" "Check Commands" "$divideLine"
- commandsUnavailable=0
- commands=("$xtb" "$crest" "$molclus" "$isostat")
- for cmd in "\${commands[@]}"; do
- if ! command -v "\$cmd" &>/dev/null; then
- printf "Check %-15s: Unavailable\n" "\$cmd"
- commandsUnavailable=$((commandsUnavailable + 1))
- else
- printf "Check %-15s: OK\n" "\$cmd"
- fi
- done
- if [ \$commandsUnavailable -gt 0 ]; then
- printf "\n%s\n" "$divideLine"
- printf "Aborted because %d command(s) are not available, please check if environment variables have been added\n" \$commandsUnavailable
- exit 1
- else
- printf "All commands are available. Begin conformation search.\n"
- fi
复制代码如果没有使用--traj选项,则调用xtb跑动力学,否则直接使用输入的轨迹文件 - # if trajectory is not provided, generate it by xtb
- printf "\n%s\n%50s\n%s\n" "$divideLine" "Trajectory generation" "$divideLine"
- if [ $inputTraj -ne 1 ]; then
- mkdir -p 00
- cp -f $mdFile 00/
- cp -f \$inputFile 00/
- cd 00
- $xtb \$inputFile --input "$mdFile" --omd --gfn 0 -P $cpu
- wait
- cd ..
- cp -f 00/xtb.trj 00result.xyz
- inputFile=00result.xyz
- echo "Final result is 00result.xyz"
- else
- echo "Trajectory file profided."
- inputFile=\$inputFile
- fi
复制代码在没有-1选项时,通过crest跑GFN0-xtb优化 - # step 1: use crest to optimize under GFN0-xTB
- if [ $skip1 -ne 1 ];then
- printf "\n%s\n%50s\n%s\n" "$divideLine" "GFN0-xTB Optimization" "$divideLine"
- mkdir -p 01
- cp -f \$inputFile 01/traj.xyz
- cd 01
- $crest -mdopt traj.xyz -chrg $chrg -uhf $uhf -gfn 0 -opt normal -T $cpu
- wait
- $isostat crest_ensemble.xyz -Edis $Edis -Gdis $Gdis -nt $cpu -T $temperature
- wait
- cd ..
- cp -f 01/cluster.xyz 01result.xyz
- inputFile=01result.xyz
- copy -f 01result.xyz final.xyz
- echo "Final result is 01result.xyz"
- fi
复制代码在没有-2选项时,通过crest跑GFN2-xtb优化 - # step 2: use crest to optimize under GFN2-xTB
- if [ $skip2 -ne 1 ];then
- printf "\n%s\n%50s\n%s\n" "$divideLine" "GFN2-xTB Optimization" "$divideLine"
- mkdir -p 02
- cp -f \$inputFile 02/traj.xyz
- cd 02
- if [ -n $solvent ];then
- $crest -mdopt traj.xyz -chrg $chrg -uhf $uhf -gfn 2 -opt normal --gbsa $solvent -T $cpu
- else
- $crest -mdopt traj.xyz -chrg $chrg -uhf $uhf -gfn 2 -opt normal -T $cpu
复制代码在没有-3选项时,调用molclus - # step 3 use molclus with Gaussian to optimize and calculate energy
- if [ $skip3 -ne 1 ];then
- printf "\n%s\n%43s\n%s\n" "$divideLine" "Molclus" "$divideLine"
- mkdir -p 03
- cp -f \$inputFile 03/03traj.xyz
- cp -f settings.ini 03/
- cp -f template.gjf 03/
- if [ -e template2.gjf ];then
- cp -f template2.gjf 03/
- fi
- if [ -n $spTemplate ];then
- cp -f $spTemplate 03/
- fi
- cd 03
- $molclus settings.ini 03traj.xyz
- wait
- $isostat isomers.xyz -Nout $Nout -Eout $Eout -Edis $Edis -Gdis $Gdis -nt $cpu -T $temperature
- wait
- cd ..
- cp -f 03/cluster.xyz final.xyz
- echo "Final result is final.xyz"
- fi
复制代码最后根据--clean选项的参数自动清理。最后一行EOF结束here document,完成sbatch命令。
- # auto clean
- folders=("00" "01" "02" "03")
- files=("00result.xyz" "01result.xyz" "02result.xyz")
- if [ $clean -ge 1 ]; then
- # delete intermediate file
- for folder in "\${folders[@]}"; do
- if [ -d "\$folder" ]; then
- rm -rf "\$folder"
- fi
- done
- fi
- if [ $clean -ge 2 ]; then
- # also delete results of each step
- for file in "\${files[@]}"; do
- if [ -f "\$file" ]; then
- rm "\$file"
- fi
- done
- fi
- EOF
复制代码
|