|
本帖最后由 MilkTeaLegend 于 2020-5-28 13:55 编辑
上次看到现在QE的GPU加速介绍效果显著。这次特意尝试
# 环境
- Centos 8 i9 9900K RTX 2080
- PGI 19.10 编译器 cuda 10.1
- q-e-gpu-qe-gpu-6.5a2.tar.gz
要用GPU加速的话必须用PGI编译器
编译过程当然有一点波折要点是要注意PGI编译与相应cuda版本的适配。
# 测试结果
利用GPU后速度的提升是显著的
测试的简单的晶体硅的输入文件。特别调高了Ecut和Kpoints。
- &control
- calculation = 'relax',
- prefix = 'Si_exc2',
- verbosity = 'high'
- outdir = './Sitmp/'
- pseudo_dir = ' ../qePotential/'
- /
- &system
- ibrav = 2,
- celldm(1) = 10.348,
- nat = 2,
- ntyp = 1,
- ecutwfc = 200,
- /
- &electrons
- mixing_beta = 0.7
- /
- &IONS
- ion_dynamics= 'bfgs'
- /
- ATOMIC_SPECIES
- Si 28.086 Si.pbe-rrkj.UPF
- ATOMIC_POSITIONS (alat)
- Si 0.0 0.0 0.0
- Si 0.25 0.25 0.25
- K_POINTS (automatic)
- 24 24 24 0 0 0
复制代码 纯CPU的耗时:i9-9900K 8核 Openmpi并行
- init_run : 120.08s CPU 138.31s WALL ( 1 calls)
- electrons : 1234.34s CPU 1236.01s WALL ( 1 calls)
- forces : 33.85s CPU 31.33s WALL ( 1 calls)
- Called by init_run:
- wfcinit : 119.89s CPU 138.11s WALL ( 1 calls)
- wfcinit:atom : 0.11s CPU 0.08s WALL ( 413 calls)
- wfcinit:wfcr : 119.30s CPU 137.69s WALL ( 413 calls)
- potinit : 0.03s CPU 0.04s WALL ( 1 calls)
- hinit0 : 0.01s CPU 0.01s WALL ( 1 calls)
- Called by electrons:
- c_bands : 1224.15s CPU 1225.82s WALL ( 6 calls)
- sum_band : 9.98s CPU 9.95s WALL ( 6 calls)
- v_of_rho : 0.17s CPU 0.18s WALL ( 6 calls)
- v_h : 0.01s CPU 0.01s WALL ( 6 calls)
- v_xc : 0.16s CPU 0.17s WALL ( 6 calls)
- mix_rho : 0.02s CPU 0.02s WALL ( 6 calls)
- Called by c_bands:
- init_us_2 : 1.47s CPU 1.28s WALL ( 5782 calls)
- cegterg : 1089.74s CPU 1126.66s WALL ( 2478 calls)
- Called by sum_band:
- Called by *egterg:
- h_psi : 475.77s CPU 430.73s WALL ( 7653 calls)
- g_psi : 0.58s CPU 0.46s WALL ( 4762 calls)
- cdiaghg : 15.91s CPU 16.89s WALL ( 6827 calls)
- cegterg:over : 541.66s CPU 647.80s WALL ( 4762 calls)
- cegterg:upda : 4.09s CPU 6.08s WALL ( 4762 calls)
- cegterg:last : 7.65s CPU 10.62s WALL ( 2489 calls)
- Called by h_psi:
- h_psi:calbec : 168.44s CPU 153.30s WALL ( 7653 calls)
- vloc_psi : 292.57s CPU 229.73s WALL ( 7653 calls)
- add_vuspsi : 14.29s CPU 47.29s WALL ( 7653 calls)
- General routines
- calbec : 201.88s CPU 184.18s WALL ( 9305 calls)
- fft : 0.15s CPU 0.14s WALL ( 62 calls)
- ffts : 0.00s CPU 0.00s WALL ( 6 calls)
- fftw : 285.14s CPU 224.69s WALL ( 68524 calls)
-
- Parallel routines
-
- PWSCF : 23m 8.68s CPU 23m26.22s WALL
-
- This run was terminated on: 13:55:40 27May2020
复制代码 GPU加速版本的耗时: CPU单进程OMP并行 RTX2080 加速
- Writing output data file ./Sitmp/Si_exc2.save/
-
- init_run : 19.37s CPU 19.43s WALL ( 1 calls)
- electrons : 22.95s CPU 27.86s WALL ( 1 calls)
- forces : 0.41s CPU 0.47s WALL ( 1 calls)
- Called by init_run:
- wfcinit : 17.93s CPU 18.19s WALL ( 1 calls)
- 18.19s GPU ( 1 calls)
- wfcinit:atom : 12.71s CPU 13.54s WALL ( 413 calls)
- 13.54s GPU ( 413 calls)
- wfcinit:wfcr : 4.06s CPU 3.75s WALL ( 413 calls)
- 3.75s GPU ( 413 calls)
- potinit : 0.70s CPU 0.49s WALL ( 1 calls)
- hinit0 : 0.30s CPU 0.31s WALL ( 1 calls)
- Called by electrons:
- c_bands : 18.69s CPU 23.42s WALL ( 6 calls)
- sum_band : 2.61s CPU 3.15s WALL ( 6 calls)
- 3.15s GPU ( 6 calls)
- v_of_rho : 2.14s CPU 1.54s WALL ( 6 calls)
- v_h : 0.29s CPU 0.21s WALL ( 6 calls)
- v_xc : 1.85s CPU 1.33s WALL ( 6 calls)
- mix_rho : 0.04s CPU 0.05s WALL ( 6 calls)
- Called by c_bands:
- init_us_2 : 0.31s CPU 0.62s WALL ( 5782 calls)
- 0.60s GPU ( 5782 calls)
- cegterg : 17.78s CPU 22.13s WALL ( 2478 calls)
- Called by sum_band:
- sum_band:wei : 0.00s CPU 0.00s WALL ( 6 calls)
- 0.00s GPU ( 6 calls)
- sum_band:loo : 2.52s CPU 3.06s WALL ( 6 calls)
- 3.06s GPU ( 6 calls)
- sum_band:buf : 0.44s CPU 0.53s WALL ( 2478 calls)
- 0.22s GPU ( 2478 calls)
- sum_band:ini : 0.22s CPU 0.28s WALL ( 2478 calls)
- 0.27s GPU ( 2478 calls)
- Called by *egterg:
- cdiaghg : 1.88s CPU 2.25s WALL ( 6807 calls)
- 2.23s GPU ( 6807 calls)
- cegterg:over : 0.70s CPU 0.85s WALL ( 4742 calls)
- cegterg:upda : 0.79s CPU 0.95s WALL ( 4742 calls)
- cegterg:last : 0.37s CPU 0.44s WALL ( 2488 calls)
- h_psi : 15.51s CPU 18.32s WALL ( 7633 calls)
- 18.25s GPU ( 7633 calls)
- g_psi : 0.01s CPU 0.10s WALL ( 4742 calls)
- 0.09s GPU ( 4742 calls)
- Called by h_psi:
- h_psi:calbec : 0.22s CPU 0.95s WALL ( 7633 calls)
- 0.93s GPU ( 7633 calls)
- vloc_psi : 13.18s CPU 15.54s WALL ( 7633 calls)
- 15.46s GPU ( 7633 calls)
- add_vuspsi : 0.10s CPU 1.67s WALL ( 7633 calls)
- 1.62s GPU ( 7633 calls)
- General routines
- calbec : 0.25s CPU 0.30s WALL ( 9285 calls)
- fft : 0.83s CPU 0.71s WALL ( 62 calls)
- 0.00s GPU ( 2 calls)
- ffts : 0.02s CPU 0.02s WALL ( 6 calls)
- fftw : 0.54s CPU 17.50s WALL ( 17744 calls)
- 15.73s GPU ( 17744 calls)
-
- Parallel routines
-
- PWSCF : 44.89s CPU 50.07s WALL
-
- This run was terminated on: 13:28:57 27May2020
- =------------------------------------------------------------------------------=
- JOB DONE.
- =------------------------------------------------------------------------------=
复制代码 # 能量结果的对比
GPU : E=-15.7419293645 Ry 与 CPU : E = -15.7419293674
# 某二维材料的测试
输入文件
- &control
- calculation = 'scf',
- verbosity = 'high'
- prefix = '2D'
- outdir = './tmp/'
- pseudo_dir = '/home/kan/opt/qePotential'
- /
- &system
- a = 1.64969e+01
- b = 1.64809e+01
- c = 15
- cosab = -5.00000e-01
- degauss = 1.00000e-02
- ecutrho = 250
- ecutwfc = 25
- ibrav = 12
- nat = 30
- ntyp = 2
- occupations = 'smearing',
- smearing = 'gauss',
- degauss = 0.01
- /
- &electrons
- mixing_beta = 0.7
- diagonalization= 'cg'
- /
- &IONS
- ion_dynamics = 'bfgs'
- /
- ATOMIC_SPECIES
- C 12.01070 C.pbe-rrkjus.UPF
- H 1.00794 H.pbe-rrkjus.UPF
- ATOMIC_POSITIONS {angstrom}
- C 3.759785 7.611155 14.310000
- C 3.760673 9.047720 14.311500
- C 2.543351 9.754656 14.318500
- C 2.540098 11.162104 14.319500
- C 4.977963 9.754085 14.308000
- C 4.981391 11.161391 14.309500
- C 1.295809 11.880316 14.324000
- C 3.760766 11.862617 14.315500
- C 6.225577 11.879459 14.306000
- C -6.953221 13.788314 14.301000
- C -4.488410 13.804585 14.308000
- C 2.531580 0.233219 14.300500
- C 2.534926 1.640667 14.296500
- C -2.023371 13.787029 14.320000
- C 4.972708 0.232220 14.309500
- C 4.969455 1.639668 14.307000
- C 3.752380 2.346747 14.301000
- C 3.753598 3.783312 14.299500
- C 0.234245 12.489482 14.323000
- C 3.757615 6.387113 14.301500
- C 8.482504 13.178148 14.300000
- C 3.755934 5.007354 14.298500
- C -0.961807 13.177863 14.321500
- C 7.286753 12.489625 14.300000
- H 1.602888 9.212001 14.325000
- H 5.918147 9.210859 14.304000
- H 3.761210 12.948498 14.315500
- H -4.488936 12.718847 14.312000
- H 1.594577 2.183608 14.290000
- H 5.909835 2.182466 14.310500
- K_POINTS {automatic}
- 6 6 6 1 1 1
复制代码 系统共有30个原子。晶格参数为16 15 15 90 90 120
# 结果
CPU
- init_run : 401.09s CPU 353.32s WALL ( 1 calls)
- electrons : 8844.01s CPU 7169.17s WALL ( 1 calls)
- Called by init_run:
- wfcinit : 388.52s CPU 340.92s WALL ( 1 calls)
- potinit : 5.47s CPU 5.41s WALL ( 1 calls)
- hinit0 : 1.37s CPU 1.40s WALL ( 1 calls)
- Called by electrons:
- c_bands : 7654.70s CPU 6098.04s WALL ( 10 calls)
- sum_band : 1147.25s CPU 1032.24s WALL ( 10 calls)
- v_of_rho : 21.04s CPU 20.15s WALL ( 11 calls)
- newd : 16.67s CPU 14.98s WALL ( 11 calls)
- mix_rho : 4.30s CPU 4.16s WALL ( 10 calls)
- Called by c_bands:
- init_us_2 : 138.32s CPU 83.76s WALL ( 2268 calls)
- cegterg : 7498.05s CPU 5989.51s WALL ( 1080 calls)
- Called by sum_band:
- sum_band:bec : 0.28s CPU 0.15s WALL ( 1080 calls)
- addusdens : 21.31s CPU 17.90s WALL ( 10 calls)
- Called by *egterg:
- h_psi : 5827.95s CPU 5188.31s WALL ( 6834 calls)
- s_psi : 513.20s CPU 260.90s WALL ( 6834 calls)
- g_psi : 76.59s CPU 49.46s WALL ( 5646 calls)
- cdiaghg : 124.75s CPU 77.94s WALL ( 6726 calls)
- Called by h_psi:
- h_psi:calbec : 535.22s CPU 273.96s WALL ( 6834 calls)
- vloc_psi : 4704.55s CPU 4583.96s WALL ( 6834 calls)
- add_vuspsi : 530.43s CPU 279.99s WALL ( 6834 calls)
- General routines
- calbec : 672.30s CPU 344.31s WALL ( 7914 calls)
- fft : 22.85s CPU 20.19s WALL ( 140 calls)
- ffts : 0.86s CPU 0.67s WALL ( 21 calls)
- fftw : 4332.56s CPU 4224.93s WALL ( 430228 calls)
- interpolate : 2.67s CPU 2.11s WALL ( 11 calls)
-
- Parallel routines
-
- PWSCF : 2h34m CPU 2h 6m WALL
-
- This run was terminated on: 23:57:34 26May2020
- =------------------------------------------------------------------------------=
- JOB DONE.
- =------------------------------------------------------------------------------=
复制代码 GPU
- init_run : 69.06s CPU 70.96s WALL ( 1 calls)
- electrons : 869.82s CPU 1094.89s WALL ( 1 calls)
- Called by init_run:
- wfcinit : 57.72s CPU 62.73s WALL ( 1 calls)
- 62.73s GPU ( 1 calls)
- wfcinit:atom : 17.87s CPU 17.65s WALL ( 108 calls)
- 17.65s GPU ( 108 calls)
- wfcinit:wfcr : 25.30s CPU 32.09s WALL ( 108 calls)
- 32.09s GPU ( 108 calls)
- potinit : 8.01s CPU 5.03s WALL ( 1 calls)
- hinit0 : 1.89s CPU 1.75s WALL ( 1 calls)
- Called by electrons:
- c_bands : 788.03s CPU 998.58s WALL ( 10 calls)
- sum_band : 60.12s CPU 76.17s WALL ( 10 calls)
- 76.17s GPU ( 10 calls)
- v_of_rho : 17.67s CPU 16.32s WALL ( 11 calls)
- v_h : 3.17s CPU 3.03s WALL ( 11 calls)
- v_xc : 14.51s CPU 13.29s WALL ( 11 calls)
- newd : 3.26s CPU 2.97s WALL ( 11 calls)
- 2.97s GPU ( 11 calls)
- mix_rho : 1.85s CPU 1.87s WALL ( 10 calls)
- Called by c_bands:
- init_us_2 : 1.58s CPU 3.53s WALL ( 2268 calls)
- 3.52s GPU ( 2268 calls)
- ccgdiagg : 621.69s CPU 786.45s WALL ( 1080 calls)
- wfcrot : 171.90s CPU 222.11s WALL ( 1080 calls)
- 222.10s GPU ( 1080 calls)
- Called by sum_band:
- sum_band:wei : 0.00s CPU 0.00s WALL ( 10 calls)
- 0.00s GPU ( 10 calls)
- sum_band:loo : 58.52s CPU 74.38s WALL ( 10 calls)
- 74.38s GPU ( 10 calls)
- sum_band:buf : 8.23s CPU 9.17s WALL ( 1080 calls)
- 9.17s GPU ( 1080 calls)
- sum_band:ini : 1.46s CPU 1.69s WALL ( 1080 calls)
- 1.69s GPU ( 1080 calls)
- sum_band:cal : 0.01s CPU 20.07s WALL ( 1080 calls)
- 18.99s GPU ( 1080 calls)
- sum_band:bec : 0.48s CPU 0.60s WALL ( 1080 calls)
- 0.60s GPU ( 1080 calls)
- addusdens : 0.59s CPU 0.77s WALL ( 10 calls)
- 0.77s GPU ( 10 calls)
- addusd:skk : 0.00s CPU 0.59s WALL ( 20 calls)
- 0.04s GPU ( 20 calls)
- Called by *cgdiagg:
- h_psi : 471.10s CPU 602.42s WALL ( 175710 calls)
- 594.50s GPU ( 175710 calls)
- s_psi : 80.77s CPU 288.43s WALL ( 350340 calls)
- 209.39s GPU ( 350340 calls)
- Called by h_psi:
- h_psi:calbec : 1.28s CPU 109.14s WALL ( 175710 calls)
- 108.60s GPU ( 175710 calls)
- vloc_psi : 277.59s CPU 360.35s WALL ( 175710 calls)
- 348.22s GPU ( 175710 calls)
- add_vuspsi : 16.65s CPU 129.33s WALL ( 175710 calls)
- 128.79s GPU ( 175710 calls)
- hs_1psi : 433.05s CPU 550.15s WALL ( 174630 calls)
- 542.08s GPU ( 174630 calls)
- s_1psi : 136.31s CPU 172.91s WALL ( 174630 calls)
- 171.61s GPU ( 174630 calls)
- General routines
- calbec : 1.95s CPU 2.56s WALL ( 351420 calls)
- fft : 11.07s CPU 11.02s WALL ( 140 calls)
- 0.43s GPU ( 11 calls)
- ffts : 0.25s CPU 0.25s WALL ( 21 calls)
- fftw : 2.30s CPU 398.12s WALL ( 362868 calls)
- 341.16s GPU ( 362868 calls)
- interpolate : 1.01s CPU 1.01s WALL ( 11 calls)
-
- Parallel routines
-
- PWSCF : 15m51.38s CPU 19m39.30s WALL
-
- This run was terminated on: 13:44:33 28May2020
复制代码
可见对于现在的PW,K点的计算。高性能的GPU加速是大有裨益的。
|
评分 Rate
-
查看全部评分 View all ratings
|