感谢各位路过的老师和同行大佬,本人想用deepmd训练InGaAs合金体系的势函数,这是我现在的步骤:
一、首先用VASP通过AIMD准备了216 原子的InAs,GaAs 和In0.5Ga0.5As三个材料的300K数据集;
二、接着做了一组pre-training model,一共四个,用的是4-fold cross-validation的方式切换不同顺序的training/validation训练的(后面打算用这四个models自己搓同步学习,类似于dpgen的流程,但是我想控制每次标注都留点独立数据最后test用);
三、然后我想在探索更多合金组分之前,先用这些model验证一下是否能准确预测InAs和GaAs的声子谱,
我一共对比了四个情况:
1.自己之前用VASP弛豫之后算的声子谱,图例:DFT;
2.基于这个DFT弛豫的结构又用model 通过lammps做了结构优化,进而计算的声子谱,图例:DFT_relax
3.下载Materials Project结构用model只优化原子位置,图例:MP_atomrelax
4.下载Materials Project结构用model优化体积和位置,图例:MP_fullrelax
![]()
目前的问题:
1.GaAs在model优化体积和位置的情况下直接是大范围虚频,检查发现优化的晶格常数和DFT相比偏小(~3.83 Å vs DFT 3.99 Å)
2.其他情况和DFT计算的数据相差的也相对较大
如图所示:
个人猜想:
针对上面的问题,我有几个分析,但是我没有精力盲目耗费计算资源去逐个排查,希望各位有经验的能给我一些见解。
1.初始数据集数据量不足:我目前的AIMD给每个材料216atoms跑了2000 frames,然后每个材料的frames打乱5等分,其中一份作为单独隔离的test,剩下四份就轮换顺序作3:1的training和validation;
2.训练模型不能这样强行把三个物质凑一起:我这个地方有点困惑,三个体系的能量确实在测试过程中发现是三团,有一些差异,但我觉得训练复杂的general的model不应该受这样的学习顺序限制,因此倾向于觉得这个all in乱炖的方法应该不影响;
3.我在model优化体积和位置的情况下出现了虚频,说明GaAs的stress可能没有学好。我在训练模型的时候把virial关掉了,是否是需要学习virial才行?
(相关设置如下: "loss": {
"start_pref_e": 0.02,
"limit_pref_e": 0.1,
"start_pref_f": 1000,
"limit_pref_f": 100,
"start_pref_v": 0, "limit_pref_v": 0,)
4. 我的AIMD设置有问题,这个我其实不专业,不清楚是否这个数据集产生就出现了问题,也或许是我直接把温度设为300K,也就没有低温的信息导致对声子谱的预测出现了偏差。
INCAR:
Global Parameters
ISTART = 1 (Read existing wavefunction, if there)
ICHARG = 0 (Non-self-consistent: GGA/LDA band structures)
LREAL = Auto (Projection operators: automatic)
ENCUT = 450 (Cut-off energy for plane wave basis set, in eV)
PREC = Normal (Precision level: Normal or Accurate, set Accurate when perform structure lattice relaxation calculation)
LWAVE = .FALSE. (Write WAVECAR or not)
LCHARG = .FALSE. (Write CHGCAR or not)
ADDGRID= .TRUE. (Increase grid, helps GGA convergence)
NWRITE = 2 (Medium-level output)
Electronic Relaxation
ISMEAR = 0
SIGMA = 0.05
EDIFF = 1E-08
Molecular Dynamics
IBRION = 0 (Activate MD)
NSW = 2000 (Max ionic steps)
EDIFFG = -1E-03 (Ionic convergence, eV/A)
POTIM = 0.5 (Timestep in fs)
SMASS = 0 (MD Algorithm: -3-microcanonical ensemble, 0-canonical ensemble)
TEBEG = 300 (Start temperature K)
TEEND = 300 (Final temperature K)
MDALGO = 3 (Andersen Thermostat)
ISYM = 0 (Switch symmetry off)
ISIF = 2
LANGEVIN_GAMMA = 50 50 50
LANGEVIN_GAMMA_L = 50
PMASS = 20
PSTRESS = 0
KPOINTS都是111
5.我的DeePMD训练设置参数有问题,这个我放一下大家看看吧,我个人没发现问题
{
"_comment": "that's all",
"model": {
"type_map": [
"In",
"Ga",
"As"
],
"descriptor": {
"type": "se_atten_v2",
"sel": "auto:1.2",
"rcut_smth": 4.5,
"rcut": 9.0,
"neuron": [
25,
50,
100
],
"resnet_dt": false,
"axis_neuron": 16,
"attn_layer": 0,
"seed": 1,
"_comment": " that's all"
},
"fitting_net": {
"neuron": [
240,
240,
240
],
"resnet_dt": true,
"seed": 1,
"_comment": " that's all"
},
"_comment": " that's all"
},
"learning_rate": {
"type": "exp",
"decay_steps": 2000,
"start_lr": 0.001,
"stop_lr": 3.51e-08,
"_comment": "that's all"
},
"loss": {
"start_pref_e": 0.02,
"limit_pref_e": 0.1,
"start_pref_f": 1000,
"limit_pref_f": 100,
"start_pref_v": 0,
"limit_pref_v": 0,
"_comment": " that's all"
},
"training": {
"stop_batch": 300000,
"seed": 1,
"_comment": "that's all",
"disp_file": "lcurve4.out",
"disp_freq": 100,
"numb_test": 10,
"save_freq": 1000,
"save_ckpt": "model.ckpt",
"disp_training": true,
"time_training": true,
"profiling": false,
"profiling_file": "timeline.json",
"training_data": {
"systems": [
"../deepmd_data/InAs_300K/2",
"../deepmd_data/InAs_300K/3",
"../deepmd_data/InAs_300K/4",
"../deepmd_data/GaAs_300K/2",
"../deepmd_data/GaAs_300K/3",
"../deepmd_data/GaAs_300K/4",
"../deepmd_data/In0.5Ga0.5As_300K/2",
"../deepmd_data/In0.5Ga0.5As_300K/3",
"../deepmd_data/In0.5Ga0.5As_300K/4"
],
"batch_size": "auto"
},
"validation_data": {
"systems": [
"../deepmd_data/InAs_300K/1",
"../deepmd_data/GaAs_300K/1",
"../deepmd_data/In0.5Ga0.5As_300K/1"
],
"batch_size": "auto"
}
}
}
顺便放一下loss curve
感谢各位的宝贵时间,本人的能力可能有限,自学的一些东西有点零碎,有低级错误的话还请多多指正!!
|