Gromacs在Protein-Ligand的MD中遇到：1）gro和top文件原子数不符；2）残基CG原子缺失

SolaireKnight · 发表于 Post on 2025-7-9 18:20:45

本帖最后由 SolaireKnight 于 2025-7-9 18:20 编辑

**需求：**
本人需要分别做三个蛋白质和一种小分子 Ginsenoside Re 的 MD。三个蛋白质分别是
- Human Serum Albumin (PDB: 1AO6)，PDB 是二聚体，但是常规是单链；
- TNF-alpha (PDB: 1TNF)，PDB 是三聚体，常规也是三聚体；
- AKT1 (PDB: 3O96)，PDB 是单链，常规也是单链。

用 Autodock 将三个蛋白分别与 Ginsenoside Re 小分子做了 molecular docking，得到 .pdbqt 格式的复合体文件，用于 Gromacs 运行 MD。

**处理过程：**

1. 用 PyMOL 将 PDB 的其中一条链删去，去水加氢，运行 Autodock。得到 pdbqt 文件，使用 `grep -E '^HETATM' HSA-Re.pdbqt > ligand.pdb` 提取出配体，然后使用 `grep -E '^ATOM|^HETATM.*RES A' HSA-Re.pdbqt > HSA.pdb` 提取出蛋白。在蛋白 pdb 末尾添加上 TER 和 END。使用 `gmx pdb2gmx -f HSA.pdb -o processed_HSA.gro -water tip3p -ff amber99sb-ildn -ignh` 得到 top 文件和 gro 文件。
2. ligand 的 pdb 下载下来后转成 mol2 格式，用 Sobtop 得到 .itp, .top, .gro 文件。力场使用 GAFF。
3. 使用 `gmx insert-molecules -f processed_HSA.gro -ci ligand.gro -o complex.gro -nmol 1` 构建复合物，top 会自动更新，complex.gro 也会自动更新。
4. 使用 `gmx editconf -f complex.gro -o box.gro -bt cubic -d 1.0`构建盒子，`gmx solvate -cp box.gro -cs spc216.gro -p topol.top -o solv.gro` 溶剂化，`gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr` 预处理，`gmx genion -s ions.tpr -neutral -p topol.top -o ions.gro` 离子化。后续过程略。第一个蛋白按这样的流程处理是顺利的，一路跑到了最后。

但是在处理第二、第三个蛋白的时候，按如上类似流程处理，却遇到如下问题：

**遇到问题：**
1. 在处理第三个蛋白，一个单链蛋白的时候，运行 `gmx grompp` 离子化前预处理的时候，报错：

Fatal error:
number of coordinates in coordinate file (solv.gro, 155634)
         does not match topology (topol.top, 162778)

2. 在处理第二个蛋白，一个三聚体蛋白的时候，运行 `gmx pdb2gmx` 生成 top 和 gro 文件时，报错：

Fatal error:
Residue 1 named ARG of a molecule in the input file was mapped
to an entry in the topology database, but the atom CG used in
that entry is not found in the input file. Perhaps your atom
and/or residue naming needs to be fixed.

在 pdb2gmx 末尾加上 -missing 的 option 忽略错误，继续运行 `gmx insert-molecules` 来自动更新拓扑和 gro 时，报错：

WARNING: Masses and atomic (Van der Waals) radii will be guessed
      based on residue and atom names, since they could not be
      definitively assigned from the information in your input
      files. These guessed numbers might deviate from the mass
      and radius of the atom type. Please check the output
      files if necessary. Note, that this functionality may
      be removed in a future GROMACS version. Please, consider
      using another file format for your input.
[...]

Using random seed 2080276028
Try 1Segmentation fault

**附件说明：**
对于问题1（第二个蛋白），上传了
- TNF2.pdb 蛋白的 PDB
- ligand_2.gro
- processed_TNF_2.gro
对于问题2（第三个蛋白），上传了
- complex_3.gro
- topol_3.top
- solv_3.gro

附件命名末尾的“_2” “_3” 字样为本人区分问题而加。服务器上文件命名并没有带这个后缀。

感谢解答

student0618 · 发表于 Post on 2025-7-9 19:02:54

首先，不理解为何docking完又insert molecule，这情况应手动合并gro及修改top。

1. 先检查蛋白pdb是否有缺non-terminal残基和原子(文本编辑器Ctrl-F找missing)，缺的话要先补回。方法很多，论坛搜一搜。
2. 问题和1一样，缺原子先补回再跑

SolaireKnight · 发表于 Post on 2025-7-9 19:27:12

student0618 发表于 2025-7-9 19:02
首先，不理解为何docking完又insert molecule，这情况应手动合并gro及修改top。

1. 先检查蛋白pdb是否有 ...

抱歉，我对 MD 认识还比较浅，所以我的回复可能会有知识性的错误。
1. docking 是为了得到结合能较低、同时又有氢键的构象，然后再做 MD，似乎需要把 pdbqt 拆开，分别用 FF 参数化，然后再合并。直接对一整个复合体用 pdb2gmx 似乎会有问题，小分子部分 FF 不好处理；
2. 检查过 pdb 文件，是没有 MISSING 字符的，文件也上传在附件了，如果方便的话可以二次验证。至于缺原子的问题，我尝试用 PyMOL 和 Chimera 补回，一直不得要领，我会继续搜索相关的方法，如果你有比较推荐的帖子，也很希望得到你的分享。

student0618 · 发表于 Post on 2025-7-9 20:28:15

1. pdb2gmx是只处理蛋白，分开处理是对的。不过合并时应手动处理，而非用insert molecule. gmx insert-molecule 是随机放分子进去，而不是放回docking的结果。
2. 抱歉我手机看不了附件，但是pdb 找 1TNF 和 3O96 都有缺原子/残基的。简单的可以用例如pdbfixer补完，其他选择有例如只缺原子可以用pdb2pqr或pras服务器，缺几个残基可以用modeller (e.g. via chimera interface model loop)

tptp · 发表于 Post on 2025-7-10 19:05:08

问题1：processed_TNF_2.gro中有7110个原子，ligand_2.gro有34个原子，加一块是7144个原子。和你的报错信息Fatal error:
number of coordinates in coordinate file (solv.gro, 155634)
does not match topology (topol.top, 162778)中的162778-155634=7144是一样的，这说明你的top文件中应该是少引用了你的蛋白和配体，正常引入就可以了。
问题2：安装pdbfixer程序，在gmx pdb2gmx之前使用pdbfixer对你的蛋白修复就行，很简单的操作，一条命令就行

		自动登录 Automatic login	找回密码 Forget password
密码 Password			注册 Register

[GROMACS] Gromacs在Protein-Ligand的MD中遇到：1）gro和top文件原子数不符；2）残基CG原子缺失

浏览过的版块