用AI辅助写简单分析小程序应用笔记

student0618 · 发表于 Post on 2025-7-16 16:36:17

本帖最后由 student0618 于 2026-3-23 00:48 编辑

0. 前言
近日需要写点python code作些比较冷门的MD分析，但个人只懂bash，python是能看不能写的程度。以前要写python 脚本我是写好pseudo code后逐行代码Google的，如今有AI无论是debug 还是文档也可以省不少时间。当然，以后有时间的话还是打算多学些基础的，就是太忙要先把工作做好。

跟同学闲聊时见都不少人也是想找AI帮忙写code，但怎样也用不了。在此分享一下试错试出来比较有用的prompt结构，希望可以帮助大家更有效地利用工具。生成脚本后也要调整/debug的，但一开始的prompt有完整逻辑可更有效率，有问题更容易排查解决。

2026年3月23日补充：提供Coding agent 相关模板实例，并根据最新趋势更新评论。

1. 软件
Chat interface:
(outdated, agents the current trend)
结构相近的Prompt在copilot、chatgpt、gemini等试用过，目前较常用社长推荐的grok3 (免费版think mode)，感觉他比较完整的抓到我所有的需求，尤其是当要做的分析比较複杂时。没有用专门写code的AI、没付费试grok4

Coding agent:
Coding agent LLM相关杂记可以参考我放灌水楼的帖 (计算化学方向简单任务试用coding agent 杂记 http://bbs.keinsci.com/thread-58190-1-1.html )

2. Prompt 结构

段落1：
清楚、详细地描述所有的需求及目的

段落2：
(chat interface) code架构
(coding agent) 完成任务的建议步骤

附件/段落3:
(Chat interface) 附件：含Library版本的conda environment yml、要implement公式的参考文献(如有，可以的话把公式的latex/mathml code和protocol部份先抓出来更好)
(Coding agent) 段落3: test files 路径及解释

3. 模板例子
3.1. Chat interface

**Aim and general**
Write a python script to perform trajectory analysis using {选用的python library，如 mdanalysis to load the trajectory, numpy to do numerical calculation, scipy for clustering}. {根据经验要给的指示，如 Only load the trajectory once. Improve the code efficiency by minimizing the use of for-loops. Clean up any unnecessary repeating actions. Optimize the analysis on a large dataset.} {个人偏好的输出文件格式，如 For the data files to save, use spaces instead of comma or tab as separater.} {计算用的单位，如Use kJ/mol as the free energy unit, nanometer as distance unit, degree as angle unit.}
{确保脚本与python environment的Library相容} Refer to the attached conda environment for library versions, ensure compatibility of the script with the environment.
If there is anything uncertain, ask me first before you proceed.
**What the script should do**
1. Estimate memory usage of the run using {用来测memory usage的library}.
2. get the filename of {必要的输入文件，例如 topology (e.g. topology.pdb)), trajectory (e.g. trajectory.xtc), and weight (e.g. weights.xvg)}; also {必要的参数如residue ID to be analyzed, temperature in K}, and {optional的输入文件和参数} from command line option. {个人喜好所有预设参数放一起置顶方便修改}All the default variables should be defined at the beginning of the script.
3. {如何读取输入文件，例如定义receptor ligand、xvg文件要用第几个column}
4. {缺少文件的处理方法，如预设值或warning；预处理如align trajectory on protein C-alpha}
5. {要做的分析，如calculate CV using the equation in the attached reference，输出哪些文件，如绘图跟data file}
6. {绘图用的参数，如字体大小、颜色、labels、error bars}
7. {Optional的分析和输出}
Finally,
8. {输出Log文件内容} Log all steps to analysis.log, report the memory usage of the run after all the tasks have finished.
9. {报错写入Log方便排查} Catch all the errors and log in analysis.log.

复制代码

3.2 Coding agent
注:

直接写用哪些文件作测试让它跑就好。
安全起见最好不要直接用agent 分析production 大轨迹以免memory leak或者误删，指令权限也要设好。就算是最top的LLM也可能不小心删掉或覆盖文件的。
以下prompt使用 opencode 和 openrouter 的平价模型进行了测试, 总计 19M tokens to 完成脚本, debug, optimize plot style等。
debugger, planner, researcher etc 是 GitHub 上的第三方agent

# Aim
Write a python script to analyze trajectory and gmx_MMPBSA results of [my system]. The following functions should be included:
0. load topology and trajectory, select only atoms in trajectory from topology to load successfully, align frames on receptor backbone
1. rmsd: [what to calculate rmsd for], output raw csv data and png plots (overlap both lines on same plot)
2. [analysis 2] output csv and png plot.
3. [analysis 3] output 2 plots, csv and png plot.
4. [analysis 4] slightly adjusted mmpbsa plots of the same styleas in gmx_mmpbsa_ana (e.g. energy, decomposition...) but without the need of gui
# Workflow
This is complicated multi-stage work. split into research stage, plan stage, execute stage, check stage. Spawn subagents for that
1. research on how to do aims 0-4 + combine. write research context to file
2. plan: read research context, write multiple small plans, check plans
3. execute: execute plans to complete the script
4. check: test, debug the complete script.
# Explanation of test files
This directory has the proccessed trajectory and topology for mmpbsa, and results of mmpbsa calculation.
## system topology and trajectory
[topology name] - full topology of the solvated system
[trajectory name] stripped trj with only receptor and ligand
[top itp and ndx]
## scripts for trj processing and running gmx_MMPBSA
[scripts and input]
## Results of mmgbsa
[mmpbsa outputs]
# Behavior
* Do not use the rm command, do not edit files listed above
* be profesional, clear, concise. be critical to your own work when checking.
* generalize so the code work on [type of system]
- take cmd flags, and define default variables at the top of the script
- do not hardcode paths
- no double quotes around variables that are paths
- the code should be efficient and consistent. Replace for loops with vectorization for speed. use only libraries in the current environment.

复制代码

4. 结语

上文分享了个人用AI辅助写分析小程序的笔记，如果大家有好用的技巧也欢迎留言分享讨论，如有问题也请不吝指正。

neocc · 发表于 Post on 2025-8-2 02:20:12

各位大佬最近有尝试让claude code/gemini cli构建md的结构吗，已经准备好了各种分子的pdb和itp文件，告诉cc用mdanalysis读取pdb文件的关键信息，packmol进行建模，结果一塌糊涂，是prompt不合适吗

student0618 · 发表于 Post on 2025-8-2 03:59:39

本帖最后由 student0618 于 2025-8-3 02:12 编辑

我自己用AI的前提是让他作一个任务是可以提高效率。一些较specific的tasks、或者是不用一分钟就写完的脚本还是自己弄更快，让AI帮忙还要整理好逻辑写prompt，反而更慢。

建模跑模拟目前还是手动更快更好，尤其是複杂的体系还是手动较安全。而且就算是普适有html manual的软件如gmx，AI也很常搞混指令跟版本等，还不如直接看手册ctrl-F搜关键词。很多时候这些tasks用 AI反而降低了工作效率。
论坛上其他相关评论可参考 http://bbs.keinsci.com/thread-54071-1-1.html 和 http://bbs.keinsci.com/thread-52382-1-1.html

目前比较放心用AI而且感觉真显着提高工作效率的tasks有例如

让他写python脚本用numpy scipy 等库分析、matplotlib seaborn作图等。这些都有海量公开资料作training data的。较专业的任务如写mdp还是不太行。
给它一个分析MD软件A模拟数据的脚本，让他改成分析MD软件B模拟数据的脚本。附件要提供AB软件模拟数据格式或样本。
给AI一个自己写的 Pseudocode 或者计划大纲、研究方向，让他提问还缺什么。很多时候能帮忙找到可改善之处，启发新思路。
给它指定的目的，讨论代码框架、不同方向不同写法的利弊。
聊聊研究思路、整合各种资讯、Literature review初筛。
跟它吐槽诉苦，提供情绪价值，避免内耗。这要找对AI用对mode，不然有些AI比人类更虚伪、或者只是教科书式让你document everything/重复几句样版式鼓励句子。我自己有时想AI模仿一位如今很少联系的老师给点鼓励，写了至少一千字背景设定、语气、例句、不同情况应有反应等。

student0618 · 发表于 Post on 2025-8-2 14:11:56

本帖最后由 student0618 于 2025-8-2 14:13 编辑

neocc 发表于 2025-8-2 02:20
各位大佬最近有尝试让claude code/gemini cli构建md的结构吗，已经准备好了各种分子的pdb和itp文件，告诉cc ...

而且真要试的话，建模过程的步骤、不同if else设定等，很可能都要完整写出来才能用，要写的字可能更多。

还不如自己写，先用一个体系试错。能用后，要重复给相似体系建模的话写个模板input 每次修改、或者用bash脚本自动化(e.g. sed 改文件名、variable 定义放多少分子等)。

Graphite · 发表于 Post on 2025-8-26 21:08:52

别的不说，这手prompt写的，当个产品经理都够了

虽然人类程序员会嫌你烦

student0618 · 发表于 Post on 2025-8-26 22:28:37

本帖最后由 student0618 于 2025-8-27 03:30 编辑

Graphite 发表于 2025-8-26 21:08
别的不说，这手prompt写的，当个产品经理都够了虽然人类程序员会嫌你烦

不少是试错试出来的 (没错就是那句only load the trajectory once，感谢当时copilot 给的代码，加一个分析读一遍轨迹......)

最近在计划拿一位老师读博开始写的几百个python脚本转成python3 后用来训练gpt-oss-20b，试试看能不能更自动化，提升效率。

student0618 · 发表于 Post on 2026-1-22 10:07:36

近来发现 opencode 很好用，免费开源，下载开箱即跑，不必用麻烦的 chat interface了。
保留这帖仅作记录，现在除了部分逻辑外没实际用途了。

Uus/pMeC6H4-/キ · 发表于 Post on 2026-1-22 13:00:58

student0618 发表于 2026-1-22 10:07
近来发现 opencode 很好用，免费开源，下载开箱即跑，不必用麻烦的 chat interface了。
保留这帖仅作记录 ...

至少在提示词工程上这帖还是挺有参考意义的。

说到ai编程的交互界面（不是ai模型本身），已经有VS Code的话加装一个Cline插件也不错，不用另外装软件，在合适的环境下带权限运行可以直接把思考与计划、写代码、配依赖、试运行和排故障的全流程都在VS Code展示出来。

方方方 · 发表于 Post on 2026-1-22 13:53:53

Uus/pMeC6H4-/キ发表于 2026-1-22 13:00
至少在提示词工程上这帖还是挺有参考意义的。

说到ai编程的交互界面（不是ai模型本身），已经有VS Cod ...

推荐一手Claude code+GLM，真的很舒服

student0618 · 发表于 Post on 2026-1-22 21:32:32

本帖最后由 student0618 于 2026-1-23 02:53 编辑

Uus/pMeC6H4-/キ发表于 2026-1-22 13:00
至少在提示词工程上这帖还是挺有参考意义的。

说到ai编程的交互界面（不是ai模型本身），已经有VS Cod ...

组内也有不少用vscode，可惜自己习惯vim直接写很难适应换IDE。主要是用opencode cli/tui就是为了在手机termux遥距跑它。

有现成codebase 的话 /init 自动分析生成 project 的AGENTS.md ，稍稍修一下可取代原帖大部分内容了。

当然要用Agent要舒服也是要花点时间，不过重点多改为什么时候log什么时候不用、某LLM不要每个回应都用Perfect开首令人烦躁，以及token没注意配额用太快等事了。

student0618 · 发表于 Post on 2026-3-14 00:14:22

本帖最后由 student0618 于 2026-3-23 00:59 编辑

请管理员删除

student0618 · 发表于 Post on 2026-3-23 00:35:15

2026年3月23补充：提供Coding agent 相关模板实例，并根据最新趋势更新评论。

		自动登录 Automatic login	找回密码 Forget password
密码 Password			注册 Register

[程序/脚本开发] 用AI辅助写简单分析小程序应用笔记

评分 Rate

评分 Rate