|
|
本帖最后由 student0618 于 2026-3-23 00:48 编辑
0. 前言
近日需要写点python code作些比较冷门的MD分析,但个人只懂bash,python是能看不能写的程度。以前要写python 脚本我是写好pseudo code后逐行代码Google的,如今有AI无论是debug 还是文档也可以省不少时间。当然,以后有时间的话还是打算多学些基础的,就是太忙要先把工作做好。
跟同学闲聊时见都不少人也是想找AI帮忙写code,但怎样也用不了。在此分享一下试错试出来比较有用的prompt结构,希望可以帮助大家更有效地利用工具。生成脚本后也要调整/debug的,但一开始的prompt有完整逻辑可更有效率,有问题更容易排查解决。
2026年3月23日补充:提供Coding agent 相关模板实例,并根据最新趋势更新评论。
1. 软件
Chat interface:
(outdated, agents the current trend)
结构相近的Prompt在copilot、chatgpt、gemini等试用过,目前较常用社长推荐的grok3 (免费版think mode),感觉他比较完整的抓到我所有的需求,尤其是当要做的分析比较複杂时。没有用专门写code的AI、没付费试grok4
Coding agent:
Coding agent LLM相关杂记可以参考我放灌水楼的帖 (计算化学方向简单任务试用coding agent 杂记 http://bbs.keinsci.com/thread-58190-1-1.html )
2. Prompt 结构
段落1:
清楚、详细地描述所有的需求及目的
段落2:
(chat interface) code架构
(coding agent) 完成任务的建议步骤
附件/段落3:
(Chat interface) 附件:含Library版本的conda environment yml、要implement公式的参考文献(如有,可以的话把公式的latex/mathml code和protocol部份先抓出来更好)
(Coding agent) 段落3: test files 路径及解释
3. 模板例子
3.1. Chat interface
- **Aim and general**
- Write a python script to perform trajectory analysis using {选用的python library,如 mdanalysis to load the trajectory, numpy to do numerical calculation, scipy for clustering}. {根据经验要给的指示,如 Only load the trajectory once. Improve the code efficiency by minimizing the use of for-loops. Clean up any unnecessary repeating actions. Optimize the analysis on a large dataset.} {个人偏好的输出文件格式,如 For the data files to save, use spaces instead of comma or tab as separater.} {计算用的单位, 如Use kJ/mol as the free energy unit, nanometer as distance unit, degree as angle unit.}
- {确保脚本与python environment的Library相容} Refer to the attached conda environment for library versions, ensure compatibility of the script with the environment.
- If there is anything uncertain, ask me first before you proceed.
- **What the script should do**
- 1. Estimate memory usage of the run using {用来测memory usage的library}.
- 2. get the filename of {必要的输入文件,例如 topology (e.g. topology.pdb)), trajectory (e.g. trajectory.xtc), and weight (e.g. weights.xvg)}; also {必要的参数如residue ID to be analyzed, temperature in K}, and {optional的输入文件和参数} from command line option. {个人喜好所有预设参数放一起置顶方便修改}All the default variables should be defined at the beginning of the script.
- 3. {如何读取输入文件,例如定义receptor ligand、xvg文件要用第几个column}
- 4. {缺少文件的处理方法,如预设值或warning;预处理如align trajectory on protein C-alpha}
- 5. {要做的分析,如calculate CV using the equation in the attached reference,输出哪些文件,如绘图跟data file}
- 6. {绘图用的参数,如字体大小、颜色、labels、error bars}
- 7. {Optional的分析和输出}
- Finally,
- 8. {输出Log文件内容} Log all steps to analysis.log, report the memory usage of the run after all the tasks have finished.
- 9. {报错写入Log方便排查} Catch all the errors and log in analysis.log.
复制代码
3.2 Coding agent
注:
- 直接写用哪些文件作测试让它跑就好。
- 安全起见最好不要直接用agent 分析production 大轨迹以免memory leak或者误删,指令权限也要设好。就算是最top的LLM也可能不小心删掉或覆盖文件的。
- 以下prompt使用 opencode 和 openrouter 的平价模型进行了测试, 总计 19M tokens to 完成脚本, debug, optimize plot style等。
- debugger, planner, researcher etc 是 GitHub 上的第三方agent
- # Aim
- Write a python script to analyze trajectory and gmx_MMPBSA results of [my system]. The following functions should be included:
- 0. load topology and trajectory, select only atoms in trajectory from topology to load successfully, align frames on receptor backbone
- 1. rmsd: [what to calculate rmsd for], output raw csv data and png plots (overlap both lines on same plot)
- 2. [analysis 2] output csv and png plot.
- 3. [analysis 3] output 2 plots, csv and png plot.
- 4. [analysis 4] slightly adjusted mmpbsa plots of the same styleas in gmx_mmpbsa_ana (e.g. energy, decomposition...) but without the need of gui
- # Workflow
- This is complicated multi-stage work. split into research stage, plan stage, execute stage, check stage. Spawn subagents for that
- 1. research on how to do aims 0-4 + combine. write research context to file
- 2. plan: read research context, write multiple small plans, check plans
- 3. execute: execute plans to complete the script
- 4. check: test, debug the complete script.
- # Explanation of test files
- This directory has the proccessed trajectory and topology for mmpbsa, and results of mmpbsa calculation.
- ## system topology and trajectory
- [topology name] - full topology of the solvated system
- [trajectory name] stripped trj with only receptor and ligand
- [top itp and ndx]
- ## scripts for trj processing and running gmx_MMPBSA
- [scripts and input]
- ## Results of mmgbsa
- [mmpbsa outputs]
- # Behavior
- * Do not use the rm command, do not edit files listed above
- * be profesional, clear, concise. be critical to your own work when checking.
- * generalize so the code work on [type of system]
- - take cmd flags, and define default variables at the top of the script
- - do not hardcode paths
- - no double quotes around variables that are paths
- - the code should be efficient and consistent. Replace for loops with vectorization for speed. use only libraries in the current environment.
复制代码
4. 结语
上文分享了个人用AI辅助写分析小程序的笔记,如果大家有好用的技巧也欢迎留言分享讨论,如有问题也请不吝指正。
|
评分 Rate
-
查看全部评分 View all ratings
|