计算化学公社

 找回密码 Forget password
 注册 Register
Views: 351|回复 Reply: 4
打印 Print 上一主题 Last thread 下一主题 Next thread

[使用经验] 请问现在有什么指标可以评测CPU和GPU的搭配瓶颈?

[复制链接 Copy URL]

102

帖子

2

威望

759

eV
积分
901

Level 4 (黑子)

618快到了,近日看了不少配置贴。电源,内存,主板等等已经是老生常谈的问题,骂骂奸商也就过去了。但是近年来GPU加速计算变得越来越普及,越来越多的科学计算软件开始支持GPU计算。那么一个新的问题就开始频频出现在讨论里:处理器和显卡该如何搭配?

我朋友他对比过CUDA和MKL,发现CUDA精度比MKL高,cuDSS比PARDISO也更容易收敛。因此我想GPU加速已经早已超过了房间里的大象阶段。现阶段任何配置都应当开始考虑GPU可能给工作带来的提速了。从我个人的经验角度,错误的搭配简直是灾难级别的表现。我的笔记本在我购买它的时候,只考虑CPU的表现很不错。但是在考虑CUDA和cuDSS加速后,其CPU性能就远远追不上GPU。以至于计算期间GPU大量时间是没跑满的。

那么现阶段,我们有什么指标可以参考,来判定一个配置的瓶颈出现在哪里嘛?对于游戏玩家,在显示器分辨率一定的情况下是可以计算出瓶颈在哪里的。但是对于科学计算这种无头工作来说,我们有什么并行的指标,可以评测瓶颈所在吗?我认为这和CPU频率,核数以及GPU频率和处理速度(Ops/Sec)有关,但是我不清楚:具体应该如何计算?老黄发布的FPS数据又应该怎么看待?

希望有大佬可以给我解答一下疑惑。也欢迎讨论和指出我想当然的地方!
Superiora de inferioribus, inferiora de superioribus, prodigiorum operatio ex uno, quemadmodum omnia ex uno eodemque ducunt originem, una eademque consilii administratione.

6万

帖子

99

威望

6万

eV
积分
127649

管理员

公社社长

2#
发表于 Post on 2026-6-4 06:35:07 | 只看该作者 Only view this author
不同领域、程序对CPU和GPU搭配的要求往往极为不同,不同领域里怎么搭配合适早就有定数、早就有大量用户的测试结果和经验讨论,基本上花几个小时看看相关领域的硬件配置方面的资料就能了解个大概、不至于踩坑,比如分子动力学的搭配看本论坛、跑本地LLM看抡锤者论坛等等。我认为并不太需要弄一个所谓的统一的标准。非要说的话,可以姑且用nvidia-smi或nvtop查看GPU利用率/功耗,是判断GPU利用充分程度的方式之一,虽然也不能全面说明问题。

CUDA精度比MKL高”这种说法不靠谱,至少一个靠谱的说法也不会表达成这么粗糙和含糊的形式。首先消费级GPU上普遍都用FP16(以及BF16)或者更低精度跑CUDA的GPU加速,而科学计算方面MKL主要用于CPU端用双精度跑,光是这一点说CUDA精度更高就不成立。而且MKL数学库里支持的子程序/函数多了去了,CUDA框架下的具体什么数学库的什么函数和它比也完全没提。MKL是那么稳健、经典、不断发展、维护精良的数学库,倘若真有什么精度方面的问题,一般早就会被修正。
北京科音自然科学研究中心http://www.keinsci.com)致力于计算化学的发展和传播,长期开办极高质量的各种计算化学类培训:初级量子化学培训班中级量子化学培训班高级量子化学培训班量子化学波函数分析与Multiwfn程序培训班分子动力学与GROMACS培训班CP2K第一性原理计算培训班,内容介绍以及往届资料购买请点击相应链接查看。这些培训是计算化学从零快速入门以及进一步全面系统性提升研究水平的高速路!培训各种常见问题见《北京科音办的培训班FAQ》
欢迎加入北京科音微信公众号获取北京科音培训的最新消息,并避免错过网上有价值的计算化学文章!
欢迎加入人气极高、专业性特别强的理论与计算化学综合交流群思想家公社QQ群(群号见此链接),合计达一万多人。北京科音培训班的学员在群中可申请VIP头衔,提问将得到群主Sobereva的最优先解答。
思想家公社的门口Blog:http://sobereva.com(发布大量原创计算化学相关博文)
Multiwfn主页:http://sobereva.com/multiwfn(十分强大、极为流行的量子化学波函数分析程序)
Google Scholar:https://scholar.google.com/citations?user=tiKE0qkAAAAJ
ResearchGate:https://www.researchgate.net/profile/Tian_Lu

102

帖子

2

威望

759

eV
积分
901

Level 4 (黑子)

3#
 楼主 Author| 发表于 Post on 2026-6-4 10:30:10 | 只看该作者 Only view this author
sobereva 发表于 2026-6-4 06:35
不同领域、程序对CPU和GPU搭配的要求往往极为不同,不同领域里怎么搭配合适早就有定数、早就有大量用户的测 ...

求解精度方面我确实省略了不少。主要是因为这是朋友的研究,我不清楚这个问题的披露情况,不好说太多。他的意思是在都是双精度单元的情况下,计算AB+C,mkl发生两次舍入,cuda由于硬件优化只发生一次。因此精度更高。他也做了一些相关测试,我从结果上看,cuDSS收敛确实强一点。
Superiora de inferioribus, inferiora de superioribus, prodigiorum operatio ex uno, quemadmodum omnia ex uno eodemque ducunt originem, una eademque consilii administratione.

47

帖子

3

威望

617

eV
积分
724

Level 4 (黑子)

4#
发表于 Post on 2026-6-6 08:22:28 | 只看该作者 Only view this author
The interplay between GPU and CPU performance for GPU accelerated scientific computing workflows is extremely variable from task to task, and very difficult to predict accurately in advance.

My advice is to always benchmark and test your workflows on multiple configurations before investing (usually a lot of) money into hardware to run it. This is very straightforward and inexpensive these days: you can find instances for almost any type of GPU (with a range of different CPU/platform configurations too) on these now very popular GPU rental platforms.

The premise of this thread is correct: lots of new GPU accelerated scientific workflows are being introduced, as well as domain-specific ML/data pipelines, and very little is known about their performance on different hardware configurations (which are constantly changing too). I have always found interesting surprises when benchmarking my workflows on different GPU instances, and this due diligence has saved me thousands of dollars specifying my own hardware stack.

Because GPUs vary enormously in price, you can be making a very expensive mistake if the additional performance becomes bottlenecked by other factors (CPU performance, memory banwidth/latency, PCIe bus). Even within GPU specifications, sometimes much cheaper cards can perform just as well as more expensive ones, because of GPU-specific performance demands differ between workloads.

If more helpful discussions surface, then the knowledge base online will certainly grow, but for now, it is imperative to test your own GPU accelerated workflows to understand how they interact with different hardware configurations.

102

帖子

2

威望

759

eV
积分
901

Level 4 (黑子)

5#
 楼主 Author| 发表于 Post on 2026-6-7 19:17:43 | 只看该作者 Only view this author
David_R 发表于 2026-6-6 08:22
The interplay between GPU and CPU performance for GPU accelerated scientific computing workflows is  ...

So testing it myself is still the only method. My idea was: as this interplay is playing a more and more important role nowadays, can we propose some benchmark and use it to indicate the performance on scientific job?

I mean, testing and renting GPUs before actually buying them is still the only method for regular people. But this is time-consuming and resource-wasting. The number of possible CPU-GPU pairings can be huge, and testing them one by one takes a lot of time and money. I think the interplay is actually determined by several performance parameters, so can we propose an equation or relationship to help predict the performance?

But I gotta say renting is a very inspiring idea. I was thinking about I can't really buy and return those hardwares if it works not well. But renting solved this problem so thanks a lot!
Superiora de inferioribus, inferiora de superioribus, prodigiorum operatio ex uno, quemadmodum omnia ex uno eodemque ducunt originem, una eademque consilii administratione.

本版积分规则 Credits rule

手机版 Mobile version|北京科音自然科学研究中心 Beijing Kein Research Center for Natural Sciences|京公网安备 11010502035419号|计算化学公社 — 北京科音旗下高水平计算化学交流论坛 ( 京ICP备14038949号-1 )|网站地图

GMT+8, 2026-6-24 07:24 , Processed in 0.265797 second(s), 21 queries , Gzip On.

快速回复 返回顶部 返回列表 Return to list