![]() |
LetsQu1t 发表于 2025-8-2 17:52 Thank you for your comment! These are good points to discuss. When running the comparisons with and without SMT enabled, the utilisation of all logical cores (twice the number of physical cores) was achieved by running multiple simultaneous ORCA calculations, parallelised with 4 nprocs each, so the Linux scheduler handles the workload without ORCA throwing any errors. My investigations took me down a path of extracting more performance from hardware by committing more simultaneous calculations, each parallelised with fewer CPU cores each, especially, as you noted, individual calculations don't tend to parallelise efficiently beyond 16-32 cores (regardless of whether SMT is enabled). My two-fold hypothesis on the relevance of this strategy is as follows: 1. Continued advancements in the efficiency of scheduling with SMT and Linux have ironed out a lot issues with commiting tasks according to logical cores rather than physical cores. Still, in line with what Tian Lu notes in his blogpost, the performance increase (if there is one) is fairly marginal. 2. CPU core clocks have increased considerably in these high-core-count chips, and they boost reliably under all-core workloads relevant to computational chemistry. My suspicion is that because the CPUs operate at higher frequencies, this exposes bottlenecks in parallelisation overhead, meaning that optimal parallisation efficiency occurs at lower CPU cores specified per calculation. This is also underwritten by some of my more qualitative observations using older-generation AMD EPYC 7002 and 7003 hardware, as well as the newer EPYC Turin 9005 chips. Lastly, other bottlenecks associated with committing the large number of tasks according to the number of logical cores are mitigated here as these systems have adequate RAM, and powerful cooling/power delivery. Nonetheless, these might be valid factors for others to take into consideration. |
本帖最后由 LetsQu1t 于 2025-8-2 09:58 编辑 Hi David, Many thanks for sharing this work. I just want to have a few quick comments on hyperthreads and parallelisation efficiency. I'm curious that, when SMT was disabled, calling more CPUs than the number of physical cores still resulted in a normal termination of your calculation. Based on my knowledge, this should lead to an instant abortion of the ORCA job as the system would never be able to locate the extra cores (those exceeding the # of physical cores which equals to # of logical threads). Meanwhile, your conclusion that the maximum efficiency was achieved by enabling SMT and specifying all threads in the input file. This, however, was widely regarded as a poor practice. Tian Lu has posted a full thread about this (see http://sobereva.com/392). When the # of cores requested > # of physical cores (w/ HT/SMT), the performance plunged without exception, as you can tell from his numbers. In addition, disabling SMT could result in performance improvement if it boosts all-core turbo clock-speed (common for the old Intel chips, and also in the case that your motherboard's power delivery is not robust). The fact that you started to observe pronounced efficiency loss when requesting > 24 cores was also mind-blowing. A 58-atom close-shell system with RI-DFT and 2-zeta basis set consist a reasonably sized, not particularly small job, and I was expecting a turning point at a number > 24. Sincerely |
含光君 发表于 2025-5-8 16:21 It is my pleasure! To make charts, I use various combinations of Python, Origin Pro, MS Excel, and Adobe Illustrator. Most of the 'custom' modifications are done in Adobe Illustrator as it is powerful vector graphics software. I would expect that any open source alternative for vector graphics software should also work well. |
参与人数Participants 1 | eV +5 | 收起 理由Reason |
---|---|---|
| + 5 | Thanks |
Thanks for sharing! Very inspiring and helpful work! The charts in this report look quite nice, could you please share how did you plot them? Using python or other applications? |
手机版 Mobile version|北京科音自然科学研究中心 Beijing Kein Research Center for Natural Sciences|京公网安备 11010502035419号|计算化学公社 — 北京科音旗下高水平计算化学交流论坛 ( 京ICP备14038949号-1 )|网站地图
GMT+8, 2025-8-12 05:04 , Processed in 0.369594 second(s), 27 queries , Gzip On.