计算化学公社

标题: pgi fortran on gpu。运行调用不了GPU。求助！ [打印本页]

作者
Author: didi_dudu 时间: 2017-5-16 14:55
标题: pgi fortran on gpu。运行调用不了GPU。求助！
本帖最后由 didi_dudu 于 2017-5-16 15:35 编辑

最近想尝试下GPU加速，于是折腾着安装好了CUDA和PGI。然后尝试编译PIG FORTRAN的例子。编译可以顺利通过，然后运行总是failed。自己测试发现应该是increment这个子程序根本没有被调用。求助下各位大神谁知道原因可能出在什么地方么？-------------------------------------------------------
程序

http://blog.csdn.net/slow_jiulong/article/details/53105223

---------------------------------------------------------------------
pgaccelinfo显示的信息
CUDA Driver Version:          8000
NVRM version:                NVIDIA UNIX x86_64 Kernel Module  375.26  Thu Dec  8 18:36:43 PST 2016

Device Number:                0
Device Name:                GeForce GTX 1080
Device Revision Number:       6.1
Global Memory Size:          8507555840
Number of Multiprocessors:    20
Concurrent Copy and Execution: Yes
Total Constant Memory:       65536
Total Shared Memory per Block: 49152
Registers per Block:          65536
Warp Size:                   32
Maximum Threads per Block:    1024
Maximum Block Dimensions:    1024, 1024, 64
Maximum Grid Dimensions:    2147483647 x 65535 x 65535
Maximum Memory Pitch:       2147483647B
Texture Alignment:          512B
Clock Rate:                   1733 MHz
Execution Timeout:          No
Integrated Device:          No
Can Map Host Memory:          Yes
Compute Mode:                default
Concurrent Kernels:          Yes
ECC Enabled:                No
Memory Clock Rate:          5005 MHz
Memory Bus Width:             256 bits
L2 Cache Size:                2097152 bytes
Max Threads Per SMP:          2048
Async Engines:                2
Unified Addressing:          Yes
Managed Memory:             Yes
PGI Compiler Option:          -ta=tesla:cc60

Device Number:                1
Device Name:                GeForce GT 730
Device Revision Number:       3.5
Global Memory Size:          1028128768
Number of Multiprocessors:    2
Number of SP Cores:          384
Number of DP Cores:          128
Concurrent Copy and Execution: Yes
Total Constant Memory:       65536
Total Shared Memory per Block: 49152
Registers per Block:          65536
Warp Size:                   32
Maximum Threads per Block:    1024
Maximum Block Dimensions:    1024, 1024, 64
Maximum Grid Dimensions:    2147483647 x 65535 x 65535
Maximum Memory Pitch:       2147483647B
Texture Alignment:          512B
Clock Rate:                   901 MHz
Execution Timeout:          No
Integrated Device:          No
Can Map Host Memory:          Yes
Compute Mode:                default
Concurrent Kernels:          Yes
ECC Enabled:                No
Memory Clock Rate:          900 MHz
Memory Bus Width:             64 bits
L2 Cache Size:                524288 bytes
Max Threads Per SMP:          2048
Async Engines:                1
Unified Addressing:          Yes
Managed Memory:             Yes
PGI Compiler Option:          -ta=tesla:cc35

作者
Author: didi_dudu 时间: 2017-5-16 15:34
额然后发现编译为pgf90 -Mcuda -ta=tesla:cc60 test.f90 就能运行了。编译文件就在pgaccelinfo末端有显示。。。果然还是应该细心多看才对~~~~

欢迎光临计算化学公社 (http://bbs.keinsci.com/)