|
正好最近在学MPI,基于mpi4py写了个手动并行版:
- #! /usr/bin/env python
- import numpy as np
- from mpi4py import MPI
- # Initialize MPI environment
- comm = MPI.COMM_WORLD
- rank = comm.Get_rank()
- size = comm.Get_size()
- # Generate a random matrix
- dim = 1000
- if rank == 0:
- mat_a = np.random.random((dim, dim))
- mat_b = np.random.random((dim, dim))
- else:
- mat_a = np.empty((dim, dim), dtype='float')
- mat_b = np.empty((dim, dim), dtype='float')
- comm.Bcast(mat_a, root=0)
- comm.Bcast(mat_b, root=0)
- # Distribute jobs among processes
- if rank == 0:
- jobs = [[i for i in range(dim) if i % size == k] for k in range(size)]
- else:
- jobs = None
- jobs = comm.bcast(jobs, root=0)
- # Multiply the matrices
- mat_work = np.zeros((dim, dim), dtype='float')
- rows = jobs[rank]
- for i in rows:
- np.matmul(mat_a[i,:], mat_b, out=mat_work[i,:])
- # Collect the results
- if rank == 0:
- prod = np.zeros((dim, dim), dtype='float')
- else:
- prod = None
- comm.Reduce(mat_work, prod, root=0)
- #if rank == 0:
- # print(mat_a)
- # print(mat_b)
- # print(prod)
复制代码
服务器都排满了任务,放虚拟机里试了一下(i3-4160)。
1000*1000矩阵相乘,用两个进程花两秒多一点,再大的就测不动了。 |
|