# pdcs-hw1 **Repository Path**: doki-doki/pdcs-hw1 ## Basic Information - **Project Name**: pdcs-hw1 - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-11-19 - **Last Updated**: 2023-11-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 作业1 20337218 刘述赜 ## 电脑配置 ![Alt text](image.png) ## 源代码 ``` import time time_start = time.time() # 记录开始时间 from mpi4py import MPI import numpy as np comm = MPI.COMM_WORLD size = comm.Get_size() rank = comm.Get_rank() # 矩阵大小 N = 4096 # 确定每个进程需要计算的行数 local_n = N // size remain = N % size local_n += 1 if rank < remain else 0 # 初始化本地子矩阵 local_A = np.zeros((local_n, N)) local_B = np.zeros((N, N)) local_C = np.zeros((local_n, N)) # 进程0初始化矩阵A和B if rank == 0: time_start = time.time() # 记录开始时间 print("当前进程数: %d" % size) A = np.random.rand(N, N) B = np.random.rand(N, N) else: A = None B = None # 广播矩阵A和B A = comm.bcast(A, root=0) B = comm.bcast(B, root=0) # 将矩阵B广播给所有进程 comm.Bcast(B, root=0) # 每个进程计算本地子矩阵 for i in range(local_n): for j in range(N): local_A[i][j] = A[rank * local_n + i][j] for i in range(N): for j in range(N): local_B[i][j] = B[i][j] # 等待所有进程计算完成 comm.Barrier() # 进行矩阵乘法 for i in range(local_n): for j in range(N): local_C[i][j] = np.sum(local_A[i, :] * local_B[:, j]) # 收集所有进程的结果并汇总到进程0 C = None if rank == 0: C = np.empty((N, N), dtype='float64') comm.Gather(local_C, C, root=0) if rank == 0: print("Result C:") print(C) time_end = time.time() # 记录结束时间 time_sum = time_end - time_start # 计算的时间差为程序的执行时间,单位为秒/s print("time:" + str(time_sum)) ``` ## 运行命令 ![Alt text](image-1.png) ## 指标图表 ![](Figure_1.png) ![](Figure_2.png) ![](Figure_3.png) ![](Figure_4.png) ![](Figure_5.png) ## 说明分析 在上述图表中,横轴代表线程数,纵轴代表运行时间。 可以看出,在矩阵较小时,8进程速度最快。但矩阵增大时,16进程变成了最快的。 ## 实验结论 通过上述实验可以看出,随着矩阵的增大,最优的进程数也在增加。过多和过少的进程数都会降低程序性能。