diff --git a/.gitignore b/.gitignore index f5f4ccba45b462f91932967f32d6e62396b20439..0bdf2b68aad5b0ab2f7b6da79ed5e620ef5396c2 100644 --- a/.gitignore +++ b/.gitignore @@ -3,4 +3,7 @@ kperf.data.* env.sh hostfile .vscode -test.* \ No newline at end of file +test.* +porting* +HPC-info* +tmp \ No newline at end of file diff --git a/README.en.md b/README.en.md deleted file mode 100644 index 562604bd3c5379032163046529af39e46e706d8d..0000000000000000000000000000000000000000 --- a/README.en.md +++ /dev/null @@ -1,36 +0,0 @@ -# hpcrunner - -#### Description -openEuler High Performance Computing(HPC) Runner, provides universal portal for hpc users and developers. - -#### Software Architecture -Software architecture description - -#### Installation - -1. xxxx -2. xxxx -3. xxxx - -#### Instructions - -1. xxxx -2. xxxx -3. xxxx - -#### Contribution - -1. Fork the repository -2. Create Feat_xxx branch -3. Commit your code -4. Create Pull Request - - -#### Gitee Feature - -1. You can use Readme\_XXX.md to support different languages, such as Readme\_en.md, Readme\_zh.md -2. Gitee blog [blog.gitee.com](https://blog.gitee.com) -3. Explore open source project [https://gitee.com/explore](https://gitee.com/explore) -4. The most valuable open source project [GVP](https://gitee.com/gvp) -5. The manual of Gitee [https://gitee.com/help](https://gitee.com/help) -6. The most popular members [https://gitee.com/gitee-stars/](https://gitee.com/gitee-stars/) diff --git a/README.md b/README.md index 239053b5b3028f1c99fafc7723a5d542ab98a938..2cb97f9facd3572b7fc794f90b9eec92068b063e 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,40 @@ -# HPCRunner : 贾维斯辅助系统 -### 项目背景 +# HPCRunner : 贾维斯智能助手 +## ***给每个HPC应用一个温暖的家*** -因为HPC应用的特殊性,其环境配置、编译、运行、CPU/GPU性能采集分析的门槛比较高,导致迁移和调优的工作量大,不同的人在不同的机器上跑同样的软件和算例基本上是重头开始,费时费力,而且很多情况下需要同时部署ARM/X86两套环境进行验证,增加了很多的重复性工作。 +### 项目背景 +因为HPC应用的复杂性,其依赖安装、环境配置、编译、运行、CPU/GPU性能采集分析的门槛比较高,导致迁移和调优的工作量大,不同的人在不同的机器上跑同样的应用和算例基本上是重头开始,费时费力,而且很多情况下需要同时部署鲲鹏/X86两套环境进行验证,增加了很多的重复性工作,无法聚焦软件算法优化。 -### 解决方案 +### 项目特色 -- 提供支持ARM/X86的统一接口,一键生成环境脚本、一键编译、一键运行、一键性能采集、一键Benchmark等功能. +- 支持鲲鹏/X86,一键下载依赖,一键安装依赖、采用业界权威依赖目录结构管理海量依赖,自动生成module file +- 根据HPC配置一键生成环境脚本、一键编译、一键运行、一键性能采集、一键Benchmark. - 所有配置仅用一个文件记录,HPC应用部署到不同的机器仅需修改配置文件. - 日志管理系统自动记录HPC应用部署过程中的所有信息. -- 常用HPC工具软件开箱即用,提供GCC/毕昇/icc版本,支持一键module加载. -- 软件本身开箱即用,仅依赖Python环境. +- 常用HPC工具软件开箱即用. +- 软件本身无需编译开箱即用,仅依赖Python环境. - (未来) 集成HPC领域常用性能调优手段、核心算法. - (未来) 集群性能分析工具. - (未来) 智能调优. - (未来) HPC应用[容器化](https://catalog.ngc.nvidia.com/orgs/hpc/containers/quantum_espresso). +### 目录结构 + +| 目录/文件 | 说明 | 备注 | +| --------- | ---------------------------------- | -------- | +| benchmark | 矩阵运算、OpenMP、MPI、P2P性能测试 | | +| doc | 文档 | | +| downloads | 存放依赖库源码包/压缩包 | | +| examples | 性能小实验 | | +| package | 存放安装脚本和FAQ | | +| software | 依赖库二进制仓库 | 自动生成 | +| src | 贾维斯源码 | | +| templates | 常用HPC应用的配置模板 | | +| test | 贾维斯测试用例 | | +| workload | 常用HPC应用的算例合集 | | +| init.sh | 贾维斯初始化文件 | | +| jarvis | 贾维斯启动入口 | | + ### 已验证HPC应用 分子动力学领域: @@ -36,60 +55,145 @@ - [x] OpenFOAM + ### 使用说明 1.下载包解压之后初始化 -`source init.sh` +``` +source init.sh +``` + +2.修改data.config或者套用现有模板,各配置项说明如下所示: + +| 配置项 | 说明 | 示例 | +| :----------: | :--------------------------------------------------------- | :----------------------------------------------------------- | +| [SERVER] | 服务器节点列表,多节点时用于自动生成hostfile,每行一个节点 | 11.11.11.11 | +| [DOWNLOAD] | 每行一个软件的版本和下载链接,默认下载到downloads目录 | cmake/3.16.4 https://cmake.org/files/v3.16/cmake-3.16.4.tar.gz | +| [DEPENDENCY] | HPC应用依赖安装脚本 | ./jarvis -install gcc/9.3.1 com
module use ./software/modulefiles
module load gcc9 | +| [ENV] | HPC应用编译运行环境配置 | source env.sh | +| [APP] | HPC应用信息,包括应用名、构建路径、二进制路径、算例路径 | app_name = CP2K
build_dir = /home/cp2k-8.2/
binary_dir = /home/CP2K/cp2k-8.2/bin/
case_dir = /home/CP2K/cp2k-8.2/benchmarks/QS/ | +| [BUILD] | HPC应用构建脚本 | make -j 128 | +| [CLEAN] | HPC应用编译清理脚本 | make -j 128 clean | +| [RUN] | HPC应用运行配置,包括前置命令、应用命令和节点个数 | run = mpi
binary = cp2k.psmp H2O-256.inp
nodes = 1 | +| [BATCH] | HPC应用批量运行命令 | #!/bin/bash
nvidia-smi -pm 1
nvidia-smi -ac 1215,1410 | +| [PERF] | 性能工具额外参数 | | + +3.一键下载依赖(仅针对无需鉴权的链接,否则需要自行下载) + +``` +./jarvis -d +``` + +4.安装单个依赖 + +``` +./jarvis -install [name/version/other] [option] +``` + +option支持列表如下所示 + +| 选项值 | 解释 | 安装目录 | +| ------------------ | ----------------------------- | ----------------------- | +| gcc | 使用当前gcc进行编译 | software/libs/gcc | +| gcc+mpi | 使用当前gcc+当前mpi进行编译 | software/libs/gcc/mpi | +| clang(bisheng) | 使用当前clang进行编译 | software/libs/clang | +| clang(bisheng)+mpi | 使用当前clang+当前mpi进行编译 | software/libs/clang/mpi | +| nvc | 使用当前nvc进行编译 | software/libs/nvc | +| nvc+mpi | 使用当前nvc+当前mpi进行编译 | software/libs/nvc/mpi | +| icc | 使用当前icc进行编译 | software/libs/icc | +| icc+mpi | 使用当前icc+当前mpi进行编译 | software/libs/icc/mpi | +| com | 安装编译器 | software/compiler | +| any | 安装工具软件 | software/compiler/utils | + +注意,如果软件为MPI通信软件(如hmpi、openmpi),会安装到software/mpi目录 + +(eg: ./jarvis -install fftw/3.3.8 gcc) +5.一键安装所有依赖 + +``` +./jarvis -dp +``` + +6.一键生成环境变量(脱离贾维斯运行才需要执行) + +``` +./jarvis -e && source ./env.sh +``` + +7.一键编译 + +``` +./jarvis -b +``` + +8.一键运行 + +``` +./jarvis -r +``` -2.修改data.config(ARM)或者data.X86.config(X86) +9.一键性能采集(perf) -3.一键生成环境变量(或者python3 jarvis.py) +``` +./jarvis -p +``` -`./jarvis.py -e` -`source env.sh` -4.一键编译 +10.一键Kperf性能采集(生成TopDown) -`./jarvis.py -b` +``` +./jarvis -kp +``` -5.一键运行 +11.一键GPU性能采集(需安装nsys、ncu) -`./jarvis.py -r` +``` +./jarvis -gp +``` -6.一键性能采集(perf) +12.一键输出服务器信息(包括CPU、网卡、OS、内存等) -`./jarvis.py -p` +``` +./jarvis -i +``` -7.一键GPU性能采集(使用nsys、ncu) +13.一键服务器性能评测(包括MPI、OMP、P2P等) -`./jarvis.py -gp` +``` +./jarvis -bench all #运行所有benchmark +./jarvis -bench mpi #运行MPI benchmark +./jarvis -bench omp #运行OMP benchmark +./jarvis -bench gemm #运行矩阵运算 benchmark +``` -8.一键输出服务器信息(包括CPU、网卡、OS、内存等) +14.切换配置 -`./jarvis.py -i` +``` +./jarvis -use XXX.config +``` -9.切换配置 +15.其它功能查看(网络检测) -`./jarvis.py -use data.XXX.config` +``` +./jarvis -h +``` -10.其它功能查看(多线程下载、网络检测) -`./jarvis.py -h` ### 欢迎贡献 -贾维斯项目欢迎您的热情参与! +贾维斯项目欢迎您的专业技能和热情参与! -小的改进或修复总是值得赞赏的;先从文档开始可能是一个很好的起点。如果您正在考虑对源代码的更大贡献,请先提交issue讨论。 +小的改进或修复总是值得赞赏的;先从文档开始可能是一个很好的起点。如果您正在考虑对源代码的更大贡献,请先提交一个issue或者在maillist进行讨论。 编写代码并不是为贾维斯做出贡献的唯一方法。您还可以: -- 贡献小而精的工具(小于10MB>) +- 贡献安装脚本 - 帮助我们测试新的HPC应用 -- 开发教程、演示和其他教育材料 +- 开发教程、演示 - 为我们宣传 - 帮助新的贡献者加入 -请添加OpenEuler SIG微信群了解更多HPC迁移调优知识 +请添加openEuler HPC SIG微信群了解更多HPC迁移调优知识 ![微信群](./wechat-group-qr.png) \ No newline at end of file diff --git a/benchmark/README.md b/benchmark/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a9c072d60856f80821ded643066be7d129b0e8ec --- /dev/null +++ b/benchmark/README.md @@ -0,0 +1,4 @@ +# benchmark +# gemm: blas and MPI performance +# p2p: GPU p2p connectivity and bandwidth check + diff --git a/benchmark/gemm/Makefile b/benchmark/gemm/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..b41aae52cf40d2f9ad7a9d1680a9c082b900d15a --- /dev/null +++ b/benchmark/gemm/Makefile @@ -0,0 +1,19 @@ +CC = mpic++ +CCFLAGS = -O2 -fopenmp +OPENBLAS_PATH = ${JARVIS_LIBS}/gcc9/openblas/0.3.18 +OPENBLAS_INC = -I ${OPENBLAS_PATH}/include +OPENBLAS_LDFLAGS = -L ${OPENBLAS_PATH}/lib -lopenblas + +KML_PATH = /usr/local/kml +KML_INC = -I ${KML_PATH}/include +KML_LDFLAGS = -L ${KML_PATH}/lib/kblas/omp -lkblas +all: gemm + +gemm: gemm.cpp + ${CC} ${CCFLAGS} ${OPENBLAS_INC} gemm.cpp -o gemm ${OPENBLAS_LDFLAGS} + +gemm-kml: gemm.cpp + ${CC} -DUSE_KML ${CCFLAGS} ${KML_INC} gemm.cpp -o gemm-kml ${KML_LDFLAGS} + +clean: + rm -rf gemm* diff --git a/benchmark/gemm/gemm.cpp b/benchmark/gemm/gemm.cpp new file mode 100644 index 0000000000000000000000000000000000000000..53b0974cd0897137db3f9de1319189abb9fad9a6 --- /dev/null +++ b/benchmark/gemm/gemm.cpp @@ -0,0 +1,224 @@ +#include +#include +#include +#include +#include +#include "mpi.h" +#ifdef USE_KML + #include "kblas.h" +#else + #include +#endif +using namespace std; + +void randMat(int rows, int cols, float *&Mat) { + Mat = new float[rows * cols]; + for (int i = 0; i < rows; i++) + for (int j = 0; j < cols; j++) + Mat[i * cols + j] = 1.0; +} + +void openmp_sgemm(int m, int n, int k, float *&leftMat, float *&rightMat, + float *&resultMat) { + // rightMat is transposed +#pragma omp parallel for + for (int row = 0; row < m; row++) { + for (int col = 0; col < k; col++) { + resultMat[row * k + col] = 0.0; + for (int i = 0; i < n; i++) { + resultMat[row * k + col] += + leftMat[row * n + i] * rightMat[col * n + i]; + } + } + } + return; +} + +void blas_sgemm(int m, int n, int k, float *&leftMat, float *&rightMat, + float *&resultMat) { + cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasTrans, m, k, n, 1.0, leftMat, + n, rightMat, n, 0.0, resultMat, k); +} + +void mpi_sgemm(int m, int n, int k, float *&leftMat, float *&rightMat, + float *&resultMat, int rank, int worldsize, bool blas) { + int rowBlock = sqrt(worldsize); + if (rowBlock * rowBlock > worldsize) + rowBlock -= 1; + int colBlock = rowBlock; + + int rowStride = m / rowBlock; + int colStride = k / colBlock; + + worldsize = rowBlock * colBlock; // we abandom some processes. + // so best set process to a square number. + + float *res; + + if (rank == 0) { + float *buf = new float[k * n]; + // transpose right Mat + for (int r = 0; r < n; r++) { + for (int c = 0; c < k; c++) { + buf[c * n + r] = rightMat[r * k + c]; + } + } + + for (int r = 0; r < k; r++) { + for (int c = 0; c < n; c++) { + rightMat[r * n + c] = buf[r * n + c]; + } + } + + MPI_Request sendRequest[2 * worldsize]; + MPI_Status status[2 * worldsize]; + for (int rowB = 0; rowB < rowBlock; rowB++) { + for (int colB = 0; colB < colBlock; colB++) { + rowStride = (rowB == rowBlock - 1) ? m - (rowBlock - 1) * (m / rowBlock) + : m / rowBlock; + colStride = (colB == colBlock - 1) ? k - (colBlock - 1) * (k / colBlock) + : k / colBlock; + int sendto = rowB * colBlock + colB; + if (sendto == 0) + continue; + MPI_Isend(&leftMat[rowB * (m / rowBlock) * n], rowStride * n, MPI_FLOAT, + sendto, 0, MPI_COMM_WORLD, &sendRequest[sendto]); + MPI_Isend(&rightMat[colB * (k / colBlock) * n], colStride * n, + MPI_FLOAT, sendto, 1, MPI_COMM_WORLD, + &sendRequest[sendto + worldsize]); + } + } + for (int rowB = 0; rowB < rowBlock; rowB++) { + for (int colB = 0; colB < colBlock; colB++) { + int recvfrom = rowB * colBlock + colB; + if (recvfrom == 0) + continue; + MPI_Wait(&sendRequest[recvfrom], &status[recvfrom]); + MPI_Wait(&sendRequest[recvfrom + worldsize], + &status[recvfrom + worldsize]); + } + } + res = new float[(m / rowBlock) * (k / colBlock)]; + } else { + if (rank < worldsize) { + MPI_Status status[2]; + rowStride = ((rank / colBlock) == rowBlock - 1) + ? m - (rowBlock - 1) * (m / rowBlock) + : m / rowBlock; + colStride = ((rank % colBlock) == colBlock - 1) + ? k - (colBlock - 1) * (k / colBlock) + : k / colBlock; + if (rank != 0) { + leftMat = new float[rowStride * n]; + rightMat = new float[colStride * n]; + } + if (rank != 0) { + MPI_Recv(leftMat, rowStride * n, MPI_FLOAT, 0, 0, MPI_COMM_WORLD, + &status[0]); + MPI_Recv(rightMat, colStride * n, MPI_FLOAT, 0, 1, MPI_COMM_WORLD, + &status[1]); + } + res = new float[rowStride * colStride]; + } + } + MPI_Barrier(MPI_COMM_WORLD); + + if (rank < worldsize) { + rowStride = ((rank / colBlock) == rowBlock - 1) + ? m - (rowBlock - 1) * (m / rowBlock) + : m / rowBlock; + colStride = ((rank % colBlock) == colBlock - 1) + ? k - (colBlock - 1) * (k / colBlock) + : k / colBlock; + if (!blas) + openmp_sgemm(rowStride, n, colStride, leftMat, rightMat, res); + else + blas_sgemm(rowStride, n, colStride, leftMat, rightMat, res); + } + MPI_Barrier(MPI_COMM_WORLD); + + if (rank == 0) { + MPI_Status status; + float *buf = new float[(m - (rowBlock - 1) * (m / rowBlock)) * + (k - (colBlock - 1) * (k / colBlock))]; + float *temp_res; + for (int rowB = 0; rowB < rowBlock; rowB++) { + for (int colB = 0; colB < colBlock; colB++) { + rowStride = (rowB == rowBlock - 1) ? m - (rowBlock - 1) * (m / rowBlock) + : m / rowBlock; + colStride = (colB == colBlock - 1) ? k - (colBlock - 1) * (k / colBlock) + : k / colBlock; + int recvfrom = rowB * colBlock + colB; + if (recvfrom != 0) { + temp_res = buf; + MPI_Recv(temp_res, rowStride * colStride, MPI_FLOAT, recvfrom, 0, + MPI_COMM_WORLD, &status); + } else { + temp_res = res; + } + for (int r = 0; r < rowStride; r++) + for (int c = 0; c < colStride; c++) + resultMat[rowB * (m / rowBlock) * k + colB * (k / colBlock) + + r * k + c] = temp_res[r * colStride + c]; + } + } + } else { + rowStride = ((rank / colBlock) == rowBlock - 1) + ? m - (rowBlock - 1) * (m / rowBlock) + : m / rowBlock; + colStride = ((rank % colBlock) == colBlock - 1) + ? k - (colBlock - 1) * (k / colBlock) + : k / colBlock; + if (rank < worldsize) + MPI_Send(res, rowStride * colStride, MPI_FLOAT, 0, 0, MPI_COMM_WORLD); + } + MPI_Barrier(MPI_COMM_WORLD); + + return; +} + +int main(int argc, char *argv[]) { + if (argc != 5) { + cout << "Usage: " << argv[0] << " M N K use-blas\n"; + exit(-1); + } + + int rank; + int worldSize; + MPI_Init(&argc, &argv); + + MPI_Comm_size(MPI_COMM_WORLD, &worldSize); + MPI_Comm_rank(MPI_COMM_WORLD, &rank); + + int m = atoi(argv[1]); + int n = atoi(argv[2]); + int k = atoi(argv[3]); + int blas = atoi(argv[4]); + + float *leftMat, *rightMat, *resMat; + + struct timeval start, stop; + if (rank == 0) { + randMat(m, n, leftMat); + randMat(n, k, rightMat); + randMat(m, k, resMat); + } + gettimeofday(&start, NULL); + mpi_sgemm(m, n, k, leftMat, rightMat, resMat, rank, worldSize, blas); + gettimeofday(&stop, NULL); + if (rank == 0) { + cout << "mpi matmul: " + << (stop.tv_sec - start.tv_sec) * 1000.0 + + (stop.tv_usec - start.tv_usec) / 1000.0 + << " ms" << endl; + + for (int i = 0; i < m; i++) { + for (int j = 0; j < k; j++) + if (int(resMat[i * k + j]) != n) { + cout << resMat[i * k + j] << "error\n"; + exit(-1); + } + } + } + MPI_Finalize(); +} diff --git a/benchmark/gemm/run.sh b/benchmark/gemm/run.sh new file mode 100644 index 0000000000000000000000000000000000000000..4452ca479124203b951bb9e480b789f0baa88287 --- /dev/null +++ b/benchmark/gemm/run.sh @@ -0,0 +1,30 @@ +flags="**************" +armRun(){ + mpi_cmd="mpirun --allow-run-as-root -x OMP_NUM_THREADS=4 -mca btl ^vader,tcp,openib,uct -np 16" + echo "${flags}benching openblas gemm, best 405ms${flags}" + make + ${mpi_cmd} ./gemm 4024 4024 4024 1 + echo "${flags}benching kml gemm, best 216ms${flags}" + make gemm-kml + ${mpi_cmd} ./gemm-kml 4024 4024 4024 1 + echo "${flags}benching MPI perf, best 1855ms${flags}" + ${mpi_cmd} ./gemm 4024 4024 4024 0 +} + +x86Run(){ + mpi_cmd="mpirun -genv OMP_NUM_THREADS=4 -n 16" + echo "${flags}benching openblas gemm, best 405ms${flags}" + make + ${mpi_cmd} ./gemm 4024 4024 4024 1 + echo "${flags}benching MKL gemm, best 216ms${flags}" + make gemm-MKL + ${mpi_cmd} ./gemm-mkl 4024 4024 4024 1 + echo "${flags}benching MPI perf, best 1855ms${flags}" + ${mpi_cmd} ./gemm 4024 4024 4024 0 +} +# check Arch +if [ x$(arch) = xaarch64 ];then + armRun +else + x86Run +fi \ No newline at end of file diff --git a/benchmark/mpi/reduce_avg.c b/benchmark/mpi/reduce_avg.c new file mode 100644 index 0000000000000000000000000000000000000000..05a576be7505a36a5a0ff7a4ee575a243752c3d1 --- /dev/null +++ b/benchmark/mpi/reduce_avg.c @@ -0,0 +1,74 @@ +// Author: Wes Kendall +// Copyright 2013 www.mpitutorial.com +// This code is provided freely with the tutorials on mpitutorial.com. Feel +// free to modify it for your own use. Any distribution of the code must +// either provide a link to www.mpitutorial.com or keep this header intact. +// +// Program that computes the average of an array of elements in parallel using +// MPI_Reduce. +// +#include +#include +#include +#include +#include + +// Creates an array of random numbers. Each number has a value from 0 - 1 +float *create_rand_nums(int num_elements) { + float *rand_nums = (float *)malloc(sizeof(float) * num_elements); + assert(rand_nums != NULL); + int i; + for (i = 0; i < num_elements; i++) { + rand_nums[i] = (rand() / (float)RAND_MAX); + } + return rand_nums; +} + +int main(int argc, char** argv) { + if (argc != 2) { + fprintf(stderr, "Usage: avg num_elements_per_proc\n"); + exit(1); + } + + int num_elements_per_proc = atoi(argv[1]); + + MPI_Init(NULL, NULL); + + int world_rank; + MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); + int world_size; + MPI_Comm_size(MPI_COMM_WORLD, &world_size); + + // Create a random array of elements on all processes. + srand(time(NULL)*world_rank); // Seed the random number generator to get different results each time for each processor + float *rand_nums = NULL; + rand_nums = create_rand_nums(num_elements_per_proc); + + // Sum the numbers locally + float local_sum = 0; + int i; + for (i = 0; i < num_elements_per_proc; i++) { + local_sum += rand_nums[i]; + } + + // Print the random numbers on each process + printf("Local sum for process %d - %f, avg = %f\n", + world_rank, local_sum, local_sum / num_elements_per_proc); + + // Reduce all of the local sums into the global sum + float global_sum; + MPI_Reduce(&local_sum, &global_sum, 1, MPI_FLOAT, MPI_SUM, 0, + MPI_COMM_WORLD); + + // Print the result + if (world_rank == 0) { + printf("Total sum = %f, avg = %f\n", global_sum, + global_sum / (world_size * num_elements_per_proc)); + } + + // Clean up + free(rand_nums); + + MPI_Barrier(MPI_COMM_WORLD); + MPI_Finalize(); +} \ No newline at end of file diff --git a/benchmark/mpi/run.sh b/benchmark/mpi/run.sh new file mode 100644 index 0000000000000000000000000000000000000000..265b300cb6673a879bd2fb961545f7b925b83da2 --- /dev/null +++ b/benchmark/mpi/run.sh @@ -0,0 +1,2 @@ +mpicc reduce_avg.c -o avg +mpirun -n 2 --allow-run-as-root ./avg 2 \ No newline at end of file diff --git a/benchmark/omp/Makefile b/benchmark/omp/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..254eec75896942942a0232b7a51b80a26a7687e2 --- /dev/null +++ b/benchmark/omp/Makefile @@ -0,0 +1,17 @@ +CC = gcc +CCFLAGS = -fopenmp -O2 +NVCFLAGS = + +all: caclPI + +caclPI: caclPI.cpp + ${CC} ${CCFLAGS} caclPI.cpp -o caclPI + +gramSchmidt_gpu: gramSchmidt_gpu.c + nvc -mp=gpu -Minfo=mp -lm gramSchmidt_gpu.c -o gramSchmidt_gpu.o + +gramSchmidt_gpu_f90: gramSchmidt_gpu.F90 + nvfortran -mp=gpu -Minfo=mp -lm gramSchmidt_gpu.F90 -o gramSchmidt_gpu_f.o + +clean: + rm -rf caclPI gramSchmidt_gpu diff --git a/benchmark/omp/caclPI.cpp b/benchmark/omp/caclPI.cpp new file mode 100644 index 0000000000000000000000000000000000000000..b68de200a488065f1e258f36d4f79dceceace29d --- /dev/null +++ b/benchmark/omp/caclPI.cpp @@ -0,0 +1,24 @@ + +#include +#include + +#define NUM_THREADS 32 +static long num_steps = 100000000; + +int main () +{ + int i; + double x, pi, sum = 0.0, step, start_time,end_time; + step = 1.0/(double) num_steps; + omp_set_num_threads(NUM_THREADS); + start_time=omp_get_wtime(); + #pragma omp parallel for reduction(+ : sum) private(x) + for (i=1;i<= num_steps; i++){ + x = (i-0.5)*step; + sum = sum + 4.0/(1.0+x*x); + } + pi = step * sum; + end_time=omp_get_wtime(); + printf("Pi = %16.15f\n Running time:%.3f ms \n", pi, end_time - start_time); + return 1; +} diff --git a/benchmark/omp/gramSchmidt_gpu.F90 b/benchmark/omp/gramSchmidt_gpu.F90 new file mode 100644 index 0000000000000000000000000000000000000000..aa1afd6d6d5abc1de4799fd4e576671d34b6c0d1 --- /dev/null +++ b/benchmark/omp/gramSchmidt_gpu.F90 @@ -0,0 +1,34 @@ +! @@name: target_data.3f +! @@type: F-free +! @@compilable: yes +! @@linkable: no +! @@expect: success +! @@version: omp_4.0 +subroutine gramSchmidt(Q,rows,cols) + integer :: rows,cols, i,k + double precision :: Q(rows,cols), tmp + !$omp target data map(Q) + do k=1,cols + tmp = 0.0d0 + !$omp target map(tofrom: tmp) + !$omp parallel do reduction(+:tmp) + do i=1,rows + tmp = tmp + (Q(i,k) * Q(i,k)) + end do + !$omp end target + + tmp = 1.0d0/sqrt(tmp) + + !$omp target + !$omp parallel do + do i=1,rows + Q(i,k) = Q(i,k)*tmp + enddo + !$omp end target + end do + !$omp end target data +end subroutine + +! Note: The variable tmp is now mapped with tofrom, for correct +! execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro. + \ No newline at end of file diff --git a/benchmark/omp/gramSchmidt_gpu.c b/benchmark/omp/gramSchmidt_gpu.c new file mode 100644 index 0000000000000000000000000000000000000000..9cae585ffa7df5c1a8aeca46504c2bd0ddb88d38 --- /dev/null +++ b/benchmark/omp/gramSchmidt_gpu.c @@ -0,0 +1,59 @@ +#include +#include +#include + +#define COLS 1000 +#define ROWS 1000 +#define FLOAT_T float + +FLOAT_T *getFinput(int scale) +{ + FLOAT_T *input; + if ((input = (FLOAT_T *)malloc(sizeof(FLOAT_T) * scale)) == NULL) + { + fprintf(stderr, "Out of Memory!!\n"); + exit(1); + } + for (int i = 0; i < scale; i++) + { + input[i] = ((FLOAT_T)rand() / (FLOAT_T)RAND_MAX) - 0.5; + } + return input; +} + +FLOAT_T **get2Darr(int M, int N) +{ + FLOAT_T **input; + input = (FLOAT_T **)malloc(M * sizeof(FLOAT_T *)); + for (int i = 0; i < M; i++) + { + input[i] = (FLOAT_T *)malloc(N * sizeof(FLOAT_T)); + } + return input; +} + +void gramSchmidt_gpu(FLOAT_T **Q) +{ + int cols = COLS; + #pragma omp target data map(Q[0:ROWS][0:cols]) + for(int k=0; k < cols; k++) + { + double tmp = 0.0; + #pragma omp target map(tofrom: tmp) + #pragma omp parallel for reduction(+:tmp) + for(int i=0; i < ROWS; i++) + tmp += (Q[i][k] * Q[i][k]); + tmp = 1/sqrt(tmp); + #pragma omp target + #pragma omp parallel for + for(int i=0; i < ROWS; i++) + Q[i][k] *= tmp; + } +} + +int main() +{ + FLOAT_T **Q = get2Darr(ROWS, COLS); + gramSchmidt_gpu(Q); + return; +} diff --git a/benchmark/omp/run.sh b/benchmark/omp/run.sh new file mode 100644 index 0000000000000000000000000000000000000000..3784d96de306443521a0c6fb52abe11e6cc122f9 --- /dev/null +++ b/benchmark/omp/run.sh @@ -0,0 +1,18 @@ +flags="**************" +armRun(){ + echo "${flags}benching omp perf, best 0.023ms${flags}" + make + ./caclPI + make gramSchmidt_gpu + ./gramSchmidt_gpu +} + +x86Run(){ + armRun +} +# check Arch +if [ x$(arch) = xaarch64 ];then + armRun +else + x86Run +fi \ No newline at end of file diff --git a/benchmark/p2p/Makefile b/benchmark/p2p/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..e9c55f7d0572b79b09fb31340e90e2864ec2e83f --- /dev/null +++ b/benchmark/p2p/Makefile @@ -0,0 +1,67 @@ +## + # ===================================================================================== + # + # Filename: Makefile + # + # Description: This microbenchmark is to obtain the latency & uni/bi-directional + # bandwidth for PCI-e, NVLink-V1 in NVIDIA P100 DGX-1 and NVLink-V2 in + # V100 DGX-1. Please see our IISWC-18 paper titled "Tartan: Evaluating + # Modern GPU Interconnect via a Multi-GPU Benchmark Suite". The + # Code is modified from the p2pBandwidthLatencyTest app in + # NVIDIA CUDA-SDK. Please follow NVIDIA's EULA for end usage. + # + # Version: 1.0 + # Created: 01/24/2018 02:12:31 PM + # Revision: none + # Compiler: GNU-Make + # + # Author: Ang Li, PNNL + # Website: http://www.angliphd.com + # + # ===================================================================================== +## + + +################################################################################ +# +# Copyright 1993-2015 NVIDIA Corporation. All rights reserved. +# +# NOTICE TO USER: +# +# This source code is subject to NVIDIA ownership rights under U.S. and +# international Copyright laws. +# +# NVIDIA MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOURCE +# CODE FOR ANY PURPOSE. IT IS PROVIDED "AS IS" WITHOUT EXPRESS OR +# IMPLIED WARRANTY OF ANY KIND. NVIDIA DISCLAIMS ALL WARRANTIES WITH +# REGARD TO THIS SOURCE CODE, INCLUDING ALL IMPLIED WARRANTIES OF +# MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. +# IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL, +# OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS +# OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE +# OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE +# OR PERFORMANCE OF THIS SOURCE CODE. +# +# U.S. Government End Users. This source code is a "commercial item" as +# that term is defined at 48 C.F.R. 2.101 (OCT 1995), consisting of +# "commercial computer software" and "commercial computer software +# documentation" as such terms are used in 48 C.F.R. 12.212 (SEPT 1995) +# and is provided to the U.S. Government only as a commercial end item. +# Consistent with 48 C.F.R.12.212 and 48 C.F.R. 227.7202-1 through +# 227.7202-4 (JUNE 1995), all U.S. Government End Users acquire the +# source code with only those rights set forth herein. +# +################################################################################ +# +# Makefile project only supported on Mac OS X and Linux Platforms) +# +################################################################################ + +include shared.mk + +p2pTest: p2pBandwidthLatencyTest.cu + $(NVCC) $(NVCC_FLAGS) $^ -o $@ + +clean: + rm -f p2pTest + diff --git a/benchmark/p2p/p2pBandwidthLatencyTest.cu b/benchmark/p2p/p2pBandwidthLatencyTest.cu new file mode 100644 index 0000000000000000000000000000000000000000..e3aaec08c278e74bb47a98b5a474d515f525164a --- /dev/null +++ b/benchmark/p2p/p2pBandwidthLatencyTest.cu @@ -0,0 +1,653 @@ +/* + * ===================================================================================== + * + * Filename: p2pBandwidthLatencyTest.cu + * + * Description: This microbenchmark is to obtain the latency & uni/bi-directional + * bandwidth for PCI-e, NVLink-V1 in NVIDIA P100 DGX-1 and NVLink-V2 in + * V100 DGX-1. Please see our IISWC-18 paper titled "Tartan: Evaluating + * Modern GPU Interconnects via a Multi-GPU Benchmark Suite". The + * Code is modified from the p2pBandwidthLatencyTest app in + * NVIDIA CUDA-SDK. Please follow NVIDIA's EULA for end usage. + * + * Version: 1.0 + * Created: 01/24/2018 02:12:31 PM + * Revision: none + * Compiler: nvcc + * + * Author: Ang Li, PNNL + * Website: http://www.angliphd.com + * + * ===================================================================================== + */ + +/* + * Copyright 1993-2015 NVIDIA Corporation. All rights reserved. + * + * Please refer to the NVIDIA end user license agreement (EULA) associated + * with this source code for terms and conditions that govern your use of + * this software. Any use, reproduction, disclosure, or distribution of + * this software and related documentation outside the terms of the EULA + * is strictly prohibited. + * + */ + +#define ASCENDING + +#include +#include + +using namespace std; + +const char *sSampleName = "P2P (Peer-to-Peer) GPU Bandwidth Latency Test"; + +//Macro for checking cuda errors following a cuda launch or api call +#define cudaCheckError() { \ + cudaError_t e=cudaGetLastError(); \ + if(e!=cudaSuccess) { \ + printf("Cuda failure %s:%d: '%s'\n",__FILE__,__LINE__,cudaGetErrorString(e)); \ + exit(EXIT_SUCCESS); \ + } \ + } +__global__ void delay(int * null) { + float j=threadIdx.x; + for(int i=1;i<10000;i++) + j=(j+1)/j; + + if(threadIdx.x == j) null[0] = j; +} + +void checkP2Paccess(int numGPUs) +{ + for (int i=0; i buffers(numGPUs); + vector start(numGPUs); + vector stop(numGPUs); + + for (int d=0; d bandwidthMatrix(numGPUs*numGPUs); + + for (int i=0; i=0; k--) +#endif + { + cudaDeviceCanAccessPeer(&src2route,i,k); + cudaDeviceCanAccessPeer(&route2dst,k,j); + if (src2route && route2dst) + { + routingnode = k; + break; + } + } + cudaDeviceEnablePeerAccess(routingnode,0 ); + cudaCheckError(); + cudaSetDevice(routingnode); + cudaDeviceEnablePeerAccess(j,0 ); + cudaSetDevice(i); + } + } + + cudaDeviceSynchronize(); + cudaCheckError(); + + if (routingrequired) + { + delay<<<1,1>>>(NULL); + cudaEventRecord(start[i]); + for (int r=0; r>>(NULL); + cudaEventRecord(start[i]); + + for (int r=0; r buffers(numGPUs); + vector start(numGPUs); + vector stop(numGPUs); + vector stream0(numGPUs); + vector stream1(numGPUs); + + for (int d=0; d bandwidthMatrix(numGPUs*numGPUs); + + for (int i=0; i=0; k--) +#endif + { + cudaDeviceCanAccessPeer(&src2route,i,k); + cudaDeviceCanAccessPeer(&route2dst,k,j); + if (src2route && route2dst) + { + routingnode = k; + break; + } + } + cudaSetDevice(i); + cudaDeviceEnablePeerAccess(routingnode,0 ); + cudaCheckError(); + cudaSetDevice(routingnode); + cudaDeviceEnablePeerAccess(i,0 ); + cudaCheckError(); + cudaDeviceEnablePeerAccess(j,0 ); + cudaCheckError(); + cudaSetDevice(j); + cudaDeviceEnablePeerAccess(routingnode,0 ); + cudaSetDevice(i); + cudaCheckError(); + } + } + + cudaSetDevice(i); + cudaDeviceSynchronize(); + cudaCheckError(); + + if (routingrequired) + { + delay<<<1,1>>>(NULL); + cudaEventRecord(start[i]); + for (int r=0; r>>(NULL); + cudaEventRecord(start[i]); + + for (int r=0; r buffers(numGPUs); + vector start(numGPUs); + vector stop(numGPUs); + + for (int d=0; d latencyMatrix(numGPUs*numGPUs); + + for (int i=0; i=0; k--) +#endif + { + cudaDeviceCanAccessPeer(&src2route,i,k); + cudaDeviceCanAccessPeer(&route2dst,k,j); + if (src2route && route2dst) + { + routingnode = k; + break; + } + } + cudaSetDevice(i); + cudaDeviceEnablePeerAccess(routingnode,0 ); + cudaCheckError(); + cudaSetDevice(routingnode); + cudaDeviceEnablePeerAccess(j,0 ); + cudaCheckError(); + cudaSetDevice(i); + } + } + cudaDeviceSynchronize(); + cudaCheckError(); + + + if (routingrequired) + { + delay<<<1,1>>>(NULL); + cudaEventRecord(start[i]); + + for (int r=0; r>>(NULL); + cudaEventRecord(start[i]); + + for (int r=0; r%d=>%d,(access:%d,routingrequired:%d\n",i,routingnode,j,access, routingrequired); + cudaCheckError(); + cudaDeviceDisablePeerAccess(routingnode); + cudaCheckError(); + cudaSetDevice(routingnode); + cudaDeviceDisablePeerAccess(j); + cudaCheckError(); + cudaSetDevice(i); + } + } + } + } + + printf(" D\\D"); + + for (int j=0; j + + + +Software Download: + + +

X86

+

ARM

+ bisheng 2.1.0 + + \ No newline at end of file diff --git a/examples/cuda/Makefile b/examples/cuda/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..81e7d3ad511e20b7aadde1e4e792ff03001ddc33 --- /dev/null +++ b/examples/cuda/Makefile @@ -0,0 +1,12 @@ +ARCH=sm_80 +NVCC_FLAGS = -arch=$(ARCH) -O3 +CUDA_DIR = /usr/local/cuda/ +# CUDA compiler +NVCC = $(CUDA_DIR)/bin/nvcc +all: cuda + +cuda: cuda.cu + $(NVCC) $(NVCC_FLAGS) $^ -o $@.o + +clean: + rm -f cuda.o \ No newline at end of file diff --git a/examples/cuda/cuda.cu b/examples/cuda/cuda.cu new file mode 100644 index 0000000000000000000000000000000000000000..c8aace498be089ce805579e61bff0d288933e59e --- /dev/null +++ b/examples/cuda/cuda.cu @@ -0,0 +1,102 @@ +// nvcc cuda_hello.cu -o hello.o +#include +#define MAX_DEVICE 2 +#define RTERROR(status, s) \ + if (status != cudaSuccess) \ + { \ + printf("%s %s\n", s, cudaGetErrorString(status)); \ + cudaDeviceReset(); \ + exit(-1); \ + } + +//HelloFromGPU<<<1, 5>>>(); +__global__ void HelloFromGPU(void) +{ + printf("Hello from GPU\n"); +} + +int getDeviceCount() { + cudaError_t status; + int gpuCount = 0; + status = cudaGetDeviceCount(&gpuCount); + RTERROR(status, "cudaGetDeviceCount failed"); + if (gpuCount == 0) + { + printf("No CUDA-capable devices found, exiting.\n"); + cudaDeviceReset(); + exit(-1); + } + return gpuCount; +} + +cudaDeviceProp getProps(int device) +{ + cudaDeviceProp deviceProp; + cudaGetDeviceProperties(&deviceProp, device); + return deviceProp; +} + +void cudaGetSetDevice(){ + cudaError_t status; + int device = 0; + status = cudaGetDevice(&device); + RTERROR(status, "Error fetching current GPU"); + status = cudaSetDevice(device); + RTERROR(status, "Error setting CUDA device"); + cudaDeviceSynchronize(); +} + +void isSupportP2P(int gpuCount) +{ + int uvaOrdinals[MAX_DEVICE]; + int uvaCount = 0; + int i, j; + cudaDeviceProp prop; + for (i = 0; i < gpuCount; ++i) + { + cudaGetDeviceProperties(&prop, i); + if (prop.unifiedAddressing) + { + uvaOrdinals[uvaCount] = i; + printf(" GPU%d \"%15s\"\n", i, prop.name); + uvaCount += 1; + } + else + printf(" GPU%d \"%15s\" NOT UVA capable\n", i, prop.name); + } + int canAccessPeer_ij, canAccessPeer_ji; + for (i = 0; i < uvaCount; ++i) + { + for (j = i + 1; j < uvaCount; ++j) + { + cudaDeviceCanAccessPeer(&canAccessPeer_ij, uvaOrdinals[i], uvaOrdinals[j]); + cudaDeviceCanAccessPeer(&canAccessPeer_ji, uvaOrdinals[j], uvaOrdinals[i]); + if (canAccessPeer_ij * canAccessPeer_ji) + { + printf(" GPU%d and GPU%d: YES\n", uvaOrdinals[i], uvaOrdinals[j]); + } + else + { + printf(" GPU%d and GPU%d: NO\n", uvaOrdinals[i], uvaOrdinals[j]); + } + } + } +} + +int main(void) +{ + // get GPU Number + int gpuCount = getDeviceCount(); + printf("gpucount:%d\n", gpuCount); + // get SM Number + cudaDeviceProp deviceProp = getProps(0); + printf("SM number:%d\n", deviceProp.multiProcessorCount); + // get Mode info + if (deviceProp.computeMode == cudaComputeModeDefault) + { + printf("GPU is in Compute Mode.\n"); + } + // get P2P support info + isSupportP2P(gpuCount); + return 0; +} diff --git a/examples/false_sharing/Makefile b/examples/false_sharing/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..3039d437ef86aa8994e3a650ebeb4d189f98b195 --- /dev/null +++ b/examples/false_sharing/Makefile @@ -0,0 +1,10 @@ +CC = gcc +LDLIBS = -lnuma -lpthread +binary = false_sharing.exe +source = false_sharing_example.c +.PHONY : clean + +$(binary) : $(source) + $(CC) $(LDLIBS) -o $@ $< +clean : + -rm $(binary) $(objects) diff --git a/examples/false_sharing/ReadMe.txt b/examples/false_sharing/ReadMe.txt new file mode 100644 index 0000000000000000000000000000000000000000..d009b54cae33977c0cbdc73496f00d4b16b0c8ec --- /dev/null +++ b/examples/false_sharing/ReadMe.txt @@ -0,0 +1,35 @@ +install numactl-devel, in order to use numa.h +1.rpm -ivh numactl-devel-2.0.13-4.ky10.x86_64.rpm +compile +2.make +start perf... +3.perf c2c record ./false_sharing.exe 2 +start report... +4.perf c2c report -NN -g -c pid,iaddr --stdio + Load Local HITM : 2010 【too High, false_sharing is detected】 + Load Remote HITM : 1315 + Load Remote HIT : 0 + Load Local DRAM : 71 + Load Remote DRAM : 1881 + Load MESI State Exclusive : 1881 + Load MESI State Shared : 71 + Load LLC Misses : 3267 + LLC Misses to Local DRAM : 2.2% + LLC Misses to Remote DRAM : 57.6% + LLC Misses to Remote cache (HIT) : 0.0% + LLC Misses to Remote cache (HITM) : 40.3% +compile no false_sharing code +7.gcc -g false_sharing_example.c -pthread -lnuma -DNO_FALSE_SHARING -o no_false_sharing.exe +8.perf c2c report -NN -g -c pid,iaddr --stdio + Load Local HITM : 6【normal, false_sharing is erased】 + Load Remote HITM : 486 + Load Remote HIT : 0 + Load Local DRAM : 1 + Load Remote DRAM : 498 + Load MESI State Exclusive : 498 + Load MESI State Shared : 1 + Load LLC Misses : 985 + LLC Misses to Local DRAM : 0.1% + LLC Misses to Remote DRAM : 50.6% + LLC Misses to Remote cache (HIT) : 0.0% + LLC Misses to Remote cache (HITM) : 49.3% \ No newline at end of file diff --git a/examples/false_sharing/false_sharing_example.c b/examples/false_sharing/false_sharing_example.c new file mode 100644 index 0000000000000000000000000000000000000000..900f1ee17f5b0f32a0f812b49864bce037966a21 --- /dev/null +++ b/examples/false_sharing/false_sharing_example.c @@ -0,0 +1,268 @@ +/* + * This is an example program to show false sharing between + * numa nodes. + * + * It can be compiled two ways: + * gcc -g false_sharing_example.c -pthread -lnuma -o false_sharing.exe + * gcc -g false_sharing_example.c -pthread -lnuma -DNO_FALSE_SHARING -o no_false_sharing.exe + * + * The -DNO_FALSE_SHARING macro reduces the false sharing by expanding the shared data + * structure into two different cachelines, (and it runs faster). + * + * The usage is: + * ./false_sharing.exe + * ./no_false_sharing.exe + * + * The program will make half the threads writer threads and half reader + * threads. It will pin those threads in round-robin format to the + * different numa nodes in the system. + * + * For example, on a system with 4 numa nodes: + * ./false_sharing.exe 2 + * 12165 mticks, reader_thd (thread 6), on node 2 (cpu 144). + * 12403 mticks, reader_thd (thread 5), on node 1 (cpu 31). + * 12514 mticks, reader_thd (thread 4), on node 0 (cpu 96). + * 12703 mticks, reader_thd (thread 7), on node 3 (cpu 170). + * 12982 mticks, lock_th (thread 0), on node 0 (cpu 1). + * 13018 mticks, lock_th (thread 1), on node 1 (cpu 24). + * 13049 mticks, lock_th (thread 3), on node 3 (cpu 169). + * 13050 mticks, lock_th (thread 2), on node 2 (cpu 49). + * + * # ./no_false_sharing.exe 2 + * 1918 mticks, reader_thd (thread 4), on node 0 (cpu 96). + * 2432 mticks, reader_thd (thread 7), on node 3 (cpu 170). + * 2468 mticks, reader_thd (thread 6), on node 2 (cpu 146). + * 3903 mticks, reader_thd (thread 5), on node 1 (cpu 40). + * 7560 mticks, lock_th (thread 0), on node 0 (cpu 1). + * 7574 mticks, lock_th (thread 2), on node 2 (cpu 145). + * 7602 mticks, lock_th (thread 3), on node 3 (cpu 169). + * 7625 mticks, lock_th (thread 1), on node 1 (cpu 24). + * + */ + +#define _MULTI_THREADED +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * A thread on each numa node seems to provoke cache misses + */ +#define LOOP_CNT (5 * 1024 * 1024) + +#if defined(__x86_64__) || defined(__i386__) +static __inline__ uint64_t rdtsc() { + unsigned hi, lo; + __asm__ __volatile__ ( "rdtsc" : "=a"(lo), "=d"(hi)); + return ( (uint64_t)lo) | ( ((uint64_t)hi) << 32); +} + +#elif defined(__aarch64__) +static __inline__ uint64_t rdtsc(void) +{ + uint64_t val; + + /* + * According to ARM DDI 0487F.c, from Armv8.0 to Armv8.5 inclusive, the + * system counter is at least 56 bits wide; from Armv8.6, the counter + * must be 64 bits wide. So the system counter could be less than 64 + * bits wide and it is attributed with the flag 'cap_user_time_short' + * is true. + */ + asm volatile("mrs %0, cntvct_el0" : "=r" (val)); + + return val; +} +#endif + + +/* + * Create a struct where reader fields share a cacheline with the hot lock field. + * Compiling with -DNO_FALSE_SHARING inserts padding to avoid that sharing. + */ +typedef struct _buf { + long lock0; + long lock1; + long reserved1; +#if defined(NO_FALSE_SHARING) + long pad[5]; // to keep the 'lock*' fields on their own cacheline. +#else + long pad[1]; // to provoke false sharing. +#endif + long reader1; + long reader2; + long reader3; + long reader4; +} buf __attribute__((aligned (64))); + +buf buf1; +buf buf2; + +volatile int wait_to_begin = 1; +struct thread_data *thread; +int max_node_num; +int num_threads; +char * lock_thd_name = "lock_th"; +char * reader_thd_name = "reader_thd"; + +#define checkResults(string, val) { \ + if (val) { \ + printf("Failed with %d at %s", val, string); \ + exit(1); \ + } \ +} + +struct thread_data { + pthread_t tid; + long tix; + long node; + char *name; +}; + +/* + * Bind a thread to the specified numa node. +*/ +void setAffinity(void *parm) { + volatile uint64_t rc, j; + int node = ((struct thread_data *)parm)->node; + char *func_name = ((struct thread_data *)parm)->name; + + numa_run_on_node(node); + pthread_setname_np(pthread_self(),func_name); +} + +/* + * Thread function to simulate the false sharing. + * The "lock" threads will test-n-set the lock field, + * while the reader threads will just read the other fields + * in the struct. + */ +extern void *read_write_func(void *parm) { + + int tix = ((struct thread_data *)parm)->tix; + uint64_t start, stop, j; + char *thd_name = ((struct thread_data *)parm)->name; + + // Pin each thread to a numa node. + setAffinity(parm); + + // Wait for all threads to get created before starting. + while(wait_to_begin) ; + + start = rdtsc(); + for(j=0; j\n", argv[0] ); + printf( "where \"n\" is the number of threads per node\n"); + exit(1); + } + + if ( numa_available() < 0 ) + { + printf( "NUMA not available\n" ); + exit(1); + } + + int thread_cnt = atoi(argv[1]); + + max_node_num = numa_max_node(); + if ( max_node_num == 0 ) + max_node_num = 1; + int node_cnt = max_node_num + 1; + + // Use "thread_cnt" threads per node. + num_threads = (max_node_num +1) * thread_cnt; + + thread = malloc( sizeof(struct thread_data) * num_threads); + + // Create the first half of threads as lock threads. + // Assign each thread a successive round robin node to + // be pinned to (later after it gets created.) + // + for (i=0; i<=(num_threads/2 - 1); i++) { + thread[i].tix = i; + thread[i].node = i%node_cnt; + thread[i].name = lock_thd_name; + rc = pthread_create(&thread[i].tid, NULL, read_write_func, &thread[i]); + checkResults("pthread_create()\n", rc); + usleep(500); + } + + // Create the second half of threads as reader threads. + // Assign each thread a successive round robin node to + // be pinned to (later after it gets created.) + // + for (i=((num_threads/2)); i<(num_threads); i++) { + thread[i].tix = i; + thread[i].node = i%node_cnt; + thread[i].name = reader_thd_name; + rc = pthread_create(&thread[i].tid, NULL, read_write_func, &thread[i]); + checkResults("pthread_create()\n", rc); + usleep(500); + } + + // Sync to let threads start together + usleep(500); + wait_to_begin = 0; + + for (i=0; i length: - print(f"You don't have {nodes} nodes, only {length} nodes available!") - sys.exit() - if nodes <= 1: - return - gen_nodes = '\n'.join(self.avail_ips_list[:nodes]) - print(f"HOSTFILE\n{gen_nodes}\nGENERATED.") - self.write_file('hostfile', gen_nodes) - - # single run - def run(self): - print(f"start run {Data.app_name}") - nodes = int(Data.run_cmd['nodes']) - self.gen_hostfile(nodes) - run_cmd = self.hpc_data.get_run_cmd() - self.exe.exec_raw(run_cmd) - - def batch_run(self): - batch_file = 'Batch_run.sh' - print(f"start batch run {Data.app_name}") - batch_content = f''' -cd {Data.case_dir} -{Data.batch_cmd} -''' - with open(batch_file, 'w') as f: - f.write(batch_content) - run_cmd = f''' -chmod +x {batch_file} -./{batch_file} -''' - self.exe.exec_raw(run_cmd) - - def change_yum_repo(self): - print(f"start yum repo change") - repo_cmd = ''' -cp ./config/yum/*.repo /etc/yum.repos.d/ -yum clean all -yum makecache -''' - self.exe.exec_raw(repo_cmd) - - def get_pid(self): - #get pid - pid_cmd = f'pidof {Data.binary_file}' - result = self.exe.exec_popen(pid_cmd) - if len(result) == 0: - print("failed to get pid.") - sys.exit() - else: - pid_list = result[0].split(' ') - return pid_list[0].strip() - - def perf(self): - print(f"start perf {Data.app_name}") - #get pid - pid = self.get_pid() - #start perf && analysis - perf_cmd = f''' -perf record -a -g -p {pid} -perf report -i perf.data -F period,sample,overhead,symbol,dso,comm -s overhead --percent-limit 0.1% --stdio -''' - self.exe.exec_raw(perf_cmd) - - def gen_wget_url(self, out_dir='./downloads', url=''): - head = "wget --no-check-certificate" - out_para = "-P" - if not os.path.exists(out_dir): - os.makedirs(out_dir) - download_url = f'{head} {out_para} {out_dir} {url}' - return download_url - - def download(self): - print(f"start download") - for url in self.download_list: - download_url = self.gen_wget_url(url=url) - os.popen(download_url) - - def get_arch(self): - arch = 'arm' - if not self.isARM: - arch = 'X86' - return arch - - def get_cur_time(self): - return re.sub(' |:', '-', self.tool.get_time_stamp()) - - def gpu_perf(self): - print(f"start gpu perf") - run_cmd = self.hpc_data.get_run() - gperf_cmd = f''' -cd {Data.case_dir} -nsys profile -y 5s -d 100s -o nsys-{self.get_arch()}-{self.get_cur_time()} {run_cmd} - ''' - self.exe.exec_raw(gperf_cmd) - - def ncu_perf(self, kernel): - print(f"start ncu perf") - run_cmd = self.hpc_data.get_run() - ncu_cmd = f''' - cd {Data.case_dir} - ncu --export ncu-{self.get_arch()}-{self.get_cur_time()} --import-source=yes --set full --kernel-name {kernel} --launch-skip 1735 --launch-count 1 {run_cmd} - ''' - self.exe.exec_raw(ncu_cmd) - - def switch_config(self, config_file): - print(f"Switch config file to {config_file}") - with open(Data.meta_file, 'w') as f: - f.write(config_file.strip()) - print("Successfully switched.") - - def main(self): - if self.args.version: - print("V1.0") - - if self.args.info: - self.get_machine_info() - - if self.args.env: - self.env() - - if self.args.clean: - self.clean() - - if self.args.build: - self.build() - - if self.args.run: - self.run() - - if self.args.perf: - self.perf() - - if self.args.rbatch: - self.batch_run() - - if self.args.download: - self.download() - - if self.args.gpuperf: - self.gpu_perf() - - if self.args.ncuperf: - self.ncu_perf(self.args.ncuperf[0]) - - if self.args.use: - self.switch_config(self.args.use[0]) - - if self.args.network: - self.check_network() - - if self.args.yum: - self.change_yum_repo() - - data_list = self.args.compare - if data_list and len(data_list) == 2: - print(f"start compare {Data.app_name}") - self.compare(data_list[0], data_list[1]) - -if __name__ == '__main__': - HPCRunner().main() diff --git a/package/bisheng/1.3.3/install.sh b/package/bisheng/1.3.3/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..118c96b8b082dbd679d71042ae02df4bccd876ea --- /dev/null +++ b/package/bisheng/1.3.3/install.sh @@ -0,0 +1,5 @@ +#!/bin/bash +#download from https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-2.1.0-aarch64-linux.tar.gz +set -e +cd ${JARVIS_TMP} +tar xzvf ${JARVIS_DOWNLOAD}/bisheng-compiler-1.3.3-aarch64-linux.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/bisheng/2.1.0/install.sh b/package/bisheng/2.1.0/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..717c1e1931552d3b44b27886383823ea757884d4 --- /dev/null +++ b/package/bisheng/2.1.0/install.sh @@ -0,0 +1,6 @@ +#download from https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-2.1.0-aarch64-linux.tar.gz +#!/bin/bash +set -e +cd ${JARVIS_TMP} +yum -y install libatomic libstdc++ libstdc++-devel +tar xzvf ${JARVIS_DOWNLOAD}/bisheng-compiler-2.1.0-aarch64-linux.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/boost/1.72.0/install.sh b/package/boost/1.72.0/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..d95b972fb1aba8b46c13fe644acbdb7b754059e5 --- /dev/null +++ b/package/boost/1.72.0/install.sh @@ -0,0 +1,8 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/boost_1_72_0.tar.gz +cd boost_1_72_0 +./bootstrap.sh +./b2 install --prefix=$1 \ No newline at end of file diff --git a/package/cmake/3.20.5/install.sh b/package/cmake/3.20.5/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..fb01ef8d0b6c2904f4b859ffa3d6bb0a719d6add --- /dev/null +++ b/package/cmake/3.20.5/install.sh @@ -0,0 +1,4 @@ +#!/bin/bash +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/cmake-3.20.5-linux-aarch64.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/fftw/3.3.10/install.sh b/package/fftw/3.3.10/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..d732a7178dc63acb02e3b52415c86737fc976a0f --- /dev/null +++ b/package/fftw/3.3.10/install.sh @@ -0,0 +1,9 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +rm -rf fftw-3.3.10 +tar -xvf ${JARVIS_DOWNLOAD}/fftw-3.3.10.tar.gz +cd fftw-3.3.10 +./configure --prefix=$1 MPICC=mpicc --enable-shared --enable-threads --enable-openmp --enable-mpi +make -j install \ No newline at end of file diff --git a/package/fftw/3.3.8/install.sh b/package/fftw/3.3.8/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..df0242ae8d092bbf939326aa64b2252cd9a50485 --- /dev/null +++ b/package/fftw/3.3.8/install.sh @@ -0,0 +1,8 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/fftw-3.3.8.tar.gz +cd fftw-3.3.8 +./configure --prefix=$1 MPICC=mpicc --enable-shared --enable-threads --enable-openmp --enable-mpi +make -j install \ No newline at end of file diff --git a/package/gcc/9.3.1/install.sh b/package/gcc/9.3.1/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..c4ac9f88adb8a095ec8536fa73fa38d4757c8132 --- /dev/null +++ b/package/gcc/9.3.1/install.sh @@ -0,0 +1,4 @@ +#!/bin/bash +set -e +cd ${JARVIS_TMP} +tar -xzvf ${JARVIS_DOWNLOAD}/gcc-9.3.1-2021.03-aarch64-linux.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/gmp/6.2.0/install.sh b/package/gmp/6.2.0/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..650efaa20e84ad399ec41d42f714e89060effc8a --- /dev/null +++ b/package/gmp/6.2.0/install.sh @@ -0,0 +1,9 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/gmp-6.2.0.tar.xz +cd gmp-6.2.0 +./configure --prefix=$1 +make -j +make install \ No newline at end of file diff --git a/package/gsl/2.6/install.sh b/package/gsl/2.6/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..40948323997731d10268d05df597e81f51d39599 --- /dev/null +++ b/package/gsl/2.6/install.sh @@ -0,0 +1,8 @@ +#!/bin/bash +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/gsl-2.6.tar.gz +cd gsl-2.6 +./configure --prefix=$1 +make -j +make install diff --git a/package/hmpi/1.1.0/gcc/install.sh b/package/hmpi/1.1.0/gcc/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..254a5d9e3d8a84c33970a2eb70a1e7c395265068 --- /dev/null +++ b/package/hmpi/1.1.0/gcc/install.sh @@ -0,0 +1,4 @@ +#!/bin/bash +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/Hyper-MPI_1.1.0_aarch64_CentOS7.6_GCC9.3_MLNX-OFED4.9.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/hmpi/1.1.1/install.sh b/package/hmpi/1.1.1/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..0a1bd108c7b5fd9bd3a40d0fcb29e516ab4e1a0f --- /dev/null +++ b/package/hmpi/1.1.1/install.sh @@ -0,0 +1,23 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +yum install -y perl-Data-Dumper autoconf automake libtool binutils +rm -rf hmpi-1.1.1-huawei hucx-1.1.1-huawei xucg-1.1.1-huawei +unzip ${JARVIS_DOWNLOAD}/hucx-1.1.1-huawei.zip +unzip ${JARVIS_DOWNLOAD}/xucg-1.1.1-huawei.zip +unzip ${JARVIS_DOWNLOAD}/hmpi-1.1.1-huawei.zip +\cp -rf xucg-1.1.1-huawei/* hucx-1.1.1-huawei/src/ucg/ +sleep 3 +cd hucx-1.1.1-huawei +./autogen.sh +./contrib/configure-opt --prefix=$1/hucx CFLAGS="-DHAVE___CLEAR_CACHE=1" --disable-numa +for file in `find . -name Makefile`;do sed -i "s/-Werror//g" $file;done +for file in `find . -name Makefile`;do sed -i "s/-implicit-function-declaration//g" $file;done +make -j64 +make install +cd ../hmpi-1.1.1-huawei +./autogen.pl +./configure --prefix=$1 --with-platform=contrib/platform/mellanox/optimized --enable-mpi1-compatibility --with-ucx=$1/hucx +make -j64 +make install diff --git a/package/hmpi/FAQ.md b/package/hmpi/FAQ.md new file mode 100644 index 0000000000000000000000000000000000000000..f4b848f92c6ccfd4cfb2ab1eb5b96871745da14b --- /dev/null +++ b/package/hmpi/FAQ.md @@ -0,0 +1,7 @@ +Q:hucx/src/ucs/arch/aarch64/cpu.h:259:20:error: redefinition of 'ucs_arch_clear_cache' + +A:报错原因为该函数在其他地方已经被声明过了,无需重复声明, 应将src/ucs/arch/aarch64/cpu.h 中位于259–271行的函数注释或者删除掉 + +Q: builtin.c: 969:21: error: comparison of array 'builtin_op->steps' not equal to a null pointer is always true + +A: builtin_op->steps不可能为空,该判断多余,直接删除即可 \ No newline at end of file diff --git a/package/kgcc/10.3.1/install.sh b/package/kgcc/10.3.1/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..79fe13a6fed626317b220dfabdbfdec96336d7c3 --- /dev/null +++ b/package/kgcc/10.3.1/install.sh @@ -0,0 +1,5 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xzvf ${JARVIS_DOWNLOAD}/gcc-10.3.1-2021.09-aarch64-linux.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/kgcc/9.3.1/install.sh b/package/kgcc/9.3.1/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..300c534ea7bcf1fc720a4d38c18ea662bb9e4663 --- /dev/null +++ b/package/kgcc/9.3.1/install.sh @@ -0,0 +1,5 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xzvf ${JARVIS_DOWNLOAD}/gcc-9.3.1-2021.03-aarch64-linux.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/kml/1.4.0/bisheng/install.sh b/package/kml/1.4.0/bisheng/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..1eb7fc187bf7bc69e20e6f1125ef43c19789965e --- /dev/null +++ b/package/kml/1.4.0/bisheng/install.sh @@ -0,0 +1,52 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +rpm -e boostkit-kml +rpm --force --nodeps -ivh ${JARVIS_ROOT}/package/kml/1.4.0/bisheng/*.rpm +# generate full lapack +netlib=${JARVIS_DOWNLOAD}/lapack-3.9.1.tar.gz +klapack=/usr/local/kml/lib/libklapack.a +kservice=/usr/local/kml/lib/libkservice.a +echo $netlib +echo $klapack + +# build netlib lapack +rm -rf netlib +mkdir netlib +cd netlib +tar zxvf $netlib +mkdir build +cd build +cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_POSITION_INDEPENDENT_CODE=ON ../lapack-3.9.1 +make -j +cd ../.. + +cp netlib/build/lib/liblapack.a liblapack_adapt.a + +# get symbols defined both in klapack and netlib lapack +nm -g liblapack_adapt.a | grep 'T ' | grep -oP '\K\w+(?=_$)' | sort | uniq > netlib.sym +nm -g $klapack | grep 'T ' | grep -oP '\K\w+(?=_$)' | sort | uniq > klapack.sym +comm -12 klapack.sym netlib.sym > comm.sym + +objcopy -W dsecnd_ -W second_ liblapack_adapt.a + +# add _netlib_ postfix to symbols in liblapack_adapt.a (e.g. dgetrf_netlib_) +while read sym; do \ + if ! nm liblapack_adapt.a | grep -qe " T ${sym}_\$"; then \ + continue; \ + fi; \ + ar x liblapack_adapt.a $sym.f.o; \ + mv $sym.f.o ${sym}_netlib.f.o; \ + objcopy --redefine-sym ${sym}_=${sym}_netlib_ ${sym}_netlib.f.o; \ + ar d liblapack_adapt.a ${sym}.f.o; \ + ar ru liblapack_adapt.a ${sym}_netlib.f.o; \ + rm ${sym}_netlib.f.o; \ +done < comm.sym + +# (optional) build a full lapack shared library +clang -o libklapack_full.so -shared -fPIC -Wl,--whole-archive $klapack liblapack_adapt.a $kservice -Wl,--no-whole-archive -fopenmp -lpthread -lgfortran -lm + +\cp libklapack_full.so /usr/local/kml/lib/ +echo "Generated liblapack_adapt.a and libklapack_full.so" +exit 0 \ No newline at end of file diff --git a/package/kml/1.4.0/gcc/install.sh b/package/kml/1.4.0/gcc/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..084bbef519643fa66dd02980336af8ad07cbc617 --- /dev/null +++ b/package/kml/1.4.0/gcc/install.sh @@ -0,0 +1,7 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +rpm -e boostkit-kml +rpm --force --nodeps -ivh ${JARVIS_ROOT}/package/kml/1.4.0/gcc/*.rpm +cp -rf ${JARVIS_ROOT}/package/kml/1.4.0/gcc/libklapack_full.so /usr/local/kml/lib \ No newline at end of file diff --git a/package/lapack/3.8.0/install.sh b/package/lapack/3.8.0/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..dc4942aa0ec363708b714490d5897f070c314dc3 --- /dev/null +++ b/package/lapack/3.8.0/install.sh @@ -0,0 +1,10 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/lapack-3.8.0.tgz +cd lapack-3.8.0 +cp make.inc.example make.inc +make -j +mkdir $1/lib/ +cp *.a $1/lib/ \ No newline at end of file diff --git a/package/libint/2.6.0/install.sh b/package/libint/2.6.0/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..d48e3eb3e4f831057a288f085f6377684c774f65 --- /dev/null +++ b/package/libint/2.6.0/install.sh @@ -0,0 +1,19 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +export GCC_LIBS=/home/HT3/HPCRunner2/software/libs/kgcc9 +tar -xvf ${JARVIS_DOWNLOAD}/libint-2.6.0.tar.gz +cd libint-2.6.0 +./autogen.sh +mkdir build +cd build +export LDFLAGS="-L${GCC_LIBS}/gmp/6.2.0/lib -L${GCC_LIBS}/boost/1.72.0/lib" +export CPPFLAGS="-I${GCC_LIBS}/gmp/6.2.0/include -I${GCC_LIBS}/boost/1.72.0/include" +../configure CXX=mpicxx --enable-eri=1 --enable-eri2=1 --enable-eri3=1 --with-max-am=4 --with-eri-max-am=4,3 --with-eri2-max-am=6,5 --with-eri3-max-am=6,5 --with-opt-am=3 --enable-generic-code --disable-unrolling --with-libint-exportdir=libint_cp2k_lmax4 +make export +tar -xvf libint_cp2k_lmax4.tgz +cd libint_cp2k_lmax4 +./configure --prefix=$1 CC=mpicc CXX=mpicxx FC=mpifort --enable-fortran --enable-shared +make -j 32 +make install diff --git a/package/libvori/21.04.12/install.sh b/package/libvori/21.04.12/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..9f782329a895fc05ca5b86f2a17aedceee0ee674 --- /dev/null +++ b/package/libvori/21.04.12/install.sh @@ -0,0 +1,12 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xzvf ${JARVIS_DOWNLOAD}/libvori-210412.tar.gz +cd libvori-210412 +mkdir build +cd build +cmake .. -DCMAKE_INSTALL_PREFIX=$1 +make -j +make install + diff --git a/package/libxc/5.1.4/install.sh b/package/libxc/5.1.4/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..cc4d52340b5470b8358dd0c6a8f1c5fcd8e7b765 --- /dev/null +++ b/package/libxc/5.1.4/install.sh @@ -0,0 +1,9 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/libxc-5.1.4.tar.gz +cd libxc-5.1.4 +./configure FC=gfortran CC=gcc --prefix=$1 +make -j +make install diff --git a/package/openblas/0.3.18/install.sh b/package/openblas/0.3.18/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..d475d9e78dd32c9d39a627f87615b6e00937e43f --- /dev/null +++ b/package/openblas/0.3.18/install.sh @@ -0,0 +1,8 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xzvf ${JARVIS_DOWNLOAD}/OpenBLAS-0.3.18.tar.gz +cd OpenBLAS-0.3.18 +make -j +make PREFIX=$1 install diff --git a/package/openmpi/4.1.2/gpu/install.sh b/package/openmpi/4.1.2/gpu/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..06ff675a366cc66c07b8b6bb3dc13521965d4161 --- /dev/null +++ b/package/openmpi/4.1.2/gpu/install.sh @@ -0,0 +1,25 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +#install ucx +tar -xvf ${JARVIS_DOWNLOAD}/ucx-1.12.0.tar.gz +cd ucx +./autogen.sh +./contrib/configure-release --prefix=$1/ucx +make -j8 +make install +#install openmpi +tar -xvf ${JARVIS_DOWNLOAD}/openmpi-4.1.2.tar.gz +cd openmpi-4.1.2 +CPP=cpp CC=nvc CFLAGS='-DNDEBUG -O1 -nomp -fPIC -fno-strict-aliasing -tp=haswell' CXX=nvc++ CXXFLAGS='-DNDEBUG -O1 -nomp -fPIC -finline-functions -tp=haswell' F77=nvfortran F90=nvfortran FC=nvfortran FCFLAGS='-O1 -nomp -fPIC -tp=haswell' FFLAGS='-fast -Mipa=fast,inline -tp=haswell' LDFLAGS=-Wl,--as-needed ./configure --prefix=$1 --disable-debug --disable-getpwuid --disable-mem-debug --disable-mem-profile --disable-memchecker --disable-static --enable-mca-no-build=btl-uct --enable-mpi1-compatibility --enable-oshmem --with-cuda=/usr/local/cuda --with-ucx=$1/ucx --enable-mca-no-build=op-avx +make -j8 +make install + +export LIBRARY_PATH=$1/lib:$LIBRARY_PATH +export PATH=$1/bin:$PATH \ +UCX_IB_PCI_RELAXED_ORDERING=on \ +UCX_MAX_RNDV_RAILS=1 \ +UCX_MEMTYPE_CACHE=n \ +UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda \ +UCX_TLS=rc_v,sm,cuda_copy,cuda_ipc,gdr_copy (or UCX_TLS=all) diff --git a/package/openmpi/4.1.2/install.sh b/package/openmpi/4.1.2/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..6b93da002460d131b4bd8651555e802edb349b31 --- /dev/null +++ b/package/openmpi/4.1.2/install.sh @@ -0,0 +1,8 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/openmpi-4.1.2.tar.gz +cd openmpi-4.1.2 +./configure CC=gcc CXX=g++ FC=gfortran --prefix=$1 --enable-pretty-print-stacktrace --enable-orterun-prefix-by-default --with-knem=/opt/knem-1.1.4.90mlnx1/ --with-hcoll=/opt/mellanox/hcoll/ --with-cma --with-ucx --enable-mpi1-compatibility +make -j install diff --git a/package/plumed/2.6.2/install.sh b/package/plumed/2.6.2/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..a75c28b31617c98ccd5263497882ce6078b923d7 --- /dev/null +++ b/package/plumed/2.6.2/install.sh @@ -0,0 +1,9 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/plumed-2.6.2.tgz +cd plumed-2.6.2 +./configure CXX=mpicxx CC=mpicc FC=mpifort --prefix=$1 --enable-external-blas --enable-gsl --enable-external-lapack LDFLAGS=-L/home//HT3/HPCRunner2/package/lapack/3.8.0/lapack-3.8.0/ LIBS="-lrefblas –llapack" +make -j +make install diff --git a/package/python3/3.7.10/install.sh b/package/python3/3.7.10/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..49f88a7350acfa2fb6f1ed98e91a77665b8070de --- /dev/null +++ b/package/python3/3.7.10/install.sh @@ -0,0 +1,12 @@ +#!/bin/bash +# https://repo.huaweicloud.com/python/3.7.10/Python-3.7.10.tgz +set -x +set -e +cd ${JARVIS_TMP} +yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make libffi-devel +tar -zxvf ${JARVIS_DOWNLOAD}/Python-3.7.10.tgz +cd Python-3.7.10 +./configure --prefix=${JARVIS_COMPILER}/python3 +make +make install +ln -s ${JARVIS_COMPILER}/python3/bin/python3.7 /usr/local/bin/python3 \ No newline at end of file diff --git a/package/scalapack/2.1.0/install.sh b/package/scalapack/2.1.0/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..e79a4709e95a28c04cf4abdbc0db79914a88ccc4 --- /dev/null +++ b/package/scalapack/2.1.0/install.sh @@ -0,0 +1,10 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/scalapack-2.1.0.tgz +cd scalapack-2.1.0 +cp SLmake.inc.example SLmake.inc +make -j +mkdir $1/lib +cp *.a $1/lib diff --git a/package/scalapack/2.1.0/kml/install.sh b/package/scalapack/2.1.0/kml/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..26da61aa5d306a2a6c53101f40a0af2fd4e4c70a --- /dev/null +++ b/package/scalapack/2.1.0/kml/install.sh @@ -0,0 +1,13 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +rm -rf scalapack-2.1.0 +tar -xvf ${JARVIS_DOWNLOAD}/scalapack-2.1.0.tgz +cd scalapack-2.1.0 +rm -rf build +mkdir build +cd build +cmake -DCMAKE_INSTALL_PREFIX=$1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_SHARED_LIBS=ON -DBLAS_LIBRARIES=/usr/local/kml/lib/kblas/omp/libkblas.so -DLAPACK_LIBRARIES=/usr/local/kml/lib/libklapack_full.so -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpif90 .. +make -j +make install \ No newline at end of file diff --git a/package/spglib/1.16.0/install.sh b/package/spglib/1.16.0/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..c1877ea7d4ab322b7725445cc7e1b0672fe782b4 --- /dev/null +++ b/package/spglib/1.16.0/install.sh @@ -0,0 +1,11 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/spglib-1.16.0.tar.gz +cd spglib-1.16.0 +mkdir build +cd build +cmake .. -DCMAKE_INSTALL_PREFIX=$1 +make -j +make install diff --git a/package/tau/2.30.0/install.sh b/package/tau/2.30.0/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..e7803c978cc4ab8af3535a8f7b68b3479a84530a --- /dev/null +++ b/package/tau/2.30.0/install.sh @@ -0,0 +1,17 @@ +#!/bin/bash +set -x +set -e +cd ${JARVIS_TMP} +# install PDT +tar -zxvf ${JARVIS_DOWNLOAD}/pdt.tgz +cd pdtoolkit-3.25.1/ +./configure -GNU -prefix=$1/PDT +make -j install +# install TAU, using tau with external package +tar -zxvf ${JARVIS_DOWNLOAD}/tau-2.30.0.tar.gz +cd tau-2.30.0/ +./configure -openmp -bfd=download -unwind=download -mpi -pdt=$1/PDT/ -pdt_c++=g++ -mpi +export PATH=$1/tau-2.30.0/arm64_linux/bin:$PATH + +#usage: mpirun --allow-run-as-root -np 128 -x OMP_NUM_THREADS=1 --mca btl ^openib tau_exec vasp_std +#pprof diff --git a/software/compiler/bisheng/2.1.0/installed b/software/compiler/bisheng/2.1.0/installed new file mode 100644 index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547 --- /dev/null +++ b/software/compiler/bisheng/2.1.0/installed @@ -0,0 +1 @@ +0 \ No newline at end of file diff --git a/software/compiler/gcc/9.3.1/installed b/software/compiler/gcc/9.3.1/installed new file mode 100644 index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547 --- /dev/null +++ b/software/compiler/gcc/9.3.1/installed @@ -0,0 +1 @@ +0 \ No newline at end of file diff --git a/software/compiler/kgcc/10.3.1/installed b/software/compiler/kgcc/10.3.1/installed new file mode 100644 index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547 --- /dev/null +++ b/software/compiler/kgcc/10.3.1/installed @@ -0,0 +1 @@ +0 \ No newline at end of file diff --git a/software/compiler/kgcc/9.3.1/installed b/software/compiler/kgcc/9.3.1/installed new file mode 100644 index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547 --- /dev/null +++ b/software/compiler/kgcc/9.3.1/installed @@ -0,0 +1 @@ +0 \ No newline at end of file diff --git a/software/compiler/python3/installed b/software/compiler/python3/installed new file mode 100644 index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547 --- /dev/null +++ b/software/compiler/python3/installed @@ -0,0 +1 @@ +0 \ No newline at end of file diff --git a/software/libs/bisheng2/openblas/0.3.18/installed b/software/libs/bisheng2/openblas/0.3.18/installed new file mode 100644 index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547 --- /dev/null +++ b/software/libs/bisheng2/openblas/0.3.18/installed @@ -0,0 +1 @@ +0 \ No newline at end of file diff --git a/software/libs/gcc9/fftw/3.3.8/installed b/software/libs/gcc9/fftw/3.3.8/installed new file mode 100644 index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547 --- /dev/null +++ b/software/libs/gcc9/fftw/3.3.8/installed @@ -0,0 +1 @@ +0 \ No newline at end of file diff --git a/software/libs/nvc/installed b/software/libs/nvc/installed new file mode 100644 index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547 --- /dev/null +++ b/software/libs/nvc/installed @@ -0,0 +1 @@ +0 \ No newline at end of file diff --git a/software/moduledeps/gcc9-openmpi4/scalapack/2.1.0 b/software/moduledeps/gcc9-openmpi4/scalapack/2.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..c0af1c6e6d9bab59c55af2d401a7bd2111603148 --- /dev/null +++ b/software/moduledeps/gcc9-openmpi4/scalapack/2.1.0 @@ -0,0 +1,5 @@ +#%Module1.0##################################################################### +set rootdir $::env(JARVIS_ROOT) +set version 2.1.0 + +prepend-path LD_LIBRARY_PATH $rootdir/software/libs/gcc9/openmpi4/scalapack/2.1.0 diff --git a/software/moduledeps/gcc9/openblas/0.3.18 b/software/moduledeps/gcc9/openblas/0.3.18 new file mode 100644 index 0000000000000000000000000000000000000000..509b37ca869a54aa8bf89dc996f2a7516979c336 --- /dev/null +++ b/software/moduledeps/gcc9/openblas/0.3.18 @@ -0,0 +1,12 @@ +#%Module1.0##################################################################### +set rootdir $::env(JARVIS_ROOT) +set prefix $rootdir/software/libs/gcc9/openblas/0.3.18 +set version 0.3.18 + +prepend-path PATH $prefix/bin +prepend-path INCLUDE $prefix/include +prepend-path LD_LIBRARY_PATH $prefix/lib + +setenv OPENBLAS_DIR $prefix +setenv OPENBLAS_LIB $prefix/lib +setenv OPENBLAS_INC $prefix/include diff --git a/software/modulefiles/gcc9/9.3.1 b/software/modulefiles/gcc9/9.3.1 new file mode 100644 index 0000000000000000000000000000000000000000..2a2a3f888552d5a362768e9400e1b6126da73b99 --- /dev/null +++ b/software/modulefiles/gcc9/9.3.1 @@ -0,0 +1,10 @@ +#%Module1.0##################################################################### +set rootdir $::env(JARVIS_ROOT) +set prefix $rootdir/software/compiler/gcc/9.3.1 +set version 9.3.1 + +prepend-path PATH $prefix/bin +prepend-path MANPATH $prefix/share/man +prepend-path INCLUDE $prefix/include +prepend-path LD_LIBRARY_PATH $prefix/lib64 +prepend-path MODULEPATH $rootdir/software/moduledeps/gcc9 diff --git a/software/mpi/openmpi4-gcc9/4.1.2/installed b/software/mpi/openmpi4-gcc9/4.1.2/installed new file mode 100644 index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547 --- /dev/null +++ b/software/mpi/openmpi4-gcc9/4.1.2/installed @@ -0,0 +1 @@ +0 \ No newline at end of file diff --git a/software/utils/cmake/3.20.5/installed b/software/utils/cmake/3.20.5/installed new file mode 100644 index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547 --- /dev/null +++ b/software/utils/cmake/3.20.5/installed @@ -0,0 +1 @@ +0 \ No newline at end of file diff --git a/src/analysis.py b/src/analysis.py new file mode 100644 index 0000000000000000000000000000000000000000..1492550b3db69d68f1cc61ebeb532a68183e256f --- /dev/null +++ b/src/analysis.py @@ -0,0 +1,640 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +import platform +import sys +import os +import re +from glob import glob + +from data import Data +from tool import Tool +from execute import Execute +from machine import Machine +from bench import Benchmark + +from enum import Enum + +class SType(Enum): + COMPILER = 1 + MPI = 2 + UTIL = 3 + LIB = 4 + +class Install: + def __init__(self): + self.hpc_data = Data() + self.exe = Execute() + self.tool = Tool() + self.ROOT = os.getcwd() + self.PACKAGE_PATH = os.path.join(self.ROOT, 'package') + self.COMPILER_PATH = os.path.join(self.ROOT, 'software/compiler') + self.LIBS_PATH = os.path.join(self.ROOT, 'software/libs') + self.MODULE_DEPS_PATH = os.path.join(self.ROOT, 'software/moduledeps') + self.MODULE_FILES = os.path.join(self.ROOT, 'software/modulefiles') + self.MPI_PATH = os.path.join(self.ROOT, 'software/mpi') + self.UTILS_PATH = os.path.join(self.ROOT, 'software/utils') + + def get_version_info(self, info): + return re.search( r'(\d+)\.(\d+)\.',info).group(1) + + # some command don't generate output, must redirect to a tmp file + def get_cmd_output(self, cmd): + tmp_path = os.path.join(self.ROOT, 'tmp') + tmp_file = os.path.join(tmp_path, 'tmp.txt') + self.tool.mkdirs(tmp_path) + cmd += f' &> {tmp_file}' + self.exe.exec_popen(cmd, False) + info_list = self.tool.read_file(tmp_file).split('\n') + return info_list + + def get_gcc_info(self): + gcc_info_list = self.get_cmd_output('gcc -v') + gcc_info = gcc_info_list[-1].strip() + version = self.get_version_info(gcc_info) + name = 'gcc' + if 'kunpeng' in gcc_info.lower(): + name = 'kgcc' + return {"cname": name, "cmversion": version} + + def get_clang_info(self): + clang_info_list = self.get_cmd_output('clang -v') + clang_info = clang_info_list[0].strip() + version = self.get_version_info(clang_info) + name = 'clang' + if 'bisheng' in clang_info.lower(): + name = 'bisheng' + return {"cname": name, "cmversion": version} + + def get_nvc_info(self): + return {"cname": "cuda", "cmversion": "11"} + + def get_icc_info(self): + return {"cname": "icc", "cmversion": "11"} + + def get_mpi_info(self): + mpi_info_list = self.get_cmd_output('mpirun -version') + mpi_info = mpi_info_list[0].strip() + name = 'openmpi' + version = self.get_version_info(mpi_info) + hmpi_info = self.get_cmd_output('ompi_info | grep "MCA coll: ucx"')[0] + if hmpi_info != "": + name = 'hmpi' + version = re.search( r'Component v(\d+)\.(\d+)\.',hmpi_info).group(1) + return {"name": name, "version": version} + + def check_software_path(self, software_path): + abs_software_path = os.path.join(self.PACKAGE_PATH, software_path) + if not os.path.exists(abs_software_path): + print(f"{software_path} not exist, Are you sure the software lies in package dir?") + return False + return abs_software_path + + def check_compiler_mpi(self, compiler_list, compiler_mpi_info): + no_compiler = ["COM","ANY"] + is_valid = False + compiler_mpi_info = compiler_mpi_info.upper() + valid_list = [] + for compiler in compiler_list: + valid_list.append(compiler) + valid_list.append(f'{compiler}+MPI') + valid_list += no_compiler + for valid_para in valid_list: + if compiler_mpi_info == valid_para: + is_valid = True + break + if not is_valid: + print(f"compiler or mpi info error, Only {valid_list.join('/').lower()} is supported") + return False + return compiler_mpi_info + + def get_used_compiler(self, compiler_mpi_info): + return compiler_mpi_info.split('+')[0] + + def get_software_type(self,software_name, compiler_mpi_info): + if self.is_mpi_software(software_name): + return SType.MPI + if compiler_mpi_info == "COM": + return SType.COMPILER + elif compiler_mpi_info == "ANY": + return SType.UTIL + else: + return SType.LIB + + def get_suffix(self, software_info_list): + if len(software_info_list) == 3: + return software_info_list[2] + return "" + + def get_software_info(self, software_path, compiler_mpi_info): + software_info_list = software_path.split('/') + software_name = software_info_list[0] + software_version = software_info_list[1] + software_main_version = self.get_main_version(software_version) + software_type = self.get_software_type(software_name, compiler_mpi_info) + software_info = { + "sname":software_name, + "sversion": software_version, + "mversion": software_main_version, + "type" : software_type, + "suffix": self.get_suffix(software_info_list) + } + if software_type == SType.LIB or software_type == SType.MPI: + software_info["is_use_mpi"] = self.is_contained_mpi(compiler_mpi_info) + software_info["use_compiler"] = self.get_used_compiler(compiler_mpi_info) + return software_info + + def get_compiler_info(self, compilers, compiler_mpi_info): + compiler_info = {"cname":None, "cmversion": None} + for compiler, info_func in compilers.items(): + if compiler in compiler_mpi_info: + compiler_info = info_func() + return compiler_info + + def get_main_version(self, version): + return version.split('.')[0] + + def is_mpi_software(self, software_name): + mpis = ['hmpi', 'openmpi', 'hpcx'] + return software_name in mpis + + def add_mpi_path(self, software_info, install_path): + if not software_info['is_use_mpi']: + return install_path + mpi_info = self.get_mpi_info() + if mpi_info["version"] == None: + print("MPI not found!") + return False + mpi_str = mpi_info["name"]+mpi_info["version"] + print("Use MPI: "+mpi_str) + install_path = os.path.join(install_path, mpi_str) + return install_path + + def get_install_path(self, software_info, env_info): + suffix = software_info['suffix'] + sversion = software_info['sversion'] + stype = software_info['type'] + cname = env_info['cname'] + if suffix != "": + software_info['sname'] += '-' + suffix + sname = software_info['sname'] + if stype == SType.MPI: + return os.path.join(self.MPI_PATH, f"{sname}{self.get_main_version(sversion)}-{cname}{env_info['cmversion']}", sversion) + if stype == SType.COMPILER: + install_path = os.path.join(self.COMPILER_PATH, f'{sname}/{sversion}') + elif stype == SType.UTIL: + install_path = os.path.join(self.UTILS_PATH, f'{sname}/{sversion}') + else: + install_path = os.path.join(self.LIBS_PATH, cname+env_info['cmversion']) + # get mpi name and version + install_path = self.add_mpi_path(software_info, install_path) + install_path = os.path.join(install_path, f'{sname}/{sversion}') + return install_path + + def is_contained_mpi(self, compiler_mpi_info): + return "MPI" in compiler_mpi_info + + def get_files(self, abs_path): + file_list = [d for d in glob(abs_path+'/**', recursive=True)] + return file_list + + def get_module_file_content(self, install_path, sversion): + module_file_content = '' + file_list = self.get_files(install_path) + bins_dir_type = ["bin"] + libs_dir_type = ["libs", "lib", "lib64"] + incs_dir_type = ["include"] + bins_dir = [] + libs_dir = [] + incs_dir = [] + bins_str = '' + libs_str = '' + incs_str = '' + for file in file_list: + if not os.path.isdir(file): + continue + last_dir = file.split('/')[-1] + if last_dir in bins_dir_type: + bins_dir.append(file.replace(install_path, "$prefix")) + elif last_dir in libs_dir_type: + libs_dir.append(file.replace(install_path, "$prefix")) + elif last_dir in incs_dir_type: + incs_dir.append(file.replace(install_path, "$prefix")) + if len(bins_dir) >= 1: + bins_str = "prepend-path PATH "+':'.join(bins_dir) + if len(libs_dir) >= 1: + libs_str = "prepend-path LD_LIBRARY_PATH "+':'.join(libs_dir) + if len(incs_dir) >= 1: + incs_str = "prepend-path INCLUDE " + ':'.join(incs_dir) + module_file_content = f'''#%Module1.0##################################################################### +set prefix {install_path} +set version {sversion} + +{bins_str} +{libs_str} +{incs_str} +''' + return module_file_content + + def get_installed_file_path(self, install_path): + return os.path.join(install_path, "installed") + + def is_installed(self, install_path): + installed_file_path = self.get_installed_file_path(install_path) + if not os.path.exists(installed_file_path): + return False + if not self.tool.read_file(installed_file_path) == "1": + return False + return True + + def set_installed_status(self, install_path): + installed_file_path = self.get_installed_file_path(install_path) + self.tool.write_file(installed_file_path, "1") + + def gen_module_file(self, install_path, software_info, env_info): + sname = software_info['sname'] + sversion = software_info['sversion'] + stype = software_info['type'] + cname = env_info['cname'] + cmversion = env_info['cmversion'] + software_str = sname + self.get_main_version(sversion) + module_file_content = self.get_module_file_content(install_path, sversion) + if not self.is_installed(install_path): + return + if stype == SType.MPI: + compiler_str = cname + cmversion + module_path = os.path.join(self.MODULE_DEPS_PATH, compiler_str ,software_str) + attach_module_path = os.path.join(self.MODULE_DEPS_PATH, compiler_str+'-'+software_str) + self.tool.mkdirs(attach_module_path) + module_file_content += f"\nprepend-path MODULEPATH {attach_module_path}" + else: + if stype == SType.COMPILER: + module_path = os.path.join(self.MODULE_FILES, software_str) + attach_module_path = os.path.join(self.MODULE_DEPS_PATH, software_str) + self.tool.mkdirs(attach_module_path) + module_file_content += f"\nprepend-path MODULEPATH {attach_module_path}" + elif stype == SType.UTIL: + module_path = os.path.join(self.MODULE_FILES, sname) + else: + compiler_str = cname + cmversion + if software_info['is_use_mpi']: + mpi_info = self.get_mpi_info() + mpi_str = mpi_info['name'] + self.get_main_version(mpi_info['version']) + module_path = os.path.join(self.MODULE_DEPS_PATH, f"{compiler_str}-{mpi_str}" ,sname) + else: + module_path = os.path.join(self.MODULE_DEPS_PATH, compiler_str, sname) + self.tool.mkdirs(module_path) + module_file = os.path.join(module_path, sversion) + self.tool.write_file(module_file, module_file_content) + print(f"module file {module_file} successfully generated") + + def install_package(self, abs_software_path, install_path): + install_script = 'install.sh' + install_script_path = os.path.join(abs_software_path, install_script) + print("start installing..."+ abs_software_path) + if not os.path.exists(install_script_path): + print("install script not exists, skipping...") + return + self.tool.mkdirs(install_path) + if self.is_installed(install_path): + print("already installed, skipping...") + return + install_cmd = f''' +source ./init.sh +cd {abs_software_path} +chmod +x {install_script} +./{install_script} {install_path} +''' + result = self.exe.exec_raw(install_cmd) + if result: + print(f"install to {install_path} successful") + self.set_installed_status(install_path) + else: + print("install failed") + sys.exit() + + def install(self, software_path, compiler_mpi_info): + self.tool.prt_content("INSTALL " + software_path) + compilers = {"GCC":self.get_gcc_info, "CLANG":self.get_clang_info, + "NVC":self.get_nvc_info, "ICC":self.get_icc_info, + "BISHENG":self.get_clang_info} + + # software_path should exists + abs_software_path = self.check_software_path(software_path) + if not abs_software_path: return + compiler_mpi_info = self.check_compiler_mpi(compilers.keys(), compiler_mpi_info) + if not compiler_mpi_info: return + software_info = self.get_software_info(software_path, compiler_mpi_info) + stype = software_info['type'] + # get compiler name and version + env_info = self.get_compiler_info(compilers, compiler_mpi_info) + if stype == SType.LIB or stype == SType.MPI: + cmversion = env_info['cmversion'] + if cmversion == None: + print(f"The specified {software_info['use_compiler']} Compiler not found!") + return False + else: + print(f"Use Compiler: {env_info['cname']} {cmversion}") + + # get install path + install_path = self.get_install_path(software_info, env_info) + if not install_path: return + # get install script + self.install_package(abs_software_path, install_path) + # gen module file + self.gen_module_file( install_path, software_info, env_info) + + def install_depend(self): + depend_file = 'depend_install.sh' + print(f"start installing dependendcy of {Data.app_name}") + depend_content = f''' +{Data.dependency} +''' + self.tool.write_file(depend_file, depend_content) + run_cmd = f''' +chmod +x {depend_file} +./{depend_file} +''' + self.exe.exec_raw(run_cmd) + +class Env: + def __init__(self): + self.hpc_data = Data() + self.tool = Tool() + self.ROOT = os.getcwd() + self.exe = Execute() + + def env(self): + print(f"set environment {Data.app_name}") + env_file = os.path.join(self.ROOT, Data.env_file) + self.tool.write_file(env_file, Data.module_content) + print(f"ENV FILE {Data.env_file} GENERATED.") + self.exe.exec_raw(f'chmod +x {Data.env_file}') + +class Build: + def __init__(self): + self.hpc_data = Data() + self.exe = Execute() + + def clean(self): + print(f"start clean {Data.app_name}") + clean_cmd=self.hpc_data.get_clean_cmd() + self.exe.exec_raw(clean_cmd) + + def build(self): + print(f"start build {Data.app_name}") + build_cmd = self.hpc_data.get_build_cmd() + self.exe.exec_raw(build_cmd) + +class Run: + def __init__(self): + self.hpc_data = Data() + self.exe = Execute() + self.tool = Tool() + self.ROOT = os.getcwd() + self.avail_ips_list = self.tool.gen_list(Data.avail_ips) + + def gen_hostfile(self, nodes): + length = len(self.avail_ips_list) + if nodes > length: + print(f"You don't have {nodes} nodes, only {length} nodes available!") + sys.exit() + if nodes <= 1: + return + gen_nodes = '\n'.join(self.avail_ips_list[:nodes]) + print(f"HOSTFILE\n{gen_nodes}\nGENERATED.") + self.tool.write_file('hostfile', gen_nodes) + + # single run + def run(self): + print(f"start run {Data.app_name}") + nodes = int(Data.run_cmd['nodes']) + self.gen_hostfile(nodes) + run_cmd = self.hpc_data.get_run_cmd() + self.exe.exec_raw(run_cmd) + + def batch_run(self): + batch_file = 'batch_run.sh' + batch_file_path = os.path.join(self.ROOT, batch_file) + print(f"start batch run {Data.app_name}") + batch_content = f''' +cd {Data.case_dir} +{Data.batch_cmd} +''' + self.tool.write_file(batch_file_path, batch_content) + run_cmd = f''' +chmod +x {batch_file} +./{batch_file} +''' + self.exe.exec_raw(run_cmd) + +class Perf: + def __init__(self): + self.hpc_data = Data() + self.exe = Execute() + self.tool = Tool() + self.isARM = platform.machine() == 'aarch64' + + def get_pid(self): + #get pid + pid_cmd = f'pidof {Data.binary_file}' + result = self.exe.exec_popen(pid_cmd) + if len(result) == 0: + print("failed to get pid.") + sys.exit() + else: + pid_list = result[0].split(' ') + mid = int(len(pid_list)/2) + return pid_list[mid].strip() + + def perf(self): + print(f"start perf {Data.app_name}") + #get pid + pid = self.get_pid() + #start perf && analysis + perf_cmd = f''' +perf record {Data.perf_para} -a -g -p {pid} +perf report -i ./perf.data -F period,sample,overhead,symbol,dso,comm -s overhead --percent-limit 0.1% --stdio +''' + self.exe.exec_raw(perf_cmd) + + def get_arch(self): + arch = 'arm' + if not self.isARM: + arch = 'X86' + return arch + + def get_cur_time(self): + return re.sub(' |:', '-', self.tool.get_time_stamp()) + + def gpu_perf(self): + print(f"start gpu perf") + run_cmd = self.hpc_data.get_run() + gperf_cmd = f''' +cd {Data.case_dir} +nsys profile -y 5s -d 100s {Data.nsys_para} -o nsys-{self.get_arch()}-{self.get_cur_time()} {run_cmd} + ''' + self.exe.exec_raw(gperf_cmd) + + def ncu_perf(self, kernel): + print(f"start ncu perf") + run_cmd = self.hpc_data.get_run() + ncu_cmd = f''' + cd {Data.case_dir} + ncu --export ncu-{self.get_arch()}-{self.get_cur_time()} {Data.ncu_para} --import-source=yes --set full --kernel-name {kernel} --launch-skip 1735 --launch-count 1 {run_cmd} + ''' + self.exe.exec_raw(ncu_cmd) + +class Download: + def __init__(self): + self.hpc_data = Data() + self.exe = Execute() + self.tool = Tool() + self.ROOT = os.getcwd() + self.download_list = self.tool.gen_list(Data.download_info) + self.download_path = os.path.join(self.ROOT, 'downloads') + + def check_network(self): + print(f"start network checking") + network_test_cmd=''' +wget --spider -T 5 -q -t 2 www.baidu.com | echo $? +curl -s -o /dev/null www.baidu.com | echo $? + ''' + self.exe.exec_raw(network_test_cmd) + + def change_yum_repo(self): + print(f"start yum repo change") + repo_cmd = ''' +cp ./templates/yum/*.repo /etc/yum.repos.d/ +yum clean all +yum makecache +''' + self.exe.exec_raw(repo_cmd) + + def gen_wget_url(self, out_dir='./downloads', url=''): + head = "wget --no-check-certificate" + out_para = "-P" + download_url = f'{head} {out_para} {out_dir} {url}' + return download_url + + def download(self): + print(f"start download") + url_links = [] + self.tool.mkdirs(self.download_path) + download_flag = False + # create directory + for url_info in self.download_list: + url_list = url_info.split(' ') + if len(url_list) != 2: + continue + software_info = url_list[0].strip() + url_link = url_list[1].strip() + url_links.append(url_link) + # create software directory + software_path = os.path.join(self.ROOT, 'package', software_info) + self.tool.mkdirs(software_path) + # create install script + install_script = os.path.join(software_path, "install.sh") + self.tool.mkfile(install_script) + # start download + for url in url_links: + download_flag = True + filename = os.path.basename(url) + file_path = os.path.join(self.download_path, filename) + if os.path.exists(file_path): + self.tool.prt_content(f"FILE {filename} already DOWNLOADED") + continue + download_url = self.gen_wget_url(self.download_path, url) + self.tool.prt_content("DOWNLOAD " + filename) + os.popen(download_url) + if not download_flag: + print("The download list is empty!") +class Test: + def __init__(self): + self.exe = Execute() + self.ROOT = os.getcwd() + self.test_dir = os.path.join(self.ROOT, 'test') + + def test(self): + run_cmd = f''' +cd {self.test_dir} +./test-qe.sh +cd {self.test_dir} +./test-util.sh +''' + self.exe.exec_raw(run_cmd) + +class Config: + def __init__(self): + self.exe = Execute() + self.tool = Tool() + self.ROOT = os.getcwd() + + def switch_config(self, config_file): + print(f"Switch config file to {config_file}") + meta_path = os.path.join(self.ROOT, Data.meta_file) + self.tool.write_file(meta_path, config_file.strip()) + print("Successfully switched.") + +class Analysis: + def __init__(self): + self.jmachine = Machine() + self.jtest = Test() + self.jdownload = Download() + self.jbenchmark = Benchmark() + self.jperf = Perf() + self.jrun = Run() + self.jbuild = Build() + self.jenv = Env() + self.jinstall = Install() + self.jconfig = Config() + + def get_machine_info(self): + self.jmachine.output_machine_info() + + def bench(self, bench_case): + self.jbenchmark.output_bench_info(bench_case) + + def switch_config(self, config_file): + self.jconfig.switch_config(config_file) + + def test(self): + self.jtest.test() + + def download(self): + self.jdownload.download() + + def check_network(self): + self.jdownload.check_network() + + def gpu_perf(self): + self.jperf.gpu_perf() + + def ncu_perf(self, kernel): + self.jperf.ncu_perf(kernel) + + def perf(self): + self.jperf.perf() + + def kperf(self): + self.jperf.kperf() + + def run(self): + self.jrun.run() + + def batch_run(self): + self.jrun.batch_run() + + def clean(self): + self.jbuild.clean() + + def build(self): + self.jbuild.build() + + def env(self): + self.jenv.env() + + def install(self,software_path, compiler_mpi_info): + self.jinstall.install(software_path, compiler_mpi_info) + + def install_deps(self): + self.jinstall.install_depend() diff --git a/src/bench.py b/src/bench.py new file mode 100644 index 0000000000000000000000000000000000000000..96f55d70c9ac7b4912c28041dd66b919b57b132e --- /dev/null +++ b/src/bench.py @@ -0,0 +1,26 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +import platform +import os +from glob import glob + +from execute import Execute + +class Benchmark: + def __init__(self): + self.isARM = platform.machine() == 'aarch64' + self.ROOT = os.getcwd() + self.exe = Execute() + self.RUN_FILE = 'run.sh' + self.ALL = 'all' + + def output_bench_info(self, bench_case): + bench_path = os.path.join(self.ROOT, 'benchmark') + file_list = [d for d in glob(bench_path+'/**', recursive=False)] + for file in file_list: + cur_bench_case = os.path.basename(file) + run_file = os.path.join(file, self.RUN_FILE) + if os.path.isdir(file) and os.path.exists(run_file): + cmd = f"cd {file} && chmod +x {self.RUN_FILE} && ./{self.RUN_FILE}" + if cur_bench_case == self.ALL or cur_bench_case == bench_case: + self.exe.exec_raw(cmd) diff --git a/data.py b/src/data.py similarity index 75% rename from data.py rename to src/data.py index 116348ff86a07c15e82a8a2998f614e750f72538..31bb4d2bd2b3a73e997613057648e35031339528 100644 --- a/data.py +++ b/src/data.py @@ -3,10 +3,13 @@ import os import platform +from tool import Tool + class Data: # Hardware Info avail_ips='' # Dependent Software environment Info + dependency = '' module_content='' env_file = 'env.sh' # Application Info @@ -23,33 +26,36 @@ class Data: batch_cmd = '' #Other Info meta_file = '.meta' - download_urls = ''' -https://www.cp2k.org/static/downloads/libxc-5.1.4.tar.gz -https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz -''' - + root_path = os.getcwd() + download_info = '' + #perf info + kperf_para = '' + perf_para = '' + nsys_para = '' + ncu_para = '' + def get_abspath(self, relpath): + return os.path.join(Data.root_path, relpath) + def __init__(self): self.isARM = platform.machine() == 'aarch64' + self.tool = Tool() self.data_process() def get_file_name(self): file_name = 'data.config' if not os.path.exists(Data.meta_file): - if not self.isARM: - file_name = 'data.X86.config' return file_name - with open(Data.meta_file, encoding='utf-8') as file_obj: - contents = file_obj.read() - return contents.strip() + return self.tool.read_file(Data.meta_file) def get_data_config(self): file_name = self.get_file_name() - with open(file_name, encoding='utf-8') as file_obj: + file_path = self.get_abspath(file_name) + with open(file_path, encoding='utf-8') as file_obj: contents = file_obj.read() return contents.strip() - def is_empty(self, content): - return len(content) == 0 or content.isspace() or content == '\n' + def is_empty(self, str): + return len(str) == 0 or str.isspace() or str == '\n' def read_rows(self, rows, start_row): data = '' @@ -81,6 +87,12 @@ https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz Data.build_dir = data['build_dir'] Data.binary_dir = data['binary_dir'] Data.case_dir = data['case_dir'] + + def set_perf_info(self, data): + Data.kperf_para = data['kperf'] + Data.perf_para = data['perf'] + Data.nsys_para = data['nsys'] + Data.ncu_para = data['ncu'] def split_two_part(self, data): split_list = data.split(' ', 1) @@ -95,10 +107,15 @@ https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz rows = contents.split('\n') rowIndex = 0 data = {} + perf_data = {} while rowIndex < len(rows): row = rows[rowIndex].strip() if row == '[SERVER]': rowIndex, Data.avail_ips = self.read_rows(rows, rowIndex+1) + elif row == '[DOWNLOAD]': + rowIndex, Data.download_info = self.read_rows(rows, rowIndex+1) + elif row == '[DEPENDENCY]': + rowIndex, Data.dependency = self.read_rows(rows, rowIndex+1) elif row == '[ENV]': rowIndex, Data.module_content = self.read_rows(rows, rowIndex+1) elif row == '[APP]': @@ -112,6 +129,9 @@ https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz rowIndex, Data.run_cmd = self.read_rows_kv(rows, rowIndex+1) elif row == '[BATCH]': rowIndex, Data.batch_cmd = self.read_rows(rows, rowIndex+1) + elif row == '[PERF]': + rowIndex, perf_data = self.read_rows_kv(rows, rowIndex+1) + self.set_perf_info(perf_data) else: rowIndex += 1 Data.binary_file, Data.binary_para = self.split_two_part(Data.run_cmd['binary']) @@ -121,9 +141,14 @@ https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz cd {Data.build_dir} {Data.clean_cmd} ''' + def get_env(self): + return f''' +./jarvis -e +source ./{Data.env_file}''' def get_build_cmd(self): return f''' +{self.get_env()} cd {Data.build_dir} {Data.build_cmd} ''' @@ -141,6 +166,7 @@ cd {Data.build_dir} def get_run_cmd(self): return f''' +{self.get_env()} cd {Data.case_dir} {self.get_run()} ''' \ No newline at end of file diff --git a/src/execute.py b/src/execute.py new file mode 100644 index 0000000000000000000000000000000000000000..19e6b50f283d2bfb08a61d6fc1946e8d5782162b --- /dev/null +++ b/src/execute.py @@ -0,0 +1,62 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +import os +import logging +from asyncio.log import logger +from datetime import datetime +from tool import Tool + +LOG_FORMAT = "%(asctime)s - %(levelname)s - %(message)s" +DATE_FORMAT = "%m/%d/%Y %H:%M:%S %p" +logging.basicConfig(filename='runner.log', level=logging.DEBUG, format=LOG_FORMAT, datefmt=DATE_FORMAT) + +class Execute: + def __init__(self): + self.cur_time = '' + self.end_time = '' + self.tool = Tool() + self.flags = '*' * 80 + self.end_flag = 'END: ' + + # tools function + def join_cmd(self, arrs): + return " && ".join(arrs) + + def print_cmd(self, cmd): + print(self.flags) + self.cur_time = self.tool.get_time_stamp() + print(f"RUNNING at {self.cur_time}:\n{cmd}") + logging.info(cmd) + print(self.flags) + + # Execute, get output and don't know whether success or not + def exec_popen(self, cmd, isPrint=True): + if isPrint: + self.print_cmd(cmd) + output = os.popen(cmd).readlines() + return output + + def get_duration(self): + time_1_struct = datetime.strptime(self.cur_time, "%Y-%m-%d %H:%M:%S") + time_2_struct = datetime.strptime(self.end_time, "%Y-%m-%d %H:%M:%S") + seconds = (time_2_struct - time_1_struct).seconds + return seconds + + # Execute, get whether success or not + def exec_list(self, cmds): + cmd = self.join_cmd(cmds) + if not cmd.startswith('echo'): + self.print_cmd(cmd) + state = os.system(cmd) + self.end_time = self.tool.get_time_stamp() + print(f"total time used: {self.get_duration()}s") + logger.info(self.end_flag + cmd) + if state: + print(f"failed at {self.end_time}:{state}".upper()) + return False + else: + print(f"successfully executed at {self.end_time}, congradulations!!!".upper()) + return True + + def exec_raw(self, rows): + return self.exec_list(self.tool.gen_list(rows)) \ No newline at end of file diff --git a/src/jarvis.py b/src/jarvis.py new file mode 100644 index 0000000000000000000000000000000000000000..5b03d64df3d1a3d54541cec285c74609fe490fbe --- /dev/null +++ b/src/jarvis.py @@ -0,0 +1,105 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +import argparse + +from data import Data +from analysis import Analysis + +class Jarvis: + def __init__(self): + self.analysis = Analysis() + # Argparser set + parser = argparse.ArgumentParser(description=f'please put me into CASE directory, used for {Data.app_name} Compiler/Clean/Run/Compare', + usage='%(prog)s [-h] [--build] [--clean] [...]') + parser.add_argument("-v","--version", help=f"get version info", action="store_true") + parser.add_argument("-use","--use", help="Switch config file...", nargs=1) + parser.add_argument("-i","--info", help=f"get machine info", action="store_true") + #accept software_name/version GCC/GCC+MPI/CLANG/CLANG+MPI + parser.add_argument("-install","--install", help=f"install dependency", nargs=2) + # dependency install + parser.add_argument("-dp","--depend", help=f"{Data.app_name} dependency install", action="store_true") + parser.add_argument("-e","--env", help=f"set environment {Data.app_name}", action="store_true") + parser.add_argument("-b","--build", help=f"compile {Data.app_name}", action="store_true") + parser.add_argument("-cls","--clean", help=f"clean {Data.app_name}", action="store_true") + parser.add_argument("-r","--run", help=f"run {Data.app_name}", action="store_true") + parser.add_argument("-p","--perf", help=f"auto perf {Data.app_name}", action="store_true") + parser.add_argument("-kp","--kperf", help=f"auto kperf {Data.app_name}", action="store_true") + # GPU perf + parser.add_argument("-gp","--gpuperf", help="GPU perf...", action="store_true") + + # NCU perf + parser.add_argument("-ncu","--ncuperf", help="NCU perf...", nargs=1) + parser.add_argument("-c","--compare", help=f"compare {Data.app_name}", nargs=2) + # batch run + parser.add_argument("-rb","--rbatch", help=f"run batch {Data.app_name}", action="store_true") + # batch download + parser.add_argument("-d","--download", help="Batch Download...", action="store_true") + parser.add_argument("-net","--network", help="network checking...", action="store_true") + #change yum repo to aliyun + parser.add_argument("-yum","--yum", help="yum repo changing...", action="store_true") + # start benchmark test + parser.add_argument("-bench","--benchmark", help="start benchmark test...", nargs=1) + # start test + parser.add_argument("-t","--test", help="start Jarvis test...", action="store_true") + self.args = parser.parse_args() + + def main(self): + if self.args.version: + print("V1.0") + + if self.args.info: + self.analysis.get_machine_info() + + if self.args.install: + self.analysis.install(self.args.install[0], self.args.install[1]) + + if self.args.env: + self.analysis.env() + + if self.args.clean: + self.analysis.clean() + + if self.args.build: + self.analysis.build() + + if self.args.run: + self.analysis.run() + + if self.args.perf: + self.analysis.perf() + + if self.args.kperf: + self.analysis.kperf() + + if self.args.depend: + self.analysis.install_deps() + + if self.args.rbatch: + self.analysis.batch_run() + + if self.args.download: + self.analysis.download() + + if self.args.gpuperf: + self.analysis.gpu_perf() + + if self.args.ncuperf: + self.analysis.ncu_perf(self.args.ncuperf[0]) + + if self.args.use: + self.analysis.switch_config(self.args.use[0]) + + if self.args.network: + self.analysis.check_network() + + if self.args.yum: + self.analysis.change_yum_repo() + + if self.args.benchmark: + self.analysis.bench(self.args.benchmark[0]) + + if self.args.test: + self.analysis.test() + +if __name__ == '__main__': + Jarvis().main() diff --git a/src/machine.py b/src/machine.py new file mode 100644 index 0000000000000000000000000000000000000000..7e40db494628a8e420859baa1ce883f3dc805c18 --- /dev/null +++ b/src/machine.py @@ -0,0 +1,27 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +from execute import Execute +from tool import Tool + +class Machine: + def __init__(self): + self.exe = Execute() + self.tool = Tool() + self.info2cmd = { + 'CHECK network adapter':'nmcli d', + 'CHECK Machine Bits':'getconf LONG_BIT', + 'CHECK OS':'cat /proc/version && uname -a', + 'CHECK GPU': 'lspci | grep -i nvidia', + 'CHECK Total Memory':'cat /proc/meminfo | grep MemTotal', + 'CHECK Total Disk Memory':'fdisk -l | grep Disk', + 'CHECK CPU info': 'cat /proc/cpuinfo | grep "processor" | wc -l && lscpu && dmidecode -t 4' + } + + def get_info(self, content, cmd): + self.tool.prt_content(content) + self.exe.exec_raw(cmd) + + def output_machine_info(self): + print("get machine info") + for key, value in self.info2cmd.items(): + self.get_info(key, value) diff --git a/src/tool.py b/src/tool.py new file mode 100644 index 0000000000000000000000000000000000000000..7cd3b62251641058efc57e3732ec8ff07144bed0 --- /dev/null +++ b/src/tool.py @@ -0,0 +1,36 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +import time +import os + +class Tool: + def __init__(self): + pass + + def prt_content(self, content): + flags = '*' * 30 + print(f"{flags}{content}{flags}") + + def gen_list(self, data): + return data.strip().split('\n') + + def get_time_stamp(self): + return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) + + def read_file(self, filename): + content = '' + with open(filename, encoding='utf-8') as f: + content = f.read().strip() + return content + + def write_file(self, filename, content=""): + with open(filename,'w') as f: + f.write(content) + + def mkdirs(self, path): + if not os.path.exists(path): + os.makedirs(path) + + def mkfile(self, path, content=''): + if not os.path.exists(path): + self.write_file(path, content) diff --git a/templates/data.CP2K.X86.config b/templates/CP2K/8.2/data.CP2K.X86.cpu.config similarity index 100% rename from templates/data.CP2K.X86.config rename to templates/CP2K/8.2/data.CP2K.X86.cpu.config diff --git a/templates/CP2K/8.2/data.CP2K.arm.cpu.config b/templates/CP2K/8.2/data.CP2K.arm.cpu.config new file mode 100644 index 0000000000000000000000000000000000000000..66c49b6a3a1472f2d74f65c963d5dd3dfa1eab43 --- /dev/null +++ b/templates/CP2K/8.2/data.CP2K.arm.cpu.config @@ -0,0 +1,66 @@ +[SERVER] +11.11.11.11 + +[ENV] +source /home/kpgcc-ompi.env +export LIBRARY_PATH=/home/cp2k/EXTRA/gsl/lib:$LIBRARY_PATH +export LD_LIBRARY_PATH=/home/cp2k/EXTRA/gsl/lib:$LD_LIBRARY_PATH +export CPATH=/usr/local/cuda/include:$CPATH + +[APP] +app_name = CP2K +build_dir = /home/cp2k/CP2K/cp2k-8.2/ +binary_dir = /home/cp2k/CP2K/cp2k-8.2/exe/local-cpu/ +case_dir = /home/cp2k/CP2K/cp2k-8.2/benchmarks/QS/ + +[BUILD] +make -j 128 ARCH=local-cpu VERSION=psmp + +[CLEAN] +make -j 128 ARCH=local-cpu VERSION=psmp clean + +[RUN] +run = numactl -C 0-63 mpirun --allow-run-as-root -np 64 -map-by ppr:64:node:pe=1 -bind-to core -x OMP_NUM_THREADS=1 +binary = cp2k.psmp H2O-256.inp +nodes = 1 + +[BATCH] +#!/bin/bash + +logfile=cp2k.H2O-256.inp.log + +nvidia-smi -pm 1 +nvidia-smi -ac 1215,1410 + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 32C*GPU===" >> $logfile +mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 32C*2GPU===" >> $logfile +mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 64C*GPU===" >> $logfile +mpirun -np 64 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 64C*2GPU===" >> $logfile +mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 128C*GPU===" >> $logfile +mpirun -np 128 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 128C*2GPU===" >> $logfile +mpirun -np 128 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + + + + + + + diff --git a/templates/CP2K/8.2/data.CP2K.arm.gpu.config b/templates/CP2K/8.2/data.CP2K.arm.gpu.config new file mode 100644 index 0000000000000000000000000000000000000000..2012254a25a02c42d0f5972fed052c8eeccff1fe --- /dev/null +++ b/templates/CP2K/8.2/data.CP2K.arm.gpu.config @@ -0,0 +1,98 @@ +[SERVER] +11.11.11.11 + +[DOWNLOAD] +libint/2.6.0 https://github.com/evaleev/libint/archive/v2.6.0.tar.gz +libXC/5.1.4 https://www.cp2k.org/static/downloads/libxc-5.1.4.tar.gz +fftw/3.3.8 https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz +lapack/3.8.0 https://www.cp2k.org/static/downloads/lapack-3.8.0.tgz +scalapack/2.1.0 https://www.cp2k.org/static/downloads/scalapack-2.1.0.tgz +cmake/3.16.4 https://cmake.org/files/v3.16/cmake-3.16.4.tar.gz + +[DEPENDENCY] +./jarvis -install kgcc/9.3.1 com +module purge +module use ./software/modulefiles +module load kgcc9/9.3.1 +export CC=`which gcc` +export CXX=`which g++` +export FC=`which gfortran` +./jarvis -install openmpi/4.1.2 gcc +module load openmpi4/4.1.2 +./jarvis -install gmp/6.2.0 gcc +./jarvis -install boost/1.72.0 gcc +./jarvis -install libint/2.6.0 gcc+mpi +./jarvis -install fftw/3.3.8 gcc+mpi +./jarvis -install openblas/0.3.18 gcc +module load openblas/0.3.18 +./jarvis -install scalapack/2.1.0 gcc+mpi +./jarvis -install spglib/1.16.0 gcc +./jarvis -install libxc/5.1.4 gcc +./jarvis -install gsl/2.6 gcc +module load gsl/2.6 +./jarvis -install plumed/2.6.2 gcc+mpi +./jarvis -install libvori/21.04.12 gcc + +[ENV] +module purge +module load kgcc9/9.3.1 +module load openmpi4/4.1.2 +module load gsl/2.6 + +[APP] +app_name = CP2K +build_dir = /home/HT3/HPCRunner2/cp2k-8.2/ +binary_dir = /home/HT3/HPCRunner2/cp2k-8.2/exe/local-cuda/ +case_dir = /home/HT3/HPCRunner2/cp2k-8.2/benchmarks/QS/ + +[BUILD] +make -j 128 ARCH=local-cuda VERSION=psmp + +[CLEAN] +make -j 128 ARCH=local-cuda VERSION=psmp clean + +[RUN] +run = numactl -C 0-63 mpirun --allow-run-as-root -x CUDA_VISIBLE_DEVICES=0,1 -np 64 -x OMP_NUM_THREADS=1 +binary = cp2k.psmp H2O-256.inp +nodes = 1 + +[BATCH] +#!/bin/bash + +logfile=cp2k.H2O-256.inp.log + +nvidia-smi -pm 1 +nvidia-smi -ac 1215,1410 + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 32C*GPU===" >> $logfile +mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 32C*2GPU===" >> $logfile +mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 64C*GPU===" >> $logfile +mpirun -np 64 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 64C*2GPU===" >> $logfile +mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 128C*GPU===" >> $logfile +mpirun -np 128 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + +echo 3 > /proc/sys/vm/drop_caches +echo "===run 128C*2GPU===" >> $logfile +mpirun -np 128 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1 + + + + + + + diff --git a/templates/data.amber.config b/templates/amber/20/data.amber.arm.gpu.config similarity index 100% rename from templates/data.amber.config rename to templates/amber/20/data.amber.arm.gpu.config diff --git a/templates/data.openfoam.config b/templates/openfoam/1960/data.openfoam.arm.cpu.config similarity index 100% rename from templates/data.openfoam.config rename to templates/openfoam/1960/data.openfoam.arm.cpu.config diff --git a/templates/openfoam/1960/data.openfoam.arm.cpu.opt.config b/templates/openfoam/1960/data.openfoam.arm.cpu.opt.config new file mode 100644 index 0000000000000000000000000000000000000000..25abc6f8b8218dcdc6c328106aafbc4a3a062d70 --- /dev/null +++ b/templates/openfoam/1960/data.openfoam.arm.cpu.opt.config @@ -0,0 +1,34 @@ +[SERVER] +11.11.11.11 + +[DEPENDENCY] +./jarvis -install bisheng/2.1.0 com +module use ./software/modulefiles +module load bisheng2 +./jarvis -install hmpi/1.1.1 clang +module load hmpi1/1.1.1 + +[ENV] +# add gcc/mpi +source /home/Jarvis3-4/HPCRunner/opt-OpenFOAM/opt_codes/OpenFOAM-v1906/etc/bashrc +module use ./software/modulefiles +module load bisheng2 +module load hmpi1/1.1.1 + +[APP] +app_name = OpenFOAM +build_dir = /home/Jarvis3-4/HPCRunner/opt-OpenFOAM/opt_codes/OpenFOAM-v1906/ +binary_dir = +case_dir = /home/Jarvis3-4/HPCRunner/case/openfoam/audi/ + +[BUILD] +source /home/Jarvis3-4/HPCRunner/opt-OpenFOAM/opt_codes/OpenFOAM-v1906/etc/bashrc +./Allwmake -j 64 + +[CLEAN] +rm -rf build + +[RUN] +run = mpirun --allow-run-as-root -x PATH -x LD_LIBRARY_PATH -x WM_PROJECT_DIR -x WM_PROJECT_USER_DIR -np 128 +binary = pisoFoam –parallel 2 +nodes = 1 diff --git a/templates/qe/6.4/data.qe.test.config b/templates/qe/6.4/data.qe.test.config new file mode 100644 index 0000000000000000000000000000000000000000..59254e0e8b21c6888b0d364119129e0c3721e7a5 --- /dev/null +++ b/templates/qe/6.4/data.qe.test.config @@ -0,0 +1,40 @@ +[SERVER] +11.11.11.11 + +[DEPENDENCY] +./jarvis -install kgcc/9.3.1 com +module purge +module use ./software/modulefiles +module load kgcc9/9.3.1 +export CC=`which gcc` +export CXX=`which g++` +export FC=`which gfortran` +./jarvis -install openmpi/4.1.2/ gcc +module load openmpi4/4.1.2 +#test if mpi is normal +./jarvis -bench mpi + +[ENV] +module purge +module use ./software/modulefiles +module load kgcc9 +module load openmpi4/4.1.2 + +[APP] +app_name = QE +build_dir = /tmp/q-e-qe-6.4.1/ +binary_dir = /tmp/q-e-qe-6.4.1/bin/ +case_dir = /tmp/qe-test + +[BUILD] +./configure F90=gfortran F77=gfortran MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no --enable-openmp +make -j 96 pwall +make install + +[CLEAN] +make clean + +[RUN] +run = mpirun --allow-run-as-root -x OMP_NUM_THREADS=1 -mca coll ^hcoll -mca btl ^vader,tcp,openib,uct -np 128 +binary = pw.x -input test_3.in +nodes = 1 \ No newline at end of file diff --git a/templates/qe/6.4/data.qe.test.opt.config b/templates/qe/6.4/data.qe.test.opt.config new file mode 100644 index 0000000000000000000000000000000000000000..4b6d44762ffb73fffe94ead7fdb2d2ebd675a534 --- /dev/null +++ b/templates/qe/6.4/data.qe.test.opt.config @@ -0,0 +1,46 @@ +[SERVER] +11.11.11.11 + +[DEPENDENCY] +./jarvis -install bisheng/2.1.0 com +module purge +module use ./software/modulefiles +module load bisheng2/2.1.0 +export CC=`which clang` +export CXX=`which clang++` +export FC=`which flang` +./jarvis -install hmpi/1.1.1 bisheng +module load hmpi1/1.1.1 +./jarvis -bench mpi +./jarvis -install kml/1.4.0/bisheng bisheng + +[ENV] +source /etc/profile +module purge +module use ./software/modulefiles +module load bisheng2/2.1.0 +export CC=`which clang` +export CXX=`which clang++` +export FC=`which flang` +module load hmpi1/1.1.1 +export BLAS_LIBS="-L/usr/local/kml/lib/kblas/omp -lkblas" +export LAPACK_LIBS="-L/usr/local/kml/lib/ -lklapack_full" + +[APP] +app_name = QE +build_dir = /tmp/q-e-qe-6.4.1/ +binary_dir = /tmp/q-e-qe-6.4.1/bin/ +case_dir = /tmp/qe-test/ + +[BUILD] +./configure F90=flang F77=flang MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no --enable-openmp +make -j 96 pwall +make install + +[CLEAN] +make clean + +[RUN] +run = mpirun --allow-run-as-root -x OMP_NUM_THREADS=1 -np 128 +binary = pw.x -input test_3.in +nodes = 1 diff --git a/templates/qe/6.4/qe.block.opt.config b/templates/qe/6.4/qe.block.opt.config new file mode 100644 index 0000000000000000000000000000000000000000..6eee58f4acc01efffaa31eae0e0c6ea730e492d9 --- /dev/null +++ b/templates/qe/6.4/qe.block.opt.config @@ -0,0 +1,56 @@ +[SERVER] +11.11.11.11 + +[DEPENDENCY] +./jarvis -install bisheng/2.1.0 com +module purge +module use ./software/modulefiles +module load bisheng2/2.1.0 +export CC=`which clang` +export CXX=`which clang++` +export FC=`which flang` +./jarvis -install hmpi/1.1.1 bisheng +module load hmpi1/1.1.1 +./jarvis -install cmake/3.20.5 bisheng +module load cmake/3.20.5 +./jarvis -install kml/1.4.0/bisheng bisheng +./jarvis -install scalapack/2.1.0/kml bisheng +./jarvis -install fftw/3.3.10 bisheng +module load fftw/3.3.10 scalapack/2.1.0 cmake/3.20.5 +#修改fortran_single的CMakeLists.txt,第10行,第74行,第75行 +./jarvis -install block-davidson/3.14 bisheng +module load block-davidson/3.14 + +[ENV] +source /etc/profile +module purge +module use ./software/modulefiles +module load bisheng2/2.1.0 +export CC=`which clang` +export CXX=`which clang++` +export FC=`which flang` +module load hmpi1/1.1.1 +module load fftw/3.3.10 scalapack/2.1.0 block-davidson/3.14 +export BLAS_LIBS="-L/usr/local/kml/lib/kblas/omp -lkblas" +export LAPACK_LIBS="-L/usr/local/kml/lib -lklapack_full" +export SCALAPACK_LIBS="-L/home/fang/HT1/HPCRunner-master/software/libs/bisheng2/scalapack/2.1.0/lib/ -lscalapack" + +[APP] +app_name = QE +build_dir = /home/fang/HT1/HPCRunner-master/q-e-qe-6.4.1/ +binary_dir = /home/fang/HT1/HPCRunner-master/q-e-qe-6.4.1/bin +case_dir = /home/fang/HT1/HPCRunner-master/workload/QE/GRIR443/ + +[BUILD] +# add tunning/QE/6.4/q-e-6.4.blockmesh.patch here +./configure F90=flang F77=flang MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=yes --enable-openmp +make -j 96 pw +make install + +[CLEAN] +make clean + +[RUN] +run = mpirun --allow-run-as-root -x OMP_NUM_THREADS=1 -np 128 +binary = pw.x -input grir443.in +nodes = 1 diff --git a/templates/qe/6.5/data.qe.X86.cpu.config b/templates/qe/6.5/data.qe.X86.cpu.config new file mode 100644 index 0000000000000000000000000000000000000000..22bf2f482097b971310769a8d46204d8ff20f88b --- /dev/null +++ b/templates/qe/6.5/data.qe.X86.cpu.config @@ -0,0 +1,29 @@ +[SERVER] +11.11.11.11 + +[ENV] +#add oneapi(include icc/mpi) +source /workspace/cc/env/intel2021.4/setvars.sh +# add cmake +module use ./modules +module add icc/cmake +export LAPACK_LIBS="$MKLROOT/lib/intel64/libmkl_intel_lp64.a $MKLROOT/lib/intel64/libmkl_core.a" +export BLAS_LIBS="$MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group" + +[APP] +app_name = QE +build_dir = /home/csouser/HPCRunner/q-e-qe-6.5/ +binary_dir = /home/csouser/HPCRunner/q-e-qe-6.5/bin/ +case_dir = /home/csouser/HPCRunner/qe_large/ + +[BUILD] +./configure F90=ifort F77=ifort MPIF90=mpiifort MPIF77=mpiifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no +make -j 40 pwall install + +[CLEAN] +make clean + +[RUN] +run = mpirun -n 40 +binary = pw.x -nk 8 -input scf.in +nodes = 1 \ No newline at end of file diff --git a/templates/qe/6.5/data.qe.arm.cpu.config b/templates/qe/6.5/data.qe.arm.cpu.config new file mode 100644 index 0000000000000000000000000000000000000000..918aabcbd06b381d96b1fb623b4d88e85a4cd22f --- /dev/null +++ b/templates/qe/6.5/data.qe.arm.cpu.config @@ -0,0 +1,29 @@ +[SERVER] +11.11.11.11 + +[ENV] +source /etc/profile +module use /opt/modulefile/ +module load gcc-9.3.1 +module load openmpi-4.1.1 +export BLAS_LIBS="-L/usr/local/kml/lib/kblas/omp -lkblas" +export LAPACK_LIBS="-L/usr/local/kml/lib/ -lklapack_full" + +[APP] +app_name = QE +build_dir = /home/Jarvis3-4/HPCRunner/q-e-qe-6.5/ +binary_dir = /home/Jarvis3-4/HPCRunner/q-e-qe-6.5/bin/ +case_dir = /home/Jarvis3-4/HPCRunner/workload/QE/qe-large/ + +[BUILD] +./configure F90=gfortran F77=gfortran MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no --enable-openmp +make -j 96 pwall +make install + +[CLEAN] +make clean + +[RUN] +run = mpirun --allow-run-as-root -mca btl ^vader,tcp,openib,uct -np 128 +binary = pw.x -nk 8 -input scf.in +nodes = 1 \ No newline at end of file diff --git a/templates/qe/6.5/data.qe.arm.cpu.opt.config b/templates/qe/6.5/data.qe.arm.cpu.opt.config new file mode 100644 index 0000000000000000000000000000000000000000..bd5d524380d8fea1da70a70161c6603573aa2e95 --- /dev/null +++ b/templates/qe/6.5/data.qe.arm.cpu.opt.config @@ -0,0 +1,46 @@ +[SERVER] +11.11.11.11 + +[DEPENDENCY] +./jarvis -install bisheng/2.1.0 com +module purge +module use ./software/modulefiles +module load bisheng2/2.1.0 +export CC=`which clang` +export CXX=`which clang++` +export FC=`which flang` +./jarvis -install hmpi/1.1.1 bisheng +module load hmpi1/1.1.1 +./jarvis -install kml/1.4.0/bisheng bisheng + +[ENV] +source /etc/profile +module purge +module use ./software/modulefiles +module load bisheng2/2.1.0 +export CC=`which clang` +export CXX=`which clang++` +export FC=`which flang` +module load hmpi1/1.1.1 +export BLAS_LIBS="-L/usr/local/kml/lib/kblas/omp -lkblas" +export LAPACK_LIBS="-L/usr/local/kml/lib/ -lklapack_full" + +[APP] +app_name = QE +build_dir = /tmp/q-e-qe-6.5/ +binary_dir = /tmp/q-e-qe-6.5/bin/ +case_dir = /tmp/qe-test/ + +[BUILD] +./configure F90=flang F77=flang MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no --enable-openmp +sed -i "s/gfortran/flang/g" make.inc +make -j 96 pwall +make install + +[CLEAN] +make clean + +[RUN] +run = mpirun --allow-run-as-root -x OMP_NUM_THREADS=1 -np 128 +binary = pw.x -input test_3.in +nodes = 1 diff --git a/templates/qe/6.8/data.qe.arm.cpu.config b/templates/qe/6.8/data.qe.arm.cpu.config new file mode 100644 index 0000000000000000000000000000000000000000..bbe0749e04a9b100d0969a80344fb49321f8b24d --- /dev/null +++ b/templates/qe/6.8/data.qe.arm.cpu.config @@ -0,0 +1,37 @@ +[SERVER] +11.11.11.11 + +[DEPENDENCY] +./jarvis -install kgcc/9.3.1 com +module use ./software/modulefiles +module load kgcc9 +./jarvis -install hmpi/1.1.0/gcc gcc +module load hmpi1/1.1.0 +./jarvis -install kml/1.4.0/gcc gcc + +[ENV] +source /etc/profile +module use ./software/modulefiles +module load kgcc9 +module load hmpi1/1.1.0 +export BLAS_LIBS="-L/usr/local/kml/lib/kblas/omp -lkblas" +export LAPACK_LIBS="-L/usr/local/kml/lib/ -lklapack_full" + +[APP] +app_name = QE +build_dir = /tmp/q-e-qe-6.8/ +binary_dir = /tmp/q-e-qe-6.8/bin/ +case_dir = /tmp/qe-large/ + +[BUILD] +./configure F90=gfortran F77=gfortran MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no --enable-openmp +make -j 96 pwall +make install + +[CLEAN] +make clean + +[RUN] +run = mpirun --allow-run-as-root -mca btl ^vader,tcp,openib,uct -np 128 +binary = pw.x -nk 8 -input scf.in +nodes = 1 \ No newline at end of file diff --git a/templates/data.qe.gpu.config b/templates/qe/6.8/data.qe.arm.gpu.config similarity index 97% rename from templates/data.qe.gpu.config rename to templates/qe/6.8/data.qe.arm.gpu.config index 5d00bfe8af6fbecfb104583c9c344d8c8051f4e5..60b78c182f58ef19bb66d11afff73d970b41f476 100644 --- a/templates/data.qe.gpu.config +++ b/templates/qe/6.8/data.qe.arm.gpu.config @@ -21,7 +21,7 @@ module load nvhpc/21.9 app_name = QE build_dir = /home/HPCRunner-master/q-e-qe-6.8/ binary_dir = /home/HPCRunner-master/q-e-qe-6.8/bin/ -case_dir = /home/HPCRunner-master/jiancong/ +case_dir = /home/HPCRunner-master/qe-large/ [BUILD] ./configure --with-cuda=yes --with-cuda-runtime=11.4 --with-cuda-cc=80 --enable-openmp --with-scalapack=no diff --git a/templates/data.vasp.config b/templates/vasp/5.4.4/data.vasp.arm.cpu.config similarity index 100% rename from templates/data.vasp.config rename to templates/vasp/5.4.4/data.vasp.arm.cpu.config diff --git a/templates/data.vasp6.1.gpu.x86.config b/templates/vasp/6.1.0/data.vasp.x86.gpu.config similarity index 100% rename from templates/data.vasp6.1.gpu.x86.config rename to templates/vasp/6.1.0/data.vasp.x86.gpu.config diff --git a/templates/yum/aliyun-Centos-7.repo b/templates/yum/aliyun-Centos-7.repo new file mode 100644 index 0000000000000000000000000000000000000000..df18245ddb57fed48bf1dee61c24d0159d054312 --- /dev/null +++ b/templates/yum/aliyun-Centos-7.repo @@ -0,0 +1,62 @@ +# CentOS-Base.repo +# +# The mirror system uses the connecting IP address of the client and the +# update status of each mirror to pick mirrors that are updated to and +# geographically close to the client. You should use this for CentOS updates +# unless you are manually picking other mirrors. +# +# If the mirrorlist= does not work for you, as a fall back you can try the +# remarked out baseurl= line instead. +# +# + +[base] +name=CentOS-$releasever - Base - mirrors.aliyun.com +failovermethod=priority +baseurl=http://mirrors.aliyun.com/centos/$releasever/os/$basearch/ + http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/ + http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/ +gpgcheck=1 +gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 + +#released updates +[updates] +name=CentOS-$releasever - Updates - mirrors.aliyun.com +failovermethod=priority +baseurl=http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/ + http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/ + http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/ +gpgcheck=1 +gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 + +#additional packages that may be useful +[extras] +name=CentOS-$releasever - Extras - mirrors.aliyun.com +failovermethod=priority +baseurl=http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/ + http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/ + http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/ +gpgcheck=1 +gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 + +#additional packages that extend functionality of existing packages +[centosplus] +name=CentOS-$releasever - Plus - mirrors.aliyun.com +failovermethod=priority +baseurl=http://mirrors.aliyun.com/centos/$releasever/centosplus/$basearch/ + http://mirrors.aliyuncs.com/centos/$releasever/centosplus/$basearch/ + http://mirrors.cloud.aliyuncs.com/centos/$releasever/centosplus/$basearch/ +gpgcheck=1 +enabled=0 +gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 + +#contrib - packages by Centos Users +[contrib] +name=CentOS-$releasever - Contrib - mirrors.aliyun.com +failovermethod=priority +baseurl=http://mirrors.aliyun.com/centos/$releasever/contrib/$basearch/ + http://mirrors.aliyuncs.com/centos/$releasever/contrib/$basearch/ + http://mirrors.cloud.aliyuncs.com/centos/$releasever/contrib/$basearch/ +gpgcheck=1 +enabled=0 +gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7 diff --git a/templates/yum/hw-Centos-7.repo b/templates/yum/hw-Centos-7.repo new file mode 100644 index 0000000000000000000000000000000000000000..4e43bbc6094d09ab211147221db6756ab854c370 --- /dev/null +++ b/templates/yum/hw-Centos-7.repo @@ -0,0 +1,43 @@ +# CentOS-Base.repo +# +# The mirror system uses the connecting IP address of the client and the +# update status of each mirror to pick mirrors that are updated to and +# geographically close to the client. You should use this for CentOS updates +# unless you are manually picking other mirrors. +# +# If the mirrorlist= does not work for you, as a fall back you can try the +# remarked out baseurl= line instead. +# +# + +[base] +name=CentOS-$releasever - Base +#mirrorlist=http://mirrors.tools.huawei.com/?release=$releasever&arch=$basearch&repo=os +baseurl=http://mirrors.tools.huawei.com/centos/$releasever/os/$basearch/ +gpgcheck=1 +gpgkey=http://mirrors.tools.huawei.com/centos/RPM-GPG-KEY-CentOS-7 + +#released updates +[updates] +name=CentOS-$releasever - Updates +# mirrorlist=http://mirrors.tools.huawei.com/?release=$releasever&arch=$basearch&repo=updates +baseurl=http://mirrors.tools.huawei.com/centos/$releasever/updates/$basearch/ +gpgcheck=1 +gpgkey=http://mirrors.tools.huawei.com/centos/RPM-GPG-KEY-CentOS-7 + +#additional packages that may be useful +[extras] +name=CentOS-$releasever - Extras +# mirrorlist=http://mirrors.tools.huawei.com/?release=$releasever&arch=$basearch&repo=extras +baseurl=http://mirrors.tools.huawei.com/centos/$releasever/extras/$basearch/ +gpgcheck=1 +gpgkey=http://mirrors.tools.huawei.com/centos/RPM-GPG-KEY-CentOS-7 + +#additional packages that extend functionality of existing packages +[centosplus] +name=CentOS-$releasever - Plus +# mirrorlist=http://mirrors.tools.huawei.com/?release=$releasever&arch=$basearch&repo=centosplus +baseurl=http://mirrors.tools.huawei.com/centos/$releasever/centosplus/$basearch/ +gpgcheck=1 +enabled=0 +gpgkey=http://mirrors.tools.huawei.com/centos/RPM-GPG-KEY-CentOS-7 \ No newline at end of file diff --git a/templates/yum/kylin_aarch64.repo b/templates/yum/kylin_aarch64.repo new file mode 100644 index 0000000000000000000000000000000000000000..e298fcb2586baa607ad7b4121618c69a11e76180 --- /dev/null +++ b/templates/yum/kylin_aarch64.repo @@ -0,0 +1,22 @@ +###Kylin Linux Advanced Server 10 - os repo### + +[ks10-adv-os] +name = Kylin Linux Advanced Server 10 - Os +baseurl = http://update.cs2c.com.cn:8080/NS/V10/V10SP2/os/adv/lic/base/$basearch/ +gpgcheck = 1 +gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-kylin +enabled = 1 + +[ks10-adv-updates] +name = Kylin Linux Advanced Server 10 - Updates +baseurl = http://update.cs2c.com.cn:8080/NS/V10/V10SP2/os/adv/lic/updates/$basearch/ +gpgcheck = 1 +gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-kylin +enabled = 1 + +[ks10-adv-addons] +name = Kylin Linux Advanced Server 10 - Addons +baseurl = http://update.cs2c.com.cn:8080/NS/V10/V10SP2/os/adv/lic/addons/$basearch/ +gpgcheck = 1 +gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-kylin +enabled = 0 diff --git a/test/test-qe-opt.sh b/test/test-qe-opt.sh new file mode 100644 index 0000000000000000000000000000000000000000..1ae031bc47bcb8f0eda191e6c95d71044da3fe8c --- /dev/null +++ b/test/test-qe-opt.sh @@ -0,0 +1,28 @@ +#!/bin/bash +# back to root +cd .. +# release qe src code +rm -rf /tmp/q-e-qe-6.4.1 +tar xzvf ./downloads/q-e-qe-6.4.1.tar.gz -C /tmp/ +# copy workload +cp -rf ./workload/QE/qe-test /tmp +# copy templates +cp -rf ./templates/qe/6.4/data.qe.test.opt.config ./ +# switch to config +./jarvis -use data.qe.test.opt.config +# install dependency +./jarvis -dp +# generate environment +./jarvis -e +# environment setup +source env.sh +# build +./jarvis -b +# run +./jarvis -r +# perf +./jarvis -p +# kperf +./jarvis -kp +# gpu nsysperf +./jarvis -gp \ No newline at end of file diff --git a/test/test-qe.sh b/test/test-qe.sh new file mode 100644 index 0000000000000000000000000000000000000000..0248590ef415193ab7fed8e01f0dd500ee7252c4 --- /dev/null +++ b/test/test-qe.sh @@ -0,0 +1,27 @@ +#!/bin/bash +# back to root +cd .. +# release qe src code +tar xzvf ./downloads/q-e-qe-6.4.1.tar.gz -C /tmp/ +# copy workload +cp -rf ./workload/QE/qe-test /tmp +# copy templates +cp -rf ./templates/qe/6.4/data.qe.test.config ./ +# switch to config +./jarvis -use data.qe.test.config +# install dependency +./jarvis -dp +# generate environment +./jarvis -e +# environment setup +source env.sh +# build +./jarvis -b +# run +./jarvis -r +# perf +./jarvis -p +# kperf +./jarvis -kp +# gpu nsysperf +./jarvis -gp \ No newline at end of file diff --git a/test/test-util.sh b/test/test-util.sh new file mode 100644 index 0000000000000000000000000000000000000000..210a8fb5e941adb78d36bbfeee6f91593667357f --- /dev/null +++ b/test/test-util.sh @@ -0,0 +1,8 @@ +#!/bin/bash +cd .. +# check machine info +./jarvis -i +# gpu nsysperf +./jarvis -gp +# benchmark +./jarvis -bench all \ No newline at end of file diff --git a/workloads/ReadMe.md b/workloads/ReadMe.md new file mode 100644 index 0000000000000000000000000000000000000000..5f19fe69f65129bc1d1c90d9002ebfc99f97ab3f --- /dev/null +++ b/workloads/ReadMe.md @@ -0,0 +1 @@ +存放常用的HPC应用小规模算例:通常小于1MB \ No newline at end of file