diff --git a/.gitignore b/.gitignore
index f5f4ccba45b462f91932967f32d6e62396b20439..0bdf2b68aad5b0ab2f7b6da79ed5e620ef5396c2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,4 +3,7 @@ kperf.data.*
env.sh
hostfile
.vscode
-test.*
\ No newline at end of file
+test.*
+porting*
+HPC-info*
+tmp
\ No newline at end of file
diff --git a/README.en.md b/README.en.md
deleted file mode 100644
index 562604bd3c5379032163046529af39e46e706d8d..0000000000000000000000000000000000000000
--- a/README.en.md
+++ /dev/null
@@ -1,36 +0,0 @@
-# hpcrunner
-
-#### Description
-openEuler High Performance Computing(HPC) Runner, provides universal portal for hpc users and developers.
-
-#### Software Architecture
-Software architecture description
-
-#### Installation
-
-1. xxxx
-2. xxxx
-3. xxxx
-
-#### Instructions
-
-1. xxxx
-2. xxxx
-3. xxxx
-
-#### Contribution
-
-1. Fork the repository
-2. Create Feat_xxx branch
-3. Commit your code
-4. Create Pull Request
-
-
-#### Gitee Feature
-
-1. You can use Readme\_XXX.md to support different languages, such as Readme\_en.md, Readme\_zh.md
-2. Gitee blog [blog.gitee.com](https://blog.gitee.com)
-3. Explore open source project [https://gitee.com/explore](https://gitee.com/explore)
-4. The most valuable open source project [GVP](https://gitee.com/gvp)
-5. The manual of Gitee [https://gitee.com/help](https://gitee.com/help)
-6. The most popular members [https://gitee.com/gitee-stars/](https://gitee.com/gitee-stars/)
diff --git a/README.md b/README.md
index 239053b5b3028f1c99fafc7723a5d542ab98a938..2cb97f9facd3572b7fc794f90b9eec92068b063e 100644
--- a/README.md
+++ b/README.md
@@ -1,21 +1,40 @@
-# HPCRunner : 贾维斯辅助系统
-### 项目背景
+# HPCRunner : 贾维斯智能助手
+## ***给每个HPC应用一个温暖的家***
-因为HPC应用的特殊性,其环境配置、编译、运行、CPU/GPU性能采集分析的门槛比较高,导致迁移和调优的工作量大,不同的人在不同的机器上跑同样的软件和算例基本上是重头开始,费时费力,而且很多情况下需要同时部署ARM/X86两套环境进行验证,增加了很多的重复性工作。
+### 项目背景
+因为HPC应用的复杂性,其依赖安装、环境配置、编译、运行、CPU/GPU性能采集分析的门槛比较高,导致迁移和调优的工作量大,不同的人在不同的机器上跑同样的应用和算例基本上是重头开始,费时费力,而且很多情况下需要同时部署鲲鹏/X86两套环境进行验证,增加了很多的重复性工作,无法聚焦软件算法优化。
-### 解决方案
+### 项目特色
-- 提供支持ARM/X86的统一接口,一键生成环境脚本、一键编译、一键运行、一键性能采集、一键Benchmark等功能.
+- 支持鲲鹏/X86,一键下载依赖,一键安装依赖、采用业界权威依赖目录结构管理海量依赖,自动生成module file
+- 根据HPC配置一键生成环境脚本、一键编译、一键运行、一键性能采集、一键Benchmark.
- 所有配置仅用一个文件记录,HPC应用部署到不同的机器仅需修改配置文件.
- 日志管理系统自动记录HPC应用部署过程中的所有信息.
-- 常用HPC工具软件开箱即用,提供GCC/毕昇/icc版本,支持一键module加载.
-- 软件本身开箱即用,仅依赖Python环境.
+- 常用HPC工具软件开箱即用.
+- 软件本身无需编译开箱即用,仅依赖Python环境.
- (未来) 集成HPC领域常用性能调优手段、核心算法.
- (未来) 集群性能分析工具.
- (未来) 智能调优.
- (未来) HPC应用[容器化](https://catalog.ngc.nvidia.com/orgs/hpc/containers/quantum_espresso).
+### 目录结构
+
+| 目录/文件 | 说明 | 备注 |
+| --------- | ---------------------------------- | -------- |
+| benchmark | 矩阵运算、OpenMP、MPI、P2P性能测试 | |
+| doc | 文档 | |
+| downloads | 存放依赖库源码包/压缩包 | |
+| examples | 性能小实验 | |
+| package | 存放安装脚本和FAQ | |
+| software | 依赖库二进制仓库 | 自动生成 |
+| src | 贾维斯源码 | |
+| templates | 常用HPC应用的配置模板 | |
+| test | 贾维斯测试用例 | |
+| workload | 常用HPC应用的算例合集 | |
+| init.sh | 贾维斯初始化文件 | |
+| jarvis | 贾维斯启动入口 | |
+
### 已验证HPC应用
分子动力学领域:
@@ -36,60 +55,145 @@
- [x] OpenFOAM
+
### 使用说明
1.下载包解压之后初始化
-`source init.sh`
+```
+source init.sh
+```
+
+2.修改data.config或者套用现有模板,各配置项说明如下所示:
+
+| 配置项 | 说明 | 示例 |
+| :----------: | :--------------------------------------------------------- | :----------------------------------------------------------- |
+| [SERVER] | 服务器节点列表,多节点时用于自动生成hostfile,每行一个节点 | 11.11.11.11 |
+| [DOWNLOAD] | 每行一个软件的版本和下载链接,默认下载到downloads目录 | cmake/3.16.4 https://cmake.org/files/v3.16/cmake-3.16.4.tar.gz |
+| [DEPENDENCY] | HPC应用依赖安装脚本 | ./jarvis -install gcc/9.3.1 com
module use ./software/modulefiles
module load gcc9 |
+| [ENV] | HPC应用编译运行环境配置 | source env.sh |
+| [APP] | HPC应用信息,包括应用名、构建路径、二进制路径、算例路径 | app_name = CP2K
build_dir = /home/cp2k-8.2/
binary_dir = /home/CP2K/cp2k-8.2/bin/
case_dir = /home/CP2K/cp2k-8.2/benchmarks/QS/ |
+| [BUILD] | HPC应用构建脚本 | make -j 128 |
+| [CLEAN] | HPC应用编译清理脚本 | make -j 128 clean |
+| [RUN] | HPC应用运行配置,包括前置命令、应用命令和节点个数 | run = mpi
binary = cp2k.psmp H2O-256.inp
nodes = 1 |
+| [BATCH] | HPC应用批量运行命令 | #!/bin/bash
nvidia-smi -pm 1
nvidia-smi -ac 1215,1410 |
+| [PERF] | 性能工具额外参数 | |
+
+3.一键下载依赖(仅针对无需鉴权的链接,否则需要自行下载)
+
+```
+./jarvis -d
+```
+
+4.安装单个依赖
+
+```
+./jarvis -install [name/version/other] [option]
+```
+
+option支持列表如下所示
+
+| 选项值 | 解释 | 安装目录 |
+| ------------------ | ----------------------------- | ----------------------- |
+| gcc | 使用当前gcc进行编译 | software/libs/gcc |
+| gcc+mpi | 使用当前gcc+当前mpi进行编译 | software/libs/gcc/mpi |
+| clang(bisheng) | 使用当前clang进行编译 | software/libs/clang |
+| clang(bisheng)+mpi | 使用当前clang+当前mpi进行编译 | software/libs/clang/mpi |
+| nvc | 使用当前nvc进行编译 | software/libs/nvc |
+| nvc+mpi | 使用当前nvc+当前mpi进行编译 | software/libs/nvc/mpi |
+| icc | 使用当前icc进行编译 | software/libs/icc |
+| icc+mpi | 使用当前icc+当前mpi进行编译 | software/libs/icc/mpi |
+| com | 安装编译器 | software/compiler |
+| any | 安装工具软件 | software/compiler/utils |
+
+注意,如果软件为MPI通信软件(如hmpi、openmpi),会安装到software/mpi目录
+
+(eg: ./jarvis -install fftw/3.3.8 gcc)
+5.一键安装所有依赖
+
+```
+./jarvis -dp
+```
+
+6.一键生成环境变量(脱离贾维斯运行才需要执行)
+
+```
+./jarvis -e && source ./env.sh
+```
+
+7.一键编译
+
+```
+./jarvis -b
+```
+
+8.一键运行
+
+```
+./jarvis -r
+```
-2.修改data.config(ARM)或者data.X86.config(X86)
+9.一键性能采集(perf)
-3.一键生成环境变量(或者python3 jarvis.py)
+```
+./jarvis -p
+```
-`./jarvis.py -e`
-`source env.sh`
-4.一键编译
+10.一键Kperf性能采集(生成TopDown)
-`./jarvis.py -b`
+```
+./jarvis -kp
+```
-5.一键运行
+11.一键GPU性能采集(需安装nsys、ncu)
-`./jarvis.py -r`
+```
+./jarvis -gp
+```
-6.一键性能采集(perf)
+12.一键输出服务器信息(包括CPU、网卡、OS、内存等)
-`./jarvis.py -p`
+```
+./jarvis -i
+```
-7.一键GPU性能采集(使用nsys、ncu)
+13.一键服务器性能评测(包括MPI、OMP、P2P等)
-`./jarvis.py -gp`
+```
+./jarvis -bench all #运行所有benchmark
+./jarvis -bench mpi #运行MPI benchmark
+./jarvis -bench omp #运行OMP benchmark
+./jarvis -bench gemm #运行矩阵运算 benchmark
+```
-8.一键输出服务器信息(包括CPU、网卡、OS、内存等)
+14.切换配置
-`./jarvis.py -i`
+```
+./jarvis -use XXX.config
+```
-9.切换配置
+15.其它功能查看(网络检测)
-`./jarvis.py -use data.XXX.config`
+```
+./jarvis -h
+```
-10.其它功能查看(多线程下载、网络检测)
-`./jarvis.py -h`
### 欢迎贡献
-贾维斯项目欢迎您的热情参与!
+贾维斯项目欢迎您的专业技能和热情参与!
-小的改进或修复总是值得赞赏的;先从文档开始可能是一个很好的起点。如果您正在考虑对源代码的更大贡献,请先提交issue讨论。
+小的改进或修复总是值得赞赏的;先从文档开始可能是一个很好的起点。如果您正在考虑对源代码的更大贡献,请先提交一个issue或者在maillist进行讨论。
编写代码并不是为贾维斯做出贡献的唯一方法。您还可以:
-- 贡献小而精的工具(小于10MB>)
+- 贡献安装脚本
- 帮助我们测试新的HPC应用
-- 开发教程、演示和其他教育材料
+- 开发教程、演示
- 为我们宣传
- 帮助新的贡献者加入
-请添加OpenEuler SIG微信群了解更多HPC迁移调优知识
+请添加openEuler HPC SIG微信群了解更多HPC迁移调优知识

\ No newline at end of file
diff --git a/benchmark/README.md b/benchmark/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a9c072d60856f80821ded643066be7d129b0e8ec
--- /dev/null
+++ b/benchmark/README.md
@@ -0,0 +1,4 @@
+# benchmark
+# gemm: blas and MPI performance
+# p2p: GPU p2p connectivity and bandwidth check
+
diff --git a/benchmark/gemm/Makefile b/benchmark/gemm/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..b41aae52cf40d2f9ad7a9d1680a9c082b900d15a
--- /dev/null
+++ b/benchmark/gemm/Makefile
@@ -0,0 +1,19 @@
+CC = mpic++
+CCFLAGS = -O2 -fopenmp
+OPENBLAS_PATH = ${JARVIS_LIBS}/gcc9/openblas/0.3.18
+OPENBLAS_INC = -I ${OPENBLAS_PATH}/include
+OPENBLAS_LDFLAGS = -L ${OPENBLAS_PATH}/lib -lopenblas
+
+KML_PATH = /usr/local/kml
+KML_INC = -I ${KML_PATH}/include
+KML_LDFLAGS = -L ${KML_PATH}/lib/kblas/omp -lkblas
+all: gemm
+
+gemm: gemm.cpp
+ ${CC} ${CCFLAGS} ${OPENBLAS_INC} gemm.cpp -o gemm ${OPENBLAS_LDFLAGS}
+
+gemm-kml: gemm.cpp
+ ${CC} -DUSE_KML ${CCFLAGS} ${KML_INC} gemm.cpp -o gemm-kml ${KML_LDFLAGS}
+
+clean:
+ rm -rf gemm*
diff --git a/benchmark/gemm/gemm.cpp b/benchmark/gemm/gemm.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..53b0974cd0897137db3f9de1319189abb9fad9a6
--- /dev/null
+++ b/benchmark/gemm/gemm.cpp
@@ -0,0 +1,224 @@
+#include
+#include
+#include
+#include
+#include
+#include "mpi.h"
+#ifdef USE_KML
+ #include "kblas.h"
+#else
+ #include
+#endif
+using namespace std;
+
+void randMat(int rows, int cols, float *&Mat) {
+ Mat = new float[rows * cols];
+ for (int i = 0; i < rows; i++)
+ for (int j = 0; j < cols; j++)
+ Mat[i * cols + j] = 1.0;
+}
+
+void openmp_sgemm(int m, int n, int k, float *&leftMat, float *&rightMat,
+ float *&resultMat) {
+ // rightMat is transposed
+#pragma omp parallel for
+ for (int row = 0; row < m; row++) {
+ for (int col = 0; col < k; col++) {
+ resultMat[row * k + col] = 0.0;
+ for (int i = 0; i < n; i++) {
+ resultMat[row * k + col] +=
+ leftMat[row * n + i] * rightMat[col * n + i];
+ }
+ }
+ }
+ return;
+}
+
+void blas_sgemm(int m, int n, int k, float *&leftMat, float *&rightMat,
+ float *&resultMat) {
+ cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasTrans, m, k, n, 1.0, leftMat,
+ n, rightMat, n, 0.0, resultMat, k);
+}
+
+void mpi_sgemm(int m, int n, int k, float *&leftMat, float *&rightMat,
+ float *&resultMat, int rank, int worldsize, bool blas) {
+ int rowBlock = sqrt(worldsize);
+ if (rowBlock * rowBlock > worldsize)
+ rowBlock -= 1;
+ int colBlock = rowBlock;
+
+ int rowStride = m / rowBlock;
+ int colStride = k / colBlock;
+
+ worldsize = rowBlock * colBlock; // we abandom some processes.
+ // so best set process to a square number.
+
+ float *res;
+
+ if (rank == 0) {
+ float *buf = new float[k * n];
+ // transpose right Mat
+ for (int r = 0; r < n; r++) {
+ for (int c = 0; c < k; c++) {
+ buf[c * n + r] = rightMat[r * k + c];
+ }
+ }
+
+ for (int r = 0; r < k; r++) {
+ for (int c = 0; c < n; c++) {
+ rightMat[r * n + c] = buf[r * n + c];
+ }
+ }
+
+ MPI_Request sendRequest[2 * worldsize];
+ MPI_Status status[2 * worldsize];
+ for (int rowB = 0; rowB < rowBlock; rowB++) {
+ for (int colB = 0; colB < colBlock; colB++) {
+ rowStride = (rowB == rowBlock - 1) ? m - (rowBlock - 1) * (m / rowBlock)
+ : m / rowBlock;
+ colStride = (colB == colBlock - 1) ? k - (colBlock - 1) * (k / colBlock)
+ : k / colBlock;
+ int sendto = rowB * colBlock + colB;
+ if (sendto == 0)
+ continue;
+ MPI_Isend(&leftMat[rowB * (m / rowBlock) * n], rowStride * n, MPI_FLOAT,
+ sendto, 0, MPI_COMM_WORLD, &sendRequest[sendto]);
+ MPI_Isend(&rightMat[colB * (k / colBlock) * n], colStride * n,
+ MPI_FLOAT, sendto, 1, MPI_COMM_WORLD,
+ &sendRequest[sendto + worldsize]);
+ }
+ }
+ for (int rowB = 0; rowB < rowBlock; rowB++) {
+ for (int colB = 0; colB < colBlock; colB++) {
+ int recvfrom = rowB * colBlock + colB;
+ if (recvfrom == 0)
+ continue;
+ MPI_Wait(&sendRequest[recvfrom], &status[recvfrom]);
+ MPI_Wait(&sendRequest[recvfrom + worldsize],
+ &status[recvfrom + worldsize]);
+ }
+ }
+ res = new float[(m / rowBlock) * (k / colBlock)];
+ } else {
+ if (rank < worldsize) {
+ MPI_Status status[2];
+ rowStride = ((rank / colBlock) == rowBlock - 1)
+ ? m - (rowBlock - 1) * (m / rowBlock)
+ : m / rowBlock;
+ colStride = ((rank % colBlock) == colBlock - 1)
+ ? k - (colBlock - 1) * (k / colBlock)
+ : k / colBlock;
+ if (rank != 0) {
+ leftMat = new float[rowStride * n];
+ rightMat = new float[colStride * n];
+ }
+ if (rank != 0) {
+ MPI_Recv(leftMat, rowStride * n, MPI_FLOAT, 0, 0, MPI_COMM_WORLD,
+ &status[0]);
+ MPI_Recv(rightMat, colStride * n, MPI_FLOAT, 0, 1, MPI_COMM_WORLD,
+ &status[1]);
+ }
+ res = new float[rowStride * colStride];
+ }
+ }
+ MPI_Barrier(MPI_COMM_WORLD);
+
+ if (rank < worldsize) {
+ rowStride = ((rank / colBlock) == rowBlock - 1)
+ ? m - (rowBlock - 1) * (m / rowBlock)
+ : m / rowBlock;
+ colStride = ((rank % colBlock) == colBlock - 1)
+ ? k - (colBlock - 1) * (k / colBlock)
+ : k / colBlock;
+ if (!blas)
+ openmp_sgemm(rowStride, n, colStride, leftMat, rightMat, res);
+ else
+ blas_sgemm(rowStride, n, colStride, leftMat, rightMat, res);
+ }
+ MPI_Barrier(MPI_COMM_WORLD);
+
+ if (rank == 0) {
+ MPI_Status status;
+ float *buf = new float[(m - (rowBlock - 1) * (m / rowBlock)) *
+ (k - (colBlock - 1) * (k / colBlock))];
+ float *temp_res;
+ for (int rowB = 0; rowB < rowBlock; rowB++) {
+ for (int colB = 0; colB < colBlock; colB++) {
+ rowStride = (rowB == rowBlock - 1) ? m - (rowBlock - 1) * (m / rowBlock)
+ : m / rowBlock;
+ colStride = (colB == colBlock - 1) ? k - (colBlock - 1) * (k / colBlock)
+ : k / colBlock;
+ int recvfrom = rowB * colBlock + colB;
+ if (recvfrom != 0) {
+ temp_res = buf;
+ MPI_Recv(temp_res, rowStride * colStride, MPI_FLOAT, recvfrom, 0,
+ MPI_COMM_WORLD, &status);
+ } else {
+ temp_res = res;
+ }
+ for (int r = 0; r < rowStride; r++)
+ for (int c = 0; c < colStride; c++)
+ resultMat[rowB * (m / rowBlock) * k + colB * (k / colBlock) +
+ r * k + c] = temp_res[r * colStride + c];
+ }
+ }
+ } else {
+ rowStride = ((rank / colBlock) == rowBlock - 1)
+ ? m - (rowBlock - 1) * (m / rowBlock)
+ : m / rowBlock;
+ colStride = ((rank % colBlock) == colBlock - 1)
+ ? k - (colBlock - 1) * (k / colBlock)
+ : k / colBlock;
+ if (rank < worldsize)
+ MPI_Send(res, rowStride * colStride, MPI_FLOAT, 0, 0, MPI_COMM_WORLD);
+ }
+ MPI_Barrier(MPI_COMM_WORLD);
+
+ return;
+}
+
+int main(int argc, char *argv[]) {
+ if (argc != 5) {
+ cout << "Usage: " << argv[0] << " M N K use-blas\n";
+ exit(-1);
+ }
+
+ int rank;
+ int worldSize;
+ MPI_Init(&argc, &argv);
+
+ MPI_Comm_size(MPI_COMM_WORLD, &worldSize);
+ MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+
+ int m = atoi(argv[1]);
+ int n = atoi(argv[2]);
+ int k = atoi(argv[3]);
+ int blas = atoi(argv[4]);
+
+ float *leftMat, *rightMat, *resMat;
+
+ struct timeval start, stop;
+ if (rank == 0) {
+ randMat(m, n, leftMat);
+ randMat(n, k, rightMat);
+ randMat(m, k, resMat);
+ }
+ gettimeofday(&start, NULL);
+ mpi_sgemm(m, n, k, leftMat, rightMat, resMat, rank, worldSize, blas);
+ gettimeofday(&stop, NULL);
+ if (rank == 0) {
+ cout << "mpi matmul: "
+ << (stop.tv_sec - start.tv_sec) * 1000.0 +
+ (stop.tv_usec - start.tv_usec) / 1000.0
+ << " ms" << endl;
+
+ for (int i = 0; i < m; i++) {
+ for (int j = 0; j < k; j++)
+ if (int(resMat[i * k + j]) != n) {
+ cout << resMat[i * k + j] << "error\n";
+ exit(-1);
+ }
+ }
+ }
+ MPI_Finalize();
+}
diff --git a/benchmark/gemm/run.sh b/benchmark/gemm/run.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4452ca479124203b951bb9e480b789f0baa88287
--- /dev/null
+++ b/benchmark/gemm/run.sh
@@ -0,0 +1,30 @@
+flags="**************"
+armRun(){
+ mpi_cmd="mpirun --allow-run-as-root -x OMP_NUM_THREADS=4 -mca btl ^vader,tcp,openib,uct -np 16"
+ echo "${flags}benching openblas gemm, best 405ms${flags}"
+ make
+ ${mpi_cmd} ./gemm 4024 4024 4024 1
+ echo "${flags}benching kml gemm, best 216ms${flags}"
+ make gemm-kml
+ ${mpi_cmd} ./gemm-kml 4024 4024 4024 1
+ echo "${flags}benching MPI perf, best 1855ms${flags}"
+ ${mpi_cmd} ./gemm 4024 4024 4024 0
+}
+
+x86Run(){
+ mpi_cmd="mpirun -genv OMP_NUM_THREADS=4 -n 16"
+ echo "${flags}benching openblas gemm, best 405ms${flags}"
+ make
+ ${mpi_cmd} ./gemm 4024 4024 4024 1
+ echo "${flags}benching MKL gemm, best 216ms${flags}"
+ make gemm-MKL
+ ${mpi_cmd} ./gemm-mkl 4024 4024 4024 1
+ echo "${flags}benching MPI perf, best 1855ms${flags}"
+ ${mpi_cmd} ./gemm 4024 4024 4024 0
+}
+# check Arch
+if [ x$(arch) = xaarch64 ];then
+ armRun
+else
+ x86Run
+fi
\ No newline at end of file
diff --git a/benchmark/mpi/reduce_avg.c b/benchmark/mpi/reduce_avg.c
new file mode 100644
index 0000000000000000000000000000000000000000..05a576be7505a36a5a0ff7a4ee575a243752c3d1
--- /dev/null
+++ b/benchmark/mpi/reduce_avg.c
@@ -0,0 +1,74 @@
+// Author: Wes Kendall
+// Copyright 2013 www.mpitutorial.com
+// This code is provided freely with the tutorials on mpitutorial.com. Feel
+// free to modify it for your own use. Any distribution of the code must
+// either provide a link to www.mpitutorial.com or keep this header intact.
+//
+// Program that computes the average of an array of elements in parallel using
+// MPI_Reduce.
+//
+#include
+#include
+#include
+#include
+#include
+
+// Creates an array of random numbers. Each number has a value from 0 - 1
+float *create_rand_nums(int num_elements) {
+ float *rand_nums = (float *)malloc(sizeof(float) * num_elements);
+ assert(rand_nums != NULL);
+ int i;
+ for (i = 0; i < num_elements; i++) {
+ rand_nums[i] = (rand() / (float)RAND_MAX);
+ }
+ return rand_nums;
+}
+
+int main(int argc, char** argv) {
+ if (argc != 2) {
+ fprintf(stderr, "Usage: avg num_elements_per_proc\n");
+ exit(1);
+ }
+
+ int num_elements_per_proc = atoi(argv[1]);
+
+ MPI_Init(NULL, NULL);
+
+ int world_rank;
+ MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
+ int world_size;
+ MPI_Comm_size(MPI_COMM_WORLD, &world_size);
+
+ // Create a random array of elements on all processes.
+ srand(time(NULL)*world_rank); // Seed the random number generator to get different results each time for each processor
+ float *rand_nums = NULL;
+ rand_nums = create_rand_nums(num_elements_per_proc);
+
+ // Sum the numbers locally
+ float local_sum = 0;
+ int i;
+ for (i = 0; i < num_elements_per_proc; i++) {
+ local_sum += rand_nums[i];
+ }
+
+ // Print the random numbers on each process
+ printf("Local sum for process %d - %f, avg = %f\n",
+ world_rank, local_sum, local_sum / num_elements_per_proc);
+
+ // Reduce all of the local sums into the global sum
+ float global_sum;
+ MPI_Reduce(&local_sum, &global_sum, 1, MPI_FLOAT, MPI_SUM, 0,
+ MPI_COMM_WORLD);
+
+ // Print the result
+ if (world_rank == 0) {
+ printf("Total sum = %f, avg = %f\n", global_sum,
+ global_sum / (world_size * num_elements_per_proc));
+ }
+
+ // Clean up
+ free(rand_nums);
+
+ MPI_Barrier(MPI_COMM_WORLD);
+ MPI_Finalize();
+}
\ No newline at end of file
diff --git a/benchmark/mpi/run.sh b/benchmark/mpi/run.sh
new file mode 100644
index 0000000000000000000000000000000000000000..265b300cb6673a879bd2fb961545f7b925b83da2
--- /dev/null
+++ b/benchmark/mpi/run.sh
@@ -0,0 +1,2 @@
+mpicc reduce_avg.c -o avg
+mpirun -n 2 --allow-run-as-root ./avg 2
\ No newline at end of file
diff --git a/benchmark/omp/Makefile b/benchmark/omp/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..254eec75896942942a0232b7a51b80a26a7687e2
--- /dev/null
+++ b/benchmark/omp/Makefile
@@ -0,0 +1,17 @@
+CC = gcc
+CCFLAGS = -fopenmp -O2
+NVCFLAGS =
+
+all: caclPI
+
+caclPI: caclPI.cpp
+ ${CC} ${CCFLAGS} caclPI.cpp -o caclPI
+
+gramSchmidt_gpu: gramSchmidt_gpu.c
+ nvc -mp=gpu -Minfo=mp -lm gramSchmidt_gpu.c -o gramSchmidt_gpu.o
+
+gramSchmidt_gpu_f90: gramSchmidt_gpu.F90
+ nvfortran -mp=gpu -Minfo=mp -lm gramSchmidt_gpu.F90 -o gramSchmidt_gpu_f.o
+
+clean:
+ rm -rf caclPI gramSchmidt_gpu
diff --git a/benchmark/omp/caclPI.cpp b/benchmark/omp/caclPI.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..b68de200a488065f1e258f36d4f79dceceace29d
--- /dev/null
+++ b/benchmark/omp/caclPI.cpp
@@ -0,0 +1,24 @@
+
+#include
+#include
+
+#define NUM_THREADS 32
+static long num_steps = 100000000;
+
+int main ()
+{
+ int i;
+ double x, pi, sum = 0.0, step, start_time,end_time;
+ step = 1.0/(double) num_steps;
+ omp_set_num_threads(NUM_THREADS);
+ start_time=omp_get_wtime();
+ #pragma omp parallel for reduction(+ : sum) private(x)
+ for (i=1;i<= num_steps; i++){
+ x = (i-0.5)*step;
+ sum = sum + 4.0/(1.0+x*x);
+ }
+ pi = step * sum;
+ end_time=omp_get_wtime();
+ printf("Pi = %16.15f\n Running time:%.3f ms \n", pi, end_time - start_time);
+ return 1;
+}
diff --git a/benchmark/omp/gramSchmidt_gpu.F90 b/benchmark/omp/gramSchmidt_gpu.F90
new file mode 100644
index 0000000000000000000000000000000000000000..aa1afd6d6d5abc1de4799fd4e576671d34b6c0d1
--- /dev/null
+++ b/benchmark/omp/gramSchmidt_gpu.F90
@@ -0,0 +1,34 @@
+! @@name: target_data.3f
+! @@type: F-free
+! @@compilable: yes
+! @@linkable: no
+! @@expect: success
+! @@version: omp_4.0
+subroutine gramSchmidt(Q,rows,cols)
+ integer :: rows,cols, i,k
+ double precision :: Q(rows,cols), tmp
+ !$omp target data map(Q)
+ do k=1,cols
+ tmp = 0.0d0
+ !$omp target map(tofrom: tmp)
+ !$omp parallel do reduction(+:tmp)
+ do i=1,rows
+ tmp = tmp + (Q(i,k) * Q(i,k))
+ end do
+ !$omp end target
+
+ tmp = 1.0d0/sqrt(tmp)
+
+ !$omp target
+ !$omp parallel do
+ do i=1,rows
+ Q(i,k) = Q(i,k)*tmp
+ enddo
+ !$omp end target
+ end do
+ !$omp end target data
+end subroutine
+
+! Note: The variable tmp is now mapped with tofrom, for correct
+! execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro.
+
\ No newline at end of file
diff --git a/benchmark/omp/gramSchmidt_gpu.c b/benchmark/omp/gramSchmidt_gpu.c
new file mode 100644
index 0000000000000000000000000000000000000000..9cae585ffa7df5c1a8aeca46504c2bd0ddb88d38
--- /dev/null
+++ b/benchmark/omp/gramSchmidt_gpu.c
@@ -0,0 +1,59 @@
+#include
+#include
+#include
+
+#define COLS 1000
+#define ROWS 1000
+#define FLOAT_T float
+
+FLOAT_T *getFinput(int scale)
+{
+ FLOAT_T *input;
+ if ((input = (FLOAT_T *)malloc(sizeof(FLOAT_T) * scale)) == NULL)
+ {
+ fprintf(stderr, "Out of Memory!!\n");
+ exit(1);
+ }
+ for (int i = 0; i < scale; i++)
+ {
+ input[i] = ((FLOAT_T)rand() / (FLOAT_T)RAND_MAX) - 0.5;
+ }
+ return input;
+}
+
+FLOAT_T **get2Darr(int M, int N)
+{
+ FLOAT_T **input;
+ input = (FLOAT_T **)malloc(M * sizeof(FLOAT_T *));
+ for (int i = 0; i < M; i++)
+ {
+ input[i] = (FLOAT_T *)malloc(N * sizeof(FLOAT_T));
+ }
+ return input;
+}
+
+void gramSchmidt_gpu(FLOAT_T **Q)
+{
+ int cols = COLS;
+ #pragma omp target data map(Q[0:ROWS][0:cols])
+ for(int k=0; k < cols; k++)
+ {
+ double tmp = 0.0;
+ #pragma omp target map(tofrom: tmp)
+ #pragma omp parallel for reduction(+:tmp)
+ for(int i=0; i < ROWS; i++)
+ tmp += (Q[i][k] * Q[i][k]);
+ tmp = 1/sqrt(tmp);
+ #pragma omp target
+ #pragma omp parallel for
+ for(int i=0; i < ROWS; i++)
+ Q[i][k] *= tmp;
+ }
+}
+
+int main()
+{
+ FLOAT_T **Q = get2Darr(ROWS, COLS);
+ gramSchmidt_gpu(Q);
+ return;
+}
diff --git a/benchmark/omp/run.sh b/benchmark/omp/run.sh
new file mode 100644
index 0000000000000000000000000000000000000000..3784d96de306443521a0c6fb52abe11e6cc122f9
--- /dev/null
+++ b/benchmark/omp/run.sh
@@ -0,0 +1,18 @@
+flags="**************"
+armRun(){
+ echo "${flags}benching omp perf, best 0.023ms${flags}"
+ make
+ ./caclPI
+ make gramSchmidt_gpu
+ ./gramSchmidt_gpu
+}
+
+x86Run(){
+ armRun
+}
+# check Arch
+if [ x$(arch) = xaarch64 ];then
+ armRun
+else
+ x86Run
+fi
\ No newline at end of file
diff --git a/benchmark/p2p/Makefile b/benchmark/p2p/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..e9c55f7d0572b79b09fb31340e90e2864ec2e83f
--- /dev/null
+++ b/benchmark/p2p/Makefile
@@ -0,0 +1,67 @@
+##
+ # =====================================================================================
+ #
+ # Filename: Makefile
+ #
+ # Description: This microbenchmark is to obtain the latency & uni/bi-directional
+ # bandwidth for PCI-e, NVLink-V1 in NVIDIA P100 DGX-1 and NVLink-V2 in
+ # V100 DGX-1. Please see our IISWC-18 paper titled "Tartan: Evaluating
+ # Modern GPU Interconnect via a Multi-GPU Benchmark Suite". The
+ # Code is modified from the p2pBandwidthLatencyTest app in
+ # NVIDIA CUDA-SDK. Please follow NVIDIA's EULA for end usage.
+ #
+ # Version: 1.0
+ # Created: 01/24/2018 02:12:31 PM
+ # Revision: none
+ # Compiler: GNU-Make
+ #
+ # Author: Ang Li, PNNL
+ # Website: http://www.angliphd.com
+ #
+ # =====================================================================================
+##
+
+
+################################################################################
+#
+# Copyright 1993-2015 NVIDIA Corporation. All rights reserved.
+#
+# NOTICE TO USER:
+#
+# This source code is subject to NVIDIA ownership rights under U.S. and
+# international Copyright laws.
+#
+# NVIDIA MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOURCE
+# CODE FOR ANY PURPOSE. IT IS PROVIDED "AS IS" WITHOUT EXPRESS OR
+# IMPLIED WARRANTY OF ANY KIND. NVIDIA DISCLAIMS ALL WARRANTIES WITH
+# REGARD TO THIS SOURCE CODE, INCLUDING ALL IMPLIED WARRANTIES OF
+# MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE.
+# IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL,
+# OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
+# OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
+# OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE
+# OR PERFORMANCE OF THIS SOURCE CODE.
+#
+# U.S. Government End Users. This source code is a "commercial item" as
+# that term is defined at 48 C.F.R. 2.101 (OCT 1995), consisting of
+# "commercial computer software" and "commercial computer software
+# documentation" as such terms are used in 48 C.F.R. 12.212 (SEPT 1995)
+# and is provided to the U.S. Government only as a commercial end item.
+# Consistent with 48 C.F.R.12.212 and 48 C.F.R. 227.7202-1 through
+# 227.7202-4 (JUNE 1995), all U.S. Government End Users acquire the
+# source code with only those rights set forth herein.
+#
+################################################################################
+#
+# Makefile project only supported on Mac OS X and Linux Platforms)
+#
+################################################################################
+
+include shared.mk
+
+p2pTest: p2pBandwidthLatencyTest.cu
+ $(NVCC) $(NVCC_FLAGS) $^ -o $@
+
+clean:
+ rm -f p2pTest
+
diff --git a/benchmark/p2p/p2pBandwidthLatencyTest.cu b/benchmark/p2p/p2pBandwidthLatencyTest.cu
new file mode 100644
index 0000000000000000000000000000000000000000..e3aaec08c278e74bb47a98b5a474d515f525164a
--- /dev/null
+++ b/benchmark/p2p/p2pBandwidthLatencyTest.cu
@@ -0,0 +1,653 @@
+/*
+ * =====================================================================================
+ *
+ * Filename: p2pBandwidthLatencyTest.cu
+ *
+ * Description: This microbenchmark is to obtain the latency & uni/bi-directional
+ * bandwidth for PCI-e, NVLink-V1 in NVIDIA P100 DGX-1 and NVLink-V2 in
+ * V100 DGX-1. Please see our IISWC-18 paper titled "Tartan: Evaluating
+ * Modern GPU Interconnects via a Multi-GPU Benchmark Suite". The
+ * Code is modified from the p2pBandwidthLatencyTest app in
+ * NVIDIA CUDA-SDK. Please follow NVIDIA's EULA for end usage.
+ *
+ * Version: 1.0
+ * Created: 01/24/2018 02:12:31 PM
+ * Revision: none
+ * Compiler: nvcc
+ *
+ * Author: Ang Li, PNNL
+ * Website: http://www.angliphd.com
+ *
+ * =====================================================================================
+ */
+
+/*
+ * Copyright 1993-2015 NVIDIA Corporation. All rights reserved.
+ *
+ * Please refer to the NVIDIA end user license agreement (EULA) associated
+ * with this source code for terms and conditions that govern your use of
+ * this software. Any use, reproduction, disclosure, or distribution of
+ * this software and related documentation outside the terms of the EULA
+ * is strictly prohibited.
+ *
+ */
+
+#define ASCENDING
+
+#include
+#include
+
+using namespace std;
+
+const char *sSampleName = "P2P (Peer-to-Peer) GPU Bandwidth Latency Test";
+
+//Macro for checking cuda errors following a cuda launch or api call
+#define cudaCheckError() { \
+ cudaError_t e=cudaGetLastError(); \
+ if(e!=cudaSuccess) { \
+ printf("Cuda failure %s:%d: '%s'\n",__FILE__,__LINE__,cudaGetErrorString(e)); \
+ exit(EXIT_SUCCESS); \
+ } \
+ }
+__global__ void delay(int * null) {
+ float j=threadIdx.x;
+ for(int i=1;i<10000;i++)
+ j=(j+1)/j;
+
+ if(threadIdx.x == j) null[0] = j;
+}
+
+void checkP2Paccess(int numGPUs)
+{
+ for (int i=0; i buffers(numGPUs);
+ vector start(numGPUs);
+ vector stop(numGPUs);
+
+ for (int d=0; d bandwidthMatrix(numGPUs*numGPUs);
+
+ for (int i=0; i=0; k--)
+#endif
+ {
+ cudaDeviceCanAccessPeer(&src2route,i,k);
+ cudaDeviceCanAccessPeer(&route2dst,k,j);
+ if (src2route && route2dst)
+ {
+ routingnode = k;
+ break;
+ }
+ }
+ cudaDeviceEnablePeerAccess(routingnode,0 );
+ cudaCheckError();
+ cudaSetDevice(routingnode);
+ cudaDeviceEnablePeerAccess(j,0 );
+ cudaSetDevice(i);
+ }
+ }
+
+ cudaDeviceSynchronize();
+ cudaCheckError();
+
+ if (routingrequired)
+ {
+ delay<<<1,1>>>(NULL);
+ cudaEventRecord(start[i]);
+ for (int r=0; r>>(NULL);
+ cudaEventRecord(start[i]);
+
+ for (int r=0; r buffers(numGPUs);
+ vector start(numGPUs);
+ vector stop(numGPUs);
+ vector stream0(numGPUs);
+ vector stream1(numGPUs);
+
+ for (int d=0; d bandwidthMatrix(numGPUs*numGPUs);
+
+ for (int i=0; i=0; k--)
+#endif
+ {
+ cudaDeviceCanAccessPeer(&src2route,i,k);
+ cudaDeviceCanAccessPeer(&route2dst,k,j);
+ if (src2route && route2dst)
+ {
+ routingnode = k;
+ break;
+ }
+ }
+ cudaSetDevice(i);
+ cudaDeviceEnablePeerAccess(routingnode,0 );
+ cudaCheckError();
+ cudaSetDevice(routingnode);
+ cudaDeviceEnablePeerAccess(i,0 );
+ cudaCheckError();
+ cudaDeviceEnablePeerAccess(j,0 );
+ cudaCheckError();
+ cudaSetDevice(j);
+ cudaDeviceEnablePeerAccess(routingnode,0 );
+ cudaSetDevice(i);
+ cudaCheckError();
+ }
+ }
+
+ cudaSetDevice(i);
+ cudaDeviceSynchronize();
+ cudaCheckError();
+
+ if (routingrequired)
+ {
+ delay<<<1,1>>>(NULL);
+ cudaEventRecord(start[i]);
+ for (int r=0; r>>(NULL);
+ cudaEventRecord(start[i]);
+
+ for (int r=0; r buffers(numGPUs);
+ vector start(numGPUs);
+ vector stop(numGPUs);
+
+ for (int d=0; d latencyMatrix(numGPUs*numGPUs);
+
+ for (int i=0; i=0; k--)
+#endif
+ {
+ cudaDeviceCanAccessPeer(&src2route,i,k);
+ cudaDeviceCanAccessPeer(&route2dst,k,j);
+ if (src2route && route2dst)
+ {
+ routingnode = k;
+ break;
+ }
+ }
+ cudaSetDevice(i);
+ cudaDeviceEnablePeerAccess(routingnode,0 );
+ cudaCheckError();
+ cudaSetDevice(routingnode);
+ cudaDeviceEnablePeerAccess(j,0 );
+ cudaCheckError();
+ cudaSetDevice(i);
+ }
+ }
+ cudaDeviceSynchronize();
+ cudaCheckError();
+
+
+ if (routingrequired)
+ {
+ delay<<<1,1>>>(NULL);
+ cudaEventRecord(start[i]);
+
+ for (int r=0; r>>(NULL);
+ cudaEventRecord(start[i]);
+
+ for (int r=0; r%d=>%d,(access:%d,routingrequired:%d\n",i,routingnode,j,access, routingrequired);
+ cudaCheckError();
+ cudaDeviceDisablePeerAccess(routingnode);
+ cudaCheckError();
+ cudaSetDevice(routingnode);
+ cudaDeviceDisablePeerAccess(j);
+ cudaCheckError();
+ cudaSetDevice(i);
+ }
+ }
+ }
+ }
+
+ printf(" D\\D");
+
+ for (int j=0; j
+
+
+
+Software Download:
+
+
+ X86
+ ARM
+ bisheng 2.1.0
+
+
\ No newline at end of file
diff --git a/examples/cuda/Makefile b/examples/cuda/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..81e7d3ad511e20b7aadde1e4e792ff03001ddc33
--- /dev/null
+++ b/examples/cuda/Makefile
@@ -0,0 +1,12 @@
+ARCH=sm_80
+NVCC_FLAGS = -arch=$(ARCH) -O3
+CUDA_DIR = /usr/local/cuda/
+# CUDA compiler
+NVCC = $(CUDA_DIR)/bin/nvcc
+all: cuda
+
+cuda: cuda.cu
+ $(NVCC) $(NVCC_FLAGS) $^ -o $@.o
+
+clean:
+ rm -f cuda.o
\ No newline at end of file
diff --git a/examples/cuda/cuda.cu b/examples/cuda/cuda.cu
new file mode 100644
index 0000000000000000000000000000000000000000..c8aace498be089ce805579e61bff0d288933e59e
--- /dev/null
+++ b/examples/cuda/cuda.cu
@@ -0,0 +1,102 @@
+// nvcc cuda_hello.cu -o hello.o
+#include
+#define MAX_DEVICE 2
+#define RTERROR(status, s) \
+ if (status != cudaSuccess) \
+ { \
+ printf("%s %s\n", s, cudaGetErrorString(status)); \
+ cudaDeviceReset(); \
+ exit(-1); \
+ }
+
+//HelloFromGPU<<<1, 5>>>();
+__global__ void HelloFromGPU(void)
+{
+ printf("Hello from GPU\n");
+}
+
+int getDeviceCount() {
+ cudaError_t status;
+ int gpuCount = 0;
+ status = cudaGetDeviceCount(&gpuCount);
+ RTERROR(status, "cudaGetDeviceCount failed");
+ if (gpuCount == 0)
+ {
+ printf("No CUDA-capable devices found, exiting.\n");
+ cudaDeviceReset();
+ exit(-1);
+ }
+ return gpuCount;
+}
+
+cudaDeviceProp getProps(int device)
+{
+ cudaDeviceProp deviceProp;
+ cudaGetDeviceProperties(&deviceProp, device);
+ return deviceProp;
+}
+
+void cudaGetSetDevice(){
+ cudaError_t status;
+ int device = 0;
+ status = cudaGetDevice(&device);
+ RTERROR(status, "Error fetching current GPU");
+ status = cudaSetDevice(device);
+ RTERROR(status, "Error setting CUDA device");
+ cudaDeviceSynchronize();
+}
+
+void isSupportP2P(int gpuCount)
+{
+ int uvaOrdinals[MAX_DEVICE];
+ int uvaCount = 0;
+ int i, j;
+ cudaDeviceProp prop;
+ for (i = 0; i < gpuCount; ++i)
+ {
+ cudaGetDeviceProperties(&prop, i);
+ if (prop.unifiedAddressing)
+ {
+ uvaOrdinals[uvaCount] = i;
+ printf(" GPU%d \"%15s\"\n", i, prop.name);
+ uvaCount += 1;
+ }
+ else
+ printf(" GPU%d \"%15s\" NOT UVA capable\n", i, prop.name);
+ }
+ int canAccessPeer_ij, canAccessPeer_ji;
+ for (i = 0; i < uvaCount; ++i)
+ {
+ for (j = i + 1; j < uvaCount; ++j)
+ {
+ cudaDeviceCanAccessPeer(&canAccessPeer_ij, uvaOrdinals[i], uvaOrdinals[j]);
+ cudaDeviceCanAccessPeer(&canAccessPeer_ji, uvaOrdinals[j], uvaOrdinals[i]);
+ if (canAccessPeer_ij * canAccessPeer_ji)
+ {
+ printf(" GPU%d and GPU%d: YES\n", uvaOrdinals[i], uvaOrdinals[j]);
+ }
+ else
+ {
+ printf(" GPU%d and GPU%d: NO\n", uvaOrdinals[i], uvaOrdinals[j]);
+ }
+ }
+ }
+}
+
+int main(void)
+{
+ // get GPU Number
+ int gpuCount = getDeviceCount();
+ printf("gpucount:%d\n", gpuCount);
+ // get SM Number
+ cudaDeviceProp deviceProp = getProps(0);
+ printf("SM number:%d\n", deviceProp.multiProcessorCount);
+ // get Mode info
+ if (deviceProp.computeMode == cudaComputeModeDefault)
+ {
+ printf("GPU is in Compute Mode.\n");
+ }
+ // get P2P support info
+ isSupportP2P(gpuCount);
+ return 0;
+}
diff --git a/examples/false_sharing/Makefile b/examples/false_sharing/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..3039d437ef86aa8994e3a650ebeb4d189f98b195
--- /dev/null
+++ b/examples/false_sharing/Makefile
@@ -0,0 +1,10 @@
+CC = gcc
+LDLIBS = -lnuma -lpthread
+binary = false_sharing.exe
+source = false_sharing_example.c
+.PHONY : clean
+
+$(binary) : $(source)
+ $(CC) $(LDLIBS) -o $@ $<
+clean :
+ -rm $(binary) $(objects)
diff --git a/examples/false_sharing/ReadMe.txt b/examples/false_sharing/ReadMe.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d009b54cae33977c0cbdc73496f00d4b16b0c8ec
--- /dev/null
+++ b/examples/false_sharing/ReadMe.txt
@@ -0,0 +1,35 @@
+install numactl-devel, in order to use numa.h
+1.rpm -ivh numactl-devel-2.0.13-4.ky10.x86_64.rpm
+compile
+2.make
+start perf...
+3.perf c2c record ./false_sharing.exe 2
+start report...
+4.perf c2c report -NN -g -c pid,iaddr --stdio
+ Load Local HITM : 2010 【too High, false_sharing is detected】
+ Load Remote HITM : 1315
+ Load Remote HIT : 0
+ Load Local DRAM : 71
+ Load Remote DRAM : 1881
+ Load MESI State Exclusive : 1881
+ Load MESI State Shared : 71
+ Load LLC Misses : 3267
+ LLC Misses to Local DRAM : 2.2%
+ LLC Misses to Remote DRAM : 57.6%
+ LLC Misses to Remote cache (HIT) : 0.0%
+ LLC Misses to Remote cache (HITM) : 40.3%
+compile no false_sharing code
+7.gcc -g false_sharing_example.c -pthread -lnuma -DNO_FALSE_SHARING -o no_false_sharing.exe
+8.perf c2c report -NN -g -c pid,iaddr --stdio
+ Load Local HITM : 6【normal, false_sharing is erased】
+ Load Remote HITM : 486
+ Load Remote HIT : 0
+ Load Local DRAM : 1
+ Load Remote DRAM : 498
+ Load MESI State Exclusive : 498
+ Load MESI State Shared : 1
+ Load LLC Misses : 985
+ LLC Misses to Local DRAM : 0.1%
+ LLC Misses to Remote DRAM : 50.6%
+ LLC Misses to Remote cache (HIT) : 0.0%
+ LLC Misses to Remote cache (HITM) : 49.3%
\ No newline at end of file
diff --git a/examples/false_sharing/false_sharing_example.c b/examples/false_sharing/false_sharing_example.c
new file mode 100644
index 0000000000000000000000000000000000000000..900f1ee17f5b0f32a0f812b49864bce037966a21
--- /dev/null
+++ b/examples/false_sharing/false_sharing_example.c
@@ -0,0 +1,268 @@
+/*
+ * This is an example program to show false sharing between
+ * numa nodes.
+ *
+ * It can be compiled two ways:
+ * gcc -g false_sharing_example.c -pthread -lnuma -o false_sharing.exe
+ * gcc -g false_sharing_example.c -pthread -lnuma -DNO_FALSE_SHARING -o no_false_sharing.exe
+ *
+ * The -DNO_FALSE_SHARING macro reduces the false sharing by expanding the shared data
+ * structure into two different cachelines, (and it runs faster).
+ *
+ * The usage is:
+ * ./false_sharing.exe
+ * ./no_false_sharing.exe
+ *
+ * The program will make half the threads writer threads and half reader
+ * threads. It will pin those threads in round-robin format to the
+ * different numa nodes in the system.
+ *
+ * For example, on a system with 4 numa nodes:
+ * ./false_sharing.exe 2
+ * 12165 mticks, reader_thd (thread 6), on node 2 (cpu 144).
+ * 12403 mticks, reader_thd (thread 5), on node 1 (cpu 31).
+ * 12514 mticks, reader_thd (thread 4), on node 0 (cpu 96).
+ * 12703 mticks, reader_thd (thread 7), on node 3 (cpu 170).
+ * 12982 mticks, lock_th (thread 0), on node 0 (cpu 1).
+ * 13018 mticks, lock_th (thread 1), on node 1 (cpu 24).
+ * 13049 mticks, lock_th (thread 3), on node 3 (cpu 169).
+ * 13050 mticks, lock_th (thread 2), on node 2 (cpu 49).
+ *
+ * # ./no_false_sharing.exe 2
+ * 1918 mticks, reader_thd (thread 4), on node 0 (cpu 96).
+ * 2432 mticks, reader_thd (thread 7), on node 3 (cpu 170).
+ * 2468 mticks, reader_thd (thread 6), on node 2 (cpu 146).
+ * 3903 mticks, reader_thd (thread 5), on node 1 (cpu 40).
+ * 7560 mticks, lock_th (thread 0), on node 0 (cpu 1).
+ * 7574 mticks, lock_th (thread 2), on node 2 (cpu 145).
+ * 7602 mticks, lock_th (thread 3), on node 3 (cpu 169).
+ * 7625 mticks, lock_th (thread 1), on node 1 (cpu 24).
+ *
+ */
+
+#define _MULTI_THREADED
+#define _GNU_SOURCE
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+/*
+ * A thread on each numa node seems to provoke cache misses
+ */
+#define LOOP_CNT (5 * 1024 * 1024)
+
+#if defined(__x86_64__) || defined(__i386__)
+static __inline__ uint64_t rdtsc() {
+ unsigned hi, lo;
+ __asm__ __volatile__ ( "rdtsc" : "=a"(lo), "=d"(hi));
+ return ( (uint64_t)lo) | ( ((uint64_t)hi) << 32);
+}
+
+#elif defined(__aarch64__)
+static __inline__ uint64_t rdtsc(void)
+{
+ uint64_t val;
+
+ /*
+ * According to ARM DDI 0487F.c, from Armv8.0 to Armv8.5 inclusive, the
+ * system counter is at least 56 bits wide; from Armv8.6, the counter
+ * must be 64 bits wide. So the system counter could be less than 64
+ * bits wide and it is attributed with the flag 'cap_user_time_short'
+ * is true.
+ */
+ asm volatile("mrs %0, cntvct_el0" : "=r" (val));
+
+ return val;
+}
+#endif
+
+
+/*
+ * Create a struct where reader fields share a cacheline with the hot lock field.
+ * Compiling with -DNO_FALSE_SHARING inserts padding to avoid that sharing.
+ */
+typedef struct _buf {
+ long lock0;
+ long lock1;
+ long reserved1;
+#if defined(NO_FALSE_SHARING)
+ long pad[5]; // to keep the 'lock*' fields on their own cacheline.
+#else
+ long pad[1]; // to provoke false sharing.
+#endif
+ long reader1;
+ long reader2;
+ long reader3;
+ long reader4;
+} buf __attribute__((aligned (64)));
+
+buf buf1;
+buf buf2;
+
+volatile int wait_to_begin = 1;
+struct thread_data *thread;
+int max_node_num;
+int num_threads;
+char * lock_thd_name = "lock_th";
+char * reader_thd_name = "reader_thd";
+
+#define checkResults(string, val) { \
+ if (val) { \
+ printf("Failed with %d at %s", val, string); \
+ exit(1); \
+ } \
+}
+
+struct thread_data {
+ pthread_t tid;
+ long tix;
+ long node;
+ char *name;
+};
+
+/*
+ * Bind a thread to the specified numa node.
+*/
+void setAffinity(void *parm) {
+ volatile uint64_t rc, j;
+ int node = ((struct thread_data *)parm)->node;
+ char *func_name = ((struct thread_data *)parm)->name;
+
+ numa_run_on_node(node);
+ pthread_setname_np(pthread_self(),func_name);
+}
+
+/*
+ * Thread function to simulate the false sharing.
+ * The "lock" threads will test-n-set the lock field,
+ * while the reader threads will just read the other fields
+ * in the struct.
+ */
+extern void *read_write_func(void *parm) {
+
+ int tix = ((struct thread_data *)parm)->tix;
+ uint64_t start, stop, j;
+ char *thd_name = ((struct thread_data *)parm)->name;
+
+ // Pin each thread to a numa node.
+ setAffinity(parm);
+
+ // Wait for all threads to get created before starting.
+ while(wait_to_begin) ;
+
+ start = rdtsc();
+ for(j=0; j\n", argv[0] );
+ printf( "where \"n\" is the number of threads per node\n");
+ exit(1);
+ }
+
+ if ( numa_available() < 0 )
+ {
+ printf( "NUMA not available\n" );
+ exit(1);
+ }
+
+ int thread_cnt = atoi(argv[1]);
+
+ max_node_num = numa_max_node();
+ if ( max_node_num == 0 )
+ max_node_num = 1;
+ int node_cnt = max_node_num + 1;
+
+ // Use "thread_cnt" threads per node.
+ num_threads = (max_node_num +1) * thread_cnt;
+
+ thread = malloc( sizeof(struct thread_data) * num_threads);
+
+ // Create the first half of threads as lock threads.
+ // Assign each thread a successive round robin node to
+ // be pinned to (later after it gets created.)
+ //
+ for (i=0; i<=(num_threads/2 - 1); i++) {
+ thread[i].tix = i;
+ thread[i].node = i%node_cnt;
+ thread[i].name = lock_thd_name;
+ rc = pthread_create(&thread[i].tid, NULL, read_write_func, &thread[i]);
+ checkResults("pthread_create()\n", rc);
+ usleep(500);
+ }
+
+ // Create the second half of threads as reader threads.
+ // Assign each thread a successive round robin node to
+ // be pinned to (later after it gets created.)
+ //
+ for (i=((num_threads/2)); i<(num_threads); i++) {
+ thread[i].tix = i;
+ thread[i].node = i%node_cnt;
+ thread[i].name = reader_thd_name;
+ rc = pthread_create(&thread[i].tid, NULL, read_write_func, &thread[i]);
+ checkResults("pthread_create()\n", rc);
+ usleep(500);
+ }
+
+ // Sync to let threads start together
+ usleep(500);
+ wait_to_begin = 0;
+
+ for (i=0; i length:
- print(f"You don't have {nodes} nodes, only {length} nodes available!")
- sys.exit()
- if nodes <= 1:
- return
- gen_nodes = '\n'.join(self.avail_ips_list[:nodes])
- print(f"HOSTFILE\n{gen_nodes}\nGENERATED.")
- self.write_file('hostfile', gen_nodes)
-
- # single run
- def run(self):
- print(f"start run {Data.app_name}")
- nodes = int(Data.run_cmd['nodes'])
- self.gen_hostfile(nodes)
- run_cmd = self.hpc_data.get_run_cmd()
- self.exe.exec_raw(run_cmd)
-
- def batch_run(self):
- batch_file = 'Batch_run.sh'
- print(f"start batch run {Data.app_name}")
- batch_content = f'''
-cd {Data.case_dir}
-{Data.batch_cmd}
-'''
- with open(batch_file, 'w') as f:
- f.write(batch_content)
- run_cmd = f'''
-chmod +x {batch_file}
-./{batch_file}
-'''
- self.exe.exec_raw(run_cmd)
-
- def change_yum_repo(self):
- print(f"start yum repo change")
- repo_cmd = '''
-cp ./config/yum/*.repo /etc/yum.repos.d/
-yum clean all
-yum makecache
-'''
- self.exe.exec_raw(repo_cmd)
-
- def get_pid(self):
- #get pid
- pid_cmd = f'pidof {Data.binary_file}'
- result = self.exe.exec_popen(pid_cmd)
- if len(result) == 0:
- print("failed to get pid.")
- sys.exit()
- else:
- pid_list = result[0].split(' ')
- return pid_list[0].strip()
-
- def perf(self):
- print(f"start perf {Data.app_name}")
- #get pid
- pid = self.get_pid()
- #start perf && analysis
- perf_cmd = f'''
-perf record -a -g -p {pid}
-perf report -i perf.data -F period,sample,overhead,symbol,dso,comm -s overhead --percent-limit 0.1% --stdio
-'''
- self.exe.exec_raw(perf_cmd)
-
- def gen_wget_url(self, out_dir='./downloads', url=''):
- head = "wget --no-check-certificate"
- out_para = "-P"
- if not os.path.exists(out_dir):
- os.makedirs(out_dir)
- download_url = f'{head} {out_para} {out_dir} {url}'
- return download_url
-
- def download(self):
- print(f"start download")
- for url in self.download_list:
- download_url = self.gen_wget_url(url=url)
- os.popen(download_url)
-
- def get_arch(self):
- arch = 'arm'
- if not self.isARM:
- arch = 'X86'
- return arch
-
- def get_cur_time(self):
- return re.sub(' |:', '-', self.tool.get_time_stamp())
-
- def gpu_perf(self):
- print(f"start gpu perf")
- run_cmd = self.hpc_data.get_run()
- gperf_cmd = f'''
-cd {Data.case_dir}
-nsys profile -y 5s -d 100s -o nsys-{self.get_arch()}-{self.get_cur_time()} {run_cmd}
- '''
- self.exe.exec_raw(gperf_cmd)
-
- def ncu_perf(self, kernel):
- print(f"start ncu perf")
- run_cmd = self.hpc_data.get_run()
- ncu_cmd = f'''
- cd {Data.case_dir}
- ncu --export ncu-{self.get_arch()}-{self.get_cur_time()} --import-source=yes --set full --kernel-name {kernel} --launch-skip 1735 --launch-count 1 {run_cmd}
- '''
- self.exe.exec_raw(ncu_cmd)
-
- def switch_config(self, config_file):
- print(f"Switch config file to {config_file}")
- with open(Data.meta_file, 'w') as f:
- f.write(config_file.strip())
- print("Successfully switched.")
-
- def main(self):
- if self.args.version:
- print("V1.0")
-
- if self.args.info:
- self.get_machine_info()
-
- if self.args.env:
- self.env()
-
- if self.args.clean:
- self.clean()
-
- if self.args.build:
- self.build()
-
- if self.args.run:
- self.run()
-
- if self.args.perf:
- self.perf()
-
- if self.args.rbatch:
- self.batch_run()
-
- if self.args.download:
- self.download()
-
- if self.args.gpuperf:
- self.gpu_perf()
-
- if self.args.ncuperf:
- self.ncu_perf(self.args.ncuperf[0])
-
- if self.args.use:
- self.switch_config(self.args.use[0])
-
- if self.args.network:
- self.check_network()
-
- if self.args.yum:
- self.change_yum_repo()
-
- data_list = self.args.compare
- if data_list and len(data_list) == 2:
- print(f"start compare {Data.app_name}")
- self.compare(data_list[0], data_list[1])
-
-if __name__ == '__main__':
- HPCRunner().main()
diff --git a/package/bisheng/1.3.3/install.sh b/package/bisheng/1.3.3/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..118c96b8b082dbd679d71042ae02df4bccd876ea
--- /dev/null
+++ b/package/bisheng/1.3.3/install.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+#download from https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-2.1.0-aarch64-linux.tar.gz
+set -e
+cd ${JARVIS_TMP}
+tar xzvf ${JARVIS_DOWNLOAD}/bisheng-compiler-1.3.3-aarch64-linux.tar.gz -C $1 --strip-components=1
\ No newline at end of file
diff --git a/package/bisheng/2.1.0/install.sh b/package/bisheng/2.1.0/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..717c1e1931552d3b44b27886383823ea757884d4
--- /dev/null
+++ b/package/bisheng/2.1.0/install.sh
@@ -0,0 +1,6 @@
+#download from https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-2.1.0-aarch64-linux.tar.gz
+#!/bin/bash
+set -e
+cd ${JARVIS_TMP}
+yum -y install libatomic libstdc++ libstdc++-devel
+tar xzvf ${JARVIS_DOWNLOAD}/bisheng-compiler-2.1.0-aarch64-linux.tar.gz -C $1 --strip-components=1
\ No newline at end of file
diff --git a/package/boost/1.72.0/install.sh b/package/boost/1.72.0/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..d95b972fb1aba8b46c13fe644acbdb7b754059e5
--- /dev/null
+++ b/package/boost/1.72.0/install.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/boost_1_72_0.tar.gz
+cd boost_1_72_0
+./bootstrap.sh
+./b2 install --prefix=$1
\ No newline at end of file
diff --git a/package/cmake/3.20.5/install.sh b/package/cmake/3.20.5/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..fb01ef8d0b6c2904f4b859ffa3d6bb0a719d6add
--- /dev/null
+++ b/package/cmake/3.20.5/install.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/cmake-3.20.5-linux-aarch64.tar.gz -C $1 --strip-components=1
\ No newline at end of file
diff --git a/package/fftw/3.3.10/install.sh b/package/fftw/3.3.10/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..d732a7178dc63acb02e3b52415c86737fc976a0f
--- /dev/null
+++ b/package/fftw/3.3.10/install.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+rm -rf fftw-3.3.10
+tar -xvf ${JARVIS_DOWNLOAD}/fftw-3.3.10.tar.gz
+cd fftw-3.3.10
+./configure --prefix=$1 MPICC=mpicc --enable-shared --enable-threads --enable-openmp --enable-mpi
+make -j install
\ No newline at end of file
diff --git a/package/fftw/3.3.8/install.sh b/package/fftw/3.3.8/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..df0242ae8d092bbf939326aa64b2252cd9a50485
--- /dev/null
+++ b/package/fftw/3.3.8/install.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/fftw-3.3.8.tar.gz
+cd fftw-3.3.8
+./configure --prefix=$1 MPICC=mpicc --enable-shared --enable-threads --enable-openmp --enable-mpi
+make -j install
\ No newline at end of file
diff --git a/package/gcc/9.3.1/install.sh b/package/gcc/9.3.1/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c4ac9f88adb8a095ec8536fa73fa38d4757c8132
--- /dev/null
+++ b/package/gcc/9.3.1/install.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+set -e
+cd ${JARVIS_TMP}
+tar -xzvf ${JARVIS_DOWNLOAD}/gcc-9.3.1-2021.03-aarch64-linux.tar.gz -C $1 --strip-components=1
\ No newline at end of file
diff --git a/package/gmp/6.2.0/install.sh b/package/gmp/6.2.0/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..650efaa20e84ad399ec41d42f714e89060effc8a
--- /dev/null
+++ b/package/gmp/6.2.0/install.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/gmp-6.2.0.tar.xz
+cd gmp-6.2.0
+./configure --prefix=$1
+make -j
+make install
\ No newline at end of file
diff --git a/package/gsl/2.6/install.sh b/package/gsl/2.6/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..40948323997731d10268d05df597e81f51d39599
--- /dev/null
+++ b/package/gsl/2.6/install.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/gsl-2.6.tar.gz
+cd gsl-2.6
+./configure --prefix=$1
+make -j
+make install
diff --git a/package/hmpi/1.1.0/gcc/install.sh b/package/hmpi/1.1.0/gcc/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..254a5d9e3d8a84c33970a2eb70a1e7c395265068
--- /dev/null
+++ b/package/hmpi/1.1.0/gcc/install.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/Hyper-MPI_1.1.0_aarch64_CentOS7.6_GCC9.3_MLNX-OFED4.9.tar.gz -C $1 --strip-components=1
\ No newline at end of file
diff --git a/package/hmpi/1.1.1/install.sh b/package/hmpi/1.1.1/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..0a1bd108c7b5fd9bd3a40d0fcb29e516ab4e1a0f
--- /dev/null
+++ b/package/hmpi/1.1.1/install.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+yum install -y perl-Data-Dumper autoconf automake libtool binutils
+rm -rf hmpi-1.1.1-huawei hucx-1.1.1-huawei xucg-1.1.1-huawei
+unzip ${JARVIS_DOWNLOAD}/hucx-1.1.1-huawei.zip
+unzip ${JARVIS_DOWNLOAD}/xucg-1.1.1-huawei.zip
+unzip ${JARVIS_DOWNLOAD}/hmpi-1.1.1-huawei.zip
+\cp -rf xucg-1.1.1-huawei/* hucx-1.1.1-huawei/src/ucg/
+sleep 3
+cd hucx-1.1.1-huawei
+./autogen.sh
+./contrib/configure-opt --prefix=$1/hucx CFLAGS="-DHAVE___CLEAR_CACHE=1" --disable-numa
+for file in `find . -name Makefile`;do sed -i "s/-Werror//g" $file;done
+for file in `find . -name Makefile`;do sed -i "s/-implicit-function-declaration//g" $file;done
+make -j64
+make install
+cd ../hmpi-1.1.1-huawei
+./autogen.pl
+./configure --prefix=$1 --with-platform=contrib/platform/mellanox/optimized --enable-mpi1-compatibility --with-ucx=$1/hucx
+make -j64
+make install
diff --git a/package/hmpi/FAQ.md b/package/hmpi/FAQ.md
new file mode 100644
index 0000000000000000000000000000000000000000..f4b848f92c6ccfd4cfb2ab1eb5b96871745da14b
--- /dev/null
+++ b/package/hmpi/FAQ.md
@@ -0,0 +1,7 @@
+Q:hucx/src/ucs/arch/aarch64/cpu.h:259:20:error: redefinition of 'ucs_arch_clear_cache'
+
+A:报错原因为该函数在其他地方已经被声明过了,无需重复声明, 应将src/ucs/arch/aarch64/cpu.h 中位于259–271行的函数注释或者删除掉
+
+Q: builtin.c: 969:21: error: comparison of array 'builtin_op->steps' not equal to a null pointer is always true
+
+A: builtin_op->steps不可能为空,该判断多余,直接删除即可
\ No newline at end of file
diff --git a/package/kgcc/10.3.1/install.sh b/package/kgcc/10.3.1/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..79fe13a6fed626317b220dfabdbfdec96336d7c3
--- /dev/null
+++ b/package/kgcc/10.3.1/install.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xzvf ${JARVIS_DOWNLOAD}/gcc-10.3.1-2021.09-aarch64-linux.tar.gz -C $1 --strip-components=1
\ No newline at end of file
diff --git a/package/kgcc/9.3.1/install.sh b/package/kgcc/9.3.1/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..300c534ea7bcf1fc720a4d38c18ea662bb9e4663
--- /dev/null
+++ b/package/kgcc/9.3.1/install.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xzvf ${JARVIS_DOWNLOAD}/gcc-9.3.1-2021.03-aarch64-linux.tar.gz -C $1 --strip-components=1
\ No newline at end of file
diff --git a/package/kml/1.4.0/bisheng/install.sh b/package/kml/1.4.0/bisheng/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1eb7fc187bf7bc69e20e6f1125ef43c19789965e
--- /dev/null
+++ b/package/kml/1.4.0/bisheng/install.sh
@@ -0,0 +1,52 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+rpm -e boostkit-kml
+rpm --force --nodeps -ivh ${JARVIS_ROOT}/package/kml/1.4.0/bisheng/*.rpm
+# generate full lapack
+netlib=${JARVIS_DOWNLOAD}/lapack-3.9.1.tar.gz
+klapack=/usr/local/kml/lib/libklapack.a
+kservice=/usr/local/kml/lib/libkservice.a
+echo $netlib
+echo $klapack
+
+# build netlib lapack
+rm -rf netlib
+mkdir netlib
+cd netlib
+tar zxvf $netlib
+mkdir build
+cd build
+cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_POSITION_INDEPENDENT_CODE=ON ../lapack-3.9.1
+make -j
+cd ../..
+
+cp netlib/build/lib/liblapack.a liblapack_adapt.a
+
+# get symbols defined both in klapack and netlib lapack
+nm -g liblapack_adapt.a | grep 'T ' | grep -oP '\K\w+(?=_$)' | sort | uniq > netlib.sym
+nm -g $klapack | grep 'T ' | grep -oP '\K\w+(?=_$)' | sort | uniq > klapack.sym
+comm -12 klapack.sym netlib.sym > comm.sym
+
+objcopy -W dsecnd_ -W second_ liblapack_adapt.a
+
+# add _netlib_ postfix to symbols in liblapack_adapt.a (e.g. dgetrf_netlib_)
+while read sym; do \
+ if ! nm liblapack_adapt.a | grep -qe " T ${sym}_\$"; then \
+ continue; \
+ fi; \
+ ar x liblapack_adapt.a $sym.f.o; \
+ mv $sym.f.o ${sym}_netlib.f.o; \
+ objcopy --redefine-sym ${sym}_=${sym}_netlib_ ${sym}_netlib.f.o; \
+ ar d liblapack_adapt.a ${sym}.f.o; \
+ ar ru liblapack_adapt.a ${sym}_netlib.f.o; \
+ rm ${sym}_netlib.f.o; \
+done < comm.sym
+
+# (optional) build a full lapack shared library
+clang -o libklapack_full.so -shared -fPIC -Wl,--whole-archive $klapack liblapack_adapt.a $kservice -Wl,--no-whole-archive -fopenmp -lpthread -lgfortran -lm
+
+\cp libklapack_full.so /usr/local/kml/lib/
+echo "Generated liblapack_adapt.a and libklapack_full.so"
+exit 0
\ No newline at end of file
diff --git a/package/kml/1.4.0/gcc/install.sh b/package/kml/1.4.0/gcc/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..084bbef519643fa66dd02980336af8ad07cbc617
--- /dev/null
+++ b/package/kml/1.4.0/gcc/install.sh
@@ -0,0 +1,7 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+rpm -e boostkit-kml
+rpm --force --nodeps -ivh ${JARVIS_ROOT}/package/kml/1.4.0/gcc/*.rpm
+cp -rf ${JARVIS_ROOT}/package/kml/1.4.0/gcc/libklapack_full.so /usr/local/kml/lib
\ No newline at end of file
diff --git a/package/lapack/3.8.0/install.sh b/package/lapack/3.8.0/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..dc4942aa0ec363708b714490d5897f070c314dc3
--- /dev/null
+++ b/package/lapack/3.8.0/install.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/lapack-3.8.0.tgz
+cd lapack-3.8.0
+cp make.inc.example make.inc
+make -j
+mkdir $1/lib/
+cp *.a $1/lib/
\ No newline at end of file
diff --git a/package/libint/2.6.0/install.sh b/package/libint/2.6.0/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..d48e3eb3e4f831057a288f085f6377684c774f65
--- /dev/null
+++ b/package/libint/2.6.0/install.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+export GCC_LIBS=/home/HT3/HPCRunner2/software/libs/kgcc9
+tar -xvf ${JARVIS_DOWNLOAD}/libint-2.6.0.tar.gz
+cd libint-2.6.0
+./autogen.sh
+mkdir build
+cd build
+export LDFLAGS="-L${GCC_LIBS}/gmp/6.2.0/lib -L${GCC_LIBS}/boost/1.72.0/lib"
+export CPPFLAGS="-I${GCC_LIBS}/gmp/6.2.0/include -I${GCC_LIBS}/boost/1.72.0/include"
+../configure CXX=mpicxx --enable-eri=1 --enable-eri2=1 --enable-eri3=1 --with-max-am=4 --with-eri-max-am=4,3 --with-eri2-max-am=6,5 --with-eri3-max-am=6,5 --with-opt-am=3 --enable-generic-code --disable-unrolling --with-libint-exportdir=libint_cp2k_lmax4
+make export
+tar -xvf libint_cp2k_lmax4.tgz
+cd libint_cp2k_lmax4
+./configure --prefix=$1 CC=mpicc CXX=mpicxx FC=mpifort --enable-fortran --enable-shared
+make -j 32
+make install
diff --git a/package/libvori/21.04.12/install.sh b/package/libvori/21.04.12/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..9f782329a895fc05ca5b86f2a17aedceee0ee674
--- /dev/null
+++ b/package/libvori/21.04.12/install.sh
@@ -0,0 +1,12 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xzvf ${JARVIS_DOWNLOAD}/libvori-210412.tar.gz
+cd libvori-210412
+mkdir build
+cd build
+cmake .. -DCMAKE_INSTALL_PREFIX=$1
+make -j
+make install
+
diff --git a/package/libxc/5.1.4/install.sh b/package/libxc/5.1.4/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..cc4d52340b5470b8358dd0c6a8f1c5fcd8e7b765
--- /dev/null
+++ b/package/libxc/5.1.4/install.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/libxc-5.1.4.tar.gz
+cd libxc-5.1.4
+./configure FC=gfortran CC=gcc --prefix=$1
+make -j
+make install
diff --git a/package/openblas/0.3.18/install.sh b/package/openblas/0.3.18/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..d475d9e78dd32c9d39a627f87615b6e00937e43f
--- /dev/null
+++ b/package/openblas/0.3.18/install.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xzvf ${JARVIS_DOWNLOAD}/OpenBLAS-0.3.18.tar.gz
+cd OpenBLAS-0.3.18
+make -j
+make PREFIX=$1 install
diff --git a/package/openmpi/4.1.2/gpu/install.sh b/package/openmpi/4.1.2/gpu/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..06ff675a366cc66c07b8b6bb3dc13521965d4161
--- /dev/null
+++ b/package/openmpi/4.1.2/gpu/install.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+#install ucx
+tar -xvf ${JARVIS_DOWNLOAD}/ucx-1.12.0.tar.gz
+cd ucx
+./autogen.sh
+./contrib/configure-release --prefix=$1/ucx
+make -j8
+make install
+#install openmpi
+tar -xvf ${JARVIS_DOWNLOAD}/openmpi-4.1.2.tar.gz
+cd openmpi-4.1.2
+CPP=cpp CC=nvc CFLAGS='-DNDEBUG -O1 -nomp -fPIC -fno-strict-aliasing -tp=haswell' CXX=nvc++ CXXFLAGS='-DNDEBUG -O1 -nomp -fPIC -finline-functions -tp=haswell' F77=nvfortran F90=nvfortran FC=nvfortran FCFLAGS='-O1 -nomp -fPIC -tp=haswell' FFLAGS='-fast -Mipa=fast,inline -tp=haswell' LDFLAGS=-Wl,--as-needed ./configure --prefix=$1 --disable-debug --disable-getpwuid --disable-mem-debug --disable-mem-profile --disable-memchecker --disable-static --enable-mca-no-build=btl-uct --enable-mpi1-compatibility --enable-oshmem --with-cuda=/usr/local/cuda --with-ucx=$1/ucx --enable-mca-no-build=op-avx
+make -j8
+make install
+
+export LIBRARY_PATH=$1/lib:$LIBRARY_PATH
+export PATH=$1/bin:$PATH \
+UCX_IB_PCI_RELAXED_ORDERING=on \
+UCX_MAX_RNDV_RAILS=1 \
+UCX_MEMTYPE_CACHE=n \
+UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda \
+UCX_TLS=rc_v,sm,cuda_copy,cuda_ipc,gdr_copy (or UCX_TLS=all)
diff --git a/package/openmpi/4.1.2/install.sh b/package/openmpi/4.1.2/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..6b93da002460d131b4bd8651555e802edb349b31
--- /dev/null
+++ b/package/openmpi/4.1.2/install.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/openmpi-4.1.2.tar.gz
+cd openmpi-4.1.2
+./configure CC=gcc CXX=g++ FC=gfortran --prefix=$1 --enable-pretty-print-stacktrace --enable-orterun-prefix-by-default --with-knem=/opt/knem-1.1.4.90mlnx1/ --with-hcoll=/opt/mellanox/hcoll/ --with-cma --with-ucx --enable-mpi1-compatibility
+make -j install
diff --git a/package/plumed/2.6.2/install.sh b/package/plumed/2.6.2/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..a75c28b31617c98ccd5263497882ce6078b923d7
--- /dev/null
+++ b/package/plumed/2.6.2/install.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/plumed-2.6.2.tgz
+cd plumed-2.6.2
+./configure CXX=mpicxx CC=mpicc FC=mpifort --prefix=$1 --enable-external-blas --enable-gsl --enable-external-lapack LDFLAGS=-L/home//HT3/HPCRunner2/package/lapack/3.8.0/lapack-3.8.0/ LIBS="-lrefblas –llapack"
+make -j
+make install
diff --git a/package/python3/3.7.10/install.sh b/package/python3/3.7.10/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..49f88a7350acfa2fb6f1ed98e91a77665b8070de
--- /dev/null
+++ b/package/python3/3.7.10/install.sh
@@ -0,0 +1,12 @@
+#!/bin/bash
+# https://repo.huaweicloud.com/python/3.7.10/Python-3.7.10.tgz
+set -x
+set -e
+cd ${JARVIS_TMP}
+yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make libffi-devel
+tar -zxvf ${JARVIS_DOWNLOAD}/Python-3.7.10.tgz
+cd Python-3.7.10
+./configure --prefix=${JARVIS_COMPILER}/python3
+make
+make install
+ln -s ${JARVIS_COMPILER}/python3/bin/python3.7 /usr/local/bin/python3
\ No newline at end of file
diff --git a/package/scalapack/2.1.0/install.sh b/package/scalapack/2.1.0/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e79a4709e95a28c04cf4abdbc0db79914a88ccc4
--- /dev/null
+++ b/package/scalapack/2.1.0/install.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/scalapack-2.1.0.tgz
+cd scalapack-2.1.0
+cp SLmake.inc.example SLmake.inc
+make -j
+mkdir $1/lib
+cp *.a $1/lib
diff --git a/package/scalapack/2.1.0/kml/install.sh b/package/scalapack/2.1.0/kml/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..26da61aa5d306a2a6c53101f40a0af2fd4e4c70a
--- /dev/null
+++ b/package/scalapack/2.1.0/kml/install.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+rm -rf scalapack-2.1.0
+tar -xvf ${JARVIS_DOWNLOAD}/scalapack-2.1.0.tgz
+cd scalapack-2.1.0
+rm -rf build
+mkdir build
+cd build
+cmake -DCMAKE_INSTALL_PREFIX=$1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_SHARED_LIBS=ON -DBLAS_LIBRARIES=/usr/local/kml/lib/kblas/omp/libkblas.so -DLAPACK_LIBRARIES=/usr/local/kml/lib/libklapack_full.so -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpif90 ..
+make -j
+make install
\ No newline at end of file
diff --git a/package/spglib/1.16.0/install.sh b/package/spglib/1.16.0/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..c1877ea7d4ab322b7725445cc7e1b0672fe782b4
--- /dev/null
+++ b/package/spglib/1.16.0/install.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+tar -xvf ${JARVIS_DOWNLOAD}/spglib-1.16.0.tar.gz
+cd spglib-1.16.0
+mkdir build
+cd build
+cmake .. -DCMAKE_INSTALL_PREFIX=$1
+make -j
+make install
diff --git a/package/tau/2.30.0/install.sh b/package/tau/2.30.0/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..e7803c978cc4ab8af3535a8f7b68b3479a84530a
--- /dev/null
+++ b/package/tau/2.30.0/install.sh
@@ -0,0 +1,17 @@
+#!/bin/bash
+set -x
+set -e
+cd ${JARVIS_TMP}
+# install PDT
+tar -zxvf ${JARVIS_DOWNLOAD}/pdt.tgz
+cd pdtoolkit-3.25.1/
+./configure -GNU -prefix=$1/PDT
+make -j install
+# install TAU, using tau with external package
+tar -zxvf ${JARVIS_DOWNLOAD}/tau-2.30.0.tar.gz
+cd tau-2.30.0/
+./configure -openmp -bfd=download -unwind=download -mpi -pdt=$1/PDT/ -pdt_c++=g++ -mpi
+export PATH=$1/tau-2.30.0/arm64_linux/bin:$PATH
+
+#usage: mpirun --allow-run-as-root -np 128 -x OMP_NUM_THREADS=1 --mca btl ^openib tau_exec vasp_std
+#pprof
diff --git a/software/compiler/bisheng/2.1.0/installed b/software/compiler/bisheng/2.1.0/installed
new file mode 100644
index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547
--- /dev/null
+++ b/software/compiler/bisheng/2.1.0/installed
@@ -0,0 +1 @@
+0
\ No newline at end of file
diff --git a/software/compiler/gcc/9.3.1/installed b/software/compiler/gcc/9.3.1/installed
new file mode 100644
index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547
--- /dev/null
+++ b/software/compiler/gcc/9.3.1/installed
@@ -0,0 +1 @@
+0
\ No newline at end of file
diff --git a/software/compiler/kgcc/10.3.1/installed b/software/compiler/kgcc/10.3.1/installed
new file mode 100644
index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547
--- /dev/null
+++ b/software/compiler/kgcc/10.3.1/installed
@@ -0,0 +1 @@
+0
\ No newline at end of file
diff --git a/software/compiler/kgcc/9.3.1/installed b/software/compiler/kgcc/9.3.1/installed
new file mode 100644
index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547
--- /dev/null
+++ b/software/compiler/kgcc/9.3.1/installed
@@ -0,0 +1 @@
+0
\ No newline at end of file
diff --git a/software/compiler/python3/installed b/software/compiler/python3/installed
new file mode 100644
index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547
--- /dev/null
+++ b/software/compiler/python3/installed
@@ -0,0 +1 @@
+0
\ No newline at end of file
diff --git a/software/libs/bisheng2/openblas/0.3.18/installed b/software/libs/bisheng2/openblas/0.3.18/installed
new file mode 100644
index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547
--- /dev/null
+++ b/software/libs/bisheng2/openblas/0.3.18/installed
@@ -0,0 +1 @@
+0
\ No newline at end of file
diff --git a/software/libs/gcc9/fftw/3.3.8/installed b/software/libs/gcc9/fftw/3.3.8/installed
new file mode 100644
index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547
--- /dev/null
+++ b/software/libs/gcc9/fftw/3.3.8/installed
@@ -0,0 +1 @@
+0
\ No newline at end of file
diff --git a/software/libs/nvc/installed b/software/libs/nvc/installed
new file mode 100644
index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547
--- /dev/null
+++ b/software/libs/nvc/installed
@@ -0,0 +1 @@
+0
\ No newline at end of file
diff --git a/software/moduledeps/gcc9-openmpi4/scalapack/2.1.0 b/software/moduledeps/gcc9-openmpi4/scalapack/2.1.0
new file mode 100644
index 0000000000000000000000000000000000000000..c0af1c6e6d9bab59c55af2d401a7bd2111603148
--- /dev/null
+++ b/software/moduledeps/gcc9-openmpi4/scalapack/2.1.0
@@ -0,0 +1,5 @@
+#%Module1.0#####################################################################
+set rootdir $::env(JARVIS_ROOT)
+set version 2.1.0
+
+prepend-path LD_LIBRARY_PATH $rootdir/software/libs/gcc9/openmpi4/scalapack/2.1.0
diff --git a/software/moduledeps/gcc9/openblas/0.3.18 b/software/moduledeps/gcc9/openblas/0.3.18
new file mode 100644
index 0000000000000000000000000000000000000000..509b37ca869a54aa8bf89dc996f2a7516979c336
--- /dev/null
+++ b/software/moduledeps/gcc9/openblas/0.3.18
@@ -0,0 +1,12 @@
+#%Module1.0#####################################################################
+set rootdir $::env(JARVIS_ROOT)
+set prefix $rootdir/software/libs/gcc9/openblas/0.3.18
+set version 0.3.18
+
+prepend-path PATH $prefix/bin
+prepend-path INCLUDE $prefix/include
+prepend-path LD_LIBRARY_PATH $prefix/lib
+
+setenv OPENBLAS_DIR $prefix
+setenv OPENBLAS_LIB $prefix/lib
+setenv OPENBLAS_INC $prefix/include
diff --git a/software/modulefiles/gcc9/9.3.1 b/software/modulefiles/gcc9/9.3.1
new file mode 100644
index 0000000000000000000000000000000000000000..2a2a3f888552d5a362768e9400e1b6126da73b99
--- /dev/null
+++ b/software/modulefiles/gcc9/9.3.1
@@ -0,0 +1,10 @@
+#%Module1.0#####################################################################
+set rootdir $::env(JARVIS_ROOT)
+set prefix $rootdir/software/compiler/gcc/9.3.1
+set version 9.3.1
+
+prepend-path PATH $prefix/bin
+prepend-path MANPATH $prefix/share/man
+prepend-path INCLUDE $prefix/include
+prepend-path LD_LIBRARY_PATH $prefix/lib64
+prepend-path MODULEPATH $rootdir/software/moduledeps/gcc9
diff --git a/software/mpi/openmpi4-gcc9/4.1.2/installed b/software/mpi/openmpi4-gcc9/4.1.2/installed
new file mode 100644
index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547
--- /dev/null
+++ b/software/mpi/openmpi4-gcc9/4.1.2/installed
@@ -0,0 +1 @@
+0
\ No newline at end of file
diff --git a/software/utils/cmake/3.20.5/installed b/software/utils/cmake/3.20.5/installed
new file mode 100644
index 0000000000000000000000000000000000000000..c227083464fb9af8955c90d2924774ee50abb547
--- /dev/null
+++ b/software/utils/cmake/3.20.5/installed
@@ -0,0 +1 @@
+0
\ No newline at end of file
diff --git a/src/analysis.py b/src/analysis.py
new file mode 100644
index 0000000000000000000000000000000000000000..1492550b3db69d68f1cc61ebeb532a68183e256f
--- /dev/null
+++ b/src/analysis.py
@@ -0,0 +1,640 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+import platform
+import sys
+import os
+import re
+from glob import glob
+
+from data import Data
+from tool import Tool
+from execute import Execute
+from machine import Machine
+from bench import Benchmark
+
+from enum import Enum
+
+class SType(Enum):
+ COMPILER = 1
+ MPI = 2
+ UTIL = 3
+ LIB = 4
+
+class Install:
+ def __init__(self):
+ self.hpc_data = Data()
+ self.exe = Execute()
+ self.tool = Tool()
+ self.ROOT = os.getcwd()
+ self.PACKAGE_PATH = os.path.join(self.ROOT, 'package')
+ self.COMPILER_PATH = os.path.join(self.ROOT, 'software/compiler')
+ self.LIBS_PATH = os.path.join(self.ROOT, 'software/libs')
+ self.MODULE_DEPS_PATH = os.path.join(self.ROOT, 'software/moduledeps')
+ self.MODULE_FILES = os.path.join(self.ROOT, 'software/modulefiles')
+ self.MPI_PATH = os.path.join(self.ROOT, 'software/mpi')
+ self.UTILS_PATH = os.path.join(self.ROOT, 'software/utils')
+
+ def get_version_info(self, info):
+ return re.search( r'(\d+)\.(\d+)\.',info).group(1)
+
+ # some command don't generate output, must redirect to a tmp file
+ def get_cmd_output(self, cmd):
+ tmp_path = os.path.join(self.ROOT, 'tmp')
+ tmp_file = os.path.join(tmp_path, 'tmp.txt')
+ self.tool.mkdirs(tmp_path)
+ cmd += f' &> {tmp_file}'
+ self.exe.exec_popen(cmd, False)
+ info_list = self.tool.read_file(tmp_file).split('\n')
+ return info_list
+
+ def get_gcc_info(self):
+ gcc_info_list = self.get_cmd_output('gcc -v')
+ gcc_info = gcc_info_list[-1].strip()
+ version = self.get_version_info(gcc_info)
+ name = 'gcc'
+ if 'kunpeng' in gcc_info.lower():
+ name = 'kgcc'
+ return {"cname": name, "cmversion": version}
+
+ def get_clang_info(self):
+ clang_info_list = self.get_cmd_output('clang -v')
+ clang_info = clang_info_list[0].strip()
+ version = self.get_version_info(clang_info)
+ name = 'clang'
+ if 'bisheng' in clang_info.lower():
+ name = 'bisheng'
+ return {"cname": name, "cmversion": version}
+
+ def get_nvc_info(self):
+ return {"cname": "cuda", "cmversion": "11"}
+
+ def get_icc_info(self):
+ return {"cname": "icc", "cmversion": "11"}
+
+ def get_mpi_info(self):
+ mpi_info_list = self.get_cmd_output('mpirun -version')
+ mpi_info = mpi_info_list[0].strip()
+ name = 'openmpi'
+ version = self.get_version_info(mpi_info)
+ hmpi_info = self.get_cmd_output('ompi_info | grep "MCA coll: ucx"')[0]
+ if hmpi_info != "":
+ name = 'hmpi'
+ version = re.search( r'Component v(\d+)\.(\d+)\.',hmpi_info).group(1)
+ return {"name": name, "version": version}
+
+ def check_software_path(self, software_path):
+ abs_software_path = os.path.join(self.PACKAGE_PATH, software_path)
+ if not os.path.exists(abs_software_path):
+ print(f"{software_path} not exist, Are you sure the software lies in package dir?")
+ return False
+ return abs_software_path
+
+ def check_compiler_mpi(self, compiler_list, compiler_mpi_info):
+ no_compiler = ["COM","ANY"]
+ is_valid = False
+ compiler_mpi_info = compiler_mpi_info.upper()
+ valid_list = []
+ for compiler in compiler_list:
+ valid_list.append(compiler)
+ valid_list.append(f'{compiler}+MPI')
+ valid_list += no_compiler
+ for valid_para in valid_list:
+ if compiler_mpi_info == valid_para:
+ is_valid = True
+ break
+ if not is_valid:
+ print(f"compiler or mpi info error, Only {valid_list.join('/').lower()} is supported")
+ return False
+ return compiler_mpi_info
+
+ def get_used_compiler(self, compiler_mpi_info):
+ return compiler_mpi_info.split('+')[0]
+
+ def get_software_type(self,software_name, compiler_mpi_info):
+ if self.is_mpi_software(software_name):
+ return SType.MPI
+ if compiler_mpi_info == "COM":
+ return SType.COMPILER
+ elif compiler_mpi_info == "ANY":
+ return SType.UTIL
+ else:
+ return SType.LIB
+
+ def get_suffix(self, software_info_list):
+ if len(software_info_list) == 3:
+ return software_info_list[2]
+ return ""
+
+ def get_software_info(self, software_path, compiler_mpi_info):
+ software_info_list = software_path.split('/')
+ software_name = software_info_list[0]
+ software_version = software_info_list[1]
+ software_main_version = self.get_main_version(software_version)
+ software_type = self.get_software_type(software_name, compiler_mpi_info)
+ software_info = {
+ "sname":software_name,
+ "sversion": software_version,
+ "mversion": software_main_version,
+ "type" : software_type,
+ "suffix": self.get_suffix(software_info_list)
+ }
+ if software_type == SType.LIB or software_type == SType.MPI:
+ software_info["is_use_mpi"] = self.is_contained_mpi(compiler_mpi_info)
+ software_info["use_compiler"] = self.get_used_compiler(compiler_mpi_info)
+ return software_info
+
+ def get_compiler_info(self, compilers, compiler_mpi_info):
+ compiler_info = {"cname":None, "cmversion": None}
+ for compiler, info_func in compilers.items():
+ if compiler in compiler_mpi_info:
+ compiler_info = info_func()
+ return compiler_info
+
+ def get_main_version(self, version):
+ return version.split('.')[0]
+
+ def is_mpi_software(self, software_name):
+ mpis = ['hmpi', 'openmpi', 'hpcx']
+ return software_name in mpis
+
+ def add_mpi_path(self, software_info, install_path):
+ if not software_info['is_use_mpi']:
+ return install_path
+ mpi_info = self.get_mpi_info()
+ if mpi_info["version"] == None:
+ print("MPI not found!")
+ return False
+ mpi_str = mpi_info["name"]+mpi_info["version"]
+ print("Use MPI: "+mpi_str)
+ install_path = os.path.join(install_path, mpi_str)
+ return install_path
+
+ def get_install_path(self, software_info, env_info):
+ suffix = software_info['suffix']
+ sversion = software_info['sversion']
+ stype = software_info['type']
+ cname = env_info['cname']
+ if suffix != "":
+ software_info['sname'] += '-' + suffix
+ sname = software_info['sname']
+ if stype == SType.MPI:
+ return os.path.join(self.MPI_PATH, f"{sname}{self.get_main_version(sversion)}-{cname}{env_info['cmversion']}", sversion)
+ if stype == SType.COMPILER:
+ install_path = os.path.join(self.COMPILER_PATH, f'{sname}/{sversion}')
+ elif stype == SType.UTIL:
+ install_path = os.path.join(self.UTILS_PATH, f'{sname}/{sversion}')
+ else:
+ install_path = os.path.join(self.LIBS_PATH, cname+env_info['cmversion'])
+ # get mpi name and version
+ install_path = self.add_mpi_path(software_info, install_path)
+ install_path = os.path.join(install_path, f'{sname}/{sversion}')
+ return install_path
+
+ def is_contained_mpi(self, compiler_mpi_info):
+ return "MPI" in compiler_mpi_info
+
+ def get_files(self, abs_path):
+ file_list = [d for d in glob(abs_path+'/**', recursive=True)]
+ return file_list
+
+ def get_module_file_content(self, install_path, sversion):
+ module_file_content = ''
+ file_list = self.get_files(install_path)
+ bins_dir_type = ["bin"]
+ libs_dir_type = ["libs", "lib", "lib64"]
+ incs_dir_type = ["include"]
+ bins_dir = []
+ libs_dir = []
+ incs_dir = []
+ bins_str = ''
+ libs_str = ''
+ incs_str = ''
+ for file in file_list:
+ if not os.path.isdir(file):
+ continue
+ last_dir = file.split('/')[-1]
+ if last_dir in bins_dir_type:
+ bins_dir.append(file.replace(install_path, "$prefix"))
+ elif last_dir in libs_dir_type:
+ libs_dir.append(file.replace(install_path, "$prefix"))
+ elif last_dir in incs_dir_type:
+ incs_dir.append(file.replace(install_path, "$prefix"))
+ if len(bins_dir) >= 1:
+ bins_str = "prepend-path PATH "+':'.join(bins_dir)
+ if len(libs_dir) >= 1:
+ libs_str = "prepend-path LD_LIBRARY_PATH "+':'.join(libs_dir)
+ if len(incs_dir) >= 1:
+ incs_str = "prepend-path INCLUDE " + ':'.join(incs_dir)
+ module_file_content = f'''#%Module1.0#####################################################################
+set prefix {install_path}
+set version {sversion}
+
+{bins_str}
+{libs_str}
+{incs_str}
+'''
+ return module_file_content
+
+ def get_installed_file_path(self, install_path):
+ return os.path.join(install_path, "installed")
+
+ def is_installed(self, install_path):
+ installed_file_path = self.get_installed_file_path(install_path)
+ if not os.path.exists(installed_file_path):
+ return False
+ if not self.tool.read_file(installed_file_path) == "1":
+ return False
+ return True
+
+ def set_installed_status(self, install_path):
+ installed_file_path = self.get_installed_file_path(install_path)
+ self.tool.write_file(installed_file_path, "1")
+
+ def gen_module_file(self, install_path, software_info, env_info):
+ sname = software_info['sname']
+ sversion = software_info['sversion']
+ stype = software_info['type']
+ cname = env_info['cname']
+ cmversion = env_info['cmversion']
+ software_str = sname + self.get_main_version(sversion)
+ module_file_content = self.get_module_file_content(install_path, sversion)
+ if not self.is_installed(install_path):
+ return
+ if stype == SType.MPI:
+ compiler_str = cname + cmversion
+ module_path = os.path.join(self.MODULE_DEPS_PATH, compiler_str ,software_str)
+ attach_module_path = os.path.join(self.MODULE_DEPS_PATH, compiler_str+'-'+software_str)
+ self.tool.mkdirs(attach_module_path)
+ module_file_content += f"\nprepend-path MODULEPATH {attach_module_path}"
+ else:
+ if stype == SType.COMPILER:
+ module_path = os.path.join(self.MODULE_FILES, software_str)
+ attach_module_path = os.path.join(self.MODULE_DEPS_PATH, software_str)
+ self.tool.mkdirs(attach_module_path)
+ module_file_content += f"\nprepend-path MODULEPATH {attach_module_path}"
+ elif stype == SType.UTIL:
+ module_path = os.path.join(self.MODULE_FILES, sname)
+ else:
+ compiler_str = cname + cmversion
+ if software_info['is_use_mpi']:
+ mpi_info = self.get_mpi_info()
+ mpi_str = mpi_info['name'] + self.get_main_version(mpi_info['version'])
+ module_path = os.path.join(self.MODULE_DEPS_PATH, f"{compiler_str}-{mpi_str}" ,sname)
+ else:
+ module_path = os.path.join(self.MODULE_DEPS_PATH, compiler_str, sname)
+ self.tool.mkdirs(module_path)
+ module_file = os.path.join(module_path, sversion)
+ self.tool.write_file(module_file, module_file_content)
+ print(f"module file {module_file} successfully generated")
+
+ def install_package(self, abs_software_path, install_path):
+ install_script = 'install.sh'
+ install_script_path = os.path.join(abs_software_path, install_script)
+ print("start installing..."+ abs_software_path)
+ if not os.path.exists(install_script_path):
+ print("install script not exists, skipping...")
+ return
+ self.tool.mkdirs(install_path)
+ if self.is_installed(install_path):
+ print("already installed, skipping...")
+ return
+ install_cmd = f'''
+source ./init.sh
+cd {abs_software_path}
+chmod +x {install_script}
+./{install_script} {install_path}
+'''
+ result = self.exe.exec_raw(install_cmd)
+ if result:
+ print(f"install to {install_path} successful")
+ self.set_installed_status(install_path)
+ else:
+ print("install failed")
+ sys.exit()
+
+ def install(self, software_path, compiler_mpi_info):
+ self.tool.prt_content("INSTALL " + software_path)
+ compilers = {"GCC":self.get_gcc_info, "CLANG":self.get_clang_info,
+ "NVC":self.get_nvc_info, "ICC":self.get_icc_info,
+ "BISHENG":self.get_clang_info}
+
+ # software_path should exists
+ abs_software_path = self.check_software_path(software_path)
+ if not abs_software_path: return
+ compiler_mpi_info = self.check_compiler_mpi(compilers.keys(), compiler_mpi_info)
+ if not compiler_mpi_info: return
+ software_info = self.get_software_info(software_path, compiler_mpi_info)
+ stype = software_info['type']
+ # get compiler name and version
+ env_info = self.get_compiler_info(compilers, compiler_mpi_info)
+ if stype == SType.LIB or stype == SType.MPI:
+ cmversion = env_info['cmversion']
+ if cmversion == None:
+ print(f"The specified {software_info['use_compiler']} Compiler not found!")
+ return False
+ else:
+ print(f"Use Compiler: {env_info['cname']} {cmversion}")
+
+ # get install path
+ install_path = self.get_install_path(software_info, env_info)
+ if not install_path: return
+ # get install script
+ self.install_package(abs_software_path, install_path)
+ # gen module file
+ self.gen_module_file( install_path, software_info, env_info)
+
+ def install_depend(self):
+ depend_file = 'depend_install.sh'
+ print(f"start installing dependendcy of {Data.app_name}")
+ depend_content = f'''
+{Data.dependency}
+'''
+ self.tool.write_file(depend_file, depend_content)
+ run_cmd = f'''
+chmod +x {depend_file}
+./{depend_file}
+'''
+ self.exe.exec_raw(run_cmd)
+
+class Env:
+ def __init__(self):
+ self.hpc_data = Data()
+ self.tool = Tool()
+ self.ROOT = os.getcwd()
+ self.exe = Execute()
+
+ def env(self):
+ print(f"set environment {Data.app_name}")
+ env_file = os.path.join(self.ROOT, Data.env_file)
+ self.tool.write_file(env_file, Data.module_content)
+ print(f"ENV FILE {Data.env_file} GENERATED.")
+ self.exe.exec_raw(f'chmod +x {Data.env_file}')
+
+class Build:
+ def __init__(self):
+ self.hpc_data = Data()
+ self.exe = Execute()
+
+ def clean(self):
+ print(f"start clean {Data.app_name}")
+ clean_cmd=self.hpc_data.get_clean_cmd()
+ self.exe.exec_raw(clean_cmd)
+
+ def build(self):
+ print(f"start build {Data.app_name}")
+ build_cmd = self.hpc_data.get_build_cmd()
+ self.exe.exec_raw(build_cmd)
+
+class Run:
+ def __init__(self):
+ self.hpc_data = Data()
+ self.exe = Execute()
+ self.tool = Tool()
+ self.ROOT = os.getcwd()
+ self.avail_ips_list = self.tool.gen_list(Data.avail_ips)
+
+ def gen_hostfile(self, nodes):
+ length = len(self.avail_ips_list)
+ if nodes > length:
+ print(f"You don't have {nodes} nodes, only {length} nodes available!")
+ sys.exit()
+ if nodes <= 1:
+ return
+ gen_nodes = '\n'.join(self.avail_ips_list[:nodes])
+ print(f"HOSTFILE\n{gen_nodes}\nGENERATED.")
+ self.tool.write_file('hostfile', gen_nodes)
+
+ # single run
+ def run(self):
+ print(f"start run {Data.app_name}")
+ nodes = int(Data.run_cmd['nodes'])
+ self.gen_hostfile(nodes)
+ run_cmd = self.hpc_data.get_run_cmd()
+ self.exe.exec_raw(run_cmd)
+
+ def batch_run(self):
+ batch_file = 'batch_run.sh'
+ batch_file_path = os.path.join(self.ROOT, batch_file)
+ print(f"start batch run {Data.app_name}")
+ batch_content = f'''
+cd {Data.case_dir}
+{Data.batch_cmd}
+'''
+ self.tool.write_file(batch_file_path, batch_content)
+ run_cmd = f'''
+chmod +x {batch_file}
+./{batch_file}
+'''
+ self.exe.exec_raw(run_cmd)
+
+class Perf:
+ def __init__(self):
+ self.hpc_data = Data()
+ self.exe = Execute()
+ self.tool = Tool()
+ self.isARM = platform.machine() == 'aarch64'
+
+ def get_pid(self):
+ #get pid
+ pid_cmd = f'pidof {Data.binary_file}'
+ result = self.exe.exec_popen(pid_cmd)
+ if len(result) == 0:
+ print("failed to get pid.")
+ sys.exit()
+ else:
+ pid_list = result[0].split(' ')
+ mid = int(len(pid_list)/2)
+ return pid_list[mid].strip()
+
+ def perf(self):
+ print(f"start perf {Data.app_name}")
+ #get pid
+ pid = self.get_pid()
+ #start perf && analysis
+ perf_cmd = f'''
+perf record {Data.perf_para} -a -g -p {pid}
+perf report -i ./perf.data -F period,sample,overhead,symbol,dso,comm -s overhead --percent-limit 0.1% --stdio
+'''
+ self.exe.exec_raw(perf_cmd)
+
+ def get_arch(self):
+ arch = 'arm'
+ if not self.isARM:
+ arch = 'X86'
+ return arch
+
+ def get_cur_time(self):
+ return re.sub(' |:', '-', self.tool.get_time_stamp())
+
+ def gpu_perf(self):
+ print(f"start gpu perf")
+ run_cmd = self.hpc_data.get_run()
+ gperf_cmd = f'''
+cd {Data.case_dir}
+nsys profile -y 5s -d 100s {Data.nsys_para} -o nsys-{self.get_arch()}-{self.get_cur_time()} {run_cmd}
+ '''
+ self.exe.exec_raw(gperf_cmd)
+
+ def ncu_perf(self, kernel):
+ print(f"start ncu perf")
+ run_cmd = self.hpc_data.get_run()
+ ncu_cmd = f'''
+ cd {Data.case_dir}
+ ncu --export ncu-{self.get_arch()}-{self.get_cur_time()} {Data.ncu_para} --import-source=yes --set full --kernel-name {kernel} --launch-skip 1735 --launch-count 1 {run_cmd}
+ '''
+ self.exe.exec_raw(ncu_cmd)
+
+class Download:
+ def __init__(self):
+ self.hpc_data = Data()
+ self.exe = Execute()
+ self.tool = Tool()
+ self.ROOT = os.getcwd()
+ self.download_list = self.tool.gen_list(Data.download_info)
+ self.download_path = os.path.join(self.ROOT, 'downloads')
+
+ def check_network(self):
+ print(f"start network checking")
+ network_test_cmd='''
+wget --spider -T 5 -q -t 2 www.baidu.com | echo $?
+curl -s -o /dev/null www.baidu.com | echo $?
+ '''
+ self.exe.exec_raw(network_test_cmd)
+
+ def change_yum_repo(self):
+ print(f"start yum repo change")
+ repo_cmd = '''
+cp ./templates/yum/*.repo /etc/yum.repos.d/
+yum clean all
+yum makecache
+'''
+ self.exe.exec_raw(repo_cmd)
+
+ def gen_wget_url(self, out_dir='./downloads', url=''):
+ head = "wget --no-check-certificate"
+ out_para = "-P"
+ download_url = f'{head} {out_para} {out_dir} {url}'
+ return download_url
+
+ def download(self):
+ print(f"start download")
+ url_links = []
+ self.tool.mkdirs(self.download_path)
+ download_flag = False
+ # create directory
+ for url_info in self.download_list:
+ url_list = url_info.split(' ')
+ if len(url_list) != 2:
+ continue
+ software_info = url_list[0].strip()
+ url_link = url_list[1].strip()
+ url_links.append(url_link)
+ # create software directory
+ software_path = os.path.join(self.ROOT, 'package', software_info)
+ self.tool.mkdirs(software_path)
+ # create install script
+ install_script = os.path.join(software_path, "install.sh")
+ self.tool.mkfile(install_script)
+ # start download
+ for url in url_links:
+ download_flag = True
+ filename = os.path.basename(url)
+ file_path = os.path.join(self.download_path, filename)
+ if os.path.exists(file_path):
+ self.tool.prt_content(f"FILE {filename} already DOWNLOADED")
+ continue
+ download_url = self.gen_wget_url(self.download_path, url)
+ self.tool.prt_content("DOWNLOAD " + filename)
+ os.popen(download_url)
+ if not download_flag:
+ print("The download list is empty!")
+class Test:
+ def __init__(self):
+ self.exe = Execute()
+ self.ROOT = os.getcwd()
+ self.test_dir = os.path.join(self.ROOT, 'test')
+
+ def test(self):
+ run_cmd = f'''
+cd {self.test_dir}
+./test-qe.sh
+cd {self.test_dir}
+./test-util.sh
+'''
+ self.exe.exec_raw(run_cmd)
+
+class Config:
+ def __init__(self):
+ self.exe = Execute()
+ self.tool = Tool()
+ self.ROOT = os.getcwd()
+
+ def switch_config(self, config_file):
+ print(f"Switch config file to {config_file}")
+ meta_path = os.path.join(self.ROOT, Data.meta_file)
+ self.tool.write_file(meta_path, config_file.strip())
+ print("Successfully switched.")
+
+class Analysis:
+ def __init__(self):
+ self.jmachine = Machine()
+ self.jtest = Test()
+ self.jdownload = Download()
+ self.jbenchmark = Benchmark()
+ self.jperf = Perf()
+ self.jrun = Run()
+ self.jbuild = Build()
+ self.jenv = Env()
+ self.jinstall = Install()
+ self.jconfig = Config()
+
+ def get_machine_info(self):
+ self.jmachine.output_machine_info()
+
+ def bench(self, bench_case):
+ self.jbenchmark.output_bench_info(bench_case)
+
+ def switch_config(self, config_file):
+ self.jconfig.switch_config(config_file)
+
+ def test(self):
+ self.jtest.test()
+
+ def download(self):
+ self.jdownload.download()
+
+ def check_network(self):
+ self.jdownload.check_network()
+
+ def gpu_perf(self):
+ self.jperf.gpu_perf()
+
+ def ncu_perf(self, kernel):
+ self.jperf.ncu_perf(kernel)
+
+ def perf(self):
+ self.jperf.perf()
+
+ def kperf(self):
+ self.jperf.kperf()
+
+ def run(self):
+ self.jrun.run()
+
+ def batch_run(self):
+ self.jrun.batch_run()
+
+ def clean(self):
+ self.jbuild.clean()
+
+ def build(self):
+ self.jbuild.build()
+
+ def env(self):
+ self.jenv.env()
+
+ def install(self,software_path, compiler_mpi_info):
+ self.jinstall.install(software_path, compiler_mpi_info)
+
+ def install_deps(self):
+ self.jinstall.install_depend()
diff --git a/src/bench.py b/src/bench.py
new file mode 100644
index 0000000000000000000000000000000000000000..96f55d70c9ac7b4912c28041dd66b919b57b132e
--- /dev/null
+++ b/src/bench.py
@@ -0,0 +1,26 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+import platform
+import os
+from glob import glob
+
+from execute import Execute
+
+class Benchmark:
+ def __init__(self):
+ self.isARM = platform.machine() == 'aarch64'
+ self.ROOT = os.getcwd()
+ self.exe = Execute()
+ self.RUN_FILE = 'run.sh'
+ self.ALL = 'all'
+
+ def output_bench_info(self, bench_case):
+ bench_path = os.path.join(self.ROOT, 'benchmark')
+ file_list = [d for d in glob(bench_path+'/**', recursive=False)]
+ for file in file_list:
+ cur_bench_case = os.path.basename(file)
+ run_file = os.path.join(file, self.RUN_FILE)
+ if os.path.isdir(file) and os.path.exists(run_file):
+ cmd = f"cd {file} && chmod +x {self.RUN_FILE} && ./{self.RUN_FILE}"
+ if cur_bench_case == self.ALL or cur_bench_case == bench_case:
+ self.exe.exec_raw(cmd)
diff --git a/data.py b/src/data.py
similarity index 75%
rename from data.py
rename to src/data.py
index 116348ff86a07c15e82a8a2998f614e750f72538..31bb4d2bd2b3a73e997613057648e35031339528 100644
--- a/data.py
+++ b/src/data.py
@@ -3,10 +3,13 @@
import os
import platform
+from tool import Tool
+
class Data:
# Hardware Info
avail_ips=''
# Dependent Software environment Info
+ dependency = ''
module_content=''
env_file = 'env.sh'
# Application Info
@@ -23,33 +26,36 @@ class Data:
batch_cmd = ''
#Other Info
meta_file = '.meta'
- download_urls = '''
-https://www.cp2k.org/static/downloads/libxc-5.1.4.tar.gz
-https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz
-'''
-
+ root_path = os.getcwd()
+ download_info = ''
+ #perf info
+ kperf_para = ''
+ perf_para = ''
+ nsys_para = ''
+ ncu_para = ''
+ def get_abspath(self, relpath):
+ return os.path.join(Data.root_path, relpath)
+
def __init__(self):
self.isARM = platform.machine() == 'aarch64'
+ self.tool = Tool()
self.data_process()
def get_file_name(self):
file_name = 'data.config'
if not os.path.exists(Data.meta_file):
- if not self.isARM:
- file_name = 'data.X86.config'
return file_name
- with open(Data.meta_file, encoding='utf-8') as file_obj:
- contents = file_obj.read()
- return contents.strip()
+ return self.tool.read_file(Data.meta_file)
def get_data_config(self):
file_name = self.get_file_name()
- with open(file_name, encoding='utf-8') as file_obj:
+ file_path = self.get_abspath(file_name)
+ with open(file_path, encoding='utf-8') as file_obj:
contents = file_obj.read()
return contents.strip()
- def is_empty(self, content):
- return len(content) == 0 or content.isspace() or content == '\n'
+ def is_empty(self, str):
+ return len(str) == 0 or str.isspace() or str == '\n'
def read_rows(self, rows, start_row):
data = ''
@@ -81,6 +87,12 @@ https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz
Data.build_dir = data['build_dir']
Data.binary_dir = data['binary_dir']
Data.case_dir = data['case_dir']
+
+ def set_perf_info(self, data):
+ Data.kperf_para = data['kperf']
+ Data.perf_para = data['perf']
+ Data.nsys_para = data['nsys']
+ Data.ncu_para = data['ncu']
def split_two_part(self, data):
split_list = data.split(' ', 1)
@@ -95,10 +107,15 @@ https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz
rows = contents.split('\n')
rowIndex = 0
data = {}
+ perf_data = {}
while rowIndex < len(rows):
row = rows[rowIndex].strip()
if row == '[SERVER]':
rowIndex, Data.avail_ips = self.read_rows(rows, rowIndex+1)
+ elif row == '[DOWNLOAD]':
+ rowIndex, Data.download_info = self.read_rows(rows, rowIndex+1)
+ elif row == '[DEPENDENCY]':
+ rowIndex, Data.dependency = self.read_rows(rows, rowIndex+1)
elif row == '[ENV]':
rowIndex, Data.module_content = self.read_rows(rows, rowIndex+1)
elif row == '[APP]':
@@ -112,6 +129,9 @@ https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz
rowIndex, Data.run_cmd = self.read_rows_kv(rows, rowIndex+1)
elif row == '[BATCH]':
rowIndex, Data.batch_cmd = self.read_rows(rows, rowIndex+1)
+ elif row == '[PERF]':
+ rowIndex, perf_data = self.read_rows_kv(rows, rowIndex+1)
+ self.set_perf_info(perf_data)
else:
rowIndex += 1
Data.binary_file, Data.binary_para = self.split_two_part(Data.run_cmd['binary'])
@@ -121,9 +141,14 @@ https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz
cd {Data.build_dir}
{Data.clean_cmd}
'''
+ def get_env(self):
+ return f'''
+./jarvis -e
+source ./{Data.env_file}'''
def get_build_cmd(self):
return f'''
+{self.get_env()}
cd {Data.build_dir}
{Data.build_cmd}
'''
@@ -141,6 +166,7 @@ cd {Data.build_dir}
def get_run_cmd(self):
return f'''
+{self.get_env()}
cd {Data.case_dir}
{self.get_run()}
'''
\ No newline at end of file
diff --git a/src/execute.py b/src/execute.py
new file mode 100644
index 0000000000000000000000000000000000000000..19e6b50f283d2bfb08a61d6fc1946e8d5782162b
--- /dev/null
+++ b/src/execute.py
@@ -0,0 +1,62 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+import os
+import logging
+from asyncio.log import logger
+from datetime import datetime
+from tool import Tool
+
+LOG_FORMAT = "%(asctime)s - %(levelname)s - %(message)s"
+DATE_FORMAT = "%m/%d/%Y %H:%M:%S %p"
+logging.basicConfig(filename='runner.log', level=logging.DEBUG, format=LOG_FORMAT, datefmt=DATE_FORMAT)
+
+class Execute:
+ def __init__(self):
+ self.cur_time = ''
+ self.end_time = ''
+ self.tool = Tool()
+ self.flags = '*' * 80
+ self.end_flag = 'END: '
+
+ # tools function
+ def join_cmd(self, arrs):
+ return " && ".join(arrs)
+
+ def print_cmd(self, cmd):
+ print(self.flags)
+ self.cur_time = self.tool.get_time_stamp()
+ print(f"RUNNING at {self.cur_time}:\n{cmd}")
+ logging.info(cmd)
+ print(self.flags)
+
+ # Execute, get output and don't know whether success or not
+ def exec_popen(self, cmd, isPrint=True):
+ if isPrint:
+ self.print_cmd(cmd)
+ output = os.popen(cmd).readlines()
+ return output
+
+ def get_duration(self):
+ time_1_struct = datetime.strptime(self.cur_time, "%Y-%m-%d %H:%M:%S")
+ time_2_struct = datetime.strptime(self.end_time, "%Y-%m-%d %H:%M:%S")
+ seconds = (time_2_struct - time_1_struct).seconds
+ return seconds
+
+ # Execute, get whether success or not
+ def exec_list(self, cmds):
+ cmd = self.join_cmd(cmds)
+ if not cmd.startswith('echo'):
+ self.print_cmd(cmd)
+ state = os.system(cmd)
+ self.end_time = self.tool.get_time_stamp()
+ print(f"total time used: {self.get_duration()}s")
+ logger.info(self.end_flag + cmd)
+ if state:
+ print(f"failed at {self.end_time}:{state}".upper())
+ return False
+ else:
+ print(f"successfully executed at {self.end_time}, congradulations!!!".upper())
+ return True
+
+ def exec_raw(self, rows):
+ return self.exec_list(self.tool.gen_list(rows))
\ No newline at end of file
diff --git a/src/jarvis.py b/src/jarvis.py
new file mode 100644
index 0000000000000000000000000000000000000000..5b03d64df3d1a3d54541cec285c74609fe490fbe
--- /dev/null
+++ b/src/jarvis.py
@@ -0,0 +1,105 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+import argparse
+
+from data import Data
+from analysis import Analysis
+
+class Jarvis:
+ def __init__(self):
+ self.analysis = Analysis()
+ # Argparser set
+ parser = argparse.ArgumentParser(description=f'please put me into CASE directory, used for {Data.app_name} Compiler/Clean/Run/Compare',
+ usage='%(prog)s [-h] [--build] [--clean] [...]')
+ parser.add_argument("-v","--version", help=f"get version info", action="store_true")
+ parser.add_argument("-use","--use", help="Switch config file...", nargs=1)
+ parser.add_argument("-i","--info", help=f"get machine info", action="store_true")
+ #accept software_name/version GCC/GCC+MPI/CLANG/CLANG+MPI
+ parser.add_argument("-install","--install", help=f"install dependency", nargs=2)
+ # dependency install
+ parser.add_argument("-dp","--depend", help=f"{Data.app_name} dependency install", action="store_true")
+ parser.add_argument("-e","--env", help=f"set environment {Data.app_name}", action="store_true")
+ parser.add_argument("-b","--build", help=f"compile {Data.app_name}", action="store_true")
+ parser.add_argument("-cls","--clean", help=f"clean {Data.app_name}", action="store_true")
+ parser.add_argument("-r","--run", help=f"run {Data.app_name}", action="store_true")
+ parser.add_argument("-p","--perf", help=f"auto perf {Data.app_name}", action="store_true")
+ parser.add_argument("-kp","--kperf", help=f"auto kperf {Data.app_name}", action="store_true")
+ # GPU perf
+ parser.add_argument("-gp","--gpuperf", help="GPU perf...", action="store_true")
+
+ # NCU perf
+ parser.add_argument("-ncu","--ncuperf", help="NCU perf...", nargs=1)
+ parser.add_argument("-c","--compare", help=f"compare {Data.app_name}", nargs=2)
+ # batch run
+ parser.add_argument("-rb","--rbatch", help=f"run batch {Data.app_name}", action="store_true")
+ # batch download
+ parser.add_argument("-d","--download", help="Batch Download...", action="store_true")
+ parser.add_argument("-net","--network", help="network checking...", action="store_true")
+ #change yum repo to aliyun
+ parser.add_argument("-yum","--yum", help="yum repo changing...", action="store_true")
+ # start benchmark test
+ parser.add_argument("-bench","--benchmark", help="start benchmark test...", nargs=1)
+ # start test
+ parser.add_argument("-t","--test", help="start Jarvis test...", action="store_true")
+ self.args = parser.parse_args()
+
+ def main(self):
+ if self.args.version:
+ print("V1.0")
+
+ if self.args.info:
+ self.analysis.get_machine_info()
+
+ if self.args.install:
+ self.analysis.install(self.args.install[0], self.args.install[1])
+
+ if self.args.env:
+ self.analysis.env()
+
+ if self.args.clean:
+ self.analysis.clean()
+
+ if self.args.build:
+ self.analysis.build()
+
+ if self.args.run:
+ self.analysis.run()
+
+ if self.args.perf:
+ self.analysis.perf()
+
+ if self.args.kperf:
+ self.analysis.kperf()
+
+ if self.args.depend:
+ self.analysis.install_deps()
+
+ if self.args.rbatch:
+ self.analysis.batch_run()
+
+ if self.args.download:
+ self.analysis.download()
+
+ if self.args.gpuperf:
+ self.analysis.gpu_perf()
+
+ if self.args.ncuperf:
+ self.analysis.ncu_perf(self.args.ncuperf[0])
+
+ if self.args.use:
+ self.analysis.switch_config(self.args.use[0])
+
+ if self.args.network:
+ self.analysis.check_network()
+
+ if self.args.yum:
+ self.analysis.change_yum_repo()
+
+ if self.args.benchmark:
+ self.analysis.bench(self.args.benchmark[0])
+
+ if self.args.test:
+ self.analysis.test()
+
+if __name__ == '__main__':
+ Jarvis().main()
diff --git a/src/machine.py b/src/machine.py
new file mode 100644
index 0000000000000000000000000000000000000000..7e40db494628a8e420859baa1ce883f3dc805c18
--- /dev/null
+++ b/src/machine.py
@@ -0,0 +1,27 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+from execute import Execute
+from tool import Tool
+
+class Machine:
+ def __init__(self):
+ self.exe = Execute()
+ self.tool = Tool()
+ self.info2cmd = {
+ 'CHECK network adapter':'nmcli d',
+ 'CHECK Machine Bits':'getconf LONG_BIT',
+ 'CHECK OS':'cat /proc/version && uname -a',
+ 'CHECK GPU': 'lspci | grep -i nvidia',
+ 'CHECK Total Memory':'cat /proc/meminfo | grep MemTotal',
+ 'CHECK Total Disk Memory':'fdisk -l | grep Disk',
+ 'CHECK CPU info': 'cat /proc/cpuinfo | grep "processor" | wc -l && lscpu && dmidecode -t 4'
+ }
+
+ def get_info(self, content, cmd):
+ self.tool.prt_content(content)
+ self.exe.exec_raw(cmd)
+
+ def output_machine_info(self):
+ print("get machine info")
+ for key, value in self.info2cmd.items():
+ self.get_info(key, value)
diff --git a/src/tool.py b/src/tool.py
new file mode 100644
index 0000000000000000000000000000000000000000..7cd3b62251641058efc57e3732ec8ff07144bed0
--- /dev/null
+++ b/src/tool.py
@@ -0,0 +1,36 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+import time
+import os
+
+class Tool:
+ def __init__(self):
+ pass
+
+ def prt_content(self, content):
+ flags = '*' * 30
+ print(f"{flags}{content}{flags}")
+
+ def gen_list(self, data):
+ return data.strip().split('\n')
+
+ def get_time_stamp(self):
+ return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
+
+ def read_file(self, filename):
+ content = ''
+ with open(filename, encoding='utf-8') as f:
+ content = f.read().strip()
+ return content
+
+ def write_file(self, filename, content=""):
+ with open(filename,'w') as f:
+ f.write(content)
+
+ def mkdirs(self, path):
+ if not os.path.exists(path):
+ os.makedirs(path)
+
+ def mkfile(self, path, content=''):
+ if not os.path.exists(path):
+ self.write_file(path, content)
diff --git a/templates/data.CP2K.X86.config b/templates/CP2K/8.2/data.CP2K.X86.cpu.config
similarity index 100%
rename from templates/data.CP2K.X86.config
rename to templates/CP2K/8.2/data.CP2K.X86.cpu.config
diff --git a/templates/CP2K/8.2/data.CP2K.arm.cpu.config b/templates/CP2K/8.2/data.CP2K.arm.cpu.config
new file mode 100644
index 0000000000000000000000000000000000000000..66c49b6a3a1472f2d74f65c963d5dd3dfa1eab43
--- /dev/null
+++ b/templates/CP2K/8.2/data.CP2K.arm.cpu.config
@@ -0,0 +1,66 @@
+[SERVER]
+11.11.11.11
+
+[ENV]
+source /home/kpgcc-ompi.env
+export LIBRARY_PATH=/home/cp2k/EXTRA/gsl/lib:$LIBRARY_PATH
+export LD_LIBRARY_PATH=/home/cp2k/EXTRA/gsl/lib:$LD_LIBRARY_PATH
+export CPATH=/usr/local/cuda/include:$CPATH
+
+[APP]
+app_name = CP2K
+build_dir = /home/cp2k/CP2K/cp2k-8.2/
+binary_dir = /home/cp2k/CP2K/cp2k-8.2/exe/local-cpu/
+case_dir = /home/cp2k/CP2K/cp2k-8.2/benchmarks/QS/
+
+[BUILD]
+make -j 128 ARCH=local-cpu VERSION=psmp
+
+[CLEAN]
+make -j 128 ARCH=local-cpu VERSION=psmp clean
+
+[RUN]
+run = numactl -C 0-63 mpirun --allow-run-as-root -np 64 -map-by ppr:64:node:pe=1 -bind-to core -x OMP_NUM_THREADS=1
+binary = cp2k.psmp H2O-256.inp
+nodes = 1
+
+[BATCH]
+#!/bin/bash
+
+logfile=cp2k.H2O-256.inp.log
+
+nvidia-smi -pm 1
+nvidia-smi -ac 1215,1410
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 32C*GPU===" >> $logfile
+mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 32C*2GPU===" >> $logfile
+mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 64C*GPU===" >> $logfile
+mpirun -np 64 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 64C*2GPU===" >> $logfile
+mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 128C*GPU===" >> $logfile
+mpirun -np 128 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 128C*2GPU===" >> $logfile
+mpirun -np 128 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+
+
+
+
+
+
diff --git a/templates/CP2K/8.2/data.CP2K.arm.gpu.config b/templates/CP2K/8.2/data.CP2K.arm.gpu.config
new file mode 100644
index 0000000000000000000000000000000000000000..2012254a25a02c42d0f5972fed052c8eeccff1fe
--- /dev/null
+++ b/templates/CP2K/8.2/data.CP2K.arm.gpu.config
@@ -0,0 +1,98 @@
+[SERVER]
+11.11.11.11
+
+[DOWNLOAD]
+libint/2.6.0 https://github.com/evaleev/libint/archive/v2.6.0.tar.gz
+libXC/5.1.4 https://www.cp2k.org/static/downloads/libxc-5.1.4.tar.gz
+fftw/3.3.8 https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz
+lapack/3.8.0 https://www.cp2k.org/static/downloads/lapack-3.8.0.tgz
+scalapack/2.1.0 https://www.cp2k.org/static/downloads/scalapack-2.1.0.tgz
+cmake/3.16.4 https://cmake.org/files/v3.16/cmake-3.16.4.tar.gz
+
+[DEPENDENCY]
+./jarvis -install kgcc/9.3.1 com
+module purge
+module use ./software/modulefiles
+module load kgcc9/9.3.1
+export CC=`which gcc`
+export CXX=`which g++`
+export FC=`which gfortran`
+./jarvis -install openmpi/4.1.2 gcc
+module load openmpi4/4.1.2
+./jarvis -install gmp/6.2.0 gcc
+./jarvis -install boost/1.72.0 gcc
+./jarvis -install libint/2.6.0 gcc+mpi
+./jarvis -install fftw/3.3.8 gcc+mpi
+./jarvis -install openblas/0.3.18 gcc
+module load openblas/0.3.18
+./jarvis -install scalapack/2.1.0 gcc+mpi
+./jarvis -install spglib/1.16.0 gcc
+./jarvis -install libxc/5.1.4 gcc
+./jarvis -install gsl/2.6 gcc
+module load gsl/2.6
+./jarvis -install plumed/2.6.2 gcc+mpi
+./jarvis -install libvori/21.04.12 gcc
+
+[ENV]
+module purge
+module load kgcc9/9.3.1
+module load openmpi4/4.1.2
+module load gsl/2.6
+
+[APP]
+app_name = CP2K
+build_dir = /home/HT3/HPCRunner2/cp2k-8.2/
+binary_dir = /home/HT3/HPCRunner2/cp2k-8.2/exe/local-cuda/
+case_dir = /home/HT3/HPCRunner2/cp2k-8.2/benchmarks/QS/
+
+[BUILD]
+make -j 128 ARCH=local-cuda VERSION=psmp
+
+[CLEAN]
+make -j 128 ARCH=local-cuda VERSION=psmp clean
+
+[RUN]
+run = numactl -C 0-63 mpirun --allow-run-as-root -x CUDA_VISIBLE_DEVICES=0,1 -np 64 -x OMP_NUM_THREADS=1
+binary = cp2k.psmp H2O-256.inp
+nodes = 1
+
+[BATCH]
+#!/bin/bash
+
+logfile=cp2k.H2O-256.inp.log
+
+nvidia-smi -pm 1
+nvidia-smi -ac 1215,1410
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 32C*GPU===" >> $logfile
+mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 32C*2GPU===" >> $logfile
+mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 64C*GPU===" >> $logfile
+mpirun -np 64 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 64C*2GPU===" >> $logfile
+mpirun -np 32 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 128C*GPU===" >> $logfile
+mpirun -np 128 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+echo 3 > /proc/sys/vm/drop_caches
+echo "===run 128C*2GPU===" >> $logfile
+mpirun -np 128 -genv OMP_NUM_THREADS=1 -genv CUDA_VISIBLE_DEVICES=0,1 exe/local-cuda/cp2k.psmp benchmarks/QS/H2O-256.inp > cp2k.H2O-256.inp.log >> $logfile 2>&1
+
+
+
+
+
+
+
diff --git a/templates/data.amber.config b/templates/amber/20/data.amber.arm.gpu.config
similarity index 100%
rename from templates/data.amber.config
rename to templates/amber/20/data.amber.arm.gpu.config
diff --git a/templates/data.openfoam.config b/templates/openfoam/1960/data.openfoam.arm.cpu.config
similarity index 100%
rename from templates/data.openfoam.config
rename to templates/openfoam/1960/data.openfoam.arm.cpu.config
diff --git a/templates/openfoam/1960/data.openfoam.arm.cpu.opt.config b/templates/openfoam/1960/data.openfoam.arm.cpu.opt.config
new file mode 100644
index 0000000000000000000000000000000000000000..25abc6f8b8218dcdc6c328106aafbc4a3a062d70
--- /dev/null
+++ b/templates/openfoam/1960/data.openfoam.arm.cpu.opt.config
@@ -0,0 +1,34 @@
+[SERVER]
+11.11.11.11
+
+[DEPENDENCY]
+./jarvis -install bisheng/2.1.0 com
+module use ./software/modulefiles
+module load bisheng2
+./jarvis -install hmpi/1.1.1 clang
+module load hmpi1/1.1.1
+
+[ENV]
+# add gcc/mpi
+source /home/Jarvis3-4/HPCRunner/opt-OpenFOAM/opt_codes/OpenFOAM-v1906/etc/bashrc
+module use ./software/modulefiles
+module load bisheng2
+module load hmpi1/1.1.1
+
+[APP]
+app_name = OpenFOAM
+build_dir = /home/Jarvis3-4/HPCRunner/opt-OpenFOAM/opt_codes/OpenFOAM-v1906/
+binary_dir =
+case_dir = /home/Jarvis3-4/HPCRunner/case/openfoam/audi/
+
+[BUILD]
+source /home/Jarvis3-4/HPCRunner/opt-OpenFOAM/opt_codes/OpenFOAM-v1906/etc/bashrc
+./Allwmake -j 64
+
+[CLEAN]
+rm -rf build
+
+[RUN]
+run = mpirun --allow-run-as-root -x PATH -x LD_LIBRARY_PATH -x WM_PROJECT_DIR -x WM_PROJECT_USER_DIR -np 128
+binary = pisoFoam –parallel 2
+nodes = 1
diff --git a/templates/qe/6.4/data.qe.test.config b/templates/qe/6.4/data.qe.test.config
new file mode 100644
index 0000000000000000000000000000000000000000..59254e0e8b21c6888b0d364119129e0c3721e7a5
--- /dev/null
+++ b/templates/qe/6.4/data.qe.test.config
@@ -0,0 +1,40 @@
+[SERVER]
+11.11.11.11
+
+[DEPENDENCY]
+./jarvis -install kgcc/9.3.1 com
+module purge
+module use ./software/modulefiles
+module load kgcc9/9.3.1
+export CC=`which gcc`
+export CXX=`which g++`
+export FC=`which gfortran`
+./jarvis -install openmpi/4.1.2/ gcc
+module load openmpi4/4.1.2
+#test if mpi is normal
+./jarvis -bench mpi
+
+[ENV]
+module purge
+module use ./software/modulefiles
+module load kgcc9
+module load openmpi4/4.1.2
+
+[APP]
+app_name = QE
+build_dir = /tmp/q-e-qe-6.4.1/
+binary_dir = /tmp/q-e-qe-6.4.1/bin/
+case_dir = /tmp/qe-test
+
+[BUILD]
+./configure F90=gfortran F77=gfortran MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no --enable-openmp
+make -j 96 pwall
+make install
+
+[CLEAN]
+make clean
+
+[RUN]
+run = mpirun --allow-run-as-root -x OMP_NUM_THREADS=1 -mca coll ^hcoll -mca btl ^vader,tcp,openib,uct -np 128
+binary = pw.x -input test_3.in
+nodes = 1
\ No newline at end of file
diff --git a/templates/qe/6.4/data.qe.test.opt.config b/templates/qe/6.4/data.qe.test.opt.config
new file mode 100644
index 0000000000000000000000000000000000000000..4b6d44762ffb73fffe94ead7fdb2d2ebd675a534
--- /dev/null
+++ b/templates/qe/6.4/data.qe.test.opt.config
@@ -0,0 +1,46 @@
+[SERVER]
+11.11.11.11
+
+[DEPENDENCY]
+./jarvis -install bisheng/2.1.0 com
+module purge
+module use ./software/modulefiles
+module load bisheng2/2.1.0
+export CC=`which clang`
+export CXX=`which clang++`
+export FC=`which flang`
+./jarvis -install hmpi/1.1.1 bisheng
+module load hmpi1/1.1.1
+./jarvis -bench mpi
+./jarvis -install kml/1.4.0/bisheng bisheng
+
+[ENV]
+source /etc/profile
+module purge
+module use ./software/modulefiles
+module load bisheng2/2.1.0
+export CC=`which clang`
+export CXX=`which clang++`
+export FC=`which flang`
+module load hmpi1/1.1.1
+export BLAS_LIBS="-L/usr/local/kml/lib/kblas/omp -lkblas"
+export LAPACK_LIBS="-L/usr/local/kml/lib/ -lklapack_full"
+
+[APP]
+app_name = QE
+build_dir = /tmp/q-e-qe-6.4.1/
+binary_dir = /tmp/q-e-qe-6.4.1/bin/
+case_dir = /tmp/qe-test/
+
+[BUILD]
+./configure F90=flang F77=flang MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no --enable-openmp
+make -j 96 pwall
+make install
+
+[CLEAN]
+make clean
+
+[RUN]
+run = mpirun --allow-run-as-root -x OMP_NUM_THREADS=1 -np 128
+binary = pw.x -input test_3.in
+nodes = 1
diff --git a/templates/qe/6.4/qe.block.opt.config b/templates/qe/6.4/qe.block.opt.config
new file mode 100644
index 0000000000000000000000000000000000000000..6eee58f4acc01efffaa31eae0e0c6ea730e492d9
--- /dev/null
+++ b/templates/qe/6.4/qe.block.opt.config
@@ -0,0 +1,56 @@
+[SERVER]
+11.11.11.11
+
+[DEPENDENCY]
+./jarvis -install bisheng/2.1.0 com
+module purge
+module use ./software/modulefiles
+module load bisheng2/2.1.0
+export CC=`which clang`
+export CXX=`which clang++`
+export FC=`which flang`
+./jarvis -install hmpi/1.1.1 bisheng
+module load hmpi1/1.1.1
+./jarvis -install cmake/3.20.5 bisheng
+module load cmake/3.20.5
+./jarvis -install kml/1.4.0/bisheng bisheng
+./jarvis -install scalapack/2.1.0/kml bisheng
+./jarvis -install fftw/3.3.10 bisheng
+module load fftw/3.3.10 scalapack/2.1.0 cmake/3.20.5
+#修改fortran_single的CMakeLists.txt,第10行,第74行,第75行
+./jarvis -install block-davidson/3.14 bisheng
+module load block-davidson/3.14
+
+[ENV]
+source /etc/profile
+module purge
+module use ./software/modulefiles
+module load bisheng2/2.1.0
+export CC=`which clang`
+export CXX=`which clang++`
+export FC=`which flang`
+module load hmpi1/1.1.1
+module load fftw/3.3.10 scalapack/2.1.0 block-davidson/3.14
+export BLAS_LIBS="-L/usr/local/kml/lib/kblas/omp -lkblas"
+export LAPACK_LIBS="-L/usr/local/kml/lib -lklapack_full"
+export SCALAPACK_LIBS="-L/home/fang/HT1/HPCRunner-master/software/libs/bisheng2/scalapack/2.1.0/lib/ -lscalapack"
+
+[APP]
+app_name = QE
+build_dir = /home/fang/HT1/HPCRunner-master/q-e-qe-6.4.1/
+binary_dir = /home/fang/HT1/HPCRunner-master/q-e-qe-6.4.1/bin
+case_dir = /home/fang/HT1/HPCRunner-master/workload/QE/GRIR443/
+
+[BUILD]
+# add tunning/QE/6.4/q-e-6.4.blockmesh.patch here
+./configure F90=flang F77=flang MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=yes --enable-openmp
+make -j 96 pw
+make install
+
+[CLEAN]
+make clean
+
+[RUN]
+run = mpirun --allow-run-as-root -x OMP_NUM_THREADS=1 -np 128
+binary = pw.x -input grir443.in
+nodes = 1
diff --git a/templates/qe/6.5/data.qe.X86.cpu.config b/templates/qe/6.5/data.qe.X86.cpu.config
new file mode 100644
index 0000000000000000000000000000000000000000..22bf2f482097b971310769a8d46204d8ff20f88b
--- /dev/null
+++ b/templates/qe/6.5/data.qe.X86.cpu.config
@@ -0,0 +1,29 @@
+[SERVER]
+11.11.11.11
+
+[ENV]
+#add oneapi(include icc/mpi)
+source /workspace/cc/env/intel2021.4/setvars.sh
+# add cmake
+module use ./modules
+module add icc/cmake
+export LAPACK_LIBS="$MKLROOT/lib/intel64/libmkl_intel_lp64.a $MKLROOT/lib/intel64/libmkl_core.a"
+export BLAS_LIBS="$MKLROOT/lib/intel64/libmkl_sequential.a $MKLROOT/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group"
+
+[APP]
+app_name = QE
+build_dir = /home/csouser/HPCRunner/q-e-qe-6.5/
+binary_dir = /home/csouser/HPCRunner/q-e-qe-6.5/bin/
+case_dir = /home/csouser/HPCRunner/qe_large/
+
+[BUILD]
+./configure F90=ifort F77=ifort MPIF90=mpiifort MPIF77=mpiifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no
+make -j 40 pwall install
+
+[CLEAN]
+make clean
+
+[RUN]
+run = mpirun -n 40
+binary = pw.x -nk 8 -input scf.in
+nodes = 1
\ No newline at end of file
diff --git a/templates/qe/6.5/data.qe.arm.cpu.config b/templates/qe/6.5/data.qe.arm.cpu.config
new file mode 100644
index 0000000000000000000000000000000000000000..918aabcbd06b381d96b1fb623b4d88e85a4cd22f
--- /dev/null
+++ b/templates/qe/6.5/data.qe.arm.cpu.config
@@ -0,0 +1,29 @@
+[SERVER]
+11.11.11.11
+
+[ENV]
+source /etc/profile
+module use /opt/modulefile/
+module load gcc-9.3.1
+module load openmpi-4.1.1
+export BLAS_LIBS="-L/usr/local/kml/lib/kblas/omp -lkblas"
+export LAPACK_LIBS="-L/usr/local/kml/lib/ -lklapack_full"
+
+[APP]
+app_name = QE
+build_dir = /home/Jarvis3-4/HPCRunner/q-e-qe-6.5/
+binary_dir = /home/Jarvis3-4/HPCRunner/q-e-qe-6.5/bin/
+case_dir = /home/Jarvis3-4/HPCRunner/workload/QE/qe-large/
+
+[BUILD]
+./configure F90=gfortran F77=gfortran MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no --enable-openmp
+make -j 96 pwall
+make install
+
+[CLEAN]
+make clean
+
+[RUN]
+run = mpirun --allow-run-as-root -mca btl ^vader,tcp,openib,uct -np 128
+binary = pw.x -nk 8 -input scf.in
+nodes = 1
\ No newline at end of file
diff --git a/templates/qe/6.5/data.qe.arm.cpu.opt.config b/templates/qe/6.5/data.qe.arm.cpu.opt.config
new file mode 100644
index 0000000000000000000000000000000000000000..bd5d524380d8fea1da70a70161c6603573aa2e95
--- /dev/null
+++ b/templates/qe/6.5/data.qe.arm.cpu.opt.config
@@ -0,0 +1,46 @@
+[SERVER]
+11.11.11.11
+
+[DEPENDENCY]
+./jarvis -install bisheng/2.1.0 com
+module purge
+module use ./software/modulefiles
+module load bisheng2/2.1.0
+export CC=`which clang`
+export CXX=`which clang++`
+export FC=`which flang`
+./jarvis -install hmpi/1.1.1 bisheng
+module load hmpi1/1.1.1
+./jarvis -install kml/1.4.0/bisheng bisheng
+
+[ENV]
+source /etc/profile
+module purge
+module use ./software/modulefiles
+module load bisheng2/2.1.0
+export CC=`which clang`
+export CXX=`which clang++`
+export FC=`which flang`
+module load hmpi1/1.1.1
+export BLAS_LIBS="-L/usr/local/kml/lib/kblas/omp -lkblas"
+export LAPACK_LIBS="-L/usr/local/kml/lib/ -lklapack_full"
+
+[APP]
+app_name = QE
+build_dir = /tmp/q-e-qe-6.5/
+binary_dir = /tmp/q-e-qe-6.5/bin/
+case_dir = /tmp/qe-test/
+
+[BUILD]
+./configure F90=flang F77=flang MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no --enable-openmp
+sed -i "s/gfortran/flang/g" make.inc
+make -j 96 pwall
+make install
+
+[CLEAN]
+make clean
+
+[RUN]
+run = mpirun --allow-run-as-root -x OMP_NUM_THREADS=1 -np 128
+binary = pw.x -input test_3.in
+nodes = 1
diff --git a/templates/qe/6.8/data.qe.arm.cpu.config b/templates/qe/6.8/data.qe.arm.cpu.config
new file mode 100644
index 0000000000000000000000000000000000000000..bbe0749e04a9b100d0969a80344fb49321f8b24d
--- /dev/null
+++ b/templates/qe/6.8/data.qe.arm.cpu.config
@@ -0,0 +1,37 @@
+[SERVER]
+11.11.11.11
+
+[DEPENDENCY]
+./jarvis -install kgcc/9.3.1 com
+module use ./software/modulefiles
+module load kgcc9
+./jarvis -install hmpi/1.1.0/gcc gcc
+module load hmpi1/1.1.0
+./jarvis -install kml/1.4.0/gcc gcc
+
+[ENV]
+source /etc/profile
+module use ./software/modulefiles
+module load kgcc9
+module load hmpi1/1.1.0
+export BLAS_LIBS="-L/usr/local/kml/lib/kblas/omp -lkblas"
+export LAPACK_LIBS="-L/usr/local/kml/lib/ -lklapack_full"
+
+[APP]
+app_name = QE
+build_dir = /tmp/q-e-qe-6.8/
+binary_dir = /tmp/q-e-qe-6.8/bin/
+case_dir = /tmp/qe-large/
+
+[BUILD]
+./configure F90=gfortran F77=gfortran MPIF90=mpifort MPIF77=mpifort CC=mpicc FCFLAGS="-O3" CFLAGS="-O3" --with-scalapack=no --enable-openmp
+make -j 96 pwall
+make install
+
+[CLEAN]
+make clean
+
+[RUN]
+run = mpirun --allow-run-as-root -mca btl ^vader,tcp,openib,uct -np 128
+binary = pw.x -nk 8 -input scf.in
+nodes = 1
\ No newline at end of file
diff --git a/templates/data.qe.gpu.config b/templates/qe/6.8/data.qe.arm.gpu.config
similarity index 97%
rename from templates/data.qe.gpu.config
rename to templates/qe/6.8/data.qe.arm.gpu.config
index 5d00bfe8af6fbecfb104583c9c344d8c8051f4e5..60b78c182f58ef19bb66d11afff73d970b41f476 100644
--- a/templates/data.qe.gpu.config
+++ b/templates/qe/6.8/data.qe.arm.gpu.config
@@ -21,7 +21,7 @@ module load nvhpc/21.9
app_name = QE
build_dir = /home/HPCRunner-master/q-e-qe-6.8/
binary_dir = /home/HPCRunner-master/q-e-qe-6.8/bin/
-case_dir = /home/HPCRunner-master/jiancong/
+case_dir = /home/HPCRunner-master/qe-large/
[BUILD]
./configure --with-cuda=yes --with-cuda-runtime=11.4 --with-cuda-cc=80 --enable-openmp --with-scalapack=no
diff --git a/templates/data.vasp.config b/templates/vasp/5.4.4/data.vasp.arm.cpu.config
similarity index 100%
rename from templates/data.vasp.config
rename to templates/vasp/5.4.4/data.vasp.arm.cpu.config
diff --git a/templates/data.vasp6.1.gpu.x86.config b/templates/vasp/6.1.0/data.vasp.x86.gpu.config
similarity index 100%
rename from templates/data.vasp6.1.gpu.x86.config
rename to templates/vasp/6.1.0/data.vasp.x86.gpu.config
diff --git a/templates/yum/aliyun-Centos-7.repo b/templates/yum/aliyun-Centos-7.repo
new file mode 100644
index 0000000000000000000000000000000000000000..df18245ddb57fed48bf1dee61c24d0159d054312
--- /dev/null
+++ b/templates/yum/aliyun-Centos-7.repo
@@ -0,0 +1,62 @@
+# CentOS-Base.repo
+#
+# The mirror system uses the connecting IP address of the client and the
+# update status of each mirror to pick mirrors that are updated to and
+# geographically close to the client. You should use this for CentOS updates
+# unless you are manually picking other mirrors.
+#
+# If the mirrorlist= does not work for you, as a fall back you can try the
+# remarked out baseurl= line instead.
+#
+#
+
+[base]
+name=CentOS-$releasever - Base - mirrors.aliyun.com
+failovermethod=priority
+baseurl=http://mirrors.aliyun.com/centos/$releasever/os/$basearch/
+ http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/
+ http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/
+gpgcheck=1
+gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
+
+#released updates
+[updates]
+name=CentOS-$releasever - Updates - mirrors.aliyun.com
+failovermethod=priority
+baseurl=http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/
+ http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/
+ http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/
+gpgcheck=1
+gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
+
+#additional packages that may be useful
+[extras]
+name=CentOS-$releasever - Extras - mirrors.aliyun.com
+failovermethod=priority
+baseurl=http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/
+ http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/
+ http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/
+gpgcheck=1
+gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
+
+#additional packages that extend functionality of existing packages
+[centosplus]
+name=CentOS-$releasever - Plus - mirrors.aliyun.com
+failovermethod=priority
+baseurl=http://mirrors.aliyun.com/centos/$releasever/centosplus/$basearch/
+ http://mirrors.aliyuncs.com/centos/$releasever/centosplus/$basearch/
+ http://mirrors.cloud.aliyuncs.com/centos/$releasever/centosplus/$basearch/
+gpgcheck=1
+enabled=0
+gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
+
+#contrib - packages by Centos Users
+[contrib]
+name=CentOS-$releasever - Contrib - mirrors.aliyun.com
+failovermethod=priority
+baseurl=http://mirrors.aliyun.com/centos/$releasever/contrib/$basearch/
+ http://mirrors.aliyuncs.com/centos/$releasever/contrib/$basearch/
+ http://mirrors.cloud.aliyuncs.com/centos/$releasever/contrib/$basearch/
+gpgcheck=1
+enabled=0
+gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
diff --git a/templates/yum/hw-Centos-7.repo b/templates/yum/hw-Centos-7.repo
new file mode 100644
index 0000000000000000000000000000000000000000..4e43bbc6094d09ab211147221db6756ab854c370
--- /dev/null
+++ b/templates/yum/hw-Centos-7.repo
@@ -0,0 +1,43 @@
+# CentOS-Base.repo
+#
+# The mirror system uses the connecting IP address of the client and the
+# update status of each mirror to pick mirrors that are updated to and
+# geographically close to the client. You should use this for CentOS updates
+# unless you are manually picking other mirrors.
+#
+# If the mirrorlist= does not work for you, as a fall back you can try the
+# remarked out baseurl= line instead.
+#
+#
+
+[base]
+name=CentOS-$releasever - Base
+#mirrorlist=http://mirrors.tools.huawei.com/?release=$releasever&arch=$basearch&repo=os
+baseurl=http://mirrors.tools.huawei.com/centos/$releasever/os/$basearch/
+gpgcheck=1
+gpgkey=http://mirrors.tools.huawei.com/centos/RPM-GPG-KEY-CentOS-7
+
+#released updates
+[updates]
+name=CentOS-$releasever - Updates
+# mirrorlist=http://mirrors.tools.huawei.com/?release=$releasever&arch=$basearch&repo=updates
+baseurl=http://mirrors.tools.huawei.com/centos/$releasever/updates/$basearch/
+gpgcheck=1
+gpgkey=http://mirrors.tools.huawei.com/centos/RPM-GPG-KEY-CentOS-7
+
+#additional packages that may be useful
+[extras]
+name=CentOS-$releasever - Extras
+# mirrorlist=http://mirrors.tools.huawei.com/?release=$releasever&arch=$basearch&repo=extras
+baseurl=http://mirrors.tools.huawei.com/centos/$releasever/extras/$basearch/
+gpgcheck=1
+gpgkey=http://mirrors.tools.huawei.com/centos/RPM-GPG-KEY-CentOS-7
+
+#additional packages that extend functionality of existing packages
+[centosplus]
+name=CentOS-$releasever - Plus
+# mirrorlist=http://mirrors.tools.huawei.com/?release=$releasever&arch=$basearch&repo=centosplus
+baseurl=http://mirrors.tools.huawei.com/centos/$releasever/centosplus/$basearch/
+gpgcheck=1
+enabled=0
+gpgkey=http://mirrors.tools.huawei.com/centos/RPM-GPG-KEY-CentOS-7
\ No newline at end of file
diff --git a/templates/yum/kylin_aarch64.repo b/templates/yum/kylin_aarch64.repo
new file mode 100644
index 0000000000000000000000000000000000000000..e298fcb2586baa607ad7b4121618c69a11e76180
--- /dev/null
+++ b/templates/yum/kylin_aarch64.repo
@@ -0,0 +1,22 @@
+###Kylin Linux Advanced Server 10 - os repo###
+
+[ks10-adv-os]
+name = Kylin Linux Advanced Server 10 - Os
+baseurl = http://update.cs2c.com.cn:8080/NS/V10/V10SP2/os/adv/lic/base/$basearch/
+gpgcheck = 1
+gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-kylin
+enabled = 1
+
+[ks10-adv-updates]
+name = Kylin Linux Advanced Server 10 - Updates
+baseurl = http://update.cs2c.com.cn:8080/NS/V10/V10SP2/os/adv/lic/updates/$basearch/
+gpgcheck = 1
+gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-kylin
+enabled = 1
+
+[ks10-adv-addons]
+name = Kylin Linux Advanced Server 10 - Addons
+baseurl = http://update.cs2c.com.cn:8080/NS/V10/V10SP2/os/adv/lic/addons/$basearch/
+gpgcheck = 1
+gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-kylin
+enabled = 0
diff --git a/test/test-qe-opt.sh b/test/test-qe-opt.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1ae031bc47bcb8f0eda191e6c95d71044da3fe8c
--- /dev/null
+++ b/test/test-qe-opt.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# back to root
+cd ..
+# release qe src code
+rm -rf /tmp/q-e-qe-6.4.1
+tar xzvf ./downloads/q-e-qe-6.4.1.tar.gz -C /tmp/
+# copy workload
+cp -rf ./workload/QE/qe-test /tmp
+# copy templates
+cp -rf ./templates/qe/6.4/data.qe.test.opt.config ./
+# switch to config
+./jarvis -use data.qe.test.opt.config
+# install dependency
+./jarvis -dp
+# generate environment
+./jarvis -e
+# environment setup
+source env.sh
+# build
+./jarvis -b
+# run
+./jarvis -r
+# perf
+./jarvis -p
+# kperf
+./jarvis -kp
+# gpu nsysperf
+./jarvis -gp
\ No newline at end of file
diff --git a/test/test-qe.sh b/test/test-qe.sh
new file mode 100644
index 0000000000000000000000000000000000000000..0248590ef415193ab7fed8e01f0dd500ee7252c4
--- /dev/null
+++ b/test/test-qe.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+# back to root
+cd ..
+# release qe src code
+tar xzvf ./downloads/q-e-qe-6.4.1.tar.gz -C /tmp/
+# copy workload
+cp -rf ./workload/QE/qe-test /tmp
+# copy templates
+cp -rf ./templates/qe/6.4/data.qe.test.config ./
+# switch to config
+./jarvis -use data.qe.test.config
+# install dependency
+./jarvis -dp
+# generate environment
+./jarvis -e
+# environment setup
+source env.sh
+# build
+./jarvis -b
+# run
+./jarvis -r
+# perf
+./jarvis -p
+# kperf
+./jarvis -kp
+# gpu nsysperf
+./jarvis -gp
\ No newline at end of file
diff --git a/test/test-util.sh b/test/test-util.sh
new file mode 100644
index 0000000000000000000000000000000000000000..210a8fb5e941adb78d36bbfeee6f91593667357f
--- /dev/null
+++ b/test/test-util.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+cd ..
+# check machine info
+./jarvis -i
+# gpu nsysperf
+./jarvis -gp
+# benchmark
+./jarvis -bench all
\ No newline at end of file
diff --git a/workloads/ReadMe.md b/workloads/ReadMe.md
new file mode 100644
index 0000000000000000000000000000000000000000..5f19fe69f65129bc1d1c90d9002ebfc99f97ab3f
--- /dev/null
+++ b/workloads/ReadMe.md
@@ -0,0 +1 @@
+存放常用的HPC应用小规模算例:通常小于1MB
\ No newline at end of file