diff --git a/README.md b/README.md index 8db7efcfb3ebd73bb325ce8d2d8ed8dc754492e0..d2dc0a499778e5994dbe581c36ff38a789fe9340 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,16 @@ # HPCRunner : 贾维斯智能助手 -## ***给每个HPC应用一个温暖的家*** +## ***愿景:在任意机器的任意目录部署最优化HPC应用*** ### 项目背景 -因为HPC应用的复杂性,其依赖安装、环境配置、编译、运行、CPU/GPU性能采集分析的门槛比较高,导致迁移和调优的工作量大,不同的人在不同的机器上跑同样的应用和算例基本上是重头开始,费时费力,而且很多情况下需要同时部署ARM/X86两套环境进行验证,增加了很多的重复性工作,无法聚焦软件算法优化。 +​ HPC被喻为是IT行业“金字塔上的明珠”,其部署、编译、运行、性能采集分析的门槛非常高,不同的机器上部署HPC应用耗费大量精力,而且很多情况下需要同时部署ARM/X86两套环境进行验证,增加了很多的重复性工作,无法聚焦核心算法优化。 + +![贾维斯](./images/jarvis.png) ### 项目特色 -- 支持鲲鹏/X86,一键下载依赖,一键安装依赖、采用业界权威依赖目录结构管理海量依赖,自动生成module file -- 根据HPC配置一键生成环境脚本、一键编译、一键运行、一键性能采集、一键Benchmark. +- 支持ARM/X86,一键部署,采用业界权威依赖目录结构管理海量依赖,自动生成module file +- 根据HPC配置实现一键编译运行、一键CPU/GPU性能采集、一键Benchmark. - 所有配置仅用一个文件记录,HPC应用部署到不同的机器仅需修改配置文件. - 日志管理系统自动记录HPC应用部署过程中的所有信息. - 软件本身无需编译开箱即用,仅依赖Python环境. @@ -68,7 +70,7 @@ source ./init.sh | 配置项 | 说明 | 示例 | | :----------: | :----------------------------------------------------------- | :----------------------------------------------------------- | | [SERVER] | 服务器节点列表,多节点时用于自动生成hostfile,每行一个节点 | 11.11.11.11 | -| [DOWNLOAD] | 每行一个软件的版本和下载链接,默认下载到downloads目录(可设置别名) | cmake/3.16.4 https://cmake.org/files/v3.16/cmake-3.16.4.tar.gz 别名 | +| [DOWNLOAD] | 每行一个软件的版本和下载链接,默认下载到downloads目录(可设置别名) | cp2k/8.2 https://xxx cp2k.8.2.tar.gz | | [DEPENDENCY] | HPC应用依赖安装脚本 | ./jarvis -install gcc/9.3.1 com
module use ./software/modulefiles
module load gcc9 | | [ENV] | HPC应用编译运行环境配置 | source env.sh | | [APP] | HPC应用信息,包括应用名、构建路径、二进制路径、算例路径 | app_name = CP2K
build_dir = /home/cp2k-8.2/
binary_dir = /home/CP2K/cp2k-8.2/bin/
case_dir = /home/CP2K/cp2k-8.2/benchmarks/QS/ | @@ -78,7 +80,7 @@ source ./init.sh | [BATCH] | HPC应用批量运行命令 | #!/bin/bash
nvidia-smi -pm 1
nvidia-smi -ac 1215,1410 | | [PERF] | 性能工具额外参数 | perf= -o
nsys=
ncu=--target-processes all --launch-skip 71434 --launch-count 1 | -3.一键下载依赖(仅针对无需鉴权的链接,否则需要自行下载到downloads目录) +3.一键下载HPC应用(仅针对无需鉴权的链接,否则需要自行下载到downloads目录) ``` ./jarvis -d @@ -87,7 +89,7 @@ source ./init.sh 4.安装单个依赖 ``` -./jarvis -install [name/version/other] [option] +./jarvis -install [package/][name/version/other] [option] ``` option支持列表如下所示 @@ -113,6 +115,7 @@ eg: ``` ./jarvis -install bisheng/2.1.0 com #安装毕晟编译器 +./jarvis -install package/bisheng/2.1.0 com #安装毕晟编译器 ./jarvis -install fftw/3.3.8 gcc+mpi #使用当前gcc和mpi编译fftw 3.3.8版本 ./jarvis -install openmpi/4.1.2 gcc #使用当前gcc编译openmpi 4.1.2版本 ``` @@ -123,31 +126,31 @@ eg: ./jarvis -remove openblas/0.3.18 ``` -6.一键安装所有依赖 +6.一键下载并安装所有依赖(会读取配置文件中的[DEPENDENCY]字段内容并按顺序执行) ``` ./jarvis -dp ``` -7.一键生成环境变量(脱离贾维斯运行才需要执行) +7.一键生成环境变量(会读取配置文件中的[ENV]字段内容并生成env.sh脚本执行,默认自动生成) ``` ./jarvis -e && source ./env.sh ``` -8.一键编译 +8.一键编译(会读取配置文件中的[BUILD]字段内容并生成build.sh脚本执行) ``` ./jarvis -b ``` -9.一键运行 +9.一键运行(会读取配置文件中的[RUN]字段内容并生成run.sh脚本执行) ``` ./jarvis -r ``` -10.一键性能采集(perf) +10.一键性能采集(会读取配置文件中的[PERF]字段内容的perf值) ``` ./jarvis -p @@ -180,19 +183,21 @@ eg: ./jarvis -use XXX.config ``` -15.其它功能查看(网络检测) +15.根据当前配置生成Singularity容器定义文件 ``` -./jarvis -h +./jarvis -container docker-hub-address ``` -16.根据当前配置生成Singularity容器定义文件 +16.其它功能查看(网络检测等) ``` -./jarvis -container docker-hub-address +./jarvis -h ``` +### 路标 +![RoadMap](./images/roadmap.png) ### 欢迎贡献 @@ -210,8 +215,10 @@ eg: 请添加openEuler HPC SIG微信群了解更多HPC迁移调优知识 -![微信群](./wechat-group-qr.png) +![微信群](./images/wechat-group-qr.png) ### 技术文章 -揭开HPC应用的神秘面纱:https://zhuanlan.zhihu.com/p/489828346 \ No newline at end of file +揭开HPC应用的神秘面纱:https://zhuanlan.zhihu.com/p/489828346 + +我和容器有个约会:https://zhuanlan.zhihu.com/p/489828346 \ No newline at end of file diff --git a/images/jarvis.png b/images/jarvis.png new file mode 100644 index 0000000000000000000000000000000000000000..1889eec3ae9f5f81fa30d5943067746eb8db27bb Binary files /dev/null and b/images/jarvis.png differ diff --git a/images/roadmap.png b/images/roadmap.png new file mode 100644 index 0000000000000000000000000000000000000000..8081d3ac0a3fdf3ca14eb0996771ead30d711463 Binary files /dev/null and b/images/roadmap.png differ diff --git a/images/wechat-group-qr.png b/images/wechat-group-qr.png new file mode 100644 index 0000000000000000000000000000000000000000..a7bab6c908197e6d26634b9fd316b03390d1123c Binary files /dev/null and b/images/wechat-group-qr.png differ diff --git a/package/bisheng/1.3.3/install.sh b/package/bisheng/1.3.3/install.sh index 118c96b8b082dbd679d71042ae02df4bccd876ea..ba7c3d3527f638f67b8895a687209c56fafea097 100644 --- a/package/bisheng/1.3.3/install.sh +++ b/package/bisheng/1.3.3/install.sh @@ -1,5 +1,5 @@ #!/bin/bash -#download from https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-2.1.0-aarch64-linux.tar.gz set -e +. ${DOWNLOAD_TOOL} -u https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-1.3.3-aarch64-linux.tar.gz cd ${JARVIS_TMP} tar xzvf ${JARVIS_DOWNLOAD}/bisheng-compiler-1.3.3-aarch64-linux.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/bisheng/2.1.0/install.sh b/package/bisheng/2.1.0/install.sh index 717c1e1931552d3b44b27886383823ea757884d4..9bf1856bd0960f15d92f6d45d2fbd2e8d320190c 100644 --- a/package/bisheng/2.1.0/install.sh +++ b/package/bisheng/2.1.0/install.sh @@ -1,6 +1,7 @@ #download from https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-2.1.0-aarch64-linux.tar.gz #!/bin/bash set -e +. ${DOWNLOAD_TOOL} -u https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-2.1.0-aarch64-linux.tar.gz cd ${JARVIS_TMP} yum -y install libatomic libstdc++ libstdc++-devel tar xzvf ${JARVIS_DOWNLOAD}/bisheng-compiler-2.1.0-aarch64-linux.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/cmake/3.20.5/install.sh b/package/cmake/3.20.5/install.sh deleted file mode 100644 index fb01ef8d0b6c2904f4b859ffa3d6bb0a719d6add..0000000000000000000000000000000000000000 --- a/package/cmake/3.20.5/install.sh +++ /dev/null @@ -1,4 +0,0 @@ -#!/bin/bash -set -e -cd ${JARVIS_TMP} -tar -xvf ${JARVIS_DOWNLOAD}/cmake-3.20.5-linux-aarch64.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/cmake/3.23.1/install.sh b/package/cmake/3.23.1/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..48f77a2c6f6f46445be4481583e141dd9d59d3cb --- /dev/null +++ b/package/cmake/3.23.1/install.sh @@ -0,0 +1,5 @@ +#!/bin/bash +set -e +. ${DOWNLOAD_TOOL} -u https://github.com/Kitware/CMake/releases/download/v3.23.1/cmake-3.23.1-linux-aarch64.tar.gz +cd ${JARVIS_TMP} +tar -xvf ${JARVIS_DOWNLOAD}/cmake-3.23.1-linux-aarch64.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/hmpi/1.1.0/gcc/install.sh b/package/hmpi/1.1.0/gcc/install.sh deleted file mode 100644 index 254a5d9e3d8a84c33970a2eb70a1e7c395265068..0000000000000000000000000000000000000000 --- a/package/hmpi/1.1.0/gcc/install.sh +++ /dev/null @@ -1,4 +0,0 @@ -#!/bin/bash -set -e -cd ${JARVIS_TMP} -tar -xvf ${JARVIS_DOWNLOAD}/Hyper-MPI_1.1.0_aarch64_CentOS7.6_GCC9.3_MLNX-OFED4.9.tar.gz -C $1 --strip-components=1 \ No newline at end of file diff --git a/package/hmpi/1.1.1/install.sh b/package/hmpi/1.1.1/install.sh index 0a1bd108c7b5fd9bd3a40d0fcb29e516ab4e1a0f..235fe842fbd75482eafe4e956ffbef861747e72d 100644 --- a/package/hmpi/1.1.1/install.sh +++ b/package/hmpi/1.1.1/install.sh @@ -1,6 +1,9 @@ #!/bin/bash set -x set -e +. ${DOWNLOAD_TOOL} -u https://github.com/kunpengcompute/hucx/archive/refs/tags/v1.1.1-huawei.zip -f hucx-1.1.1-huawei.zip +. ${DOWNLOAD_TOOL} -u https://github.com/kunpengcompute/xucg/archive/refs/tags/v1.1.1-huawei.zip -f xucg-1.1.1-huawei.zip +. ${DOWNLOAD_TOOL} -u https://github.com/kunpengcompute/hmpi/archive/refs/tags/v1.1.1-huawei.zip -f hmpi-1.1.1-huawei.zip cd ${JARVIS_TMP} yum install -y perl-Data-Dumper autoconf automake libtool binutils rm -rf hmpi-1.1.1-huawei hucx-1.1.1-huawei xucg-1.1.1-huawei diff --git a/package/kml/1.4.0/bisheng/install.sh b/package/kml/1.4.0/bisheng/install.sh index 129c8eaf1b3ba04aa344007dc4278786e1364f2d..94154c678cfb6e250396af7f8261b15af745024d 100644 --- a/package/kml/1.4.0/bisheng/install.sh +++ b/package/kml/1.4.0/bisheng/install.sh @@ -1,11 +1,13 @@ #!/bin/bash set -x set -e +. ${DOWNLOAD_TOOL} -u https://kunpeng-repo.obs.cn-north-4.myhuaweicloud.com/Kunpeng%20BoostKit/Kunpeng%20BoostKit%2021.0.1/BoostKit-kml_1.4.0_bisheng.zip cd ${JARVIS_TMP} if [ -d /usr/local/kml ];then rpm -e boostkit-kml fi -rpm --force --nodeps -ivh ${JARVIS_ROOT}/package/kml/1.4.0/bisheng/*.rpm +unzip -o ${JARVIS_DOWNLOAD}/BoostKit-kml_1.4.0_bisheng.zip +rpm --force --nodeps -ivh boostkit-kml-1.4.0-1.aarch64.rpm # generate full lapack netlib=${JARVIS_DOWNLOAD}/lapack-3.9.1.tar.gz klapack=/usr/local/kml/lib/libklapack.a diff --git a/package/kml/1.4.0/gcc/install.sh b/package/kml/1.4.0/gcc/install.sh index 2f80fe7cf1a7c0cde96355b16757fd44937c9ead..5e0b92d2ef15bd975100bf079ba51d6500799cab 100644 --- a/package/kml/1.4.0/gcc/install.sh +++ b/package/kml/1.4.0/gcc/install.sh @@ -1,11 +1,13 @@ #!/bin/bash set -x set -e +. ${DOWNLOAD_TOOL} -u https://kunpeng-repo.obs.cn-north-4.myhuaweicloud.com/Kunpeng%20BoostKit/Kunpeng%20BoostKit%2021.0.1/BoostKit-kml_1.4.0.zip -f BoostKit-kml_1.4.0-gcc.zip cd ${JARVIS_TMP} if [ -d /usr/local/kml ];then rpm -e boostkit-kml fi -rpm --force --nodeps -ivh ${JARVIS_ROOT}/package/kml/1.4.0/gcc/*.rpm +unzip -o ${JARVIS_DOWNLOAD}/BoostKit-kml_1.4.0-gcc.zip +rpm --force --nodeps -ivh boostkit-kml-1.4.0-1.aarch64.rpm # generate full lapack netlib=${JARVIS_DOWNLOAD}/lapack-3.9.1.tar.gz diff --git a/package/openblas/0.3.18/install.sh b/package/openblas/0.3.18/install.sh index d475d9e78dd32c9d39a627f87615b6e00937e43f..edc231ae5ab6c2f2ebc2b8d54ffd4e15b378e95f 100644 --- a/package/openblas/0.3.18/install.sh +++ b/package/openblas/0.3.18/install.sh @@ -1,6 +1,7 @@ #!/bin/bash set -x set -e +. ${DOWNLOAD_TOOL} -u https://github.com/xianyi/OpenBLAS/releases/download/v0.3.18/OpenBLAS-0.3.18.tar.gz cd ${JARVIS_TMP} tar -xzvf ${JARVIS_DOWNLOAD}/OpenBLAS-0.3.18.tar.gz cd OpenBLAS-0.3.18 diff --git a/package/scalapack/2.1.0/install.sh b/package/scalapack/2.1.0/install.sh index e79a4709e95a28c04cf4abdbc0db79914a88ccc4..bee6239d78c45a09cdd1b86e776176cba423cfdf 100644 --- a/package/scalapack/2.1.0/install.sh +++ b/package/scalapack/2.1.0/install.sh @@ -2,6 +2,7 @@ set -x set -e cd ${JARVIS_TMP} +. ${DOWNLOAD_TOOL} -u http://www.netlib.org/scalapack/scalapack-2.1.0.tgz tar -xvf ${JARVIS_DOWNLOAD}/scalapack-2.1.0.tgz cd scalapack-2.1.0 cp SLmake.inc.example SLmake.inc diff --git a/package/scalapack/2.1.0/kml/install.sh b/package/scalapack/2.1.0/kml/install.sh index 26da61aa5d306a2a6c53101f40a0af2fd4e4c70a..d0dff549d8b23838b6cc6959418264b9f058270f 100644 --- a/package/scalapack/2.1.0/kml/install.sh +++ b/package/scalapack/2.1.0/kml/install.sh @@ -1,6 +1,7 @@ #!/bin/bash set -x set -e +. ${DOWNLOAD_TOOL} -u http://www.netlib.org/scalapack/scalapack-2.1.0.tgz cd ${JARVIS_TMP} rm -rf scalapack-2.1.0 tar -xvf ${JARVIS_DOWNLOAD}/scalapack-2.1.0.tgz diff --git a/software/compiler/bisheng/2.1.0/installed b/software/compiler/bisheng/2.1.0/installed index c227083464fb9af8955c90d2924774ee50abb547..56a6051ca2b02b04ef92d5150c9ef600403cb1de 100644 --- a/software/compiler/bisheng/2.1.0/installed +++ b/software/compiler/bisheng/2.1.0/installed @@ -1 +1 @@ -0 \ No newline at end of file +1 \ No newline at end of file diff --git a/src/installService.py b/src/installService.py index a044b72a78983f7da5230dde923081b9d31788ef..9b50b62879f3abdd23a16862b80910be2f1024ac 100644 --- a/src/installService.py +++ b/src/installService.py @@ -33,7 +33,10 @@ class InstallService: self.UTILS_PATH = os.path.join(self.SOFTWARE_PATH, 'utils') def get_version_info(self, info): - return re.search( r'(\d+)\.(\d+)\.',info).group(1) + matched_group = re.search( r'(\d+)\.(\d+)\.',info) + if not matched_group: + return None + return matched_group.group(1) # some command don't generate output, must redirect to a tmp file def get_cmd_output(self, cmd): @@ -49,6 +52,9 @@ class InstallService: gcc_info_list = self.get_cmd_output('gcc -v') gcc_info = gcc_info_list[-1].strip() version = self.get_version_info(gcc_info) + if not version: + print("GCC not found, please install gcc first") + sys.exit() name = 'gcc' if 'kunpeng' in gcc_info.lower(): name = 'kgcc' @@ -58,6 +64,9 @@ class InstallService: clang_info_list = self.get_cmd_output('clang -v') clang_info = clang_info_list[0].strip() version = self.get_version_info(clang_info) + if not version: + print("clang not found, please install clang first") + sys.exit() name = 'clang' if 'bisheng' in clang_info.lower(): name = 'bisheng' @@ -74,6 +83,9 @@ class InstallService: mpi_info = mpi_info_list[0].strip() name = 'openmpi' version = self.get_version_info(mpi_info) + if not version: + print("MPI not found, please install MPI first.") + sys.exit() hmpi_info = self.get_cmd_output('ompi_info | grep "MCA coll: ucx"')[0] if hmpi_info != "": name = 'hmpi' diff --git a/templates/CP2K/8.2/data.CP2K.arm.gpu.config b/templates/CP2K/8.2/data.CP2K.arm.gpu.config index 2012254a25a02c42d0f5972fed052c8eeccff1fe..d2314db8402cd4ca3557bdb6e6825cdb8c1354ad 100644 --- a/templates/CP2K/8.2/data.CP2K.arm.gpu.config +++ b/templates/CP2K/8.2/data.CP2K.arm.gpu.config @@ -2,12 +2,7 @@ 11.11.11.11 [DOWNLOAD] -libint/2.6.0 https://github.com/evaleev/libint/archive/v2.6.0.tar.gz -libXC/5.1.4 https://www.cp2k.org/static/downloads/libxc-5.1.4.tar.gz -fftw/3.3.8 https://www.cp2k.org/static/downloads/fftw-3.3.8.tar.gz -lapack/3.8.0 https://www.cp2k.org/static/downloads/lapack-3.8.0.tgz -scalapack/2.1.0 https://www.cp2k.org/static/downloads/scalapack-2.1.0.tgz -cmake/3.16.4 https://cmake.org/files/v3.16/cmake-3.16.4.tar.gz +cp2k/8.2 https://github.com/cp2k/cp2k/releases/download/v8.2.0/cp2k-8.2.tar.bz2 [DEPENDENCY] ./jarvis -install kgcc/9.3.1 com @@ -32,6 +27,8 @@ module load openblas/0.3.18 module load gsl/2.6 ./jarvis -install plumed/2.6.2 gcc+mpi ./jarvis -install libvori/21.04.12 gcc +#release CP2K +tar -jxvf downloads/cp2k-8.2.tar.bz2 [ENV] module purge @@ -41,9 +38,9 @@ module load gsl/2.6 [APP] app_name = CP2K -build_dir = /home/HT3/HPCRunner2/cp2k-8.2/ -binary_dir = /home/HT3/HPCRunner2/cp2k-8.2/exe/local-cuda/ -case_dir = /home/HT3/HPCRunner2/cp2k-8.2/benchmarks/QS/ +build_dir = ${JARVIS_ROOT}/cp2k-8.2/ +binary_dir = ${JARVIS_ROOT}/cp2k-8.2/exe/local-cuda/ +case_dir = ${JARVIS_ROOT}/cp2k-8.2/benchmarks/QS/ [BUILD] make -j 128 ARCH=local-cuda VERSION=psmp diff --git a/templates/qe/6.4/data.qe.test.config b/templates/qe/6.4/data.qe.test.config index b46531a8738f5e287285fb69a2bffcd92f281df0..bcab7d4a116d994a0be7c2bc39d3629bcebe276c 100644 --- a/templates/qe/6.4/data.qe.test.config +++ b/templates/qe/6.4/data.qe.test.config @@ -1,10 +1,6 @@ [SERVER] 11.11.11.11 -[DOWNLOAD] -kgcc/9.3.1 https://mirrors.huaweicloud.com/kunpeng/archive/compiler/kunpeng_gcc/gcc-9.3.1-2021.03-aarch64-linux.tar.gz -openmpi/4.1.2 https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.2.tar.gz - [DEPENDENCY] ./jarvis -install kgcc/9.3.1 com module purge diff --git a/templates/qe/6.4/data.qe.test.opt.config b/templates/qe/6.4/data.qe.test.opt.config index 78cf7e01af1340095e60369ba277a622123887fa..b191dcb5b6b7a92e60b412fba6ed046f2da15061 100644 --- a/templates/qe/6.4/data.qe.test.opt.config +++ b/templates/qe/6.4/data.qe.test.opt.config @@ -1,13 +1,6 @@ [SERVER] 11.11.11.11 -[DOWNLOAD] -bisheng/2.1.0 https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-2.1.0-aarch64-linux.tar.gz -hmpi/1.1.1 https://github.com/kunpengcompute/hucx/archive/refs/tags/v1.1.1-huawei.zip hucx-1.1.1-huawei.zip -hmpi/1.1.1 https://github.com/kunpengcompute/hmpi/archive/refs/tags/v1.1.1-huawei.zip hmpi-1.1.1-huawei.zip -hmpi/1.1.1 https://github.com/kunpengcompute/xucg/archive/refs/tags/v1.1.1-huawei.zip xucg-1.1.1-huawei.zip -openblas/0.3.18 https://github.com/xianyi/OpenBLAS/releases/download/v0.3.18/OpenBLAS-0.3.18.tar.gz - [DEPENDENCY] set -x set -e diff --git a/wechat-group-qr.png b/wechat-group-qr.png deleted file mode 100644 index 558033342971cfcb4a72434d89ffbff19a737bb1..0000000000000000000000000000000000000000 Binary files a/wechat-group-qr.png and /dev/null differ