diff --git "a/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\346\265\213\350\257\225\346\212\245\345\221\212\343\200\213.md" "b/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\346\265\213\350\257\225\346\212\245\345\221\212\343\200\213.md" new file mode 100644 index 0000000000000000000000000000000000000000..6463181e49ae3c1fc450f25d98ac1695737f7d40 --- /dev/null +++ "b/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\346\265\213\350\257\225\346\212\245\345\221\212\343\200\213.md" @@ -0,0 +1,2285 @@ +# 《基于openEuler的openlb软件测试报告》 + +- [《基于openEuler的openlb软件测试报告》](#基于openeuler的openlb软件测试报告) + - [1. 规范性自检](#1-规范性自检) + - [1.1 选择统计文件类型](#11-选择统计文件类型) + - [1.2 统计源码代码量](#12-统计源码代码量) + - [1.3 统计不符合要求的总行数](#13-统计不符合要求的总行数) + - [1.4 统计结果](#14-统计结果) + - [2. 功能性测试](#2-功能性测试) + - [2.1 测试案例](#21-测试案例) + - [2.2 功能测试](#22-功能测试) + - [3. 性能测试](#3-性能测试) + - [3.1 测试平台对比](#31-测试平台对比) + - [3.2 测试软件环境对比](#32-测试软件环境对比) + - [3.3 测试系统性能对比](#33-测试系统性能对比) + - [3.3.1 内存带宽测试](#331-内存带宽测试) + - [3.3.2 Infiniband网络基准测试](#332-infiniband网络基准测试) + - [3.4 基准测试选定](#34-基准测试选定) + - [3.5 性能指标选择](#35-性能指标选择) + - [3.5.1 可行指标](#351-可行指标) + - [3.5.2 MLUPs & MLUPps](#352-mlups--mlupps) + - [3.5.3 measured time](#353-measured-time) + - [3.5.4 性能指标分析](#354-性能指标分析) + - [3.6 测试性能分析](#36-测试性能分析) + - [3.6.1 测试方法](#361-测试方法) + - [3.6.2 GCC基准测试](#362-gcc基准测试) + - [3.6.3 执行时间(runtime)测试](#363-执行时间runtime测试) + - [3.6.4 MLUPs程序性能测试](#364-mlups程序性能测试) + - [3.6.5 MLUPps可扩展性测试](#365-mlupps可扩展性测试) + - [3.7 性能测试结论](#37-性能测试结论) + - [4. 精度测试](#4-精度测试) + - [4.1 案例程序选择](#41-案例程序选择) + - [4.2 数据分析](#42-数据分析) + - [4.3 分析结果](#43-分析结果) + +## 1. 规范性自检 + +项目使用了Artistic Style对文件进行格式化 + +Artistic Style(AStyle)是一个可用于C, C++, Objective‑C, C#和Java编程语言格式化和美化的工具。由于各个用户、项目程序编写规范存在差异,各个编辑器对缩进、换行等的标准不同,项目代码的格式化和规范、美化有助于代码阅读和协作。AStyle可以对项目代码进行格式化和美化,并提供了高度可自定义的配置文件和方法更好地规范代码。 + +文件格式化样式参考google代码风格,配置文件`config.ini`内容如下: + +```ini +# Google Coding Style Options +# braces and indent +style=google +indent=spaces=2 +# indentation +indent-switches +indent-continuation=2 +indent-preproc-define +min-conditional-indent=0 +max-continuation-indent=80 +# padding +pad-oper +pad-header +unpad-paren +align-pointer=type +# formatting +break-one-line-headers +keep-one-line-blocks +keep-one-line-statements +convert-tabs +# objective-c +pad-method-prefix +unpad-return-type +unpad-param-type +align-method-colon +pad-method-colon=none +``` + +对于当前项目,检查代码规范性,可以通过使用AStyle对所有源码进行重新格式化,然后使用git工具查看文件修改。 + +### 1.1 选择统计文件类型 + +统计项目文件类型及其文件数量,通过Python脚本自动进行统计并进行排序: + +```python +# -*- coding: utf-8 -*- + +import os + +me = __file__ +print("分析文件夹:" + os.getcwd()) + +def getAllFiles(targetDir): + files = [] + listFiles = os.listdir(targetDir) + for file in listFiles: + path = os.path.join(targetDir, file) + if os.path.isdir(path): + files.extend(getAllFiles(path)) + elif os.path.isfile(path) and path != me: + files.append(path) + return files + +all_files=getAllFiles(os.curdir) +type_dict=dict() + +for file in all_files: + if os.path.isdir(file): + type_dict.setdefault("文件夹", 0) + type_dict["文件夹"]+=1 + else: + ext=os.path.splitext(file)[1] + type_dict.setdefault(ext, 0) + type_dict[ext]+=1 + +type_dict["空"] = type_dict.pop("") +sort_dict = sorted(type_dict.items(), key = lambda a: a[1], reverse = True) + +for each_type in sort_dict: + print("当前文件夹下共有后缀名为【%s】的文件%d个" %(each_type[0], each_type[1])) +``` + +在OpenLB项目根目录下运行,得到统计结果如下(排除自身): + +```shell +分析文件夹:D:\Git\PortOpenLB\package\olb-1.4r0 +当前文件夹下共有后缀名为【.h】的文件512个 +当前文件夹下共有后缀名为【.hh】的文件428个 +当前文件夹下共有后缀名为【.cpp】的文件265个 +当前文件夹下共有后缀名为【.mk】的文件101个 +当前文件夹下共有后缀名为【空】的文件61个 +当前文件夹下共有后缀名为【.c】的文件15个 +当前文件夹下共有后缀名为【.xml】的文件8个 +当前文件夹下共有后缀名为【.dsp】的文件5个 +当前文件夹下共有后缀名为【.vcproj】的文件4个 +当前文件夹下共有后缀名为【.nix】的文件3个 +当前文件夹下共有后缀名为【.stl】的文件3个 +当前文件夹下共有后缀名为【.sh】的文件3个 +当前文件夹下共有后缀名为【.inp】的文件3个 +当前文件夹下共有后缀名为【.py】的文件2个 +当前文件夹下共有后缀名为【.txt】的文件2个 +当前文件夹下共有后缀名为【.hpp】的文件2个 +当前文件夹下共有后缀名为【.fcstd】的文件1个 +当前文件夹下共有后缀名为【.p】的文件1个 +当前文件夹下共有后缀名为【.dsw】的文件1个 +当前文件夹下共有后缀名为【.sln】的文件1个 +当前文件夹下共有后缀名为【.gif】的文件1个 +``` + +主要源码文件后缀名为`h`,`hh`,`cpp`以及部分C语言文件。由此判断该项目主要语言为C++/C。 + +### 1.2 统计源码代码量 + +统计行数: + +```shell +find ./ -regex ".*\.hpp\|.*\.h\|.*\.cpp|.*\.c|.*\.hh" | xargs wc -l +``` + +行数统计后结果输出如下: + +```shell +67334 total +``` + +统计字数: + +```shell +find ./ -regex ".*\.hpp\|.*\.h\|.*\.cpp|.*\.c|.*\.hh" | xargs wc -m +``` + +字数统计后结果输出如下: + +```shell +2517196 total +``` + +### 1.3 统计不符合要求的总行数 + +对源代码文件(后缀名为`cpp`,`hpp`,`h`,`hh`,`c`)进行AStyle代码样式格式化,格式化结果如下: + +```shell +royenheart@LAPTOP-TDKNUURL:/mnt/d/Git/PortOpenLB/package/olb-1.4r0$ astyle --project=styles/google.ini -R ./*.cpp,*.c,*.h,*.hh,*.hpp -Qnv +Artistic Style 3.1 08/29/2022 +Project option file /mnt/d/Git/PortOpenLB/package/olb-1.4r0/styles/google.ini +------------------------------------------------------------ +Directory ./*.cpp,*.c,*.h,*.hh,*.hpp +------------------------------------------------------------ +Formatted examples/laminar/bstep2d/bstep2d.cpp +Formatted examples/laminar/bstep3d/bstep3d.cpp +Formatted examples/laminar/cavity2d/cavity2d.cpp +Formatted examples/laminar/cavity3d/cavity3d.cpp +Formatted examples/laminar/cylinder2d/cylinder2d.cpp +Formatted examples/laminar/cylinder3d/cylinder3d.cpp +Formatted examples/laminar/poiseuille2d/poiseuille2d.cpp +Formatted examples/laminar/poiseuille3d/poiseuille3d.cpp +Formatted examples/laminar/powerLaw2d/powerLaw2d.cpp +Formatted examples/multiComponent/contactAngle2d/contactAngle2d.cpp +.........(篇幅过长不详细列举) +------------------------------------------------------------ + 1,032 formatted 190 unchanged 2.6 seconds 203,326 lines +``` + +使用git工具对文件格式化后的修改内容进行统计: + +```shell +royenheart@LAPTOP-TDKNUURL:/mnt/d/Git/PortOpenLB/package/olb-1.4r0$ git commit -m "openlb format" +[master cfbf5ca] openlb format + 1032 files changed, 52268 insertions(+), 58634 deletions(-) + rewrite examples/multiComponent/microFluidics2d/microFluidics2d.cpp (66%) + rewrite examples/particles/bifurcation3d/eulerEuler/bifurcation3d.cpp (64%) + rewrite examples/particles/bifurcation3d/eulerLagrange/bifurcation3d.cpp (64%) + rewrite examples/thermal/galliumMelting2d/galliumMelting2d.cpp (74%) + rewrite examples/thermal/stefanMelting2d/stefanMelting2d.cpp (74%) + rewrite src/boundary/setBoundaryCondition2D.cpp (75%) + rewrite src/boundary/setBoundaryCondition3D.cpp (69%) + rewrite src/dynamics/freeEnergyPostProcessor2D.hh (70%) + rewrite src/dynamics/freeEnergyPostProcessor3D.hh (78%) + rewrite src/dynamics/mrtHelpers2D.h (61%) + rewrite src/dynamics/mrtHelpers3D.h (81%) + rewrite src/external/tinyxml/tinyxml.cpp (60%) + rewrite src/external/zlib/adler32.c (65%) + rewrite src/external/zlib/deflate.c (75%) + rewrite src/external/zlib/deflate.h (66%) + rewrite src/external/zlib/gzlib.c (79%) + rewrite src/external/zlib/gzread.c (78%) + rewrite src/external/zlib/gzwrite.c (85%) + rewrite src/external/zlib/infback.c (77%) + rewrite src/external/zlib/inffast.c (76%) + rewrite src/external/zlib/inffixed.h (99%) + rewrite src/external/zlib/inflate.c (71%) + rewrite src/external/zlib/inflate.h (72%) + rewrite src/external/zlib/inftrees.c (87%) + rewrite src/external/zlib/trees.c (70%) + rewrite src/external/zlib/trees.h (96%) + rewrite src/functors/analytical/analyticalF.cpp (68%) + rewrite src/functors/lattice/integral/superPlaneIntegralFluxF2D.cpp (70%) + rewrite src/functors/lattice/integral/superPlaneIntegralFluxF3D.cpp (83%) + rewrite src/functors/lattice/latticeFrameChangeF3D.hh (60%) +``` + +### 1.4 统计结果 + +综上信息,项目中代码规范性自检检查结果为: + +通过率:$$12.92 \% = \frac{67334 - 58634}{67334} * 100 \%$$ + +不通过率:$$87.08 \% = \frac{58634}{67334} * 100 \%$$ + +## 2. 功能性测试 + +### 2.1 测试案例 + +OpenLB针对流体力学不同领域和情况在项目中提供了各种仿真实验测试文件,可以使用这些单元测试文件了解入门流体力学相关实验和对项目正确性进行验证。 + +单元测试文件数如下: + +```shell +olb-1.4r0/examples +├─laminar +│ ├─bstep2d +│ ├─bstep3d +│ ├─cavity2d +│ ├─cavity3d +│ ├─cylinder2d +│ ├─cylinder3d +│ ├─poiseuille2d +│ ├─poiseuille3d +│ └─powerLaw2d +├─multiComponent +│ ├─contactAngle2d +│ ├─contactAngle3d +│ ├─microFluidics2d +│ ├─phaseSeparation2d +│ ├─phaseSeparation3d +│ ├─rayleighTaylor2d +│ ├─rayleighTaylor3d +│ ├─youngLaplace2d +│ └─youngLaplace3d +├─particles +│ ├─bifurcation3d +│ │ ├─eulerEuler +│ │ └─eulerLagrange +│ ├─dkt2d +│ ├─magneticParticles3d +│ └─settlingCube3d +├─porousMedia +│ ├─porousPoiseuille2d +│ └─porousPoiseuille3d +├─thermal +│ ├─advectionDiffusion1d +│ ├─advectionDiffusion2d +│ ├─galliumMelting2d +│ ├─porousPlate2d +│ ├─porousPlate3d +│ ├─rayleighBenard2d +│ ├─rayleighBenard3d +│ ├─squareCavity2d +│ ├─squareCavity3d +│ └─stefanMelting2d +└─turbulence + ├─aorta3d + ├─channel3d + ├─nozzle3d + ├─tgv3d + └─venturi3d +``` + +案例测试程序对应流体力学测试方向如下: + +
+ +
+
程序列表1
+
+ +
+ +
+
程序列表2
+
+ +### 2.2 功能测试 + +OpenLB仅作为一个框架,只生成对应的库文件和头文件,而在examples目录中提供了各种方向的测试程序,需要用户在编译后自行进入各个测试案例目录运行。部分测试程序需要外部输入文件,OpenLB已在对应目录提供。 + +由于所有案例程序调用的都是OpenLB的框架(链接到库),因此只测试部分程序查看OpenLB框架的正确性。这里以**porousPoiseuille3d**和**contactAngle2d**两个程序进行测试,二者可覆盖olb框架对2d和3d情况的仿真模拟。 + +```shell +# 进入项目根目录并配置好编译参数后(在config.mk文件中指定)对OpenLB框架和各个测试案例进行编译 +make samples -j +``` + +- 进行测试 + +```shell +# 测试命令 +#!/bin/bash + +testFile=$1 + +export OMP_NUM_THREADS=24 +export OMP_PROC_BIND=true +export OMP_PLACES=cores + +mpirun -machinefile nodes -np 12 -npernode 4 --bind-to numa --mca btl ^vader,tcp,openib \ + --map-by numa --rank-by numa \ + -x UCX_TLS=sm,ud_x -x UCX_NET_DEVICES=mlx5_0:1 \ + -x UCX_BUILTIN_BCAST_ALGORITHM=3 \ + -x UCX_BUILTIN_ALLREDUCE_ALGORITHM=6 \ + -x UCX_BUILTIN_BARRIER_ALGORITHM=5 \ + -x UCX_BUILTIN_DEGREE_INTRA_FANOUT=3 \ + -x UCX_BUILTIN_DEGREE_INTRA_FANIN=2 \ + -x UCX_BUILTIN_DEGREE_INTER_FANOUT=7 \ + -x UCX_BUILTIN_DEGREE_INTER_FANIN=7 \ + ${testFile} +``` + +- 测试结果(porousPoiseuille3d) + +```shell +Warning: Permanently added 'n2,192.168.0.3' (ECDSA) to the list of known hosts. +Warning: Permanently added 'n3,192.168.0.4' (ECDSA) to the list of known hosts. +[MpiManager] Sucessfully initialized, numThreads=12 +[[[[[[OmpManager] [[[OmpManager] [[[OmpManager] OmpManager] OmpManager] OmpManagerOmpManager] ] [OmpManager] Sucessfully initialized, numThreads=24 +OmpManager] [Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +OmpManager] Sucessfully initialized, numThreads=24 +OmpManager] OmpManager] [Sucessfully initialized, numThreads=24 +OmpManager] [OmpManager] Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +OmpManagerSucessfully initialized, numThreads=24 +[OmpManager] OmpManager] Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +] [OmpManager] Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +[OmpManager] Sucessfully initialized, numThreads=24 +[OmpManager] Sucessfully initialized, numThreads=24 +[OmpManager] Sucessfully initialized, numThreads=24 +[OmpManager] Sucessfully initialized, numThreads=24 +[OmpManager] Sucessfully initialized, numThreads=24 +[OmpManager] Sucessfully initialized, numThreads=24 +[prepareGeometry] Prepare Geometry ... +[SuperGeometry3D] cleaned 4104 outer boundary voxel(s) +[SuperGeometry3D] cleaned 0 inner boundary voxel(s) +[SuperGeometry3D] the model is correct! +[CuboidGeometry3D] ---Cuboid Stucture Statistics--- +[CuboidGeometry3D] Number of Cuboids: 24 +[CuboidGeometry3D] Delta (min): 0.047619 +[CuboidGeometry3D] (max): 0.047619 +[CuboidGeometry3D] Ratio (min): 0.636364 +[CuboidGeometry3D] (max): 1.71429 +[CuboidGeometry3D] Nodes (min): 847 +[CuboidGeometry3D] (max): 1056 +[CuboidGeometry3D] Weight (min): 660 +[CuboidGeometry3D] (max): 1056 +[CuboidGeometry3D] -------------------------------- +[SuperGeometryStatistics3D] materialNumber=0; count=3060; minPhysR=(-0.047619,0,0); maxPhysR=(2.04762,1,1) +[SuperGeometryStatistics3D] materialNumber=1; count=14276; minPhysR=(0,0.047619,0.047619); maxPhysR=(2,0.952381,0.952381) +[SuperGeometryStatistics3D] materialNumber=2; count=3780; minPhysR=(-0.047619,0,0); maxPhysR=(2.04762,1,1) +[SuperGeometryStatistics3D] materialNumber=3; count=332; minPhysR=(-0.047619,0.047619,0.047619); maxPhysR=(-0.047619,0.952381,0.952381) +[SuperGeometryStatistics3D] materialNumber=4; count=332; minPhysR=(2.04762,0.047619,0.047619); maxPhysR=(2.04762,0.952381,0.952381) +[prepareGeometry] Prepare Geometry ... OK +[prepareLattice] Prepare Lattice ... +[prepareLattice] Lattice Porosity: 0.981859 +[prepareLattice] Kmin: 0.000181406 +[prepareLattice] Prepare Lattice ... OK +[main] starting simulation... +[Timer] step=0; percent=0; passedTime=1.252; remTime=110425; MLUPs=0 +[LatticeStatistics] step=0; t=0; uMax=0.000733084; avEnergy=1.4351e-07; avRho=1.00106 +[SuperPlaneIntegralFluxVelocity3D] regionName=Inflow; regionSize[m^2]=0.589569; volumetricFlowRate[m^3/s]=0.0739175; meanVelocity[m/s]=0.125375 +[SuperPlaneIntegralFluxPressure3D] regionName=Inflow; regionSize[m^2]=0.589569; force[N]=18.7033; meanPressure[Pa]=31.7237 +[SuperPlaneIntegralFluxVelocity3D] regionName=Outflow; regionSize[m^2]=0.664399; volumetricFlowRate[m^3/s]=0.0770278; meanVelocity[m/s]=0.115936 +[SuperPlaneIntegralFluxPressure3D] regionName=Outflow; regionSize[m^2]=0.664399; force[N]=-0.0626416; meanPressure[Pa]=0.0942832 +[getResults] pressure1=16.3205; pressure2=15.6805; pressureDrop=0.64 +[error] velocity-L1-error(abs)=0.00819255; velocity-L1-error(rel)=0.0496816 +[error] velocity-L2-error(abs)=0.00950415; velocity-L2-error(rel)=0.0671756 +[error] velocity-Linf-error(abs)=0.0377358; velocity-Linf-error(rel)=0.245121 +[error] pressure-L1-error(abs)=0.730704; pressure-L1-error(rel)=0.029626 +[error] pressure-L2-error(abs)=1.17404; pressure-L2-error(rel)=0.0508801 +[error] pressure-Linf-error(abs)=3.89607; pressure-Linf-error(rel)=0.121752 +[ValueTracer] average=1.47662e-07; stdDev/average=0.0138736 +[main] Simulation converged. +[Timer] step=1570; percent=1.78005; passedTime=4.495; remTime=248.027; MLUPs=9.06272 +[LatticeStatistics] step=1570; t=0.356009; uMax=0.000772759; avEnergy=1.48448e-07; avRho=1.0011 +[SuperPlaneIntegralFluxVelocity3D] regionName=Inflow; regionSize[m^2]=0.589569; volumetricFlowRate[m^3/s]=0.0744295; meanVelocity[m/s]=0.126244 +[SuperPlaneIntegralFluxPressure3D] regionName=Inflow; regionSize[m^2]=0.589569; force[N]=19.2009; meanPressure[Pa]=32.5677 +[SuperPlaneIntegralFluxVelocity3D] regionName=Outflow; regionSize[m^2]=0.664399; volumetricFlowRate[m^3/s]=0.0790515; meanVelocity[m/s]=0.118982 +[SuperPlaneIntegralFluxPressure3D] regionName=Outflow; regionSize[m^2]=0.664399; force[N]=-0.166278; meanPressure[Pa]=0.250269 +[getResults] pressure1=16.4704; pressure2=15.8142; pressureDrop=0.656244 +[error] velocity-L1-error(abs)=0.00501721; velocity-L1-error(rel)=0.0304256 +[error] velocity-L2-error(abs)=0.00417934; velocity-L2-error(rel)=0.0295397 +[error] velocity-Linf-error(abs)=0.0177805; velocity-Linf-error(rel)=0.115497 +[error] pressure-L1-error(abs)=0.356608; pressure-L1-error(rel)=0.0144585 +[error] pressure-L2-error(abs)=0.352035; pressure-L2-error(rel)=0.0152564 +[error] pressure-Linf-error(abs)=1.13082; pressure-Linf-error(rel)=0.035338 +[Timer] +[Timer] ----------------Summary:Timer---------------- +[Timer] measured time (rt) : 5.200s +[Timer] measured time (cpu): 61.404s +[Timer] average MLUPs : 5.652 +[Timer] average MLUPps: 0.020 +[Timer] --------------------------------------------- +``` + +- 测试结果(contactAngle2d) + +```shell +Warning: Permanently added 'n2,192.168.0.3' (ECDSA) to the list of known hosts. +Warning: Permanently added 'n3,192.168.0.4' (ECDSA) to the list of known hosts. +[MpiManager] Sucessfully initialized, numThreads=12 +[[[[OmpManager] [[[OmpManager] [[[OmpManager] OmpManager] OmpManagerSucessfully initialized, numThreads=24 +[OmpManager] [OmpManager] Sucessfully initialized, numThreads=24 +] [OmpManager] [Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +[OmpManager] [OmpManager] Sucessfully initialized, numThreads=24 +OmpManager] [Sucessfully initialized, numThreads=24 +OmpManager] Sucessfully initialized, numThreads=24 +[OmpManager] OmpManager] Sucessfully initialized, numThreads=24 +[OmpManager] Sucessfully initialized, numThreads=24 +OmpManager] [OmpManager] Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +OmpManager] OmpManager] Sucessfully initialized, numThreads=24 +OmpManager] Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +[OmpManager] Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +Sucessfully initialized, numThreads=24 +[OmpManager] Sucessfully initialized, numThreads=24 +[OmpManager] Sucessfully initialized, numThreads=24 +[OmpManager] Sucessfully initialized, numThreads=24 +[UnitConverter] ----------------- UnitConverter information ----------------- +[UnitConverter] -- Parameters: +[UnitConverter] Resolution: N= 75 +[UnitConverter] Lattice velocity: latticeU= 1663.34 +[UnitConverter] Lattice relaxation frequency: omega= 1 +[UnitConverter] Lattice relaxation time: tau= 1 +[UnitConverter] Characteristical length(m): charL= 75 +[UnitConverter] Characteristical speed(m/s): charU= 0.0001 +[UnitConverter] Phys. kinematic viscosity(m^2/s): charNu= 1.002e-08 +[UnitConverter] Phys. density(kg/m^d): charRho= 1 +[UnitConverter] Characteristical pressure(N/m^2): charPressure= 0 +[UnitConverter] Mach number: machNumber= 2880.99 +[UnitConverter] Reynolds number: reynoldsNumber= 748503 +[UnitConverter] Knudsen number: knudsenNumber= 0.003849 +[UnitConverter] +[UnitConverter] -- Conversion factors: +[UnitConverter] Voxel length(m): physDeltaX= 1 +[UnitConverter] Time step(s): physDeltaT= 1.66334e+07 +[UnitConverter] Velocity factor(m/s): physVelocity= 6.012e-08 +[UnitConverter] Density factor(kg/m^3): physDensity= 1 +[UnitConverter] Mass factor(kg): physMass= 1 +[UnitConverter] Viscosity factor(m^2/s): physViscosity= 6.012e-08 +[UnitConverter] Force factor(N): physForce= 3.61441e-15 +[UnitConverter] Pressure factor(N/m^2): physPressure= 3.61441e-15 +[UnitConverter] ------------------------------------------------------------- +[LoadBalancer] glob[0]=0 +[LoadBalancer] loc[0]=0 +[LoadBalancer] rank[0]=0 +[LoadBalancer] rank[1]=1 +[LoadBalancer] rank[2]=2 +[LoadBalancer] rank[3]=3 +[LoadBalancer] rank[4]=4 +[LoadBalancer] rank[5]=5 +[LoadBalancer] rank[6]=6 +[LoadBalancer] rank[7]=7 +[LoadBalancer] rank[8]=8 +[LoadBalancer] rank[9]=9 +[LoadBalancer] rank[10]=10 +[LoadBalancer] rank[11]=11 +[prepareGeometry] Prepare Geometry ... +[SuperGeometry2D] cleaned 0 inner boundary voxel(s) +[SuperGeometry2D] the model is correct! +[CuboidGeometry2D] ---Cuboid Stucture Statistics--- +[CuboidGeometry2D] Number of Cuboids: 12 +[CuboidGeometry2D] Delta (min): 1 +[CuboidGeometry2D] (max): 1 +[CuboidGeometry2D] Ratio (min): 1 +[CuboidGeometry2D] (max): 1.11765 +[CuboidGeometry2D] Nodes (min): 323 +[CuboidGeometry2D] (max): 323 +[CuboidGeometry2D] -------------------------------- +[SuperGeometryStatistics2D] materialNumber=1; count=3724; minPhysR=(0,1); maxPhysR=(75,49) +[SuperGeometryStatistics2D] materialNumber=2; count=152; minPhysR=(0,0); maxPhysR=(75,50) +[prepareGeometry] Prepare Geometry ... OK +[prepareLattice] Prepare Lattice ... +[prepareLattice] Prepare Lattice ... OK +[prepareCoupling] Add lattice coupling +[prepareCoupling] Add lattice coupling ... OK! +[main] starting simulation... +[Timer] step=0; percent=0; passedTime=0.724; remTime=50679.3; MLUPs=0 +[LatticeStatistics] step=0; t=0; uMax=0; avEnergy=0; avRho=1 +[LatticeStatistics] step=0; t=0; uMax=0; avEnergy=0; avRho=1 +[getResults] ----->>>>> Contact angle: 89.9697 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: -0.03031 +[Timer] step=1000; percent=1.42857; passedTime=1.482; remTime=102.258; MLUPs=5.10672 +[LatticeStatistics] step=1000; t=1.66334e+10; uMax=0.000147109; avEnergy=7.93412e-10; avRho=1 +[LatticeStatistics] step=1000; t=1.66334e+10; uMax=0.000147109; avEnergy=7.93412e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 90.4014 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.431718 +[Timer] step=2000; percent=2.85714; passedTime=2.253; remTime=76.602; MLUPs=5.02724 +[LatticeStatistics] step=2000; t=3.32668e+10; uMax=0.000134568; avEnergy=9.72497e-10; avRho=1 +[LatticeStatistics] step=2000; t=3.32668e+10; uMax=0.000134568; avEnergy=9.72497e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 91.1413 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.739916 +[Timer] step=3000; percent=4.28571; passedTime=3.002; remTime=67.0447; MLUPs=5.168 +[LatticeStatistics] step=3000; t=4.99002e+10; uMax=0.00012501; avEnergy=9.61883e-10; avRho=1 +[LatticeStatistics] step=3000; t=4.99002e+10; uMax=0.00012501; avEnergy=9.61883e-10; avRho=0.709498 +[getResults] ----->>>>> Contact angle: 91.8496 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.70832 +[Timer] step=4000; percent=5.71429; passedTime=4.379; remTime=72.2535; MLUPs=2.81686 +[LatticeStatistics] step=4000; t=6.65336e+10; uMax=0.000115249; avEnergy=9.0075e-10; avRho=1 +[LatticeStatistics] step=4000; t=6.65336e+10; uMax=0.000115249; avEnergy=9.0075e-10; avRho=0.709498 +[getResults] ----->>>>> Contact angle: 92.529 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.679393 +[Timer] step=5000; percent=7.14286; passedTime=5.135; remTime=66.755; MLUPs=5.12021 +[LatticeStatistics] step=5000; t=8.3167e+10; uMax=0.000106256; avEnergy=8.16662e-10; avRho=1 +[LatticeStatistics] step=5000; t=8.3167e+10; uMax=0.000106256; avEnergy=8.16662e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 93.1843 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.655271 +[Timer] step=6000; percent=8.57143; passedTime=5.887; remTime=62.7947; MLUPs=5.15426 +[LatticeStatistics] step=6000; t=9.98004e+10; uMax=9.80639e-05; avEnergy=7.24968e-10; avRho=1 +[LatticeStatistics] step=6000; t=9.98004e+10; uMax=9.80639e-05; avEnergy=7.24968e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 93.819 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.634676 +[Timer] step=7000; percent=10; passedTime=6.825; remTime=61.425; MLUPs=4.1322 +[LatticeStatistics] step=7000; t=1.16434e+11; uMax=9.0582e-05; avEnergy=6.35505e-10; avRho=1 +[LatticeStatistics] step=7000; t=1.16434e+11; uMax=9.0582e-05; avEnergy=6.35505e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 94.4338 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.614772 +[Timer] step=8000; percent=11.4286; passedTime=7.569; remTime=58.6598; MLUPs=5.20268 +[LatticeStatistics] step=8000; t=1.33067e+11; uMax=8.37055e-05; avEnergy=5.52735e-10; avRho=1 +[LatticeStatistics] step=8000; t=1.33067e+11; uMax=8.37055e-05; avEnergy=5.52735e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 95.019 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.585227 +[Timer] step=9000; percent=12.8571; passedTime=8.31; remTime=56.3233; MLUPs=5.22372 +[LatticeStatistics] step=9000; t=1.49701e+11; uMax=7.7349e-05; avEnergy=4.78139e-10; avRho=1 +[LatticeStatistics] step=9000; t=1.49701e+11; uMax=7.7349e-05; avEnergy=4.78139e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 95.5568 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.537832 +[Timer] step=10000; percent=14.2857; passedTime=9.065; remTime=54.39; MLUPs=5.12698 +[LatticeStatistics] step=10000; t=1.66334e+11; uMax=7.14523e-05; avEnergy=4.11917e-10; avRho=1 +[LatticeStatistics] step=10000; t=1.66334e+11; uMax=7.14523e-05; avEnergy=4.11917e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 96.0394 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.482621 +[Timer] step=11000; percent=15.7143; passedTime=9.806; remTime=52.5958; MLUPs=5.23784 +[LatticeStatistics] step=11000; t=1.82967e+11; uMax=6.59763e-05; avEnergy=3.53741e-10; avRho=1 +[LatticeStatistics] step=11000; t=1.82967e+11; uMax=6.59763e-05; avEnergy=3.53741e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 96.4749 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.435462 +[Timer] step=12000; percent=17.1429; passedTime=10.573; remTime=51.1028; MLUPs=5.04688 +[LatticeStatistics] step=12000; t=1.99601e+11; uMax=6.08952e-05; avEnergy=3.03042e-10; avRho=1 +[LatticeStatistics] step=12000; t=1.99601e+11; uMax=6.08952e-05; avEnergy=3.03042e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 96.8695 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.394642 +[Timer] step=13000; percent=18.5714; passedTime=11.323; remTime=49.647; MLUPs=5.168 +[LatticeStatistics] step=13000; t=2.16234e+11; uMax=5.61893e-05; avEnergy=2.59154e-10; avRho=1 +[LatticeStatistics] step=13000; t=2.16234e+11; uMax=5.61893e-05; avEnergy=2.59154e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 97.2286 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.35907 +[Timer] step=14000; percent=20; passedTime=12.096; remTime=48.384; MLUPs=5.01423 +[LatticeStatistics] step=14000; t=2.32868e+11; uMax=5.1841e-05; avEnergy=2.21368e-10; avRho=1 +[LatticeStatistics] step=14000; t=2.32868e+11; uMax=5.1841e-05; avEnergy=2.21368e-10; avRho=0.709499 +[getResults] ----->>>>> Contact angle: 97.5565 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.327872 +[Timer] step=15000; percent=21.4286; passedTime=12.846; remTime=47.102; MLUPs=5.168 +[LatticeStatistics] step=15000; t=2.49501e+11; uMax=4.78319e-05; avEnergy=1.88978e-10; avRho=1 +[LatticeStatistics] step=15000; t=2.49501e+11; uMax=4.78319e-05; avEnergy=1.88978e-10; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 97.8568 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.300336 +[Timer] step=16000; percent=22.8571; passedTime=13.676; remTime=46.1565; MLUPs=4.66426 +[LatticeStatistics] step=16000; t=2.66134e+11; uMax=4.41423e-05; avEnergy=1.61305e-10; avRho=1 +[LatticeStatistics] step=16000; t=2.66134e+11; uMax=4.41423e-05; avEnergy=1.61305e-10; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 98.1327 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.275886 +[Timer] step=17000; percent=24.2857; passedTime=14.446; remTime=45.0375; MLUPs=5.03377 +[LatticeStatistics] step=17000; t=2.82768e+11; uMax=4.07514e-05; avEnergy=1.37718e-10; avRho=1 +[LatticeStatistics] step=17000; t=2.82768e+11; uMax=4.07514e-05; avEnergy=1.37718e-10; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 98.3868 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.254051 +[Timer] step=18000; percent=25.7143; passedTime=15.217; remTime=43.9602; MLUPs=5.02724 +[LatticeStatistics] step=18000; t=2.99401e+11; uMax=3.76471e-05; avEnergy=1.17643e-10; avRho=1 +[LatticeStatistics] step=18000; t=2.99401e+11; uMax=3.76471e-05; avEnergy=1.17643e-10; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 98.6212 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.234443 +[Timer] step=19000; percent=27.1429; passedTime=15.987; remTime=42.9125; MLUPs=5.04031 +[LatticeStatistics] step=19000; t=3.16035e+11; uMax=3.47941e-05; avEnergy=1.00569e-10; avRho=1 +[LatticeStatistics] step=19000; t=3.16035e+11; uMax=3.47941e-05; avEnergy=1.00569e-10; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 98.8379 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.216745 +[Timer] step=20000; percent=28.5714; passedTime=16.763; remTime=41.9075; MLUPs=4.98842 +[LatticeStatistics] step=20000; t=3.32668e+11; uMax=3.2172e-05; avEnergy=8.60493e-11; avRho=1 +[LatticeStatistics] step=20000; t=3.32668e+11; uMax=3.2172e-05; avEnergy=8.60493e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 99.0386 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.200695 +[Timer] step=21000; percent=30; passedTime=17.706; remTime=41.314; MLUPs=4.10593 +[LatticeStatistics] step=21000; t=3.49301e+11; uMax=2.97619e-05; avEnergy=7.36973e-11; avRho=1 +[LatticeStatistics] step=21000; t=3.49301e+11; uMax=2.97619e-05; avEnergy=7.36973e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 99.2247 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.186079 +[Timer] step=22000; percent=31.4286; passedTime=18.64; remTime=40.6691; MLUPs=4.14989 +[LatticeStatistics] step=22000; t=3.65935e+11; uMax=2.75456e-05; avEnergy=6.31827e-11; avRho=1 +[LatticeStatistics] step=22000; t=3.65935e+11; uMax=2.75456e-05; avEnergy=6.31827e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 99.3974 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.172717 +[Timer] step=23000; percent=32.8571; passedTime=19.405; remTime=39.6537; MLUPs=5.06005 +[LatticeStatistics] step=23000; t=3.82568e+11; uMax=2.55065e-05; avEnergy=5.42243e-11; avRho=1 +[LatticeStatistics] step=23000; t=3.82568e+11; uMax=2.55065e-05; avEnergy=5.42243e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 99.5579 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.16046 +[Timer] step=24000; percent=34.2857; passedTime=20.24; remTime=38.7933; MLUPs=4.64192 +[LatticeStatistics] step=24000; t=3.99202e+11; uMax=2.36294e-05; avEnergy=4.65841e-11; avRho=1 +[LatticeStatistics] step=24000; t=3.99202e+11; uMax=2.36294e-05; avEnergy=4.65841e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 99.7071 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.149184 +[Timer] step=25000; percent=35.7143; passedTime=20.987; remTime=37.7766; MLUPs=5.18876 +[LatticeStatistics] step=25000; t=4.15835e+11; uMax=2.19005e-05; avEnergy=4.00613e-11; avRho=1 +[LatticeStatistics] step=25000; t=4.15835e+11; uMax=2.19005e-05; avEnergy=4.00613e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 99.8459 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.138786 +[Timer] step=26000; percent=37.1429; passedTime=21.729; remTime=36.7722; MLUPs=5.22372 +[LatticeStatistics] step=26000; t=4.32468e+11; uMax=2.0307e-05; avEnergy=3.44863e-11; avRho=1 +[LatticeStatistics] step=26000; t=4.32468e+11; uMax=2.0307e-05; avEnergy=3.44863e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 99.975 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.129177 +[Timer] step=27000; percent=38.5714; passedTime=22.477; remTime=35.7967; MLUPs=5.18182 +[LatticeStatistics] step=27000; t=4.49102e+11; uMax=1.88376e-05; avEnergy=2.97162e-11; avRho=1 +[LatticeStatistics] step=27000; t=4.49102e+11; uMax=1.88376e-05; avEnergy=2.97162e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 100.095 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.120281 +[Timer] step=28000; percent=40; passedTime=23.225; remTime=34.8375; MLUPs=5.18182 +[LatticeStatistics] step=28000; t=4.65735e+11; uMax=1.74819e-05; avEnergy=2.56305e-11; avRho=1 +[LatticeStatistics] step=28000; t=4.65735e+11; uMax=1.74819e-05; avEnergy=2.56305e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 100.207 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.112035 +[Timer] step=29000; percent=41.4286; passedTime=23.978; remTime=33.8999; MLUPs=5.15426 +[LatticeStatistics] step=29000; t=4.82369e+11; uMax=1.62305e-05; avEnergy=2.21276e-11; avRho=1 +[LatticeStatistics] step=29000; t=4.82369e+11; uMax=1.62305e-05; avEnergy=2.21276e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 100.312 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.104433 +[Timer] step=30000; percent=42.8571; passedTime=24.837; remTime=33.116; MLUPs=4.50698 +[LatticeStatistics] step=30000; t=4.99002e+11; uMax=1.5075e-05; avEnergy=1.91214e-11; avRho=1 +[LatticeStatistics] step=30000; t=4.99002e+11; uMax=1.5075e-05; avEnergy=1.91214e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 100.409 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0970788 +[Timer] step=31000; percent=44.2857; passedTime=25.613; remTime=32.2228; MLUPs=4.99485 +[LatticeStatistics] step=31000; t=5.15635e+11; uMax=1.40076e-05; avEnergy=1.65392e-11; avRho=1 +[LatticeStatistics] step=31000; t=5.15635e+11; uMax=1.40076e-05; avEnergy=1.65392e-11; avRho=0.7095 +[getResults] ----->>>>> Contact angle: 100.499 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0902512 +[Timer] step=32000; percent=45.7143; passedTime=26.364; remTime=31.3072; MLUPs=5.15426 +[LatticeStatistics] step=32000; t=5.32269e+11; uMax=1.30212e-05; avEnergy=1.43192e-11; avRho=1 +[LatticeStatistics] step=32000; t=5.32269e+11; uMax=1.30212e-05; avEnergy=1.43192e-11; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 100.583 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0839243 +[Timer] step=33000; percent=47.1429; passedTime=27.133; remTime=30.4218; MLUPs=5.04031 +[LatticeStatistics] step=33000; t=5.48902e+11; uMax=1.21104e-05; avEnergy=1.2409e-11; avRho=1 +[LatticeStatistics] step=33000; t=5.48902e+11; uMax=1.21104e-05; avEnergy=1.2409e-11; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 100.661 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.078059 +[Timer] step=34000; percent=48.5714; passedTime=27.887; remTime=29.5274; MLUPs=5.14058 +[LatticeStatistics] step=34000; t=5.65536e+11; uMax=1.12698e-05; avEnergy=1.07641e-11; avRho=1 +[LatticeStatistics] step=34000; t=5.65536e+11; uMax=1.12698e-05; avEnergy=1.07641e-11; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 100.734 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0726202 +[Timer] step=35000; percent=50; passedTime=28.644; remTime=28.644; MLUPs=5.11346 +[LatticeStatistics] step=35000; t=5.82169e+11; uMax=1.04925e-05; avEnergy=9.34643e-12; avRho=1 +[LatticeStatistics] step=35000; t=5.82169e+11; uMax=1.04925e-05; avEnergy=9.34643e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 100.801 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0675756 +[Timer] step=36000; percent=51.4286; passedTime=29.437; remTime=27.8016; MLUPs=4.88161 +[LatticeStatistics] step=36000; t=5.98802e+11; uMax=9.77346e-06; avEnergy=8.12375e-12; avRho=1 +[LatticeStatistics] step=36000; t=5.98802e+11; uMax=9.77346e-06; avEnergy=8.12375e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 100.864 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0628962 +[Timer] step=37000; percent=52.8571; passedTime=30.254; remTime=26.9833; MLUPs=4.74419 +[LatticeStatistics] step=37000; t=6.15436e+11; uMax=9.10824e-06; avEnergy=7.06841e-12; avRho=1 +[LatticeStatistics] step=37000; t=6.15436e+11; uMax=9.10824e-06; avEnergy=7.06841e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 100.922 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0580909 +[Timer] step=38000; percent=54.2857; passedTime=31.024; remTime=26.1255; MLUPs=5.02724 +[LatticeStatistics] step=38000; t=6.32069e+11; uMax=8.49266e-06; avEnergy=6.15681e-12; avRho=1 +[LatticeStatistics] step=38000; t=6.32069e+11; uMax=8.49266e-06; avEnergy=6.15681e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 100.976 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0534643 +[Timer] step=39000; percent=55.7143; passedTime=31.764; remTime=25.2483; MLUPs=5.24493 +[LatticeStatistics] step=39000; t=6.48703e+11; uMax=7.92289e-06; avEnergy=5.36876e-12; avRho=1 +[LatticeStatistics] step=39000; t=6.48703e+11; uMax=7.92289e-06; avEnergy=5.36876e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.025 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0493713 +[Timer] step=40000; percent=57.1429; passedTime=32.612; remTime=24.459; MLUPs=4.56537 +[LatticeStatistics] step=40000; t=6.65336e+11; uMax=7.39543e-06; avEnergy=4.68698e-12; avRho=1 +[LatticeStatistics] step=40000; t=6.65336e+11; uMax=7.39543e-06; avEnergy=4.68698e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.071 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0456135 +[Timer] step=41000; percent=58.5714; passedTime=33.964; remTime=24.0233; MLUPs=2.86899 +[LatticeStatistics] step=41000; t=6.81969e+11; uMax=6.90703e-06; avEnergy=4.09667e-12; avRho=1 +[LatticeStatistics] step=41000; t=6.81969e+11; uMax=6.90703e-06; avEnergy=4.09667e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.113 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0421616 +[Timer] step=42000; percent=60; passedTime=34.717; remTime=23.1447; MLUPs=5.14058 +[LatticeStatistics] step=42000; t=6.98603e+11; uMax=6.45471e-06; avEnergy=3.58514e-12; avRho=1 +[LatticeStatistics] step=42000; t=6.98603e+11; uMax=6.45471e-06; avEnergy=3.58514e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.152 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.038989 +[Timer] step=43000; percent=61.4286; passedTime=35.477; remTime=22.2763; MLUPs=5.1 +[LatticeStatistics] step=43000; t=7.15236e+11; uMax=6.03573e-06; avEnergy=3.14151e-12; avRho=1 +[LatticeStatistics] step=43000; t=7.15236e+11; uMax=6.03573e-06; avEnergy=3.14151e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.188 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0360718 +[Timer] step=44000; percent=62.8571; passedTime=36.241; remTime=21.4151; MLUPs=5.0733 +[LatticeStatistics] step=44000; t=7.3187e+11; uMax=5.64753e-06; avEnergy=2.75642e-12; avRho=1 +[LatticeStatistics] step=44000; t=7.3187e+11; uMax=5.64753e-06; avEnergy=2.75642e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.221 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0333882 +[Timer] step=45000; percent=64.2857; passedTime=37.001; remTime=20.5561; MLUPs=5.1 +[LatticeStatistics] step=45000; t=7.48503e+11; uMax=5.28778e-06; avEnergy=2.42184e-12; avRho=1 +[LatticeStatistics] step=45000; t=7.48503e+11; uMax=5.28778e-06; avEnergy=2.42184e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.252 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0309186 +[Timer] step=46000; percent=65.7143; passedTime=37.812; remTime=19.728; MLUPs=4.77928 +[LatticeStatistics] step=46000; t=7.65136e+11; uMax=4.95432e-06; avEnergy=2.13089e-12; avRho=1 +[LatticeStatistics] step=46000; t=7.65136e+11; uMax=4.95432e-06; avEnergy=2.13089e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.281 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.028645 +[Timer] step=47000; percent=67.1429; passedTime=38.58; remTime=18.8796; MLUPs=5.04031 +[LatticeStatistics] step=47000; t=7.8177e+11; uMax=4.64516e-06; avEnergy=1.87763e-12; avRho=1 +[LatticeStatistics] step=47000; t=7.8177e+11; uMax=4.64516e-06; avEnergy=1.87763e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.307 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0265511 +[Timer] step=48000; percent=68.5714; passedTime=39.333; remTime=18.0276; MLUPs=5.14741 +[LatticeStatistics] step=48000; t=7.98403e+11; uMax=4.35844e-06; avEnergy=1.65695e-12; avRho=1 +[LatticeStatistics] step=48000; t=7.98403e+11; uMax=4.35844e-06; avEnergy=1.65695e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.332 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0246222 +[Timer] step=49000; percent=70; passedTime=40.109; remTime=17.1896; MLUPs=4.98842 +[LatticeStatistics] step=49000; t=8.15037e+11; uMax=4.09248e-06; avEnergy=1.46446e-12; avRho=1 +[LatticeStatistics] step=49000; t=8.15037e+11; uMax=4.09248e-06; avEnergy=1.46446e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.355 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0228445 +[Timer] step=50000; percent=71.4286; passedTime=40.863; remTime=16.3452; MLUPs=5.14058 +[LatticeStatistics] step=50000; t=8.3167e+11; uMax=3.84569e-06; avEnergy=1.29639e-12; avRho=1 +[LatticeStatistics] step=50000; t=8.3167e+11; uMax=3.84569e-06; avEnergy=1.29639e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.376 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0212059 +[Timer] step=51000; percent=72.8571; passedTime=41.803; remTime=15.5737; MLUPs=4.1234 +[LatticeStatistics] step=51000; t=8.48303e+11; uMax=3.61662e-06; avEnergy=1.14946e-12; avRho=1 +[LatticeStatistics] step=51000; t=8.48303e+11; uMax=3.61662e-06; avEnergy=1.14946e-12; avRho=0.709501 +[getResults] ----->>>>> Contact angle: 101.396 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0196948 +[Timer] step=52000; percent=74.2857; passedTime=42.56; remTime=14.7323; MLUPs=5.11346 +[LatticeStatistics] step=52000; t=8.64937e+11; uMax=3.40394e-06; avEnergy=1.02088e-12; avRho=1 +[LatticeStatistics] step=52000; t=8.64937e+11; uMax=3.40394e-06; avEnergy=1.02088e-12; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.414 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0183011 +[Timer] step=53000; percent=75.7143; passedTime=43.348; remTime=13.9041; MLUPs=4.91878 +[LatticeStatistics] step=53000; t=8.8157e+11; uMax=3.20788e-06; avEnergy=9.08224e-13; avRho=1 +[LatticeStatistics] step=53000; t=8.8157e+11; uMax=3.20788e-06; avEnergy=9.08224e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.431 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0170152 +[Timer] step=54000; percent=77.1429; passedTime=44.105; remTime=13.0681; MLUPs=5.11346 +[LatticeStatistics] step=54000; t=8.98204e+11; uMax=3.02617e-06; avEnergy=8.09391e-13; avRho=1 +[LatticeStatistics] step=54000; t=8.98204e+11; uMax=3.02617e-06; avEnergy=8.09391e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.447 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0158284 +[Timer] step=55000; percent=78.5714; passedTime=44.933; remTime=12.2545; MLUPs=4.68116 +[LatticeStatistics] step=55000; t=9.14837e+11; uMax=2.85723e-06; avEnergy=7.22579e-13; avRho=1 +[LatticeStatistics] step=55000; t=9.14837e+11; uMax=2.85723e-06; avEnergy=7.22579e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.462 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0147329 +[Timer] step=56000; percent=80; passedTime=45.731; remTime=11.4328; MLUPs=4.85714 +[LatticeStatistics] step=56000; t=9.3147e+11; uMax=2.70009e-06; avEnergy=6.46228e-13; avRho=1 +[LatticeStatistics] step=56000; t=9.3147e+11; uMax=2.70009e-06; avEnergy=6.46228e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.475 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0137212 +[Timer] step=57000; percent=81.4286; passedTime=46.491; remTime=10.6032; MLUPs=5.0933 +[LatticeStatistics] step=57000; t=9.48104e+11; uMax=2.55387e-06; avEnergy=5.78988e-13; avRho=1 +[LatticeStatistics] step=57000; t=9.48104e+11; uMax=2.55387e-06; avEnergy=5.78988e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.488 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0127867 +[Timer] step=58000; percent=82.8571; passedTime=47.29; remTime=9.78414; MLUPs=4.85106 +[LatticeStatistics] step=58000; t=9.64737e+11; uMax=2.41774e-06; avEnergy=5.19691e-13; avRho=1 +[LatticeStatistics] step=58000; t=9.64737e+11; uMax=2.41774e-06; avEnergy=5.19691e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.5 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0119233 +[Timer] step=59000; percent=84.2857; passedTime=48.069; remTime=8.96202; MLUPs=4.96923 +[LatticeStatistics] step=59000; t=9.81371e+11; uMax=2.29096e-06; avEnergy=4.67325e-13; avRho=1 +[LatticeStatistics] step=59000; t=9.81371e+11; uMax=2.29096e-06; avEnergy=4.67325e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.511 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0111253 +[Timer] step=60000; percent=85.7143; passedTime=48.812; remTime=8.13533; MLUPs=5.21669 +[LatticeStatistics] step=60000; t=9.98004e+11; uMax=2.17282e-06; avEnergy=4.21015e-13; avRho=1 +[LatticeStatistics] step=60000; t=9.98004e+11; uMax=2.17282e-06; avEnergy=4.21015e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.522 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0103875 +[Timer] step=61000; percent=87.1429; passedTime=49.588; remTime=7.31626; MLUPs=4.99485 +[LatticeStatistics] step=61000; t=1.01464e+12; uMax=2.06267e-06; avEnergy=3.80001e-13; avRho=1 +[LatticeStatistics] step=61000; t=1.01464e+12; uMax=2.06267e-06; avEnergy=3.80001e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.531 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.00970526 +[Timer] step=62000; percent=88.5714; passedTime=50.361; remTime=6.49819; MLUPs=5.00775 +[LatticeStatistics] step=62000; t=1.03127e+12; uMax=1.95992e-06; avEnergy=3.43622e-13; avRho=1 +[LatticeStatistics] step=62000; t=1.03127e+12; uMax=1.95992e-06; avEnergy=3.43622e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.54 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.00907408 +[Timer] step=63000; percent=90; passedTime=51.215; remTime=5.69056; MLUPs=4.53864 +[LatticeStatistics] step=63000; t=1.0479e+12; uMax=1.86402e-06; avEnergy=3.11307e-13; avRho=1 +[LatticeStatistics] step=63000; t=1.0479e+12; uMax=1.86402e-06; avEnergy=3.11307e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.549 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.00849 +[Timer] step=64000; percent=91.4286; passedTime=51.987; remTime=4.87378; MLUPs=5.02073 +[LatticeStatistics] step=64000; t=1.06454e+12; uMax=1.77445e-06; avEnergy=2.82557e-13; avRho=1 +[LatticeStatistics] step=64000; t=1.06454e+12; uMax=1.77445e-06; avEnergy=2.82557e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.557 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0079493 +[Timer] step=65000; percent=92.8571; passedTime=52.744; remTime=4.05723; MLUPs=5.12021 +[LatticeStatistics] step=65000; t=1.08117e+12; uMax=1.69076e-06; avEnergy=2.56939e-13; avRho=1 +[LatticeStatistics] step=65000; t=1.08117e+12; uMax=1.69076e-06; avEnergy=2.56939e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.564 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.0074486 +[Timer] step=66000; percent=94.2857; passedTime=53.512; remTime=3.24315; MLUPs=5.04031 +[LatticeStatistics] step=66000; t=1.0978e+12; uMax=1.6125e-06; avEnergy=2.34076e-13; avRho=1 +[LatticeStatistics] step=66000; t=1.0978e+12; uMax=1.6125e-06; avEnergy=2.34076e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.571 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.00698476 +[Timer] step=67000; percent=95.7143; passedTime=54.282; remTime=2.43054; MLUPs=5.02724 +[LatticeStatistics] step=67000; t=1.11444e+12; uMax=1.53927e-06; avEnergy=2.13639e-13; avRho=1 +[LatticeStatistics] step=67000; t=1.11444e+12; uMax=1.53927e-06; avEnergy=2.13639e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.578 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.00655492 +[Timer] step=68000; percent=97.1429; passedTime=55.045; remTime=1.61897; MLUPs=5.07995 +[LatticeStatistics] step=68000; t=1.13107e+12; uMax=1.4707e-06; avEnergy=1.95341e-13; avRho=1 +[LatticeStatistics] step=68000; t=1.13107e+12; uMax=1.4707e-06; avEnergy=1.95341e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.584 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.00615643 +[Timer] step=69000; percent=98.5714; passedTime=55.805; remTime=0.808768; MLUPs=5.1 +[LatticeStatistics] step=69000; t=1.1477e+12; uMax=1.40645e-06; avEnergy=1.78932e-13; avRho=1 +[LatticeStatistics] step=69000; t=1.1477e+12; uMax=1.40645e-06; avEnergy=1.78932e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.59 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.00578685 +[Timer] step=70000; percent=100; passedTime=56.543; remTime=0; MLUPs=5.25203 +[LatticeStatistics] step=70000; t=1.16434e+12; uMax=1.34621e-06; avEnergy=1.64192e-13; avRho=1 +[LatticeStatistics] step=70000; t=1.16434e+12; uMax=1.34621e-06; avEnergy=1.64192e-13; avRho=0.709502 +[getResults] ----->>>>> Contact angle: 101.595 ; Analytical contact angle: 100.001 +[getResults] ----->>>>> Difference to previous: 0.00544394 +[Timer] +[Timer] ----------------Summary:Timer---------------- +[Timer] measured time (rt) : 56.801s +[Timer] measured time (cpu): 1218.501s +[Timer] average MLUPs : 4.777 +[Timer] average MLUPps: 0.017 +[Timer] --------------------------------------------- +``` + +模型验证且运行成功,测试通过。 + +## 3. 性能测试 + +### 3.1 测试平台对比 + +具体参数: + +| | 鲲鹏920集群 | Intel x86集群 | +| -------------------------- | ------------------------------------------------- | ------------------------------------------------- | +| 服务器型号 | TaiShan 2280 V2 | XH321 V5 | +| CPU | 2 * Kunpeng 920 | 2 * Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz | +| 内存 | 488G | 186G | +| Infiniband | Mellanox Technologies MT27800 Family [ConnectX-5] | Mellanox Technologies MT27700 Family [ConnectX-4] | +| 总节点数(不包括控制节点) | 3 | 4 | +| 系统 | openEuler-21.03 | CentOS-7 | +| Kernel | 4.18.0-147.el8.aarch64 | 4.18.0-26-generic | + +系统拓扑图比较: + +
+ +
+
鲲鹏920集群
+
+ +
+ +
+
Intel x86集群
+
+ +### 3.2 测试软件环境对比 + +| | 鲲鹏920集群 | Intel x86集群 | +| -------- | ------------------------------------------------ | ----------------------------------------------------------------- | +| gcc | gcc version 9.3.1 (GCC) | gcc version 7.5.0 (binhub) | +| compiler | Bisheng Compiler 1.3.3.b023 clang version 10.0.1 | icpc (ICC) 2021.5.0 20211109 | +| mpi | Hyper MPI v1.0.1 | Intel(R) MPI Library for Linux* OS, Version 2021.5 Build 20211102 | +| OpenBlas | tsv110p-0.3.21 | 未使用 | +| KML | 1.2.0 | 未使用 | +| OpenLB | 1.4r0 | 1.4r0 | + +### 3.3 测试系统性能对比 + +#### 3.3.1 内存带宽测试 + +使用内存带宽评测程序[Stream](https://www.cs.virginia.edu/stream/)进行测试。 + +内存带宽是处理器可以从内存读取数据或将数据存储到内存的速率 + +| 名称 | 计算模式 | bytes/iter | flops/iter | +| ----- | ----------------- | ---------- | ---------- | +| Copy | a(i) = b(i) | 16 | 0 | +| Scale | a(i) = p * b(i) | 16 | 1 | +| Add | a(i) = b(i)+c(i) | 24 | 1 | +| Triad | a(i)= b(i)+p*c(i) | 24 | 2 | + +- Copy :简单读取写入操作,需要访存两次,是最简单的访存模式。 +- Scale :读取并做乘法操作,共有三次访存。 +- Add : 读取加法操作,需要访存三次。 +- Triad :读取内存中的两个数,并做乘加混合运算的操作。 + +Stream测试数据数组大小(**DSTREAM_ARRAY_SIZE**,数组元素为8字节大小)计算公式设置为: + +$$ +L3×1024×1024×4.1×CPU路数/8 +$$ + +$L3$变量为CPU的L3缓存大小,以MB单位计算 + +鲲鹏920集群L3 cache大小为192M,得出测试数据数组大小为: + +$$ +DSTREAM\_ARRAY\_SIZE = \frac{192 * 1024 * 1024 * 4.1 * 2}{8} = 206359756 +$$ + +Intel x86集群L3 cache大小为28160K,得出测试数据数组大小为: + +$$ +DSTREAM\_ARRAY\_SIZE = \frac{28160 * 1024 * 4.1 * 2}{8} = 29556736 +$$ + +由此鲲鹏920集群内存带宽程序编译和测试结果如下: + +```shell +[xiehz20@n1 stream]$ clang -Ofast -fopenmp -march=armv8-a -mtune=tsv110 -mcmodel=large -DSTREAM_ARRAY_SIZE=206359756 -DNTIMES=30 stream.c -o arm/stream +/usr/bin/ld: /opt/app/kunpeng/bisheng/bin/../lib/libomp.so: .dynsym local symbol at index 26 (>= sh_info of 1) +[xiehz20@n1 stream]$ cd arm/ +[xiehz20@n1 arm]$ ./stream +------------------------------------------------------------- +STREAM version $Revision: 5.10 $ +------------------------------------------------------------- +This system uses 8 bytes per array element. +------------------------------------------------------------- +Array size = 206359756 (elements), Offset = 0 (elements) +Memory per array = 1574.4 MiB (= 1.5 GiB). +Total memory required = 4723.2 MiB (= 4.6 GiB). +Each kernel will be executed 30 times. + The *best* time for each kernel (excluding the first iteration) + will be used to compute the reported bandwidth. +------------------------------------------------------------- +Number of Threads requested = 96 +Number of Threads counted = 96 +------------------------------------------------------------- +Your clock granularity/precision appears to be 1 microseconds. +Each test below will take on the order of 38268 microseconds. + (= 38268 clock ticks) +Increase the size of the arrays if this shows that +you are not getting at least 20 clock ticks per test. +------------------------------------------------------------- +WARNING -- The above is only a rough guideline. +For best results, please be sure you know the +precision of your system timer. +------------------------------------------------------------- +Function Best Rate MB/s Avg time Min time Max time +Copy: 130370.1 0.033824 0.025326 0.044649 +Scale: 100388.3 0.049020 0.032890 0.063459 +Add: 109868.0 0.071518 0.045078 0.217662 +Triad: 109668.5 0.053110 0.045160 0.075044 +------------------------------------------------------------- +Solution Validates: avg error less than 1.000000e-13 on all three arrays +------------------------------------------------------------- +``` + +由此Intel x86集群内存带宽程序编译和测试结果如下: + +```shell +[xiehz20@n1 stream]$ icc -Ofast -qopenmp -xHost -mcmodel=medium -DSTREAM_ARRAY_SIZE=29556736 -DNTIMES=30 stream.c -o x86/stream +[xiehz20@n1 stream]$ cd x86/ +[xiehz20@n1 x86]$ ./stream +------------------------------------------------------------- +STREAM version $Revision: 5.10 $ +------------------------------------------------------------- +This system uses 8 bytes per array element. +------------------------------------------------------------- +Array size = 29556736 (elements), Offset = 0 (elements) +Memory per array = 225.5 MiB (= 0.2 GiB). +Total memory required = 676.5 MiB (= 0.7 GiB). +Each kernel will be executed 30 times. + The *best* time for each kernel (excluding the first iteration) + will be used to compute the reported bandwidth. +------------------------------------------------------------- +Number of Threads requested = 40 +Number of Threads counted = 40 +------------------------------------------------------------- +Your clock granularity/precision appears to be 1 microseconds. +Each test below will take on the order of 3487 microseconds. + (= 3487 clock ticks) +Increase the size of the arrays if this shows that +you are not getting at least 20 clock ticks per test. +------------------------------------------------------------- +WARNING -- The above is only a rough guideline. +For best results, please be sure you know the +precision of your system timer. +------------------------------------------------------------- +Function Best Rate MB/s Avg time Min time Max time +Copy: 127779.4 0.004375 0.003701 0.017825 +Scale: 117229.3 0.004689 0.004034 0.016958 +Add: 130637.9 0.006610 0.005430 0.018949 +Triad: 134707.2 0.005888 0.005266 0.018521 +------------------------------------------------------------- +Solution Validates: avg error less than 1.000000e-13 on all three arrays +------------------------------------------------------------- +``` + +#### 3.3.2 Infiniband网络基准测试 + +对多节点Infiniband网络通信进行测试: + +使用`ib_read_bw`、`ib_write_bw`、`ib_send_bw`三个命令对ib通信的各个方面进行测试: + +对鲲鹏920集群进行Infiniband网络通信测试: + +```shell +[xiehz20@n1 ~]$ ib_read_bw -F +[xiehz20@n2 ~]$ ib_read_bw -F n1 +--------------------------------------------------------------------------------------- + RDMA_Read BW Test + Dual-port : OFF Device : mlx5_0 + Number of qps : 1 Transport type : IB + Connection type : RC Using SRQ : OFF + TX depth : 128 + CQ Moderation : 100 + Mtu : 4096[B] + Link type : IB + Outstand reads : 16 + rdma_cm QPs : OFF + Data ex. method : Ethernet +--------------------------------------------------------------------------------------- + #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] + 65536 1000 11418.94 11398.46 0.182375 +--------------------------------------------------------------------------------------- +[xiehz20@n1 ~]$ ib_write_bw -F +[xiehz20@n2 ~]$ ib_write_bw -F n1 +--------------------------------------------------------------------------------------- + RDMA_Write BW Test + Dual-port : OFF Device : mlx5_0 + Number of qps : 1 Transport type : IB + Connection type : RC Using SRQ : OFF + TX depth : 128 + CQ Moderation : 100 + Mtu : 4096[B] + Link type : IB + Max inline data : 0[B] + rdma_cm QPs : OFF + Data ex. method : Ethernet +--------------------------------------------------------------------------------------- + #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] + 65536 5000 11509.98 11484.90 0.183758 +--------------------------------------------------------------------------------------- +[xiehz20@n1 ~]$ ib_send_bw -F +[xiehz20@n2 ~]$ ib_send_bw -F n1 +--------------------------------------------------------------------------------------- + Send BW Test + Dual-port : OFF Device : mlx5_0 + Number of qps : 1 Transport type : IB + Connection type : RC Using SRQ : OFF + TX depth : 128 + CQ Moderation : 100 + Mtu : 4096[B] + Link type : IB + Max inline data : 0[B] + rdma_cm QPs : OFF + Data ex. method : Ethernet +--------------------------------------------------------------------------------------- + #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] + 65536 1000 11488.91 11472.76 0.183564 +--------------------------------------------------------------------------------------- +``` + +在Intel x86集群进行Infiniband网络通信测试: + +```shell +[xiehz20@n1 ~]$ ib_read_bw -F -d mlx5_1 +[xiehz20@n2 ~]$ ib_read_bw -F -d mlx5_1 n1 +--------------------------------------------------------------------------------------- + RDMA_Read BW Test + Dual-port : OFF Device : mlx5_1 + Number of qps : 1 Transport type : IB + Connection type : RC Using SRQ : OFF + TX depth : 128 + CQ Moderation : 100 + Mtu : 4096[B] + Link type : IB + Outstand reads : 16 + rdma_cm QPs : OFF + Data ex. method : Ethernet +--------------------------------------------------------------------------------------- + #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] + 65536 1000 10837.63 10832.09 0.173313 +--------------------------------------------------------------------------------------- +[xiehz20@n1 ~]$ ib_write_bw -F -d mlx5_1 +[xiehz20@n2 ~]$ ib_write_bw -F -d mlx5_1 n1 +--------------------------------------------------------------------------------------- + RDMA_Write BW Test + Dual-port : OFF Device : mlx5_1 + Number of qps : 1 Transport type : IB + Connection type : RC Using SRQ : OFF + TX depth : 128 + CQ Moderation : 100 + Mtu : 4096[B] + Link type : IB + Max inline data : 0[B] + rdma_cm QPs : OFF + Data ex. method : Ethernet +--------------------------------------------------------------------------------------- + #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] + 65536 5000 11165.22 11142.22 0.178275 +--------------------------------------------------------------------------------------- +[xiehz20@n1 ~]$ ib_send_bw -F -d mlx5_1 +[xiehz20@n2 ~]$ ib_send_bw -F -d mlx5_1 n1 +--------------------------------------------------------------------------------------- + Send BW Test + Dual-port : OFF Device : mlx5_1 + Number of qps : 1 Transport type : IB + Connection type : RC Using SRQ : OFF + TX depth : 128 + CQ Moderation : 100 + Mtu : 4096[B] + Link type : IB + Max inline data : 0[B] + rdma_cm QPs : OFF + Data ex. method : Ethernet +--------------------------------------------------------------------------------------- + #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] + 65536 1000 11600.28 11599.48 0.185592 +--------------------------------------------------------------------------------------- +``` + +### 3.4 基准测试选定 + +考虑到程序运行时间和覆盖到的测试方向,本次移植以`examples`目录下的三个程序为基准进行分析比较: + +- **multiComponent/contactAngle2d/contactAngle2d** + In this example a semi-circular droplet of fluid is initialised within a different fluid at a solid boundary. The contact angle is measured as the droplet comes to equilibrium. This is compared with the analytical angle (100 degrees) predicted by the parameters set for the boundary. + + This example demonstrates how to use the wetting solid boundaries for the free-energy model with two fluid components. +- **porousMedia/porousPoiseuille3d/porousPoiseuille3d** + This example examines a 3D Poiseuille flow with porous media. Two porous media LB methods can be used here: + - Spaid and Phelan (doi:10.1063/1.869392), or + - Guo and Zhao (doi:10.1103/PhysRevE.66.036304) +- **laminar/bstep3d/bstep3d** + The implementation of a backward facing step. It is furthermore shown how to use checkpointing to save the state of the simulation regularly. + +实际运行中,**contactAngle2d**和**porousPoiseuille3d**程序分析数据较小,运行时间短。而**bstep3d**程序分析迭代步骤多,运行时间较长。 + +### 3.5 性能指标选择 + +#### 3.5.1 可行指标 + +OpenLB仿真测试程序中比较重要的仿真参数: + +1. `N`问题规模(2维仿真程序总量是N^2,3维的同理),表示格子数。 +2. 格子大小`x`(2维仿真程序总大小为x^2)。 +3. 计算域,即格子大小`x`与问题规模`N`的乘积。 + +OpenLB内提供的各个实验测试文件最终输出4个性能指标,这些指标的计算与上述的仿真参数有一定关系: + +1. measured time(rt/real time) +2. measured time(cpu) +3. average MLUPs +4. average MLUPps + +本次测试以上述4个指标为主对OpenLB的移植性能进行测试。 + +#### 3.5.2 MLUPs & MLUPps + +OpenLB基于的格子玻尔兹曼方法(LBM)是一类广泛应用的介观尺度下的流体数值模拟方法。MLUPs是该方法的一个评价指标。 + +对 LBM 方法运算表现的评判标准为每秒百万格子(Cell)更新数(Million Lattice UPdates per second),其表达式为 + +$$ +MLUPs = \frac{n_l * 10^{-6}}{T} +$$ + +$n_l$为格子数,$T$为计算耗时 + +MLUPps(Million Lattice UPdates per core and second):每秒每个计算单元(core)百万格子更新数: + +$$ +MLUPps = \frac{MLUPs}{cores} +$$ + +#### 3.5.3 measured time + +measured time给的是real time和CPU总时间即进程时间(用户态代码耗费的CPU时间 + 内核态代码运行耗费的CPU时间、内核执行系统调用的CPU时间) + +这两者关系是: + +> real < CPU 表明进程为计算密集型(CPU bound),利用多核处理器的并行执行优势 +> real ≈ CPU 表明进程为计算密集型,未并行执行 +> real > CPU 表明进程为I/O密集型 (I/O bound),多核并行执行优势并不明显 + +实际测试中OpenLB的real time 远小于 CPU Time,是计算密集型应用。 + +#### 3.5.4 性能指标分析 + +MLUPs用于比较OpenLB性能。 + +MLUPps用于比较可扩展性(多机多核并行运行时,各个计算单元是否能承担合理的计算量)。 + +measured time(runtime、CPU time)可用于大致评价运行时间,作为性能分析的辅助。 + +### 3.6 测试性能分析 + +#### 3.6.1 测试方法 + +进入OpenLB项目根目录,修改相关`config.mk`文件修改编译参数,本次测试使用GCC串行编译、Intel MPI并行编译和基于毕晟编译器的Hyper MPI并行编译进行对比比较,测试使用`config.mk`文件参数如下: + +```shell +# GCC串行编译参数 +CXX := g++ +CC := gcc +OPTIM := -O3 -Wall -march=native -mtune=native +DEBUG := -g -Wall -DOLB_DEBUG +CXXFLAGS := $(OPTIM) +CXXFLAGS += -std=c++14 +ARPRG := ar +LDFLAGS := +PARALLEL_MODE := OFF +MPIFLAGS := +OMPFLAGS := -fopenmp +BUILDTYPE := generic +FEATURES := +``` + +```shell +# Intel MPI并行编译 +CXX := mpiicpc +CC := mpiicc # necessary for zlib, for Intel use icc +OPTIM := -O3 -Wall -xHost # optional for Intel compiler +DEBUG := -g -Wall -DOLB_DEBUG +CXXFLAGS := $(OPTIM) +CXXFLAGS += -std=c++14 +ARPRG := xiar # mandatory for intel compiler +LDFLAGS := +PARALLEL_MODE := HYBRID +MPIFLAGS := +OMPFLAGS := -qopenmp +BUILDTYPE := generic +FEATURES := +``` + +```shell +# Hyper MPI并行编译 +## 配置参数说明,需要根据平台、系统自行修改 +# MTUNE_PARAM - mtune编译参数,根据机器平台修改为合适的参数 +# MARCH_PARAM - march编译参数,根据机器平台修改为合适的参数 +# CPU_FEATURE_PARAM - march开启平台拓展指令集特性,根据机器平台修改为合适的参数,如crc +# BISHENG_INCLUDE - 毕昇编译器include头文件目录 +# KML_INCLUDE - KML数学库include头文件目录 +# OPENBLAS_INCLUDE - OPENBLAS代数库include头文件目录 +# BISHENG_LIB - 毕昇编译器lib库文件目录 +# KML_LIB - KML数学库lib库文件目录 +# OPENBLAS_LIB - OPENBLAS数学库lib库文件目录 +# IPM_LIB - IPM MPI程序分析库lib库文件目录,通过插桩方式检测MPI程序性能,按需开启 +## ------------------------------------- +CXX := mpicxx +CC := mpicc # necessary for zlib, for Intel use icc +OPTIM := -Ofast -ffast-math -finline-functions -ffp-contract=fast -Wall -mtune=MTUNE_PARAM -march=MARCH_PARAM+CPU_FEATURE_PARAM -I BISHENG_INCLUDE -I KML_INCLUDE -I OPENBLAS_INCLUDE +DEBUG := -g -Wall -DOLB_DEBUG +DEBUGNoWall := -g -DOLB_DEBUG +CXXFLAGS := $(OPTIM) +# for debug mode +#CXXFLAGS += $(DEBUGNoWall) +#CXXFLAGS := $(DEBUG) + +# open pgo optimize +PGOCollect := -fprofile-instr-generate +PGOOptim := -fprofile-instr-use=code.profdata + +CXXFLAGS += -std=c++14 +ARPRG := ar +LDFLAGS := -fuse-ld=lld -flto -L OPENBLAS_LIB -lopenblas -L KML_LIB -lkm -lm -lkfft -L BISHENG_LIB -ljemalloc -Wl,-z,muldefs +# for IPM analysis(static used) +# LDFLAGS += -LIPM_LIB -lipm +# for pgo optimize +#LDFLAGS += $(PGOCollect) +#LDFLAGS += $(PGOOptim) +PARALLEL_MODE := HYBRID +MPIFLAGS := +OMPFLAGS := -fopenmp +BUILDTYPE := generic +FEATURES := OPENBLAS +``` + +使用`make samples -j`进行编译,并进入examples目录运行相关测试程序,每个程序以相同方式运行三次,结果取平均值。 + +三种编译方式运行命令如下: + +- GCC + +```shell +testFile=$1 + +# run without parallel +${testFile} +``` +- Intel MPI + +```shell +#!/bin/bash + +testFile=$1 + +export I_MPI_PIN_DOMAIN=omp +export OMP_NUM_THREADS=20 + +mpirun --print-rank-map -np 8 -ppn 2 --map-by ppr:1:numa:pe=20 -machinefile nodes ${testFile} +``` + +- Hyper MPI + +```shell +#!/bin/bash + +testFile=$1 + +export OMP_NUM_THREADS=24 +export OMP_PROC_BIND=true +export OMP_PLACES=cores + +mpirun -machinefile nodes -np 12 -npernode 4 --bind-to numa --mca btl ^vader,tcp,openib --map-by numa --rank-by numa \ + -x UCX_TLS=sm,ud_x -x UCX_NET_DEVICES=mlx5_0:1 \ + -x UCX_BUILTIN_BCAST_ALGORITHM=3 \ + -x UCX_BUILTIN_ALLREDUCE_ALGORITHM=6 \ + -x UCX_BUILTIN_BARRIER_ALGORITHM=5 \ + -x UCX_BUILTIN_DEGREE_INTRA_FANOUT=3 \ + -x UCX_BUILTIN_DEGREE_INTRA_FANIN=2 \ + -x UCX_BUILTIN_DEGREE_INTER_FANOUT=7 \ + -x UCX_BUILTIN_DEGREE_INTER_FANIN=7 \ + --report-bindings ${testFile} +``` + +由于Intel x86集群在硬件上的限制,无法测试超过160核并行运行的数据,故数据中并未体现。 + +#### 3.6.2 GCC基准测试 + +使用默认的GCC编译配置进行测试,并作为基准进行比较,最终测试结果如下: + +| Program | runtime(s) | CPU time(s) | Average MLUPs | Average MLUPps | +| ------------------ | ---------- | ----------- | ------------- | -------------- | +| bstep3d | 960.747 | 850.372 | 6.238 | 6.238 | +| contactAngle2d | 171.512 | 159.074 | 1.584 | 1.584 | +| porousPoiseuille3d | 11.818 | 8.712 | 2.487 | 2.487 | + +由于是串行运行,只占用一个CPU核,因此CPU time和Average MLUPps和并行运行进行比较时会考虑到核数的差异。 + +#### 3.6.3 执行时间(runtime)测试 + +以三个程序分别进行比较,runtime和CPU时间比较越低越好。 + +
+ +
+
bstesp3d runtime
+
+ +
+ +
+
contactAngle2d runtime
+
+ +
+ +
+
porousPoiseuille3d runtime
+
+ +
+ +
+
bstesp3d CPU Time
+
+ +
+ +
+
contactAngle2d CPU Time
+
+ +
+ +
+
porousPoiseuille3d CPU Time
+
+ +#### 3.6.4 MLUPs程序性能测试 + +MLUPs指标为百万格子(cell)每秒更新数,该数值越大越好。 + +
+ +
+
bstesp3d性能
+
+ +
+ +
+
contactAngle2d性能
+
+ +
+ +
+
porousPoiseuille3d性能
+
+ +#### 3.6.5 MLUPps可扩展性测试 + +MLUPps衡量每个计算单元负载的运算量,由于此参数由MLUPs计算得来,只代表能否进行有效的负载均衡。 + +
+ +
+
bstesp3d mpi可扩展性
+
+ +
+ +
+
contactAngle2d mpi可扩展性
+
+ +
+ +
+
porousPoiseuille3d mpi可扩展性
+
+ +### 3.7 性能测试结论 + +通过在用户、CPU时间以及LBM指标MLUPs和MLUPps的分析对比,可以看到,在经过充分的优化调优后,基于鲲鹏920集群的openEuler系统上移植的OpenLB应用性能远远超过GCC在x86平台上串行编译运行得到的结果。同时对比Intel集群的CentOS系统,在MLUPs指标上鲲鹏平台平均高 **2 - 5** MLUPs。在可拓展性上鲲鹏平台表现良好,在众核情况下仍能保持住稳定的负载均衡。同时在数据量较小、运行时间短(**contactAngle2d**和**porousPoiseuille3d**)和数据量大、运行时间长(**bstep3d**)的程序上,鲲鹏平台相对Intel平台能保持住稳定的性能提升和拓展性,具有较好的稳定性。 + +## 4. 精度测试 + +### 4.1 案例程序选择 + +选择porousPoiseuille3d程序进行精度测试,此仿真程序用于模拟、检测具有多孔介质的 3D 泊肃叶流动。 + +该程序运行后会在tmp目录下生成对应的gnuplot数据用于绘制图形,数据形式如下: + +```shell +0 0 0 +0.01 0.0136234 0.01069 +0.02 0.026053 0.0213801 +0.03 0.0373948 0.0320701 +0.04 0.0477452 0.0427601 +0.05 0.0571921 0.05288 +0.06 0.0658154 0.0611754 +0.07 0.0736879 0.0694709 +0.08 0.0808759 0.0777663 +0.09 0.0874397 0.0860617 +0.1 0.0934344 0.0929219 +0.11 0.0989099 0.0982035 +0.12 0.103912 0.103485 +0.13 0.108482 0.108767 +0.14 0.112657 0.114048 +0.15 0.116473 0.117975 +0.16 0.119961 0.121359 +0.17 0.123148 0.124744 +0.18 0.126062 0.128128 +0.19 0.128726 0.131512 +0.2 0.131161 0.133749 +0.21 0.133388 0.135928 +0.22 0.135423 0.138107 +0.23 0.137284 0.140285 +0.24 0.138986 0.142317 +0.25 0.140541 0.14372 +0.26 0.141962 0.145123 +0.27 0.14326 0.146526 +0.28 0.144446 0.147929 +0.29 0.145528 0.149116 +0.3 0.146516 0.150013 +0.31 0.147416 0.150909 +0.32 0.148236 0.151806 +0.33 0.148982 0.152703 +0.34 0.149659 0.153375 +0.35 0.150273 0.153934 +0.36 0.150829 0.154493 +0.37 0.151329 0.155052 +0.38 0.151779 0.155611 +0.39 0.152181 0.155958 +0.4 0.152539 0.156282 +0.41 0.152854 0.156606 +0.42 0.153131 0.156931 +0.43 0.153369 0.15723 +0.44 0.153572 0.157379 +0.45 0.153741 0.157528 +0.46 0.153878 0.157677 +0.47 0.153983 0.157826 +0.48 0.154057 0.157918 +0.49 0.154101 0.157918 +0.5 0.154116 0.157918 +0.51 0.154101 0.157918 +0.52 0.154057 0.157918 +0.53 0.153983 0.157826 +0.54 0.153878 0.157677 +0.55 0.153741 0.157528 +0.56 0.153572 0.15738 +0.57 0.153369 0.157231 +0.58 0.153131 0.156932 +0.59 0.152854 0.156607 +0.6 0.152539 0.156283 +0.61 0.152181 0.155959 +0.62 0.151779 0.155612 +0.63 0.151329 0.155053 +0.64 0.150829 0.154494 +0.65 0.150273 0.153935 +0.66 0.149659 0.153376 +0.67 0.148982 0.152704 +0.68 0.148236 0.151808 +0.69 0.147416 0.150911 +0.7 0.146516 0.150014 +0.71 0.145528 0.149117 +0.72 0.144446 0.147931 +0.73 0.14326 0.146528 +0.74 0.141962 0.145125 +0.75 0.140541 0.143721 +0.76 0.138986 0.142318 +0.77 0.137284 0.140287 +0.78 0.135423 0.138108 +0.79 0.133388 0.135929 +0.8 0.131161 0.13375 +0.81 0.128726 0.131514 +0.82 0.126062 0.12813 +0.83 0.123148 0.124745 +0.84 0.119961 0.121361 +0.85 0.116473 0.117976 +0.86 0.112657 0.11405 +0.87 0.108482 0.108768 +0.88 0.103912 0.103487 +0.89 0.0989099 0.0982049 +0.9 0.0934344 0.0929233 +0.91 0.0874397 0.0860629 +0.92 0.0808759 0.0777674 +0.93 0.0736879 0.0694719 +0.94 0.0658154 0.0611763 +0.95 0.0571921 0.0528808 +0.96 0.0477452 0.0427608 +0.97 0.0373948 0.0320706 +0.98 0.026053 0.0213804 +0.99 0.0136234 0.0106902 +1 0 0 +``` + +上述数据第一列为百分比进度,第二列为analytical数值,第三列为numerical数值,为计算得出。此次精度测试使用第三列数值以GCC编译的程序输出为基准进行比较分析。 + +这些数据通过gnuplot处理输出图片如下: + +
+ +
+
输出图片 示例
+
+ +### 4.2 数据分析 + +使用编译优化分析时的编译参数,对GCC串行运行、Intel MPI并行版本和OpenEuler系统上的Hyper MPI并行运行的结果进行比较,各个输出结果如下: + +- GCC + + ```shell + 0 0 0 + 0.01 0.0136234 0.01069 + 0.02 0.026053 0.02138 + 0.03 0.0373948 0.03207 + 0.04 0.0477452 0.04276 + 0.05 0.0571921 0.0528798 + 0.06 0.0658154 0.0611752 + 0.07 0.0736879 0.0694706 + 0.08 0.0808759 0.077766 + 0.09 0.0874397 0.0860614 + 0.1 0.0934344 0.0929216 + 0.11 0.0989099 0.0982031 + 0.12 0.103912 0.103485 + 0.13 0.108482 0.108766 + 0.14 0.112657 0.114048 + 0.15 0.116473 0.117974 + 0.16 0.119961 0.121359 + 0.17 0.123148 0.124743 + 0.18 0.126062 0.128128 + 0.19 0.128726 0.131512 + 0.2 0.131161 0.133748 + 0.21 0.133388 0.135927 + 0.22 0.135423 0.138106 + 0.23 0.137284 0.140285 + 0.24 0.138986 0.142316 + 0.25 0.140541 0.143719 + 0.26 0.141962 0.145123 + 0.27 0.14326 0.146526 + 0.28 0.144446 0.147929 + 0.29 0.145528 0.149115 + 0.3 0.146516 0.150012 + 0.31 0.147416 0.150909 + 0.32 0.148236 0.151806 + 0.33 0.148982 0.152703 + 0.34 0.149659 0.153374 + 0.35 0.150273 0.153934 + 0.36 0.150829 0.154493 + 0.37 0.151329 0.155052 + 0.38 0.151779 0.155611 + 0.39 0.152181 0.155958 + 0.4 0.152539 0.156282 + 0.41 0.152854 0.156606 + 0.42 0.153131 0.156931 + 0.43 0.153369 0.15723 + 0.44 0.153572 0.157379 + 0.45 0.153741 0.157528 + 0.46 0.153878 0.157677 + 0.47 0.153983 0.157826 + 0.48 0.154057 0.157918 + 0.49 0.154101 0.157918 + 0.5 0.154116 0.157918 + 0.51 0.154101 0.157918 + 0.52 0.154057 0.157918 + 0.53 0.153983 0.157826 + 0.54 0.153878 0.157677 + 0.55 0.153741 0.157528 + 0.56 0.153572 0.15738 + 0.57 0.153369 0.157231 + 0.58 0.153131 0.156932 + 0.59 0.152854 0.156607 + 0.6 0.152539 0.156283 + 0.61 0.152181 0.155959 + 0.62 0.151779 0.155612 + 0.63 0.151329 0.155053 + 0.64 0.150829 0.154494 + 0.65 0.150273 0.153935 + 0.66 0.149659 0.153376 + 0.67 0.148982 0.152705 + 0.68 0.148236 0.151808 + 0.69 0.147416 0.150911 + 0.7 0.146516 0.150014 + 0.71 0.145528 0.149117 + 0.72 0.144446 0.147931 + 0.73 0.14326 0.146528 + 0.74 0.141962 0.145125 + 0.75 0.140541 0.143722 + 0.76 0.138986 0.142318 + 0.77 0.137284 0.140287 + 0.78 0.135423 0.138108 + 0.79 0.133388 0.135929 + 0.8 0.131161 0.133751 + 0.81 0.128726 0.131514 + 0.82 0.126062 0.12813 + 0.83 0.123148 0.124745 + 0.84 0.119961 0.121361 + 0.85 0.116473 0.117976 + 0.86 0.112657 0.11405 + 0.87 0.108482 0.108768 + 0.88 0.103912 0.103487 + 0.89 0.0989099 0.0982049 + 0.9 0.0934344 0.0929233 + 0.91 0.0874397 0.086063 + 0.92 0.0808759 0.0777674 + 0.93 0.0736879 0.0694719 + 0.94 0.0658154 0.0611764 + 0.95 0.0571921 0.0528808 + 0.96 0.0477452 0.0427608 + 0.97 0.0373948 0.0320706 + 0.98 0.026053 0.0213804 + 0.99 0.0136234 0.0106902 + 1 0 0 + ``` + +- 鲲鹏920(OpenEuler上的Hyper MPI) + + ```shell + 0 0 0 + 0.01 0.0136234 0.01069 + 0.02 0.026053 0.0213801 + 0.03 0.0373948 0.0320701 + 0.04 0.0477452 0.0427601 + 0.05 0.0571921 0.05288 + 0.06 0.0658154 0.0611754 + 0.07 0.0736879 0.0694709 + 0.08 0.0808759 0.0777663 + 0.09 0.0874397 0.0860617 + 0.1 0.0934344 0.0929219 + 0.11 0.0989099 0.0982035 + 0.12 0.103912 0.103485 + 0.13 0.108482 0.108767 + 0.14 0.112657 0.114048 + 0.15 0.116473 0.117975 + 0.16 0.119961 0.121359 + 0.17 0.123148 0.124744 + 0.18 0.126062 0.128128 + 0.19 0.128726 0.131512 + 0.2 0.131161 0.133749 + 0.21 0.133388 0.135928 + 0.22 0.135423 0.138107 + 0.23 0.137284 0.140285 + 0.24 0.138986 0.142317 + 0.25 0.140541 0.14372 + 0.26 0.141962 0.145123 + 0.27 0.14326 0.146526 + 0.28 0.144446 0.147929 + 0.29 0.145528 0.149116 + 0.3 0.146516 0.150013 + 0.31 0.147416 0.150909 + 0.32 0.148236 0.151806 + 0.33 0.148982 0.152703 + 0.34 0.149659 0.153375 + 0.35 0.150273 0.153934 + 0.36 0.150829 0.154493 + 0.37 0.151329 0.155052 + 0.38 0.151779 0.155611 + 0.39 0.152181 0.155958 + 0.4 0.152539 0.156282 + 0.41 0.152854 0.156606 + 0.42 0.153131 0.156931 + 0.43 0.153369 0.15723 + 0.44 0.153572 0.157379 + 0.45 0.153741 0.157528 + 0.46 0.153878 0.157677 + 0.47 0.153983 0.157826 + 0.48 0.154057 0.157918 + 0.49 0.154101 0.157918 + 0.5 0.154116 0.157918 + 0.51 0.154101 0.157918 + 0.52 0.154057 0.157918 + 0.53 0.153983 0.157826 + 0.54 0.153878 0.157677 + 0.55 0.153741 0.157528 + 0.56 0.153572 0.15738 + 0.57 0.153369 0.157231 + 0.58 0.153131 0.156932 + 0.59 0.152854 0.156607 + 0.6 0.152539 0.156283 + 0.61 0.152181 0.155959 + 0.62 0.151779 0.155612 + 0.63 0.151329 0.155053 + 0.64 0.150829 0.154494 + 0.65 0.150273 0.153935 + 0.66 0.149659 0.153376 + 0.67 0.148982 0.152704 + 0.68 0.148236 0.151808 + 0.69 0.147416 0.150911 + 0.7 0.146516 0.150014 + 0.71 0.145528 0.149117 + 0.72 0.144446 0.147931 + 0.73 0.14326 0.146528 + 0.74 0.141962 0.145125 + 0.75 0.140541 0.143721 + 0.76 0.138986 0.142318 + 0.77 0.137284 0.140287 + 0.78 0.135423 0.138108 + 0.79 0.133388 0.135929 + 0.8 0.131161 0.13375 + 0.81 0.128726 0.131514 + 0.82 0.126062 0.12813 + 0.83 0.123148 0.124745 + 0.84 0.119961 0.121361 + 0.85 0.116473 0.117976 + 0.86 0.112657 0.11405 + 0.87 0.108482 0.108768 + 0.88 0.103912 0.103487 + 0.89 0.0989099 0.0982049 + 0.9 0.0934344 0.0929233 + 0.91 0.0874397 0.0860629 + 0.92 0.0808759 0.0777674 + 0.93 0.0736879 0.0694719 + 0.94 0.0658154 0.0611763 + 0.95 0.0571921 0.0528808 + 0.96 0.0477452 0.0427608 + 0.97 0.0373948 0.0320706 + 0.98 0.026053 0.0213804 + 0.99 0.0136234 0.0106902 + 1 0 0 + ``` + +- Intel x86集群(CentOS上的Intel MPI) + + ```shell + 0 0 0 + 0.01 0.0136234 0.01069 + 0.02 0.026053 0.02138 + 0.03 0.0373948 0.0320701 + 0.04 0.0477452 0.0427601 + 0.05 0.0571921 0.0528799 + 0.06 0.0658154 0.0611753 + 0.07 0.0736879 0.0694707 + 0.08 0.0808759 0.0777662 + 0.09 0.0874397 0.0860616 + 0.1 0.0934344 0.0929218 + 0.11 0.0989099 0.0982033 + 0.12 0.103912 0.103485 + 0.13 0.108482 0.108766 + 0.14 0.112657 0.114048 + 0.15 0.116473 0.117974 + 0.16 0.119961 0.121359 + 0.17 0.123148 0.124743 + 0.18 0.126062 0.128128 + 0.19 0.128726 0.131512 + 0.2 0.131161 0.133749 + 0.21 0.133388 0.135927 + 0.22 0.135423 0.138106 + 0.23 0.137284 0.140285 + 0.24 0.138986 0.142316 + 0.25 0.140541 0.14372 + 0.26 0.141962 0.145123 + 0.27 0.14326 0.146526 + 0.28 0.144446 0.147929 + 0.29 0.145528 0.149116 + 0.3 0.146516 0.150012 + 0.31 0.147416 0.150909 + 0.32 0.148236 0.151806 + 0.33 0.148982 0.152703 + 0.34 0.149659 0.153375 + 0.35 0.150273 0.153934 + 0.36 0.150829 0.154493 + 0.37 0.151329 0.155052 + 0.38 0.151779 0.155611 + 0.39 0.152181 0.155958 + 0.4 0.152539 0.156282 + 0.41 0.152854 0.156606 + 0.42 0.153131 0.156931 + 0.43 0.153369 0.15723 + 0.44 0.153572 0.157379 + 0.45 0.153741 0.157528 + 0.46 0.153878 0.157677 + 0.47 0.153983 0.157826 + 0.48 0.154057 0.157918 + 0.49 0.154101 0.157918 + 0.5 0.154116 0.157918 + 0.51 0.154101 0.157918 + 0.52 0.154057 0.157918 + 0.53 0.153983 0.157826 + 0.54 0.153878 0.157677 + 0.55 0.153741 0.157528 + 0.56 0.153572 0.15738 + 0.57 0.153369 0.157231 + 0.58 0.153131 0.156932 + 0.59 0.152854 0.156607 + 0.6 0.152539 0.156283 + 0.61 0.152181 0.155959 + 0.62 0.151779 0.155612 + 0.63 0.151329 0.155053 + 0.64 0.150829 0.154494 + 0.65 0.150273 0.153935 + 0.66 0.149659 0.153376 + 0.67 0.148982 0.152704 + 0.68 0.148236 0.151807 + 0.69 0.147416 0.150911 + 0.7 0.146516 0.150014 + 0.71 0.145528 0.149117 + 0.72 0.144446 0.147931 + 0.73 0.14326 0.146528 + 0.74 0.141962 0.145124 + 0.75 0.140541 0.143721 + 0.76 0.138986 0.142318 + 0.77 0.137284 0.140287 + 0.78 0.135423 0.138108 + 0.79 0.133388 0.135929 + 0.8 0.131161 0.13375 + 0.81 0.128726 0.131514 + 0.82 0.126062 0.128129 + 0.83 0.123148 0.124745 + 0.84 0.119961 0.121361 + 0.85 0.116473 0.117976 + 0.86 0.112657 0.11405 + 0.87 0.108482 0.108768 + 0.88 0.103912 0.103486 + 0.89 0.0989099 0.0982047 + 0.9 0.0934344 0.0929231 + 0.91 0.0874397 0.0860627 + 0.92 0.0808759 0.0777672 + 0.93 0.0736879 0.0694717 + 0.94 0.0658154 0.0611762 + 0.95 0.0571921 0.0528807 + 0.96 0.0477452 0.0427607 + 0.97 0.0373948 0.0320705 + 0.98 0.026053 0.0213803 + 0.99 0.0136234 0.0106902 + 1 0 0 + ``` + +以GCC为基准,python测试脚本如下: + +```python +armData = [] +x86Data = [] +defaultData = [] + +with open("arm.velocityProfile.dat", "r") as armfile: + for line in armfile: + armData.append(float(line.replace("\n", "").split(" ").pop())) + +with open("x86.velocityProfile.dat", "r") as x86file: + for line in x86file: + x86Data.append(float(line.replace("\n", "").split(" ").pop())) + +with open("gcc.velocityProfile.dat", "r") as defaultfile: + for line in defaultfile: + defaultData.append(float(line.replace("\n", "").split(" ").pop())) + +print("arm data: \n" , armData) +print("x86 data: \n" , x86Data) +print("default data: \n" , defaultData) + +print("====now print accuracy between arm and gcc data!====") + +for i in range(len(armData)): + try: + print(abs((armData[i]-defaultData[i]) / defaultData[i]) * 100, '%') + except ZeroDivisionError: + print("All zero!") + +print("====now print accuracy between x86 and gcc data!====") + +for i in range(len(x86Data)): + try: + print(abs((x86Data[i]-defaultData[i]) / defaultData[i]) * 100, '%') + except ZeroDivisionError: + print("All zero!") +``` + +运行该脚本(数据已放至正确位置),得到结果: + +```shell +/usr/bin/python3.8 /mnt/data/Git/PortOpenLB/UseOfOpenLB/精度测试/对比文件/data/accuracy_test.py +arm data: + [0.0, 0.01069, 0.0213801, 0.0320701, 0.0427601, 0.05288, 0.0611754, 0.0694709, 0.0777663, 0.0860617, 0.0929219, 0.0982035, 0.103485, 0.108767, 0.114048, 0.117975, 0.121359, 0.124744, 0.128128, 0.131512, 0.133749, 0.135928, 0.138107, 0.140285, 0.142317, 0.14372, 0.145123, 0.146526, 0.147929, 0.149116, 0.150013, 0.150909, 0.151806, 0.152703, 0.153375, 0.153934, 0.154493, 0.155052, 0.155611, 0.155958, 0.156282, 0.156606, 0.156931, 0.15723, 0.157379, 0.157528, 0.157677, 0.157826, 0.157918, 0.157918, 0.157918, 0.157918, 0.157918, 0.157826, 0.157677, 0.157528, 0.15738, 0.157231, 0.156932, 0.156607, 0.156283, 0.155959, 0.155612, 0.155053, 0.154494, 0.153935, 0.153376, 0.152704, 0.151808, 0.150911, 0.150014, 0.149117, 0.147931, 0.146528, 0.145125, 0.143721, 0.142318, 0.140287, 0.138108, 0.135929, 0.13375, 0.131514, 0.12813, 0.124745, 0.121361, 0.117976, 0.11405, 0.108768, 0.103487, 0.0982049, 0.0929233, 0.0860629, 0.0777674, 0.0694719, 0.0611763, 0.0528808, 0.0427608, 0.0320706, 0.0213804, 0.0106902, 0.0] +x86 data: + [0.0, 0.01069, 0.02138, 0.0320701, 0.0427601, 0.0528799, 0.0611753, 0.0694707, 0.0777662, 0.0860616, 0.0929218, 0.0982033, 0.103485, 0.108766, 0.114048, 0.117974, 0.121359, 0.124743, 0.128128, 0.131512, 0.133749, 0.135927, 0.138106, 0.140285, 0.142316, 0.14372, 0.145123, 0.146526, 0.147929, 0.149116, 0.150012, 0.150909, 0.151806, 0.152703, 0.153375, 0.153934, 0.154493, 0.155052, 0.155611, 0.155958, 0.156282, 0.156606, 0.156931, 0.15723, 0.157379, 0.157528, 0.157677, 0.157826, 0.157918, 0.157918, 0.157918, 0.157918, 0.157918, 0.157826, 0.157677, 0.157528, 0.15738, 0.157231, 0.156932, 0.156607, 0.156283, 0.155959, 0.155612, 0.155053, 0.154494, 0.153935, 0.153376, 0.152704, 0.151807, 0.150911, 0.150014, 0.149117, 0.147931, 0.146528, 0.145124, 0.143721, 0.142318, 0.140287, 0.138108, 0.135929, 0.13375, 0.131514, 0.128129, 0.124745, 0.121361, 0.117976, 0.11405, 0.108768, 0.103486, 0.0982047, 0.0929231, 0.0860627, 0.0777672, 0.0694717, 0.0611762, 0.0528807, 0.0427607, 0.0320705, 0.0213803, 0.0106902, 0.0] +default data: + [0.0, 0.01069, 0.02138, 0.03207, 0.04276, 0.0528798, 0.0611752, 0.0694706, 0.077766, 0.0860614, 0.0929216, 0.0982031, 0.103485, 0.108766, 0.114048, 0.117974, 0.121359, 0.124743, 0.128128, 0.131512, 0.133748, 0.135927, 0.138106, 0.140285, 0.142316, 0.143719, 0.145123, 0.146526, 0.147929, 0.149115, 0.150012, 0.150909, 0.151806, 0.152703, 0.153374, 0.153934, 0.154493, 0.155052, 0.155611, 0.155958, 0.156282, 0.156606, 0.156931, 0.15723, 0.157379, 0.157528, 0.157677, 0.157826, 0.157918, 0.157918, 0.157918, 0.157918, 0.157918, 0.157826, 0.157677, 0.157528, 0.15738, 0.157231, 0.156932, 0.156607, 0.156283, 0.155959, 0.155612, 0.155053, 0.154494, 0.153935, 0.153376, 0.152705, 0.151808, 0.150911, 0.150014, 0.149117, 0.147931, 0.146528, 0.145125, 0.143722, 0.142318, 0.140287, 0.138108, 0.135929, 0.133751, 0.131514, 0.12813, 0.124745, 0.121361, 0.117976, 0.11405, 0.108768, 0.103487, 0.0982049, 0.0929233, 0.086063, 0.0777674, 0.0694719, 0.0611764, 0.0528808, 0.0427608, 0.0320706, 0.0213804, 0.0106902, 0.0] +====now print accuracy between arm and gcc data!==== +All zero! +0.0 % +0.00046772684751827 % +0.000311817898334695 % +0.00023386342376724873 % +0.0003782162565020124 % +0.0003269298670029885 % +0.0004318373527918669 % +0.00038577270271680287 % +0.00034858833345568016 % +0.00032285281356393637 % +0.0004073191172148582 % +0.0 % +0.0009194049611100896 % +0.0 % +0.0008476443962237443 % +0.0 % +0.0008016481886655942 % +0.0 % +0.0 % +0.0007476747315855191 % +0.0007356890095426223 % +0.0007240815026146583 % +0.0 % +0.0007026616824538352 % +0.0006958022251568994 % +0.0 % +0.0 % +0.0 % +0.0006706233443992892 % +0.0006666133376003253 % +0.0 % +0.0 % +0.0 % +0.0006520009910421584 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0006548574048007597 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0006957877012572885 % +0.0 % +0.0 % +0.0 % +0.0 % +0.00074765796143655 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.00011619395094625515 % +0.0 % +0.0 % +0.00016346172706458155 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +All zero! +====now print accuracy between x86 and gcc data!==== +All zero! +0.0 % +0.0 % +0.000311817898334695 % +0.00023386342376724873 % +0.0001891081282510062 % +0.0001634649335071656 % +0.00014394578426395565 % +0.00025718180180525337 % +0.00023239222230378675 % +0.00021523520904262426 % +0.00020365955860036325 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0007476747315855191 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0006958022251568994 % +0.0 % +0.0 % +0.0 % +0.0006706233443992892 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0006520009910421584 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0006548574048007597 % +0.0006587268128168477 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0006890611541781223 % +0.0006957877012572885 % +0.0 % +0.0 % +0.0 % +0.0 % +0.00074765796143655 % +0.0 % +0.000780457348006712 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0 % +0.0009663049465159875 % +0.00020365582571936163 % +0.0002152312713880707 % +0.0003485818528226403 % +0.00025717717193290653 % +0.00028788618132763195 % +0.00032692345414050554 % +0.00018910455211709482 % +0.00023385904848102833 % +0.00031181206461973487 % +0.00046771809694582945 % +0.0 % +All zero! + +进程已结束,退出代码0 +``` + +### 4.3 分析结果 + +经过分析,所有误差均小于1%,编译结果通过测试。 \ No newline at end of file diff --git "a/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\346\265\213\350\257\225\346\212\245\345\221\212\343\200\213.pdf" "b/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\346\265\213\350\257\225\346\212\245\345\221\212\343\200\213.pdf" new file mode 100644 index 0000000000000000000000000000000000000000..3f8d1b21e326d01b2932b4b18958cedabb8f6ee0 Binary files /dev/null and "b/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\346\265\213\350\257\225\346\212\245\345\221\212\343\200\213.pdf" differ diff --git "a/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\347\247\273\346\244\215\346\214\207\345\215\227\343\200\213.md" "b/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\347\247\273\346\244\215\346\214\207\345\215\227\343\200\213.md" new file mode 100644 index 0000000000000000000000000000000000000000..6383b7dcdd6849ca6174753182285c25f14df6eb --- /dev/null +++ "b/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\347\247\273\346\244\215\346\214\207\345\215\227\343\200\213.md" @@ -0,0 +1,696 @@ +# 《基于openEuler的openlb软件移植指南》 + +- [《基于openEuler的openlb软件移植指南》](#基于openeuler的openlb软件移植指南) + - [1. 介绍](#1-介绍) + - [2. 配置编译与运行环境](#2-配置编译与运行环境) + - [2.1 硬件平台、操作系统](#21-硬件平台操作系统) + - [2.2 前置依赖、编译环境](#22-前置依赖编译环境) + - [2.3 环境管理](#23-环境管理) + - [2.4 下载安装毕晟编译器](#24-下载安装毕晟编译器) + - [2.5 下载编译Hyper MPI](#25-下载编译hyper-mpi) + - [2.6 下载安装KML数学库](#26-下载安装kml数学库) + - [2.7 下载编译OpenBlas线性代数库](#27-下载编译openblas线性代数库) + - [3. 编译优化OpenLB](#3-编译优化openlb) + - [3.1 下载编译OpenLB](#31-下载编译openlb) + - [3.2 运行OpenLB测试文件](#32-运行openlb测试文件) + - [3.3 OpenLB优化思路](#33-openlb优化思路) + - [3.4 数据可视化](#34-数据可视化) + - [4. 使用hpcrunner一键安装OpenLB](#4-使用hpcrunner一键安装openlb) + - [4.1 下载安装hpcrunner](#41-下载安装hpcrunner) + - [4.2 初始化hpcrunner](#42-初始化hpcrunner) + - [4.3 安装必要软件包](#43-安装必要软件包) + - [4.4 选择平台对应配置文件](#44-选择平台对应配置文件) + - [4.5 配置依赖环境](#45-配置依赖环境) + - [4.6 进行编译](#46-进行编译) + - [4.7 运行测试](#47-运行测试) + +## 1. 介绍 + +- 简介:格子玻尔兹曼方法(LBM)是一类广泛应用的流体力学(CFD)算法,而OpenLB开源项目,由德国卡尔斯鲁厄理工学院(KIT)开发,提供了实现该算法的一个通用C++框架,可以用于解决很多领域比如流体力学的流体运动问题。OpenLB最新版本为1.5r0(发布日期2022/04/14),主要更新为GPU(CUDA)运算支持和对LBM算法格子碰撞步骤在AVX2/AVX512指令集上的SIMD向量化支持。由于CUDA和AVX指令集为部分平台的特性,可行的替代方案不多,因此本次移植选取仍具有时效性和实用性的OpenLB 1.4r0版本(发布日期2020/11/17)进行,且移植方法对于1.5r0版本仍具有可行性。 + +- 官网地址:https://www.openlb.net/ +- OpenLB 1.4r0 用户手册:https://www.openlb.net/wp-content/uploads/2020/12/olb_ug-1.4r0.pdf +- 源码包下载地址:https://www.openlb.net/download/ + +- [版本特征](https://www.openlb.net/news/page/2/): +- 1. 增强了用户体验,全面更新核心数据结构、提供改进的边界处理。 +- 2. 更多的多物理场模型,例如二维和三维混合尺度扩散率湍流模型。 +- 3. 对工作负载的性能改进,使用新的传播模式和通信方式,对案例程序平均提速28%。 +- 4. 增加新的测试案例程序。 + +## 2. 配置编译与运行环境 + +### 2.1 硬件平台、操作系统 + +- 操作系统:openEuler +- 硬件平台:鲲鹏arm平台 + +本移植指南以鲲鹏920集群上的openEuler系统为例,软件编译部署需要根据安装平台进行合适的修改。 + +### 2.2 前置依赖、编译环境 + +- 编译器:毕晟编译器 +- MPI实现:Hyper MPI +- MPI通信库相关:hucx,xucg +- 数学库:KML +- 开源线性代数库:OpenBlas + +版本仅作参考,可使用高版本库进行替换。 + +| 名称 | 版本 | 软件下载地址 | +| :--------------- | :---------------------- | :------------------------------------------------------------------------------------------------------------------------------------ | +| Bisheng Compiler | 1.3.3 | https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-1.3.3-aarch64-linux.tar.gz | +| Hyper MPI | 1.1.1(二进制包) | https://mirrors.huaweicloud.com/kunpeng/archive/HyperMPI/1.1.1/Hyper-MPI_1.1.1_aarch64_OpenEuler20.03-LTS_Bisheng2.1.0_MLNX-OFED5.4.tar.gz | +| | 1.1.1(源码) | https://github.com/kunpengcompute/hmpi/archive/refs/tags/v1.1.1-huawei.tar.gz | +| KML | 1.6.0(毕昇编译器版本) | https://kunpeng-repo.obs.cn-north-4.myhuaweicloud.com/Kunpeng%20BoostKit/Kunpeng%20BoostKit%2022.0.RC3/BoostKit-kml_1.6.0_bisheng.zip | +| | 1.6.0(GCC编译器版本) | https://kunpeng-repo.obs.cn-north-4.myhuaweicloud.com/Kunpeng%20BoostKit/Kunpeng%20BoostKit%2022.0.RC3/BoostKit-kml_1.6.0.zip | +| OpenBlas | 0.3.21 | https://github.com/xianyi/OpenBLAS/archive/refs/tags/v0.3.21.tar.gz | +| OpenLB | 1.4r0 | https://www.openlb.net/wp-content/uploads/2020/11/olb-1.4r0.tgz | + +### 2.3 环境管理 + +这里只提供一种通用的安装、环境配置手段,具体实现方式可根据编译安装的平台进行修改。 + +- 创建软件包下载、编译、安装文件夹并加入环境变量 + + ```shell + # 软件包下载路径 + mkdir -p ${HOME}/downloads + # 软件编译路径 + mkdir -p ${HOME}/builds + # 软件安装路径 + mkdir -p ${HOME}/softwares + # 加入环境变量 + export OLB_PACKAGE_DIR=${HOME}/downloads + export OLB_BUILD_DIR=${HOME}/builds + export OLB_INSTALL_DIR=${HOME}/softwares + ``` + +### 2.4 下载安装毕晟编译器 + +- 安装必要依赖和毕昇编译器 + + ```shell + cd ${OLB_PACKAGE_DIR} + wget -c https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-1.3.3-aarch64-linux.tar.gz + tar xzvf bisheng-compiler-1.3.3-aarch64-linux.tar.gz -C ${OLB_INSTALL_DIR} + ``` + +- 配置毕昇编译器环境,可使用environment modules + + ```shell + echo "export PATH=$OLB_INSTALL_DIR/bisheng-compiler-1.3.3-aarch64-linux/bin:$PATH" >> ~/.bashrc + echo "export LD_LIBRARY_PATH=$OLB_INSTALL_DIR/bisheng-compiler-1.3.3-aarch64-linux/lib:$LD_LIBRARY_PATH" >> ~/.bashrc + source ~/.bashrc + export CC=`which clang` + export CXX=`which clang++` + ``` + +### 2.5 下载编译Hyper MPI + +- 安装必要前置环境 + + ```shell + yum -y install libstdc++ libstdc++-devel + yum -y install unzip make autoconf automake git libtool + yum -y install flex + ``` + +- 下载解压源码文件 + + ```shell + wget https://github.com/kunpengcompute/hucx/archive/refs/tags/v1.1.1-huawei.zip -O $OLB_PACKAGE_DIR/hucx-1.1.1-huawei.zip + wget https://github.com/kunpengcompute/xucg/archive/refs/tags/v1.1.1-huawei.zip -O $OLB_PACKAGE_DIR/xucg-1.1.1-huawei.zip + wget https://github.com/kunpengcompute/hmpi/archive/refs/tags/v1.1.1-huawei.zip -O $OLB_PACKAGE_DIR/hmpi-1.1.1-huawei.zip + + cd $OLB_BUILD_DIR + unzip -q $OLB_PACKAGE_DIR/hucx-1.1.1-huawei.zip + unzip -q $OLB_PACKAGE_DIR/xucg-1.1.1-huawei.zip + unzip -q $OLB_PACKAGE_DIR/hmpi-1.1.1-huawei.zip + cp -rf xucg-1.1.1-huawei/* hucx-1.1.1-huawei/src/ucg/ + ``` + +- 编译hucx + + ```shell + cd $OLB_BUILD_DIR/hucx-1.1.1-huawei + ./autogen.sh + # --disable-numa开启可能会影响性能,可安装libnuma-devel或编译安装numactl包并设置对应编译环境变量去除该参数的指定。 + ./contrib/configure-opt --prefix=$OLB_INSTALL_DIR/hmpi/hucx CFLAGS="-DHAVE___CLEAR_CACHE=1" --disable-numa --without-java + find . -name Makefile | xargs -I {} sed -i "s/-Werror//g" {} + find . -name Makefile | xargs -I {} sed -i "s/-implicit-function-declaration//g" {} + make -j + make install + ``` + +- 编译Hyper MPI + + ```shell + cd $OLB_BUILD_DIR/hmpi-1.1.1-huawei + ./autogen.pl + ./configure --prefix=$OLB_INSTALL_DIR/hmpi --with-platform=contrib/platform/mellanox/optimized --enable-mpi1-compatibility --with-ucx=$OLB_INSTALL_DIR/hmpi/hucx + make -j + make install + ``` + +- 将Hyper MPI环境加入环境变量 + + ```shell + echo "export PATH=$OLB_INSTALL_DIR/hmpi/bin:$PATH" >> ~/.bashrc + echo "export LD_LIBRARY_PATH=$OLB_INSTALL_DIR/hmpi/lib:$LD_LIBRARY_PATH" >> ~/.bashrc + source ~/.bashrc + export CC=mpicc CXX=mpicxx + ``` + +### 2.6 下载安装KML数学库 + +- 下载安装KML数学库 + + ```shell + cd ${OLB_PACKAGE_DIR} + wget -c https://kunpeng-repo.obs.cn-north-4.myhuaweicloud.com/Kunpeng%20BoostKit/Kunpeng%20BoostKit%2022.0.RC3/BoostKit-kml_1.6.0_bisheng.zip + if [ -d /usr/local/kml ];then + rpm -e boostkit-kml + fi + unzip -o BoostKit-kml_1.6.0_bisheng.zip + # 在将软件相关文件拷贝至在当前目录生成的usr/local文件夹 + rpm --force --nodeps -ivh boostkit-kml-1.6.0-1.aarch64.rpm + ``` + +- 若无权限使用rpm安装至/usr/local,使用如下命令对rpm包解压安装 + + ```shell + rpm2cpio boostkit-kml-1.6.0-1.aarch64.rpm > boostkit-kml-1.6.0-1.aarch64.cpio + mv boostkit-kml-1.6.0-1.aarch64.cpio $OLB_INSTALL_DIR + cd $OLB_INSTALL_DIR + cpio -i --make-directories < boostkit-kml-1.6.0-1.aarch64.cpio + ``` + +之后将KML的include、lib目录等在合适的环境变量中添加即可(LD_LIBRARY_PATH/LDFLAGS等) + +### 2.7 下载编译OpenBlas线性代数库 + +- 编译安装OpenBlas库 + + ```shell + cd ${OLB_PACKAGE_DIR} + wget -c https://github.com/xianyi/OpenBLAS/archive/refs/tags/v0.3.21.tar.gz -O openblas-0.3.21.tar.gz + cd ${OLB_BUILD_DIR} + tar -xvf ${OLB_PACKAGE_DIR}/openblas-0.3.21.tar.gz -C . + cd openblas-0.3.21 + # 针对鲲鹏920平台和毕昇编译器进行编译优化,需要根据具体平台进行修改。 + make CC=clang CXX=clang++ FC=flang TARGET=TSV110 USE_OPENMP=1 + make install PREFIX=${OLB_INSTALL_DIR}/openblas + ``` + +- 将openblas环境加入环境变量 + + ```shell + echo "export PATH=$OLB_INSTALL_DIR/openblas/bin:$PATH" >> ~/.bashrc + echo "export LD_LIBRARY_PATH=$OLB_INSTALL_DIR/openblas/lib:$LD_LIBRARY_PATH" >> ~/.bashrc + source ~/.bashrc + ``` + +## 3. 编译优化OpenLB + +### 3.1 下载编译OpenLB + +- 获取软件包 + + ```shell + cd ${OLB_PACKAGE_DIR} + wget -c https://www.openlb.net/wp-content/uploads/2020/11/olb-1.4r0.tgz + ``` + +- 解压软件包并进行编译(最简方法,[如何进行编译优化](#33-openlb优化思路)) + + ```shell + cd ${OLB_BUILD_DIR} + tar -xvf ${OLB_PACKAGE_DIR}/olb-1.4r0.tgz -C . + cd olb-1.4r0 + make samples -j + ``` + +### 3.2 运行OpenLB测试文件 + +- 测试脚本 + +由于OpenLB本身并没有提供相对应的一键测试脚本,故这里提供一个参考脚本,可进行编译文件的备份和测试。需要上一步编译后在OpenLB根目录下运行。 + +```shell +#!/bin/bash + +# 当出现报错时不退出,进行记录 +set +e + +# envs + +name="default" +root=. +outdir="logs" +backdir="backs" +# 默认不进行全部的测试 +isTest=false +# 默认开启案例程序的测试 +isTestExamples=true +# 默认开启数据可视化输出 +isVisual=true +# 测试文件位置存储,需要进行存储 +testsFilelists=() +# 使用的测试例 +benchmark="benchmarks/mpi-openmp-run.sh" +# 案例测试例 +# multiComponent/contactAngle2d +# porousMedia/porousPoiseuille3d +# laminar/bstep3d +examplesBench=("multiComponent/contactAngle2d/contactAngle2d" "porousMedia/porousPoiseuille3d/porousPoiseuille3d" "laminar/bstep3d/bstep3d") + +# 函数声明 + +## 说明 +help() { + echo "--name: 指定编译测试名称,在指定目录下进行拷贝和记录保存" + echo "--backdir: 指定备份目录" + echo "--outdir: 指定测试输出目录" + echo "--benchmark: 指定测试运行文件" + echo "--test: 是否进行全部实验的测试(默认关闭)" + echo "--visual: 是否开启可视化数据输出(需安装gnuplot,默认开启)" + echo "--examples: 是否开启案例程序的测试(默认开启)" +} + +## 遍历文件夹寻找可执行文件 +## 执行过程中进行文件的备份和文件测试位置的存储(用于生成测试脚本) +find_files() { + local path=$1 + local d=`ls ${path}` + for file in ${d[@]} + do + if test -d ${path}/${file} + then + find_files "${path}/${file}" + elif [[ ! ${file} =~ \.cpp$|\.o$|Makefile$|\.mk$|\.py$|\.sh$ ]]; + then + echo "${path}/${file}" + mkdir -p ${backdir}/${name}/${path} + cp ${path}/${file} ${backdir}/${name}/${path}/${file} + if test -x ${path}/${file} + then + testsFilelists[${#testsFilelists[*]}]="${backdir}/${name}/${path}/${file}" + fi + fi + done +} + +## 生成测试脚本 +generate_script() { + local lists=$1 + local benchFile=$2 + echo "#!/bin/bash" >> ${benchFile} + echo "testFilelists=(${lists[@]})" >> ${benchFile} + echo "useBenchmark=${useBenchmark}" >> ${benchFile} + echo "outdir=${outdir}" >> ${benchFile} + echo "name=${name}" >> ${benchFile} + echo 'for testfile in ${testFilelists[@]}' >> ${benchFile} + echo 'do' >> ${benchFile} + echo 'cd $(dirname ${testfile})' >> ${benchFile} + echo 'genDate=`date "+%Y%m%d_%H%M"`' >> ${benchFile} + echo 'logName=$(basename ${testfile})_${genDate}' >> ${benchFile} + echo '${useBenchmark} ${testfile} 2>&1 | tee ${outdir}/${name}/${logName}.log' >> ${benchFile} + echo 'mv ./tmp ${outdir}/${name}/tmp_${logName}' >> ${benchFile} + echo 'done' >> ${benchFile} +} + +## 读取参数 +read_config() { + until [ $# -eq 0 ] + do + case $1 in + "--name") + shift + name=$1 + ;; + "--backdir") + shift + backdir=$1 + ;; + "--outdir") + shift + outdir=$1 + ;; + "--benchmark") + shift + benchmark=$1 + ;; + "--test") + shift + isTest=$1 + ;; + "--help") + help + exit 0 + ;; + "--examples") + shift + isTestExamples=$1 + ;; + *) + echo "$1 options not recognized!" + exit -1 + ;; + esac + shift + done +} + +# 备份可执行文件 + +read_config $@ + +# 均使用绝对路径 +root=$(pwd) +backdir=$(cd $(dirname ${backdir}) && pwd)/$(basename ${backdir}) +outdir=$(cd $(dirname ${outdir}) && pwd)/$(basename ${outdir}) +useBenchmark=$(cd $(dirname ${benchmark}) && pwd)/$(basename ${benchmark}) + +if ! test -e ${useBenchmark} +then + echo "${useBenchmark} file not exists! please check ${benchmark} is located in your machine or in the current folder!" + exit -1 +fi + +echo "===info===" +echo "openlb root: ${root}" +echo "name: ${name}" +echo "backdir: ${backdir}" +echo "outdir: ${outdir}" +echo "benchmark: ${useBenchmark}" +echo "test all?: ${isTest}" +echo "test examples?: ${isTestExamples}" +echo "open visual data: ${isVisual}" +echo "===info===" + +mkdir -p ${backdir}/${name} +mkdir -p ${outdir}/${name} + +cd ./examples + +if [[ $? == 1 ]]; then + echo "There isn't any test file in your current folder, check if you're in the openlb source code folder!, current folder is ${root}" + exit -1 +fi + +find_files . + +cd .. + +for (( i=0;i<${#examplesBench[@]};i++ )) do + # 补全案例测试程序路径 + examplesBench[${i}]="${backdir}/${name}/${examplesBench[${i}]}" +done + +if [[ ${isTest} == true ]]; then + for testfile in ${testsFilelists[@]} + do + cd $(dirname ${testfile}) + genDate=`date "+%Y%m%d_%H%M"` + logName=$(basename ${testfile})_${genDate} + ${useBenchmark} ${testfile} 2>&1 | tee ${outdir}/${name}/${logName}.log + mv ./tmp ${outdir}/${name}/tmp_${logName} + done +elif [[ ${isTestExamples} == true ]]; then + for testfile in ${examplesBench[@]} + do + cd $(dirname ${testfile}) + genDate=`date "+%Y%m%d_%H%M"` + logName=$(basename ${testfile})_${genDate} + ${useBenchmark} ${testfile} 2>&1 | tee ${outdir}/${name}/${logName}.log + mv ./tmp ${outdir}/${name}/tmp_${logName} + done +else + echo "generate all test benchmark script..." + benchFile=${backdir}/${name}-benchmark.sh + if test -f ${benchFile} + then + rm ${benchFile} + fi + touch ${benchFile} + generate_script "${testsFilelists[*]}" "${benchFile}" + echo "all test benchmark script ${benchFile} done!" + echo "generate examples test benchmark script..." + benchFile=${backdir}/${name}-examples-benchmark.sh + if test -f ${benchFile} + then + rm ${benchFile} + fi + touch ${benchFile} + generate_script "${examplesBench[*]}" "${benchFile}" + echo "examples test benchmark script ${benchFile} done!" +fi +``` + +- 脚本运行并进行备份和测试 + +假如上一步的脚本文件名为`BackupAndTest.sh`并存放于OpenLB项目根目录,则运行以下命令 + +```shell +cd ${OLB_BUILD_DIR}/olb-1.4r0 +# 假设已经编译过,若仍未编译请回退至OpenLB编译步骤进行编译 +# 具体参数请根据情况修改 +chmod +x ./BackupAndTest.sh && ./BackupAndTest.sh --name default --backdir ./backs --outdir ./logs --benchmark run.sh --examples false +``` + +完成后会生成`${OLB_BUILD_DIR}/olb-1.4r0/backs/default`目录存放examples下已经编译好的实验测试程序和相关数据文件,并提供前缀为`default`的自动测试文件(全部测试和三个典型程序测试),测试文件会将程序输出存放至`${OLB_BUILD_DIR}/olb-1.4r0/logs/default`文件夹。**run.sh**为测试文件,应为能接收第一个参数输入作为测试程序并能自动完成程序的运行、测试的可执行脚本,这里提供一个模板做参考: + +```shell +#!/bin/bash + +# using hyper mpi + +testFile=$1 + +export OMP_NUM_THREADS=24 +export OMP_PROC_BIND=true +export OMP_PLACES=cores + +mpirun -machinefile nodes -np 12 -npernode 4 --bind-to numa --mca btl ^vader,tcp,openib --map-by numa --rank-by numa \ + -x UCX_TLS=sm,ud_x -x UCX_NET_DEVICES=mlx5_0:1 \ + -x UCX_BUILTIN_BCAST_ALGORITHM=3 \ + -x UCX_BUILTIN_ALLREDUCE_ALGORITHM=6 \ + -x UCX_BUILTIN_BARRIER_ALGORITHM=5 \ + -x UCX_BUILTIN_DEGREE_INTRA_FANOUT=3 \ + -x UCX_BUILTIN_DEGREE_INTRA_FANIN=2 \ + -x UCX_BUILTIN_DEGREE_INTER_FANOUT=7 \ + -x UCX_BUILTIN_DEGREE_INTER_FANIN=7 \ + --report-bindings ${testFile} +``` + +### 3.3 OpenLB优化思路 + +- 初始提供的`config.mk` + +OpenLB根目录内提供的config.mk文件为主要编译配置文件,可修改该文件的参数进行编译优化,默认参数如下: + +```shell +# This file is part of the OpenLB library +# +# Copyright (C) 2017 Markus Mohrhard, Mathias Krause +# E-mail contact: info@openlb.net +# The most recent release of OpenLB can be downloaded at +# +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public +# License along with this program; if not, write to the Free +# Software Foundation, Inc., 51 Franklin Street, Fifth Floor, +# Boston, MA 02110-1301, USA. + +########################################################################### +########################################################################### + +CXX := g++ +#CXX := icpc -D__aligned__=ignored +#CXX := mpiCC +#CXX := mpic++ + +CC := gcc # necessary for zlib, for Intel use icc + +OPTIM := -O3 -Wall -march=native -mtune=native # for gcc +#OPTIM := -O3 -Wall -xHost # for Intel compiler +#OPTIM := -O3 -Wall -xHost -ipo # optional for Intel compiler +DEBUG := -g -Wall -DOLB_DEBUG + +CXXFLAGS := $(OPTIM) +#CXXFLAGS := $(DEBUG) + +# compilation requires support for C++14 +# works in: +# * gcc 5 or later (https://gcc.gnu.org/projects/cxx-status.html#cxx14) +# * icc 17.0 or later (https://software.intel.com/en-us/articles/c14-features-supported-by-intel-c-compiler) +# * clang 3.4 or later (https://clang.llvm.org/cxx_status.html#cxx14) +CXXFLAGS += -std=c++14 + +ARPRG := ar +#ARPRG := xiar # mandatory for intel compiler + +LDFLAGS := + +PARALLEL_MODE := OFF +#PARALLEL_MODE := MPI +#PARALLEL_MODE := OMP +#PARALLEL_MODE := HYBRID + +MPIFLAGS := +OMPFLAGS := -fopenmp + +#BUILDTYPE := precompiled +BUILDTYPE := generic + +FEATURES := +``` + +- 参数说明和优化思路 + +`config.mk`文件内的参数意义如下,大部分参数和常规编译手段中用到的变量意义一致: + +1. CXX/CC:指定C++、C编译器。 +2. OPTIM:链接前编译优化参数,会加在CXXFLAGS后。 +3. DEBUG:开启DEBUG模式,此时会生成OpenLB的debug代码。 +4. ARPRG:指定静态文件链接器。 +5. LDFLAGS:指定链接时参数。 +6. PARALLEL_MODE:指定编译模式:1. 串行模式。2. MPI多进程模式。3. OMP多线程模式。4. MPI+OpenMP混合模式。一般指定第四种可更好地提升性能。 +7. FEATURE:开启特性(实质上是定义宏来开启额外的代码编译),OpenLB 1.4版本有OPENBLAS选项,可在部分代码中使用OpenBlas线性代数库提升运算性能。可参考用户手册或源码等开启其他特性 + +优化思路需要根据具体平台进行,这里根据鲲鹏920平台提供一种优化配置文件(头文件和库路径根据实际情况修改): + +```shell +CXX := mpicxx +CC := mpicc # necessary for zlib, for Intel use icc + +OPTIM := -Ofast -ffast-math -finline-functions -ffp-contract=fast -Wall -mtune=MTUNE_PARAM -march=MARCH_PARAM+CPU_FEATURE_PARAM -I BISHENG_INCLUDE -I KML_INCLUDE -I OPENBLAS_INCLUDE +DEBUG := -g -Wall -DOLB_DEBUG +DEBUGNoWall := -g -DOLB_DEBUG + +CXXFLAGS := $(OPTIM) +# for debug mode +# CXXFLAGS += $(DEBUGNoWall) +# CXXFLAGS := $(DEBUG) + +# open pgo optimize +PGOCollect := -fprofile-instr-generate +PGOOptim := -fprofile-instr-use=code.profdata + +CXXFLAGS += -std=c++14 + +ARPRG := ar +#ARPRG := xiar # mandatory for intel compiler + +LDFLAGS := -fuse-ld=lld -flto -L OPENBLAS_LIB -lopenblas -L KML_LIB -lkm -lm -lkfft -L BISHENG_LIB -ljemalloc -Wl,-z,muldefs +# for IPM analysis(static used) +# LDFLAGS += -LIPM_LIB -lipm +# for pgo optimize +#LDFLAGS += $(PGOCollect) +#LDFLAGS += $(PGOOptim) + +#PARALLEL_MODE := OFF +#PARALLEL_MODE := MPI +#PARALLEL_MODE := OMP +PARALLEL_MODE := HYBRID +MPIFLAGS := +OMPFLAGS := -fopenmp +#BUILDTYPE := precompiled +BUILDTYPE := generic +FEATURES := OPENBLAS +``` + +同时各个案例程序的文件夹内也提供了Makefile文件,可以修改该文件对特定程序做更加细化的优化。 + +### 3.4 数据可视化 + +OpenLB支持使用[gnuplot](https://sourceforge.net/projects/gnuplot)读取OpenLB生成的gnuplot数据并输出对应仿真实验的图像。OpenLB各个仿真测试文件在运行后会在当前目录生成`tmp`文件夹并存放对应的gnuplot脚本和生成的图像(如果程序运行时gnuplot应用能找到并正常运行)。 + +若想手动生成图像,安装好支持PNG格式的gnuplot软件环境后在对应目录下执行`gnuplot -c .\tmp\gnuplotData\data\plotPNG.p`即可生成对应图像。 + +此外OpenLB同样生成了PDF输出类型的gnuplot脚本,需要gnuplot支持PDF格式即可依照上述步骤生成对应文件。 + +OpenLB同样会生成供Paraview应用进行解析的数据,可使用Paraview软件进行更加细化的科学分析。 + +## 4. 使用hpcrunner一键安装OpenLB + +使用hpcrunner编译、部署OpenLB + +### 4.1 下载安装hpcrunner + +通过克隆git库安装hpcrunner + +```bash +git clone https://gitee.com/openeuler/hpcrunner.git +``` + +### 4.2 初始化hpcrunner + +初始化项目助手 + +```bash +cd hpcrunner +source init.sh +``` + +### 4.3 安装必要软件包 + +arm / x86 需要的软件包不同,根据实际环境和编译需求进行选择 + +```bash +# arm +yum install -y environment-odules git wget unzip make flex tar +# x86 +yum install -y environment-modules git wget unzip make flex +yum install -y gcc gcc-c++ glibc-devel +yum install -y tcsh tcl lsof tk bc +``` + +### 4.4 选择平台对应配置文件 + +- arm平台的配置文件为`templates/openlb/1.4/data.openlb.arm.cpu.config`或其MPI编译优化版本`templates/openlb/1.4/data.openlb.arm.cpu.opt.config`(使用Hyper MPI编译,package内包含其他编译参数配置的patch补丁文件,可自行选择或修改) + + ```shell + # default config + ./jarvis -use templates/openlb/1.4/data.openlb.arm.cpu.config + # optimization config + ./jarvis -use templates/openlb/1.4/data.openlb.arm.cpu.opt.config + ``` + +- x86 平台的配置文件为`templates/openlb/1.4/data.openlb.amd.cpu.config`或其MPI编译优化版本`templates/openlb/1.4/data.openlb.amd.cpu.oneapi.config`(使用Intel MPI编译,package内包含其他编译参数配置的patch补丁文件,可自行选择或修改) + + ```shell + # default config + ./jarvis -use templates/openlb/1.4/data.openlb.amd.cpu.config + # optimization config + ./jarvis -use templates/openlb/1.4/data.openlb.amd.cpu.oneapi.config + ``` + +### 4.5 配置依赖环境 + +```shell +./jarvis -dp +``` + +### 4.6 进行编译 + +```shell +./jarvis -b +``` + +### 4.7 运行测试 + +```shell +./jarvis -r +``` \ No newline at end of file diff --git "a/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\347\247\273\346\244\215\346\214\207\345\215\227\343\200\213.pdf" "b/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\347\247\273\346\244\215\346\214\207\345\215\227\343\200\213.pdf" new file mode 100644 index 0000000000000000000000000000000000000000..73e0981094a0ecda841b025914b5e892c327bd8d Binary files /dev/null and "b/doc/openlb/1.4/\343\200\212\345\237\272\344\272\216openEuler\347\232\204openlb\350\275\257\344\273\266\347\247\273\346\244\215\346\214\207\345\215\227\343\200\213.pdf" differ diff --git a/package/openlb/1.4/install.sh b/package/openlb/1.4/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..e8f6450e64ddffeeca0167c19a538d47e3817f8b --- /dev/null +++ b/package/openlb/1.4/install.sh @@ -0,0 +1,9 @@ +#!/bin/bash +set -x +set -e +. ${DOWNLOAD_TOOL} -u https://www.openlb.net/wp-content/uploads/2020/11/olb-1.4r0.tgz -f olb-1.4r0.tgz +cd ${JARVIS_TMP} +rm -rf olb-1.4r0 +tar -xvf ${JARVIS_DOWNLOAD}/olb-1.4r0.tgz +cd olb-1.4r0 +make samples -j \ No newline at end of file diff --git a/templates/openlb/1.4/data.openlb.amd.cpu.config b/templates/openlb/1.4/data.openlb.amd.cpu.config new file mode 100644 index 0000000000000000000000000000000000000000..806c9a7fd88ced39be144bffb44e56944eb7ba4f --- /dev/null +++ b/templates/openlb/1.4/data.openlb.amd.cpu.config @@ -0,0 +1,47 @@ +[SERVER] +11.11.11.11 + +[DOWNLOAD] +openlb/1.4 https://www.openlb.net/wp-content/uploads/2020/11/olb-1.4r0.tgz + +[DEPENDENCY] +set -e +set -x +module purge +module use ./software/modulefiles +./jarvis -install gcc/9.3.0 com +module load gcc9/9.3.0 +export CC=`which gcc` +export CXX=`which g++` +export FC=`which gfortran` +./jarvis -install hmpi/1.1.1 gcc +module load hmpi1/1.1.1 +rm -rf ${JARVIS_TMP}/olb-1.4r0 +tar -xvf ${JARVIS_DOWNLOAD}/olb-1.4r0.tgz -C ${JARVIS_TMP} + +[ENV] +module purge +module use ${JARVIS_ROOT}/software/modulefiles +module load gcc9/9.3.0 +module load hmpi1/1.1.1 +export CC=`which mpicc` +export CXX=`which mpicxx` + +[APP] +app_name = openlb +build_dir = ${JARVIS_TMP}/openlb-1.4 +binary_dir = +case_dir = ${JARVIS_TMP}/openlb-1.4/examples + +[BUILD] +cd ${JARVIS_TMP}/olb-1.4r0 +patch -p0 config.mk ${JARVIS_ROOT}/tunning/openlb/hmpi-gcc-default-hybrid.patch +make samples -j + +[CLEAN] +make clean + +[RUN] +run = +binary = +nodes = 1 diff --git a/templates/openlb/1.4/data.openlb.arm.cpu.config b/templates/openlb/1.4/data.openlb.arm.cpu.config new file mode 100644 index 0000000000000000000000000000000000000000..1ed546608aa534997908f0fc9f981d253f7ae076 --- /dev/null +++ b/templates/openlb/1.4/data.openlb.arm.cpu.config @@ -0,0 +1,47 @@ +[SERVER] +11.11.11.11 + +[DOWNLOAD] +openlb/1.4 https://www.openlb.net/wp-content/uploads/2020/11/olb-1.4r0.tgz + +[DEPENDENCY] +set -e +set -x +module purge +module use ./software/modulefiles +./jarvis -install bisheng/2.1.0 com +module load bisheng2/2.1.0 +export CC=`which clang` +export CXX=`which clang++` +export FC=`which flang` +./jarvis -install hmpi/1.1.1 clang +module load hmpi1/1.1.1 +rm -rf ${JARVIS_TMP}/olb-1.4r0 +tar -xvf ${JARVIS_DOWNLOAD}/olb-1.4r0.tgz -C ${JARVIS_TMP} + +[ENV] +module purge +module use ${JARVIS_ROOT}/software/modulefiles +module load bisheng2/2.1.0 +module load hmpi1/1.1.1 +export CC=`which mpicc` +export CXX=`which mpicxx` + +[APP] +app_name = openlb +build_dir = ${JARVIS_TMP}/openlb-1.4 +binary_dir = +case_dir = ${JARVIS_TMP}/openlb-1.4/examples + +[BUILD] +cd ${JARVIS_TMP}/olb-1.4r0 +patch -p0 config.mk ${JARVIS_ROOT}/tunning/openlb/hmpi-bisheng-default-hybrid.patch +make samples -j + +[CLEAN] +make clean + +[RUN] +run = +binary = +nodes = 1 \ No newline at end of file diff --git a/templates/openlb/1.4/data.openlb.arm.cpu.opt.config b/templates/openlb/1.4/data.openlb.arm.cpu.opt.config new file mode 100644 index 0000000000000000000000000000000000000000..3b9284affd51e2e6049762d74bd5975e83580101 --- /dev/null +++ b/templates/openlb/1.4/data.openlb.arm.cpu.opt.config @@ -0,0 +1,61 @@ +[SERVER] +11.11.11.11 + +[DOWNLOAD] +openlb/1.4 https://www.openlb.net/wp-content/uploads/2020/11/olb-1.4r0.tgz + +[DEPENDENCY] +set -e +set -x +module purge +module use ./software/modulefiles +./jarvis -install bisheng/2.1.0 com +module load bisheng2/2.1.0 +export CC=clang CXX=clang++ FC=flang +./jarvis -install hmpi/1.1.1 clang +module load hmpi1/1.1.1 +export CC=mpicc CXX=mpicxx FC=mpifort F77=mpifort +./jarvis -install openblas/0.3.18 bisheng +./jarvis -install kml/1.6.0/bisheng bisheng +./jarvis -install IPM/2.0.6 bisheng +rm -rf ${JARVIS_TMP}/olb-1.4r0 +tar -xvf ${JARVIS_DOWNLOAD}/olb-1.4r0.tgz -C ${JARVIS_TMP} + +[ENV] +module purge +module use ${JARVIS_ROOT}/software/modulefiles +module load bisheng2/2.1.0 +module load hmpi1/1.1.1 +module load openblas/0.3.18 +module load kml1/1.6.0 +export CC=`which mpicc` +export CXX=`which mpicxx` + +[APP] +app_name = openlb +build_dir = ${JARVIS_TMP}/openlb-1.4 +binary_dir = +case_dir = ${JARVIS_TMP}/openlb-1.4/examples + +[BUILD] +cd ${JARVIS_TMP}/olb-1.4r0 +patch -p0 global.mk ${JARVIS_ROOT}/tunning/openlb/global-openblas.patch +patch -p0 config.mk ${JARVIS_ROOT}/tunning/openlb/hmpi-bisheng-opt-hybrid.patch +sed -i 's/MTUNE_PARAM/tsv110/g' config.mk +sed -i 's/MARCH_PARAM/armv8-a+crc/g' config.mk +sed -i "s/BISHENG_INCLUDE/${JARVIS_COMPILER}\/bisheng\/include/g" config.mk +sed -i "s/KML_INCLUDE/\/usr\/local\/kml\/include/g" config.mk +sed -i "s/OPENBLAS_INCLUDE/${JARVIS_LIBS}\/bisheng2\/openblas\/include/g" config.mk +sed -i "s/OPENBLAS_LIB/${JARVIS_LIBS}\/bisheng2\/openblas\/lib/g" config.mk +sed -i "s/KML_LIB/\/usr\/local\/kml\/lib/g" config.mk +sed -i "s/BISHENG_LIB/${JARVIS_COMPILER}\/bisheng\/lib/g" config.mk +sed -i "s/IPM_LIB/${JARVIS_LIBS}\/bisheng2\/IPM\/lib/g" config.mk +make samples -j + +[CLEAN] +make clean + +[RUN] +run = +binary = +nodes = 1 \ No newline at end of file diff --git a/test/test-openlb.sh b/test/test-openlb.sh new file mode 100644 index 0000000000000000000000000000000000000000..ce359ca939788cdf38833e29ecfffa05a940d5ac --- /dev/null +++ b/test/test-openlb.sh @@ -0,0 +1,13 @@ +#!/bin/bash +cd .. +# release openlb src code +rm -rf tmp/olb-1.4r0 +tar xzvf ./downloads/olb-1.4r0.tar.gz -C tmp/ +# copy templates +cp -rf templates/openlb/1.4/data.openlb.amd.cpu.config ./ +# switch to config +./jarvis -use data.openlb.amd.cpu.config +# install dependency +./jarvis -dp +# build +./jarvis -b diff --git a/tunning/openlb/bisheng-default.patch b/tunning/openlb/bisheng-default.patch new file mode 100644 index 0000000000000000000000000000000000000000..b031aa49ad9aa3a003c92dc7eab9a19bfe1f88a1 --- /dev/null +++ b/tunning/openlb/bisheng-default.patch @@ -0,0 +1,122 @@ +1,64c1,56 +< # This file is part of the OpenLB library +< # +< # Copyright (C) 2017 Markus Mohrhard, Mathias Krause +< # E-mail contact: info@openlb.net +< # The most recent release of OpenLB can be downloaded at +< # +< # +< # This program is free software; you can redistribute it and/or +< # modify it under the terms of the GNU General Public License +< # as published by the Free Software Foundation; either version 2 +< # of the License, or (at your option) any later version. +< # +< # This program is distributed in the hope that it will be useful, +< # but WITHOUT ANY WARRANTY; without even the implied warranty of +< # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +< # GNU General Public License for more details. +< # +< # You should have received a copy of the GNU General Public +< # License along with this program; if not, write to the Free +< # Software Foundation, Inc., 51 Franklin Street, Fifth Floor, +< # Boston, MA 02110-1301, USA. +< +< ########################################################################### +< ########################################################################### +< +< CXX := g++ +< #CXX := icpc -D__aligned__=ignored +< #CXX := mpiCC +< #CXX := mpic++ +< +< CC := gcc # necessary for zlib, for Intel use icc +< +< OPTIM := -O3 -Wall -march=native -mtune=native # for gcc +< #OPTIM := -O3 -Wall -xHost # for Intel compiler +< #OPTIM := -O3 -Wall -xHost -ipo # optional for Intel compiler +< DEBUG := -g -Wall -DOLB_DEBUG +< +< CXXFLAGS := $(OPTIM) +< #CXXFLAGS := $(DEBUG) +< +< # compilation requires support for C++14 +< # works in: +< # * gcc 5 or later (https://gcc.gnu.org/projects/cxx-status.html#cxx14) +< # * icc 17.0 or later (https://software.intel.com/en-us/articles/c14-features-supported-by-intel-c-compiler) +< # * clang 3.4 or later (https://clang.llvm.org/cxx_status.html#cxx14) +< CXXFLAGS += -std=c++14 +< +< ARPRG := ar +< #ARPRG := xiar # mandatory for intel compiler +< +< LDFLAGS := +< +< PARALLEL_MODE := OFF +< #PARALLEL_MODE := MPI +< #PARALLEL_MODE := OMP +< #PARALLEL_MODE := HYBRID +< +< MPIFLAGS := +< OMPFLAGS := -fopenmp +< +< #BUILDTYPE := precompiled +< BUILDTYPE := generic +< +< FEATURES := +--- +> # This file is part of the OpenLB library +> # +> # Copyright (C) 2017 Markus Mohrhard, Mathias Krause +> # E-mail contact: info@openlb.net +> # The most recent release of OpenLB can be downloaded at +> # +> # +> # This program is free software; you can redistribute it and/or +> # modify it under the terms of the GNU General Public License +> # as published by the Free Software Foundation; either version 2 +> # of the License, or (at your option) any later version. +> # +> # This program is distributed in the hope that it will be useful, +> # but WITHOUT ANY WARRANTY; without even the implied warranty of +> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +> # GNU General Public License for more details. +> # +> # You should have received a copy of the GNU General Public +> # License along with this program; if not, write to the Free +> # Software Foundation, Inc., 51 Franklin Street, Fifth Floor, +> # Boston, MA 02110-1301, USA. +> +> ########################################################################### +> ########################################################################### +> +> CXX := clang++ +> CC := clang # necessary for zlib, for Intel use icc +> OPTIM := -O3 -Wall -mtune=native +> DEBUG := -g -Wall -DOLB_DEBUG +> CXXFLAGS := $(OPTIM) +> #CXXFLAGS := $(DEBUG) +> +> # compilation requires support for C++14 +> # works in: +> # * gcc 5 or later (https://gcc.gnu.org/projects/cxx-status.html#cxx14) +> # * icc 17.0 or later (https://software.intel.com/en-us/articles/c14-features-supported-by-intel-c-compiler) +> # * clang 3.4 or later (https://clang.llvm.org/cxx_status.html#cxx14) +> CXXFLAGS += -std=c++14 +> +> ARPRG := ar +> #ARPRG := xiar # mandatory for intel compiler +> +> LDFLAGS := +> +> PARALLEL_MODE := OFF +> #PARALLEL_MODE := MPI +> #PARALLEL_MODE := OMP +> #PARALLEL_MODE := HYBRID +> +> MPIFLAGS := +> OMPFLAGS := -fopenmp +> +> #BUILDTYPE := precompiled +> BUILDTYPE := generic +> +> FEATURES := diff --git a/tunning/openlb/gcc-default.patch b/tunning/openlb/gcc-default.patch new file mode 100644 index 0000000000000000000000000000000000000000..7323f907d3ec3c8a122c6a92870f32bd1d5b126b --- /dev/null +++ b/tunning/openlb/gcc-default.patch @@ -0,0 +1,13 @@ +27,35c27,28 +< #CXX := icpc -D__aligned__=ignored +< #CXX := mpiCC +< #CXX := mpic++ +< +< CC := gcc # necessary for zlib, for Intel use icc +< +< OPTIM := -O3 -Wall -march=native -mtune=native # for gcc +< #OPTIM := -O3 -Wall -xHost # for Intel compiler +< #OPTIM := -O3 -Wall -xHost -ipo # optional for Intel compiler +--- +> CC := gcc +> OPTIM := -O3 -Wall -march=native -mtune=native diff --git a/tunning/openlb/hmpi-bisheng-default-hybrid.patch b/tunning/openlb/hmpi-bisheng-default-hybrid.patch new file mode 100644 index 0000000000000000000000000000000000000000..767b154ecbfc7a264ee96212025dcdf0a562cf06 --- /dev/null +++ b/tunning/openlb/hmpi-bisheng-default-hybrid.patch @@ -0,0 +1,28 @@ +26,35c26,29 +< CXX := g++ +< #CXX := icpc -D__aligned__=ignored +< #CXX := mpiCC +< #CXX := mpic++ +< +< CC := gcc # necessary for zlib, for Intel use icc +< +< OPTIM := -O3 -Wall -march=native -mtune=native # for gcc +< #OPTIM := -O3 -Wall -xHost # for Intel compiler +< #OPTIM := -O3 -Wall -xHost -ipo # optional for Intel compiler +--- +> CXX := mpicxx +> CC := mpicc +> +> OPTIM := -O3 -Wall -mtune=native +51c45 +< LDFLAGS := +--- +> LDFLAGS := -Wl,-z,muldefs +53c47 +< PARALLEL_MODE := OFF +--- +> #PARALLEL_MODE := OFF +56c50 +< #PARALLEL_MODE := HYBRID +--- +> PARALLEL_MODE := HYBRID diff --git a/tunning/openlb/hmpi-bisheng-opt-hybrid.patch b/tunning/openlb/hmpi-bisheng-opt-hybrid.patch new file mode 100644 index 0000000000000000000000000000000000000000..015f2190251b961d4c96e215ff781a63374c5c6c --- /dev/null +++ b/tunning/openlb/hmpi-bisheng-opt-hybrid.patch @@ -0,0 +1,47 @@ +26,35c26,29 +< CXX := g++ +< #CXX := icpc -D__aligned__=ignored +< #CXX := mpiCC +< #CXX := mpic++ +< +< CC := gcc # necessary for zlib, for Intel use icc +< +< OPTIM := -O3 -Wall -march=native -mtune=native # for gcc +< #OPTIM := -O3 -Wall -xHost # for Intel compiler +< #OPTIM := -O3 -Wall -xHost -ipo # optional for Intel compiler +--- +> CXX := mpicxx +> CC := mpicc # necessary for zlib, for Intel use icc +> +> OPTIM := -Ofast -ffast-math -finline-functions -ffp-contract=fast -Wall -mtune=MTUNE_PARAM -march=MARCH_PARAM -mcpu=MTUNE_PARAM -I BISHENG_INCLUDE -I KML_INCLUDE -I OPENBLAS_INCLUDE +36a31 +> DEBUGNoWall := -g -DOLB_DEBUG +38a34,35 +> # for debug mode +> #CXXFLAGS += $(DEBUGNoWall) +40a38,41 +> # open pgo optimize +> PGOCollect := -fprofile-instr-generate +> PGOOptim := -fprofile-instr-use=code.profdata +> +51c52,57 +< LDFLAGS := +--- +> LDFLAGS := -fuse-ld=lld -flto -L OPENBLAS_LIB -lopenblas -L KML_LIB -lkm -lm -lkfft -L BISHENG_LIB -ljemalloc -Wl,-z,muldefs +> # for IPM analysis(static used) +> # LDFLAGS += -LIPM_LIB -lipm +> # for pgo optimize +> #LDFLAGS += $(PGOCollect) +> #LDFLAGS += $(PGOOptim) +53c59 +< PARALLEL_MODE := OFF +--- +> #PARALLEL_MODE := OFF +56c62 +< #PARALLEL_MODE := HYBRID +--- +> PARALLEL_MODE := HYBRID +64c70 +< FEATURES := +--- +> FEATURES := OPENBLAS diff --git a/tunning/openlb/hmpi-gcc-default-hybrid.patch b/tunning/openlb/hmpi-gcc-default-hybrid.patch new file mode 100644 index 0000000000000000000000000000000000000000..767b154ecbfc7a264ee96212025dcdf0a562cf06 --- /dev/null +++ b/tunning/openlb/hmpi-gcc-default-hybrid.patch @@ -0,0 +1,28 @@ +26,35c26,29 +< CXX := g++ +< #CXX := icpc -D__aligned__=ignored +< #CXX := mpiCC +< #CXX := mpic++ +< +< CC := gcc # necessary for zlib, for Intel use icc +< +< OPTIM := -O3 -Wall -march=native -mtune=native # for gcc +< #OPTIM := -O3 -Wall -xHost # for Intel compiler +< #OPTIM := -O3 -Wall -xHost -ipo # optional for Intel compiler +--- +> CXX := mpicxx +> CC := mpicc +> +> OPTIM := -O3 -Wall -mtune=native +51c45 +< LDFLAGS := +--- +> LDFLAGS := -Wl,-z,muldefs +53c47 +< PARALLEL_MODE := OFF +--- +> #PARALLEL_MODE := OFF +56c50 +< #PARALLEL_MODE := HYBRID +--- +> PARALLEL_MODE := HYBRID diff --git a/tunning/openlb/icc-default.patch b/tunning/openlb/icc-default.patch new file mode 100644 index 0000000000000000000000000000000000000000..1b315db41bb5a0f5ac11116ad34a7893f492616a --- /dev/null +++ b/tunning/openlb/icc-default.patch @@ -0,0 +1,19 @@ +26,35c26,28 +< CXX := g++ +< #CXX := icpc -D__aligned__=ignored +< #CXX := mpiCC +< #CXX := mpic++ +< +< CC := gcc # necessary for zlib, for Intel use icc +< +< OPTIM := -O3 -Wall -march=native -mtune=native # for gcc +< #OPTIM := -O3 -Wall -xHost # for Intel compiler +< #OPTIM := -O3 -Wall -xHost -ipo # optional for Intel compiler +--- +> CXX := icpc -D__aligned__=ignored +> CC := icc # necessary for zlib, for Intel use icc +> OPTIM := -O3 -Wall -xHost -ipo # optional for Intel compiler +59c52 +< OMPFLAGS := -fopenmp +--- +> OMPFLAGS := -qopenmp diff --git a/tunning/openlb/intelmpi-hybrid-default.patch b/tunning/openlb/intelmpi-hybrid-default.patch new file mode 100644 index 0000000000000000000000000000000000000000..091b6a19cb1982534c7b14081111043448c53ead --- /dev/null +++ b/tunning/openlb/intelmpi-hybrid-default.patch @@ -0,0 +1,33 @@ +26,35c26,28 +< CXX := g++ +< #CXX := icpc -D__aligned__=ignored +< #CXX := mpiCC +< #CXX := mpic++ +< +< CC := gcc # necessary for zlib, for Intel use icc +< +< OPTIM := -O3 -Wall -march=native -mtune=native # for gcc +< #OPTIM := -O3 -Wall -xHost # for Intel compiler +< #OPTIM := -O3 -Wall -xHost -ipo # optional for Intel compiler +--- +> CXX := mpiicpc +> CC := mpiicc # necessary for zlib, for Intel use icc +> OPTIM := -O3 -Wall -xHost # optional for Intel compiler +48,49c41,42 +< ARPRG := ar +< #ARPRG := xiar # mandatory for intel compiler +--- +> #ARPRG := ar +> ARPRG := xiar # mandatory for intel compiler +53c46 +< PARALLEL_MODE := OFF +--- +> #PARALLEL_MODE := OFF +56c49 +< #PARALLEL_MODE := HYBRID +--- +> PARALLEL_MODE := HYBRID +59c52 +< OMPFLAGS := -fopenmp +--- +> OMPFLAGS := -qopenmp diff --git a/tunning/openlb/ompi-hybrid-default.patch b/tunning/openlb/ompi-hybrid-default.patch new file mode 100644 index 0000000000000000000000000000000000000000..e058c3443a3dd732f20699fd005874f0176be91e --- /dev/null +++ b/tunning/openlb/ompi-hybrid-default.patch @@ -0,0 +1,31 @@ +26,35c26,28 +< CXX := g++ +< #CXX := icpc -D__aligned__=ignored +< #CXX := mpiCC +< #CXX := mpic++ +< +< CC := gcc # necessary for zlib, for Intel use icc +< +< OPTIM := -O3 -Wall -march=native -mtune=native # for gcc +< #OPTIM := -O3 -Wall -xHost # for Intel compiler +< #OPTIM := -O3 -Wall -xHost -ipo # optional for Intel compiler +--- +> CXX := mpicxx +> CC := mpicc # necessary for zlib, for Intel use icc +> OPTIM := -O3 -Wall -xHost # optional for Intel compiler +51c44 +< LDFLAGS := +--- +> LDFLAGS := -Wl,-z,muldefs +53c46 +< PARALLEL_MODE := OFF +--- +> #PARALLEL_MODE := OFF +56c49 +< #PARALLEL_MODE := HYBRID +--- +> PARALLEL_MODE := HYBRID +59c52 +< OMPFLAGS := -fopenmp +--- +> OMPFLAGS := -qopenmp