diff --git a/ACL_PyTorch/contrib/cv/face/ReId-MGN-master/README.md b/ACL_PyTorch/contrib/cv/face/ReId-MGN-master/README.md index ec150673e082db0c0e8db0c57a4c140ae8082d7d..63f1efc18d5fe9acecd4b7d8b6ce4739707342d9 100644 --- a/ACL_PyTorch/contrib/cv/face/ReId-MGN-master/README.md +++ b/ACL_PyTorch/contrib/cv/face/ReId-MGN-master/README.md @@ -44,20 +44,22 @@ ### 2.1 深度学习框架 ``` -CANN 5.0.1 -python = 3.7.5 -pytorch >= 1.5.0 -torchvision >= 0.6.0 -onnx >= 1.7.0 +CANN 5.0.4 +python == 3.7.5 +pytorch >= 1.8.1 +torchvision >= 0.8.1 +onnx >= 1.9.0 ``` ### 2.2 python第三方库 ``` -numpy == 1.20.3 +numpy == 1.19.2 Pillow == 8.2.0 opencv-python == 4.5.2.54 -albumentations == 0.5.2 +skl2onnx == 1.8.0 +scikit-learn == 0.24.1 +h5py == 3.3.0 ``` **说明:** > X86架构:pytorch,torchvision和onnx可以通过官方下载whl包安装,其它可以通过pip3.7 install 包名 安装 @@ -109,10 +111,11 @@ export LD_LIBRARY_PATH=${install_path}/atc/lib64:${install_path}/acllib/lib64:$L export ASCEND_OPP_PATH=${install_path}/opp ``` -2.使用atc将onnx模型转换为om模型文件,工具使用方法可以参考CANN 5.0.1 开发辅助工具指南 (推理) 01 +2.使用atc将onnx模型转换为om模型文件,工具使用方法可以参考CANN 5.0.4 开发辅助工具指南 (推理) 01 ``` -atc --framework=5 --model=./model/model_mkt1501.onnx --input_format=NCHW --input_shape="image:1,3,284,128" --output=mgn_mkt1501_bs1 --log=debug --soc_version=Ascend310 +atc --framework=5 --model=./model/model_mkt1501_bs1.onnx --input_format=NCHW --input_shape="image:1,3,384,128" --output=mgn_mkt1501_bs1 --log=debug --soc_version=Ascend${chip_name} ``` +${chip_name}可通过npu-smi info 查看(name属性)。 ## 4 数据集预处理 @@ -194,7 +197,7 @@ python3.7 ./postprocess_MGN.py --mode evaluate_om --data_path ./data/market1501 第一个参数为main函数运行模式,第二个为原始数据目录,第三个为模型所在目录。 查看输出结果: ``` -mAP: 0.9433 +mAP: 0.9423 ``` 经过对bs8的om测试,本模型batch8的精度没有差别,精度数据均如上。 @@ -216,79 +219,137 @@ MGN 0.9433 - **[npu性能数据](#71-npu性能数据)** ### 7.1 npu性能数据 -1.benchmark工具在整个数据集上推理获得性能数据 +1.benchmark工具在整个数据集上推理获得性能数据(优化在310p上产生,故对比数据采用310p的初始数据) batch1初始性能: ``` -[e2e] throughputRate: 84.5598, latency: 39829.8 -[data read] throughputRate: 473.148, moduleLatency: 2.1135 -[preprocess] throughputRate: 393.066, moduleLatency: 2.5441 -[infer] throughputRate: 88.2882, Interface throughputRate: 111.277, moduleLatency: 10.4851 -[post] throughputRate: 88.2756, moduleLatency: 11.3282 +[e2e] throughputRate: 214.165, latency: 15726.2 +[data read] throughputRate: 658.017, moduleLatency: 1.51972 +[preprocess] throughputRate: 600.498, moduleLatency: 1.66529 +[infer] throughputRate: 233.504, Interface throughputRate: 362.752, moduleLatency: 3.65513 +[post] throughputRate: 233.402, moduleLatency: 4.28445 ``` -batch1 310单卡吞吐率:111.277 * 4 = 445.108fps +batch1 310p单卡吞吐率:362.752 fps batch1优化后性能: -[e2e] throughputRate: 96.8416, latency: 34778.5 -[data read] throughputRate: 446.222, moduleLatency: 2.24104 -[preprocess] throughputRate: 422.735, moduleLatency: 2.36555 -[infer] throughputRate: 99.7554, Interface throughputRate: 128.15, moduleLatency: 9.25689 -[post] throughputRate: 99.7383, moduleLatency: 10.0262 +[e2e] throughputRate: 230.161, latency: 14633.2 +[data read] throughputRate: 259.293, moduleLatency: 3.85663 +[preprocess] throughputRate: 252.886, moduleLatency: 3.95434 +[infer] throughputRate: 255.317, Interface throughputRate: 640.829, moduleLatency: 2.14933 +[post] throughputRate: 255.057, moduleLatency: 3.92069 -batch1 310单卡吞吐率:128.15 * 4 = 512.6fps +batch1 310p单卡吞吐率:640.829 fps batch4性能: ``` -[e2e] throughputRate: 106.025, latency: 31766.1 -[data read] throughputRate: 500.107, moduleLatency: 1.99957 -[preprocess] throughputRate: 471.221, moduleLatency: 2.12215 -[infer] throughputRate: 108.979, Interface throughputRate: 153.797, moduleLatency: 8.03481 -[post] throughputRate: 27.2346, moduleLatency: 36.7181 +[e2e] throughputRate: 142.958, latency: 23559.4 +[data read] throughputRate: 151.628, moduleLatency: 6.59509 +[preprocess] throughputRate: 148.151, moduleLatency: 6.74988 +[infer] throughputRate: 148.239, Interface throughputRate: 1228.46, moduleLatency: 3.08466 +[post] throughputRate: 37.0346, moduleLatency: 27.0018 ``` -batch4 310单卡吞吐率:153.797 * 4 = 615.188fps +batch4 310p单卡吞吐率:1228.46 fps + +batch4优化后性能: +``` +[e2e] throughputRate: 163.526, latency: 20596.1 +[data read] throughputRate: 176.734, moduleLatency: 5.65823 +[preprocess] throughputRate: 173.679, moduleLatency: 5.75776 +[infer] throughputRate: 174.828, Interface throughputRate: 1453.49, moduleLatency: 1.61705 +[post] throughputRate: 43.5671, moduleLatency: 22.9531 +``` +batch4 310p单卡吞吐率:1453.49 fps + batch8性能: ``` -[e2e] throughputRate: 106.324, latency: 149665 -[data read] throughputRate: 121.058, moduleLatency: 8.2605 -[preprocess] throughputRate: 120.662, moduleLatency: 8.28762 -[infer] throughputRate: 107.422, Interface throughputRate: 149.297, moduleLatency: 8.18964 -[post] throughputRate: 13.4334, moduleLatency: 74.4414 +[e2e] throughputRate: 147.6, latency: 22818.4 +[data read] throughputRate: 160.112, moduleLatency: 6.24561 +[preprocess] throughputRate: 156.976, moduleLatency: 6.37038 +[infer] throughputRate: 158.972, Interface throughputRate: 1259.76, moduleLatency: 2.43329 +[post] throughputRate: 19.7538, moduleLatency: 50.6232 +``` +batch8 310p单卡吞吐率:1259.76 fps + +batch8优化后性能: +``` +[e2e] throughputRate: 163.348, latency: 20618.6 +[data read] throughputRate: 176.473, moduleLatency: 5.66657 +[preprocess] throughputRate: 173.037, moduleLatency: 5.77912 +[infer] throughputRate: 173.664, Interface throughputRate: 1519.17, moduleLatency: 1.517 +[post] throughputRate: 21.525, moduleLatency: 46.4577 ``` -batch8 310单卡吞吐率:149.297 * 4 = 597.188fps +batch8 310p单卡吞吐率:1519.17 fps + batch16初始性能: ``` -[e2e] throughputRate: 103.095, latency: 32668.8 -[data read] throughputRate: 138.066, moduleLatency: 7.24292 -[preprocess] throughputRate: 135.594, moduleLatency: 7.37498 -[infer] throughputRate: 107.451, Interface throughputRate: 147.867, moduleLatency: 8.19638 -[post] throughputRate: 6.72704, moduleLatency: 148.654 +[e2e] throughputRate: 156.114, latency: 21574 +[data read] throughputRate: 166.261, moduleLatency: 6.01463 +[preprocess] throughputRate: 163.089, moduleLatency: 6.13161 +[infer] throughputRate: 164.341, Interface throughputRate: 1164.02, moduleLatency: 1.75733 +[post] throughputRate: 10.2309, moduleLatency: 97.7429 ``` -batch16 310单卡吞吐率:147.867 * 4 = 591.468fps +batch16 310p单卡吞吐率:1164.02 fps batch16优化后性能: ``` -[e2e] throughputRate: 121.183, latency: 27792.7 -[data read] throughputRate: 138.209, moduleLatency: 7.23544 -[preprocess] throughputRate: 135.553, moduleLatency: 7.37721 -[infer] throughputRate: 125.617, Interface throughputRate: 184.74, moduleLatency: 6.86643 -[post] throughputRate: 7.86374, moduleLatency: 127.166 +[e2e] throughputRate: 136.404, latency: 24691.3 +[data read] throughputRate: 146.602, moduleLatency: 6.82119 +[preprocess] throughputRate: 143.933, moduleLatency: 6.94767 +[infer] throughputRate: 144.405, Interface throughputRate: 1388.27, moduleLatency: 2.77892 +[post] throughputRate: 8.92648, moduleLatency: 112.026 ``` -batch16 310单卡吞吐率:184.74 * 4 = 738.96fps +batch16 310p单卡吞吐率:1388.27 fps batch32性能: ``` -[e2e] throughputRate: 109.639, latency: 30719.1 -[data read] throughputRate: 144.87, moduleLatency: 6.90276 -[preprocess] throughputRate: 141.787, moduleLatency: 7.05281 -[infer] throughputRate: 112.348, Interface throughputRate: 159.033, moduleLatency: 7.70075 -[post] throughputRate: 3.53321, moduleLatency: 283.029 +[e2e] throughputRate: 156.208, latency: 21561 +[data read] throughputRate: 167.791, moduleLatency: 5.95981 +[preprocess] throughputRate: 164.575, moduleLatency: 6.07628 +[infer] throughputRate: 165.504, Interface throughputRate: 1083.25, moduleLatency: 1.79117 +[post] throughputRate: 5.17701, moduleLatency: 193.162 +``` +batch32 310p单卡吞吐率:1083.25 fps + +batch32优化后性能: ``` -batch32 310单卡吞吐率:159.033 * 4 = 636.132fps +[e2e] throughputRate: 131.874, latency: 25539.5 +[data read] throughputRate: 138.982, moduleLatency: 7.19518 +[preprocess] throughputRate: 136.698, moduleLatency: 7.31541 +[infer] throughputRate: 137.432, Interface throughputRate: 1367.63, moduleLatency: 2.00064 +[post] throughputRate: 4.31595, moduleLatency: 231.699 +``` +batch32 310p单卡吞吐率:1367.63 fps + + +batch64性能: +``` +[e2e] throughputRate: 170.586, latency: 19743.8 +[data read] throughputRate: 186.76, moduleLatency: 5.35446 +[preprocess] throughputRate: 182.521, moduleLatency: 5.47882 +[infer] throughputRate: 183.87, Interface throughputRate: 1100.35, moduleLatency: 1.74579 +[post] throughputRate: 2.87689, moduleLatency: 347.597 +``` +batch64 310p单卡吞吐率:1100.35 fps + +batch64优化后性能: +``` +[e2e] throughputRate: 171.391, latency: 19651 +[data read] throughputRate: 185.601, moduleLatency: 5.3879 +[preprocess] throughputRate: 181.592, moduleLatency: 5.50685 +[infer] throughputRate: 181.555, Interface throughputRate: 1364.84, moduleLatency: 1.55183 +[post] throughputRate: 2.83082, moduleLatency: 353.255 +``` +batch64 310p单卡吞吐率:1364.84 fps + ``` -MGN模型 未任何优化前310(单卡吞吐率) 优化transdata、transpose后310(单卡吞吐率) -bs1 445.108fps 512.6fps -bs16 591.468fps 738.96fps +MGN模型 未任何优化前310p(单卡吞吐率) 优化后310p(单卡吞吐率) +bs1 362.752 fps 640.829 fps +bs4 2156.73 fps 1453.49 fps +bs8 1281.93 fps 1519.17 fps +bs16 1167.81 fps 1388.27 fps +bs32 1096.42 fps 1367.63 fps +bs64 1107.77 fps 1364.84 fps ``` \ No newline at end of file