diff --git a/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/README.md b/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/README.md index b351e22c87306f4cc73abed7f68592f0d2a6a74d..f3620edbb26f65c3018b235609aadb4f821cd458 100644 --- a/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/README.md +++ b/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/README.md @@ -19,7 +19,7 @@ pip3.7 install -r requirements.txt git checkout v0.10.5 ``` -3. 下载网络权重文件 +4. 下载网络权重文件 下载路径:https://github.com/espnet/espnet/blob/master/egs/aishell/asr1/RESULTS.md @@ -27,7 +27,7 @@ pip3.7 install -r requirements.txt 解压,将对应的conf,data, exp文件夹置于espnet/egs/aishell/asr1 -4. 数据集下载: +5. 数据集下载: 在espnet/egs/aishell/asr1/文件夹下运行bash run.sh --stage -1 –stop_stage -1下载数据集 @@ -39,7 +39,7 @@ pip3.7 install -r requirements.txt 运行bash run.sh --stage 3 --stop_stage 3处理数据集 -5. 导出onnx,生成om离线文件 +6. 导出onnx,生成om离线文件 将export_onnx.diff放在espnet根目录下, @@ -51,8 +51,15 @@ pip3.7 install -r requirements.txt 生成encoder.onnx,运行python3.7.5 adaptespnet.py生成encoder_revise.onnx -6. 运行 bash encoder.sh生成离线om模型, encoder_262_1478.om +7. 运行encoder.sh生成离线om模型, encoder_262_1478.om + + ${chip_name}可通过`npu-smi info`指令查看 + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + + ``` + bash encoder.sh Ascend${chip_name} # Ascend310P3 + ``` ## 2 离线推理 @@ -95,6 +102,6 @@ export ASCEND_GLOBAL_LOG_LEVEL=3 -| 模型 | 官网pth精度 | 710离线推理精度 | gpu性能 | 710性能 | +| 模型 | 官网pth精度 | 310P离线推理精度 | gpu性能 | 310P性能 | | :--------------: | :---------: | :-------------: | :-----: | :-----: | | espnet_conformer | 5.1% | 5.4% | 261fps | 430fps | diff --git a/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/encoder.sh b/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/encoder.sh index 3bc01ebc3a62058062fb70843d72290c2571cb8c..3baf17f0367666b83e9201e1b63ed1b6232b2f5a 100644 --- a/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/encoder.sh +++ b/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/encoder.sh @@ -8,4 +8,4 @@ export LD_LIBRARY_PATH=${install_path}/acllib/lib64:$LD_LIBRARY_PATH atc --model=encoder_revise.onnx --framework=5 --output=encoder_262_1478 --input_format=ND \ --input_shape="input:-1,83" --log=error --optypelist_for_implmode="Sigmoid" --op_select_implmode=high_performance \ --dynamic_dims="262;326;390;454;518;582;646;710;774;838;902;966;1028;1284;1478" \ ---soc_version=Ascend710 +--soc_version=$1 diff --git a/ACL_PyTorch/built-in/audio/TDNN_for_Pytorch/ReadMe.md b/ACL_PyTorch/built-in/audio/TDNN_for_Pytorch/ReadMe.md index 4ea7fca8b3b280b392827ae443e4193f3a0f22cc..d3bbd4cd2bc4792fe50dcebf40f715146e5e665d 100644 --- a/ACL_PyTorch/built-in/audio/TDNN_for_Pytorch/ReadMe.md +++ b/ACL_PyTorch/built-in/audio/TDNN_for_Pytorch/ReadMe.md @@ -34,6 +34,11 @@ python3 tdnn_preprocess.py ``` ## 2 模型转换 + +${chip_name}可通过`npu-smi info`指令查看 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ```shell # 生成tdnn_bs64.onnx python3 tdnn_pth2onnx.py 64 @@ -41,9 +46,10 @@ python3 tdnn_pth2onnx.py 64 python3 -m onnxsim tdnn_bs64.onnx tdnn_bs64s.onnx python3 modify_onnx.py tdnn_bs64s.onnx # 生成om模型 -bash atc.sh tdnn_bs64s.onnx +bash atc.sh tdnn_bs64s.onnx Ascend${chip_name} # Ascend310P3 ``` + ## 3 离线推理 ```shell @@ -53,7 +59,7 @@ python3 tdnn_postprocess.py **评测结果:** 由于TensorRT不支持原模型,故只能对比修改后的模型性能。 -| 模型 | pth精度 | 710离线推理精度 | 基准性能 | 710性能 | +| 模型 | pth精度 | 310P离线推理精度 | 基准性能 | 310P性能 | | :------: | :------: | :------: | :------: | :------: | | TDNN bs64 | 99.93% | 99.93% | - | 2467fps | | TDNN修改 bs64 | - | - | 2345.179 fps | 3815.886fps | \ No newline at end of file diff --git a/ACL_PyTorch/built-in/audio/TDNN_for_Pytorch/atc.sh b/ACL_PyTorch/built-in/audio/TDNN_for_Pytorch/atc.sh index de48fc986b39b6e237b461bfeb2eb1cc84070d99..6bc0510d6a0cdbe5f115f43ae2cf9555ae339f74 100644 --- a/ACL_PyTorch/built-in/audio/TDNN_for_Pytorch/atc.sh +++ b/ACL_PyTorch/built-in/audio/TDNN_for_Pytorch/atc.sh @@ -6,8 +6,8 @@ bs=`echo ${model} | tr -cd "[0-9]" ` if [ `echo $model | grep "mod"` ] then - atc --model=$model --framework=5 --input_format=ND --input_shape="feats:${bs},-1,23;random:${bs},1500" --dynamic_dims='200;300;400;500;600;700;800;900;1000;1100;1200;1300;1400;1500;1600;1700;1800' --output=./tdnn_bs${bs}_mods --soc_version=Ascend710 --log=error + atc --model=$model --framework=5 --input_format=ND --input_shape="feats:${bs},-1,23;random:${bs},1500" --dynamic_dims='200;300;400;500;600;700;800;900;1000;1100;1200;1300;1400;1500;1600;1700;1800' --output=./tdnn_bs${bs}_mods --soc_version=$2 --log=error else - atc --model=$model --framework=5 --input_format=ND --input_shape="feats:${bs},-1,23" --dynamic_dims='200;300;400;500;600;700;800;900;1000;1100;1200;1300;1400;1500;1600;1700;1800' --output=./tdnn_bs${bs}s --soc_version=Ascend710 --log=error + atc --model=$model --framework=5 --input_format=ND --input_shape="feats:${bs},-1,23" --dynamic_dims='200;300;400;500;600;700;800;900;1000;1100;1200;1300;1400;1500;1600;1700;1800' --output=./tdnn_bs${bs}s --soc_version=$2 --log=error fi diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/README.md b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/README.md index 91c8165508b45c54b6a6a030422ac3c3af64f9c4..cf6a46044cdfd20d0e695854f01e7253ac726a24 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/README.md +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/README.md @@ -89,19 +89,21 @@ bash export_onnx.sh exp/conformer_u2/train.yaml exp/conformer_u2/final.pt om_gener工具修改onnx模型,生成decoder_final.onnx、encoder_revise.onnx、no_flash_encoder_revise.onnx,并运行相应脚本生成om模型,注意配置环境变量 + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` cd ${code_path} cp ${wenet_path}/examples/aishell/s0/onnx/* ${code_path}/ python3 adaptdecoder.py python3 adaptencoder.py python3 adaptnoflashencoder.py -bash encoder.sh -bash decoder.sh -bash no_flash_encoder.sh +bash encoder.sh Ascend${chip_name} +bash decoder.sh Ascend${chip_name} +bash no_flash_encoder.sh Ascend${chip_name} # Ascend310P3 ``` -若设备为710设备,修改sh脚本中的--soc_version=Ascend710即可 - ## 3 离线推理 ### 动态shape场景: @@ -193,7 +195,7 @@ bash run_attention_rescoring.sh --model_path ./decoder_final.om --json_path enc ### **评测结果:** -| 模型 | 官网pth精度 | 710/310离线推理精度 | gpu性能 | 710性能 | 310性能 | +| 模型 | 官网pth精度 | 310P/310离线推理精度 | gpu性能 | 310P性能 | 310性能 | | :---: | :----------------------------: | :-------------------------: | :-----: | :-----: | ------- | | wenet | GPU流式:5.94%, 非流式:4.64% | 流式:5.66%, 非流式:4.78% | | 7.69 | 11.6fps | diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/decoder.sh b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/decoder.sh index 9073e9d52de925775bd542b7c9a0fef17520b95d..6e29bfbbc8d838776d5f74d4e8661e3b347ba786 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/decoder.sh +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/decoder.sh @@ -5,6 +5,6 @@ export LD_LIBRARY_PATH=${install_path}/atc/lib64:$LD_LIBRARY_PATH export ASCEND_OPP_PATH=${install_path}/opp atc --model=decoder_final.onnx --framework=5 --output=decoder_final --input_format=ND \ - --input_shape_range="memory:[10,1~1500,256];memory_mask:[10,1,1~1500];ys_in_pad:[10,1~1500];ys_in_lens:[10];r_ys_in_pad:[10,1~1500]" --out_nodes="Add_488:0;Add_977:0" --log=error --soc_version=Ascend710 + --input_shape_range="memory:[10,1~1500,256];memory_mask:[10,1,1~1500];ys_in_pad:[10,1~1500];ys_in_lens:[10];r_ys_in_pad:[10,1~1500]" --out_nodes="Add_488:0;Add_977:0" --log=error --soc_version=$1 diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/encoder.sh b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/encoder.sh index b7bcad526cf51bdaca4b36e8d5e11453a2a53e5a..7c4e438d321888bbbe2ce38c3579de4184a56fbd 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/encoder.sh +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/encoder.sh @@ -5,5 +5,5 @@ export ASCEND_OPP_PATH=${install_path}/opp export PYTHONPATH=${install_path}/atc/python/site-packages:$PYTHONPATH export LD_LIBRARY_PATH=${install_path}/acllib/lib64:$LD_LIBRARY_PATH -atc --model=encoder_revise.onnx --framework=5 --output=encoder_revise --input_format=ND --input_shape_range="input:[1,1~1500,80];offset:[1];subsampling_cache:[1,1~1500,256];elayers_cache:[12,1,1~1500,256];conformer_cnn_cache:[12,1,256,7]" --log=error --soc_version=Ascend710 +atc --model=encoder_revise.onnx --framework=5 --output=encoder_revise --input_format=ND --input_shape_range="input:[1,1~1500,80];offset:[1];subsampling_cache:[1,1~1500,256];elayers_cache:[12,1,1~1500,256];conformer_cnn_cache:[12,1,256,7]" --log=error --soc_version=$1 diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/no_flash_encoder.sh b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/no_flash_encoder.sh index 699754af2f15ccb2ef034ef6f0f4f6b90ef33cad..5151f27f8e11a29cab563881ce1b3eb4fb8f8748 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/no_flash_encoder.sh +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/no_flash_encoder.sh @@ -3,5 +3,5 @@ export PATH=/usr/local/python3.7.5/bin:${install_path}/atc/ccec_compiler/bin:${i export PYTHONPATH=${install_path}/atc/python/site-packages:$PYTHONPATH export LD_LIBRARY_PATH=${install_path}/atc/lib64:$LD_LIBRARY_PATH export ASCEND_OPP_PATH=${install_path}/opp -atc --model=no_flash_encoder_revise.onnx --framework=5 --output=no_flash_encoder_revise --input_format=ND --input_shape_range="xs_input:[1,1~1500,80];xs_input_lens:[1]" --log=error --soc_version=Ascend710 +atc --model=no_flash_encoder_revise.onnx --framework=5 --output=no_flash_encoder_revise --input_format=ND --input_shape_range="xs_input:[1,1~1500,80];xs_input_lens:[1]" --log=error --soc_version=$1 diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_decoder.sh b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_decoder.sh index 3f6c53b365ea6359ee3458a6e2426856bbb3bae1..7fa996d48ce66462c46dcf1de7f0ae4c775c947a 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_decoder.sh +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_decoder.sh @@ -15,5 +15,5 @@ atc --model=decoder_final.onnx --framework=5 --output=decoder_fendang --input_fo 384,384,16,16;384,384,17,17;384,384,18,18;384,384,19,19;384,384,20,20;384,384,21,21;384,384,22,22;384,384,23,23;\ 384,384,24,24;384,384,25,25;384,384,26,26;384,384,27,27;384,384,28,28;384,384,29,29;384,384,30,30;384,384,31,31;\ 384,384,32,32;384,384,33,33;384,384,34,34;384,384,35,35;384,384,36,36;384,384,37,37;384,384,38,38;384,384,39,39;384,384,40,40;384,384,41,41;" \ ---soc_version=Ascend710 +--soc_version=$1 diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_encoder.sh b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_encoder.sh index 9fd70f05556715dddbb9352e942d7a97ae353ab1..90dc1532146d44a0f03b8533aa6f8eb98133e500 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_encoder.sh +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_encoder.sh @@ -7,4 +7,4 @@ export ASCEND_OPP_PATH=${install_path}/opp atc --model=no_flash_encoder_revise.onnx --framework=5 --output=encoder_fendang_262_1478_static --input_format=ND \ --input_shape="xs_input:1,-1,80;xs_input_lens:1" --log=error \ --dynamic_dims="262;326;390;454;518;582;646;710;774;838;902;966;1028;1284;1478" \ ---soc_version=Ascend710 +--soc_version=$1 diff --git a/ACL_PyTorch/built-in/cv/3DUnet_for_PyTorch/README.md b/ACL_PyTorch/built-in/cv/3DUnet_for_PyTorch/README.md index eb90f8469be621ecc39e4f7ad70540710247d5b0..fc95a94fc3d56a2b28abe6db982ebd09dc448ccb 100644 --- a/ACL_PyTorch/built-in/cv/3DUnet_for_PyTorch/README.md +++ b/ACL_PyTorch/built-in/cv/3DUnet_for_PyTorch/README.md @@ -117,7 +117,7 @@ python3 preprocess.py **评测结果:** -| 模型 | 官网pth精度 | 710/310离线推理精度 | gpu性能 | 710性能 | 310性能 | +| 模型 | 官网pth精度 | 310P/310离线推理精度 | gpu性能 | 310P性能 | 310性能 | | :--------: | :---------------: | :-----------------: | :-----: | :---------------------: | ------- | | 3DUNet bs1 | mean tumor:0.8530 | mean tumor:0.8530 | 0.5fps | ~~4.4fps~~
6.26fps | 0.78fps | diff --git a/ACL_PyTorch/built-in/cv/CRAFT_for_Pytorch/README.md b/ACL_PyTorch/built-in/cv/CRAFT_for_Pytorch/README.md index ea0d36c5d7fd4d9bebec88bf5d866bd41bc6800e..89d3cfd4c65c1413ac13d439fe13b351984b0984 100644 --- a/ACL_PyTorch/built-in/cv/CRAFT_for_Pytorch/README.md +++ b/ACL_PyTorch/built-in/cv/CRAFT_for_Pytorch/README.md @@ -44,9 +44,13 @@ pip3.7 install -r requirements.txt 7. 运行 bash craft_atc.sh生成离线om模型, craft.om + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` cp craft_atc.sh CRAFT-pytorch/ - bash craft_atc.sh + bash craft_atc.sh Ascend${chip_name} # Ascend310P3 ``` @@ -79,7 +83,7 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh ``` -| 模型 | 官网pth精度 | 710离线推理精度 | gpu性能 | 710性能 | +| 模型 | 官网pth精度 | 310P离线推理精度 | gpu性能 | 310P性能 | | :-----------: | :---------: | :-------------: | :-----: | :-----: | | CRAFT_General | | | | 148fps | diff --git a/ACL_PyTorch/built-in/cv/CRAFT_for_Pytorch/craft_atc.sh b/ACL_PyTorch/built-in/cv/CRAFT_for_Pytorch/craft_atc.sh index 1656330ecc71e91e402a7a2a457c0573ff2fe35d..081a1f7dc615e529ebb9368e6627609eba82c93f 100644 --- a/ACL_PyTorch/built-in/cv/CRAFT_for_Pytorch/craft_atc.sh +++ b/ACL_PyTorch/built-in/cv/CRAFT_for_Pytorch/craft_atc.sh @@ -1,3 +1,3 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh atc --model=craft.onnx --framework=5 --output=craft --input_format=NCHW \ ---input_shape="input:1,3,640,640" --log=error --soc_version=Ascend710 +--input_shape="input:1,3,640,640" --log=error --soc_version=$1 diff --git a/ACL_PyTorch/built-in/cv/CascadeRCNN-DCN-101_for_Pytorch/README.md b/ACL_PyTorch/built-in/cv/CascadeRCNN-DCN-101_for_Pytorch/README.md index a029751b237a8d1ae1807504d56ce2818ec7cdd7..52dbc0b0ee7f24e02e48a35ec5ee41670312a3ba 100644 --- a/ACL_PyTorch/built-in/cv/CascadeRCNN-DCN-101_for_Pytorch/README.md +++ b/ACL_PyTorch/built-in/cv/CascadeRCNN-DCN-101_for_Pytorch/README.md @@ -46,9 +46,14 @@ cp ./pytorch_code_change/deform_conv.py /root/anaconda3/envs/dcn/lib/python3.7/s 7. 运行atc.sh脚本,完成onnx到om模型的转换,注意输出节点可能需要根据实际的onnx修改,若设备为310,则需要修改atc.sh脚本中的--soc_version为Ascend310 + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` - bash atc.sh cascadeRCNNDCN.onnx cascadeRCNNDCN + bash atc.sh cascadeRCNNDCN.onnx cascadeRCNNDCN Ascend${chip_name} # Ascend310P3 ``` + @@ -89,7 +94,7 @@ python3.7 mmdetection_coco_postprocess.py --bin_data_path=result/dumpOutput_devi **评测结果:** -| 模型 | 官网pth精度 | 310离线推理精度 | gpu性能 | 310性能/710性能 | +| 模型 | 官网pth精度 | 310离线推理精度 | gpu性能 | 310性能/310P性能 | | :-----------------: | :---------: | :-------------: | :-----: | :---------------------: | | CascadedRCNNDCN bs1 | map:0.45 | map:0.45 | 4.6fps | 1.9258fps/fps/2.9534fps | diff --git a/ACL_PyTorch/built-in/cv/CascadeRCNN-DCN-101_for_Pytorch/atc.sh b/ACL_PyTorch/built-in/cv/CascadeRCNN-DCN-101_for_Pytorch/atc.sh index 385e645696766f625bb356166a80aec1a17043d5..f9f06b54354daf6f246e65b4449bafb201dd24c5 100644 --- a/ACL_PyTorch/built-in/cv/CascadeRCNN-DCN-101_for_Pytorch/atc.sh +++ b/ACL_PyTorch/built-in/cv/CascadeRCNN-DCN-101_for_Pytorch/atc.sh @@ -5,4 +5,4 @@ export ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp # export DUMP_GE_GRAPH=2 export ASCEND_SLOG_PRINT_TO_STDOUT=1 -/usr/local/Ascend/ascend-toolkit/latest/atc/bin/atc --model=$1 --framework=5 --output=$2 --input_format=NCHW --input_shape="input:1,3,1216,1216" --log=info --soc_version=Ascend710 --out_nodes="Concat_1036:0;Reshape_1038:0" \ No newline at end of file +/usr/local/Ascend/ascend-toolkit/latest/atc/bin/atc --model=$1 --framework=5 --output=$2 --input_format=NCHW --input_shape="input:1,3,1216,1216" --log=info --soc_version=$3 --out_nodes="Concat_1036:0;Reshape_1038:0" \ No newline at end of file diff --git a/ACL_PyTorch/built-in/cv/DPN131_for_Pytorch/README.md b/ACL_PyTorch/built-in/cv/DPN131_for_Pytorch/README.md index d936eba64214db4ddc2228a2995a3afe6f6ed9ae..c446cc25bb625c67dc25cf2a2b809418febec303 100644 --- a/ACL_PyTorch/built-in/cv/DPN131_for_Pytorch/README.md +++ b/ACL_PyTorch/built-in/cv/DPN131_for_Pytorch/README.md @@ -21,9 +21,10 @@ - [6.2 开源TopN精度](#62-开源topn精度) - [6.3 精度对比](#63-精度对比) - [7 性能对比](#7-性能对比) - - [7.1 npu性能数据](#71-npu性能数据) + - [7.1 npu性能数据-Ascend310性能数据](#71-npu性能数据-ascend310性能数据) - [7.2 T4性能数据](#72-t4性能数据) - [7.3 性能对比](#73-性能对比) + - [7.4 npu性能数据-Ascend310P性能数据](#74-npu性能数据-ascend310p性能数据) @@ -100,18 +101,19 @@ python3.7 dpn131_pth2onnx.py ./dpn131-7af84be88.pth dpn131.onnx ### 3.2 onnx转om模型 -1.设置环境变量 -``` -source env.sh -``` -2.使用atc将onnx模型转换为om模型文件 -``` -##Ascend310 -atc --framework=5 --model=./dpn131.onnx --output=dpn131_bs1 --input_format=NCHW --input_shape="image:1,3,224,224" --log=debug --soc_version=Ascend310 -##Ascend710 -atc --framework=5 --model=./dpn131.onnx --output=dpn131_bs1 --input_format=NCHW --input_shape="image:1,3,224,224" --log=debug --soc_version=Ascend710 +1. 设置环境变量 + ``` + source env.sh + ``` +2. 使用atc将onnx模型转换为om模型文件 + + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` + atc --framework=5 --model=./dpn131.onnx --output=dpn131_bs1 --input_format=NCHW --input_shape="image:1,3,224,224" --log=debug --soc_version=Ascend${chip_name} # Ascend310P3 + ``` -``` ## 4 数据集预处理 @@ -147,7 +149,7 @@ python3.7 gen_dataset_info.py bin ./prep_dataset ./dpn131_prep_bin.info 224 224 ### 5.1 benchmark工具概述 -benchmark工具为华为自研的模型推理工具,支持多种模型的离线推理,能够迅速统计出模型在Ascend310/Ascend710上的性能,支持真实数据和纯推理两种模式,配合后处理脚本,可以实现诸多模型的端到端过程。 +benchmark工具为华为自研的模型推理工具,支持多种模型的离线推理,能够迅速统计出模型在Ascend310/Ascend310P上的性能,支持真实数据和纯推理两种模式,配合后处理脚本,可以实现诸多模型的端到端过程。 ### 5.2 离线推理 1.设置环境变量 ``` @@ -196,7 +198,7 @@ dpn131 79.432 94.574 - **[310npu性能数据](#71-Ascend310性能数据)** - **[T4性能数据](#72-T4性能数据)** - **[性能对比](#73-性能对比)** -- **[710npu性能数据](#74-Ascend710性能数据)** +- **[310Pnpu性能数据](#74-Ascend310P性能数据)** ### 7.1 npu性能数据-Ascend310性能数据 benchmark工具在整个数据集上推理时也会统计性能数据,但是推理整个数据集较慢,如果这么测性能那么整个推理期间需要确保独占device,使用npu-smi info可以查看device是否空闲。也可以使用benchmark纯推理功能测得性能数据,但是由于随机数不能模拟数据分布,纯推理功能测的有些模型性能数据可能不太准,benchmark纯推理功能测性能仅为快速获取大概的性能数据以便调试优化使用,可初步确认benchmark工具在整个数据集上推理时由于device也被其它推理任务使用了导致的性能不准的问题。模型的性能以使用benchmark工具在整个数据集上推理得到bs1与bs16的性能数据为准,对于使用benchmark工具测试的batch4,8,32的性能数据在README.md中如下作记录即可。 1.benchmark工具在整个数据集上推理获得性能数据 @@ -289,8 +291,8 @@ batch1:37.3972x4=149.5888 < 1000/(5.51384/1) batch16:39.2882x4=157.1528 < 1000/(54.2503/16) 310单个device的吞吐率乘4即单卡吞吐率比T4单卡的吞吐率小,310性能低于T4性能,性能不达标。 对于batch1与batch16,310性能均低于T4性能,该模型放在Research/cv/classification目录下。 -### 7.4 npu性能数据-Ascend710性能数据 -详细测试方法与310相同-下面仅简单记录fp16各个batch的性能数据作为参考,需特别说明的是710的数据就是benchmark工具输出的 Interface throughputRate 的值,不需要任何计算。 +### 7.4 npu性能数据-Ascend310P性能数据 +详细测试方法与310相同-下面仅简单记录fp16各个batch的性能数据作为参考,需特别说明的是310P的数据就是benchmark工具输出的 Interface throughputRate 的值,不需要任何计算。 ``` batch1:158.108 batch4:371.75 diff --git a/ACL_PyTorch/built-in/cv/DPN131_for_Pytorch/atc.sh b/ACL_PyTorch/built-in/cv/DPN131_for_Pytorch/atc.sh index e2b89f1c77aad184e64ba35df3634d9944c94a74..c2f4dad1f2b29a237661646353e450ee878e9a31 100644 --- a/ACL_PyTorch/built-in/cv/DPN131_for_Pytorch/atc.sh +++ b/ACL_PyTorch/built-in/cv/DPN131_for_Pytorch/atc.sh @@ -5,4 +5,4 @@ export PYTHONPATH=${install_path}/atc/python/site-packages:$PYTHONPATH export ASCEND_OPP_PATH=${install_path}/opp export ASCEND_AICPU_PATH=${install_path} -atc --model=./dpn131.onnx --framework=5 --output=dpn131_fp16_bs8 --input_format=NCHW --input_shape="image:8,3,224,224" --log=error --soc_version=Ascend710 +atc --model=./dpn131.onnx --framework=5 --output=dpn131_fp16_bs8 --input_format=NCHW --input_shape="image:8,3,224,224" --log=error --soc_version=$1 diff --git a/ACL_PyTorch/built-in/cv/EfficientNet_b7_for_Pytorch/ReadMe.md b/ACL_PyTorch/built-in/cv/EfficientNet_b7_for_Pytorch/ReadMe.md index f4576f18684a26d5efaea9f0ee837c64cd758ad2..6549743cc76f8b68b57dc7d3d5ccb0b5fcae2933 100644 --- a/ACL_PyTorch/built-in/cv/EfficientNet_b7_for_Pytorch/ReadMe.md +++ b/ACL_PyTorch/built-in/cv/EfficientNet_b7_for_Pytorch/ReadMe.md @@ -49,19 +49,19 @@ pip3.7 install efficientnet_pytorch==0.7.1 2. 将onnx转为om模型 - 1. 安装mindx-toolbox工具 - - 前往https://support.huawei.com/enterprise/zh/ascend-computing/mindx-pid-252501207/software,根据自己的环境下载对应的mindx-toolbox安装包 - - 2. 配置环境变量 + 1. 配置环境变量 ``` source /usr/local/Ascend/ascend-toolkit/set_env.sh ``` - 3. 使用ATC工具将onnx模型转om模型 + 2. 使用ATC工具将onnx模型转om模型 + + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) ``` - bash atc.sh efficientnet_b7_dym_600_sim.onnx efficientnet_b7_32_600_sim + bash atc.sh efficientnet_b7_dym_600_sim.onnx efficientnet_b7_32_600_sim Ascend${chip_name} # Asecend310P3 ``` 运行成功后生成efficientnet_b7_32_600_sim.om模型文件 diff --git a/ACL_PyTorch/built-in/cv/EfficientNet_b7_for_Pytorch/atc.sh b/ACL_PyTorch/built-in/cv/EfficientNet_b7_for_Pytorch/atc.sh index 0b3ad67630c9db5d121a5817119369ce037c5cf6..f0df7533433d0e8bff579519d7bcabd85f4c7826 100644 --- a/ACL_PyTorch/built-in/cv/EfficientNet_b7_for_Pytorch/atc.sh +++ b/ACL_PyTorch/built-in/cv/EfficientNet_b7_for_Pytorch/atc.sh @@ -1,9 +1,7 @@ -source /usr/local/Ascend/toolbox/set_env.sh -soc_version=`ascend-dmi -i -dt | grep -m 1 'Chip Name' | awk -F ': ' '{print $2}' | sed 's/ //g'` - -source /usr/local/Ascend/ascend-toolkit/set_env.sh onnx_model=$1 output_model=$2 +soc_version=$3 + atc --model=$onnx_model \ --framework=5 \ --input_format=NCHW \ diff --git a/ACL_PyTorch/built-in/cv/Flownet2_for_Pytorch/README.md b/ACL_PyTorch/built-in/cv/Flownet2_for_Pytorch/README.md index b7f1b552b0689a12e8b8a6414ee5e31f18c4a3ba..0118d1a9d471ce42201997449b0a86a1820021e4 100644 --- a/ACL_PyTorch/built-in/cv/Flownet2_for_Pytorch/README.md +++ b/ACL_PyTorch/built-in/cv/Flownet2_for_Pytorch/README.md @@ -38,16 +38,19 @@ cd .. 1. 模型转换 -``` -# convert onnx -python3.7 pth2onnx.py --batch_size 1 --input_path ./FlowNet2_checkpoint.pth.tar --out_path ./models/flownet2_bs1.onnx --batch_size 1 -# optimize onnx -python3.7 -m onnxsim ./models/flownet2_bs1.onnx ./models/flownet2_bs1_sim.onnx -python3.7 fix_onnx.py ./models/flownet2_bs1_sim.onnx ./models/flownet2_bs1_sim_fix.onnx -# 310需要采用混合精度,否则有精度问题;710上采用FP16精度正常 -atc --framework=5 --model=models/flownet2_bs1_sim_fix.onnx --output=models/flownet2_bs1_sim_fix --input_format=NCHW --input_shape="x1:1,3,448,1024;x2:1,3,448,1024" --log=debug --soc_version=Ascend310 --precision_mode=allow_mix_precision -atc --framework=5 --model=models/flownet2_bs1_sim_fix.onnx --output=models/flownet2_bs1_sim_fix_710 --input_format=NCHW --input_shape="x1:1,3,448,1024;x2:1,3,448,1024" --log=debug --soc_version=Ascend710 -``` + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` + # convert onnx + python3.7 pth2onnx.py --batch_size 1 --input_path ./FlowNet2_checkpoint.pth.tar --out_path ./models/flownet2_bs1.onnx --batch_size 1 + # optimize onnx + python3.7 -m onnxsim ./models/flownet2_bs1.onnx ./models/flownet2_bs1_sim.onnx + python3.7 fix_onnx.py ./models/flownet2_bs1_sim.onnx ./models/flownet2_bs1_sim_fix.onnx + # 310需要采用混合精度,否则有精度问题;310P上采用FP16精度正常 + atc --framework=5 --model=models/flownet2_bs1_sim_fix.onnx --output=models/flownet2_bs1_sim_fix --input_format=NCHW --input_shape="x1:1,3,448,1024;x2:1,3,448,1024" --log=debug --soc_version=Ascend310 --precision_mode=allow_mix_precision + atc --framework=5 --model=models/flownet2_bs1_sim_fix.onnx --output=models/flownet2_bs1_sim_fix_310P --input_format=NCHW --input_shape="x1:1,3,448,1024;x2:1,3,448,1024" --log=debug --soc_version=Ascend${chip_name} # Ascend310P3 + ``` 2. 数据预处理 @@ -73,6 +76,6 @@ python3.7 evaluate.py --gt_path ./data_preprocessed_bs1/gt --output_path ./outpu bs16占据内存过大,无法导出 -| 模型 | pth精度 | 310离线推理精度 | 710离线推理精度 | 基准性能 | 310性能 | 710性能 | +| 模型 | pth精度 | 310离线推理精度 | 310P离线推理精度 | 基准性能 | 310性能 | 310P性能 | | :----------: | :----------------: | :----------------: | :-----------------: | :-------: | :-----: | :------: | | flownet2 bs1 | Average EPE: 2.150 | Average EPE: 2.184 | Average EPE: 2.1578 | 11.65fps | 3.07fps | 16.81fps | \ No newline at end of file diff --git a/ACL_PyTorch/built-in/cv/Flownet2_for_Pytorch/test/pth2om.sh b/ACL_PyTorch/built-in/cv/Flownet2_for_Pytorch/test/pth2om.sh index fb2cdc3b6d0a850060d9c8e446e848d7172659e9..9545b424dacbcb49db96712361485682c772a416 100644 --- a/ACL_PyTorch/built-in/cv/Flownet2_for_Pytorch/test/pth2om.sh +++ b/ACL_PyTorch/built-in/cv/Flownet2_for_Pytorch/test/pth2om.sh @@ -8,6 +8,6 @@ python3.7 pth2onnx.py --batch_size 1 --input_path ./FlowNet2_checkpoint.pth.tar # optimize onnx python3.7 -m onnxsim ./models/flownet2_bs1.onnx ./models/flownet2_bs1_sim.onnx python3.7 fix_onnx.py ./models/flownet2_bs1_sim.onnx ./models/flownet2_bs1_sim_fix.onnx -# 310需要采用混合精度,否则有精度问题;710上采用FP16精度正常 +# 310需要采用混合精度,否则有精度问题;310P上采用FP16精度正常 # atc --framework=5 --model=models/flownet2_bs1_sim_fix.onnx --output=models/flownet2_bs1_sim_fix --input_format=NCHW --input_shape="x1:1,3,448,1024;x2:1,3,448,1024" --log=debug --soc_version=Ascend310 --precision_mode=allow_mix_precision -atc --framework=5 --model=models/flownet2_bs1_sim_fix.onnx --output=models/flownet2_bs1_sim_fix --input_format=NCHW --input_shape="x1:1,3,448,1024;x2:1,3,448,1024" --log=debug --soc_version=Ascend710 +atc --framework=5 --model=models/flownet2_bs1_sim_fix.onnx --output=models/flownet2_bs1_sim_fix --input_format=NCHW --input_shape="x1:1,3,448,1024;x2:1,3,448,1024" --log=debug --soc_version=$1 diff --git a/ACL_PyTorch/built-in/cv/InceptionV3_for_Pytorch/ReadMe.md b/ACL_PyTorch/built-in/cv/InceptionV3_for_Pytorch/ReadMe.md index ec24e9ba99349208babf44d30998a10b12777715..3e873f528243b733245222631a91dfc48b4cff51 100644 --- a/ACL_PyTorch/built-in/cv/InceptionV3_for_Pytorch/ReadMe.md +++ b/ACL_PyTorch/built-in/cv/InceptionV3_for_Pytorch/ReadMe.md @@ -46,6 +46,14 @@ 3. 运行inceptionv3_atc.sh脚本转换om模型 + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + + ``` + bash inceptionv3_atc.sh Ascend${chip_name} # Ascend310P3 + ``` + 4. 用imagenet_torch_preprocess.py脚本处理数据集 ``` diff --git a/ACL_PyTorch/built-in/cv/InceptionV3_for_Pytorch/inceptionv3_atc.sh b/ACL_PyTorch/built-in/cv/InceptionV3_for_Pytorch/inceptionv3_atc.sh index 16fdfb1a414ab16dbca501a009f84df46df2f1dd..ec02ff0ccdb17ab8e31c425af3d0b74fdbeccef2 100644 --- a/ACL_PyTorch/built-in/cv/InceptionV3_for_Pytorch/inceptionv3_atc.sh +++ b/ACL_PyTorch/built-in/cv/InceptionV3_for_Pytorch/inceptionv3_atc.sh @@ -8,6 +8,5 @@ export ASCEND_OPP_PATH=${install_path}/opp #export SLOG_PRINT_TO_STDOUT=0 #print log to tenminal #export DUMP_GE_GRAPH=0 #dump ge -atc --model=./inceptionv3.onnx --framework=5 --output=inceptionv3_bs8 --input_format=NCHW --input_shape="actual_input_1:8,3,299,299" --log=info --soc_version=Ascend310 --insert_op_conf=aipp_inceptionv3_pth.config +atc --model=./inceptionv3.onnx --framework=5 --output=inceptionv3_bs8 --input_format=NCHW --input_shape="actual_input_1:8,3,299,299" --log=info --soc_version=$1 --insert_op_conf=aipp_inceptionv3_pth.config -#atc --model=./inceptionv3.onnx --framework=5 --output=inceptionv3_bs8 --input_format=NCHW --input_shape="actual_input_1:8,3,299,299" --log=info --soc_version=Ascend710 --insert_op_conf=aipp_inceptionv3_pth.config diff --git a/ACL_PyTorch/built-in/cv/Pelee_for_Pytorch/ReadMe.md b/ACL_PyTorch/built-in/cv/Pelee_for_Pytorch/ReadMe.md index ebdd1de32ae2a32afb964dd84702c805b042ac33..abe5ec0098416ae1f9c2f5bb16f68101365d4697 100644 --- a/ACL_PyTorch/built-in/cv/Pelee_for_Pytorch/ReadMe.md +++ b/ACL_PyTorch/built-in/cv/Pelee_for_Pytorch/ReadMe.md @@ -71,17 +71,17 @@ bash make.sh - [x] 用onnx-simplifier简化模型 -```shell -python3.7.5 -m onnxsim pelee_dynamic_bs.onnx pelee_dynamic_bs_sim.onnx --input-shape 1,3,304,304 -``` + ```shell + python3.7.5 -m onnxsim pelee_dynamic_bs.onnx pelee_dynamic_bs_sim.onnx --input-shape 1,3,304,304 + ``` - [x] 改图优化 ​ 修改softmax节点,在softmax前插入transpose -```shell -python3.7.5 softmax.py pelee_dynamic_bs_sim.onnx pelee_dynamic_bs_modify.onnx -``` + ```shell + python3.7.5 softmax.py pelee_dynamic_bs_sim.onnx pelee_dynamic_bs_modify.onnx + ``` ![img](file:///C:\Users\C00444~1\AppData\Local\Temp\ksohtml124560\wps2.jpg) @@ -91,24 +91,29 @@ python3.7.5 softmax.py pelee_dynamic_bs_sim.onnx pelee_dynamic_bs_modify.onnx + - [x] 使用ATC工具将ONNX模型转OM模型。 a. 修改atc.sh脚本,通过ATC工具使用脚本完成转换,具体的脚本示例如下: -```shell -# 配置环境变量 -export install_path=/usr/local/Ascend/ascend-toolkit/latest -export PATH=/usr/local/python3.7.5/bin:${install_path}/atc/ccec_compiler/bin:${install_path}/atc/bin:$PATH -export PYTHONPATH=${install_path}/atc/python/site-packages:$PYTHONPATH -export LD_LIBRARY_PATH=${install_path}/atc/lib64:${install_path}/acllib/lib64:$LD_LIBRARY_PATH -export ASCEND_OPP_PATH=${install_path}/opp - -# 使用二进制输入时,执行如下命令。不开启aipp,用于精度测试 -${install_path}/atc/bin/atc --model=./pelee_dynamic_bs_modify.onnx --framework=5 --output=pelee_bs1 --input_format=NCHW --input_shape="image:1,3,304,304" --log=info --soc_version=Ascend710 --enable_small_channel=1 - -# 使用二进制输入时,执行如下命令。开启aipp,用于性能测试 -${install_path}/atc/bin/atc --model=./pelee_dynamic_bs_modify.onnx --framework=5 --output=pelee_bs32 --input_format=NCHW --input_shape="image:32,3,304,304" --log=info --soc_version=Ascend710 --insert_op_conf=aipp.config --enable_small_channel=1 -``` + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + + ```shell + # 配置环境变量 + export install_path=/usr/local/Ascend/ascend-toolkit/latest + export PATH=/usr/local/python3.7.5/bin:${install_path}/atc/ccec_compiler/bin:${install_path}/atc/bin:$PATH + export PYTHONPATH=${install_path}/atc/python/site-packages:$PYTHONPATH + export LD_LIBRARY_PATH=${install_path}/atc/lib64:${install_path}/acllib/lib64:$LD_LIBRARY_PATH + export ASCEND_OPP_PATH=${install_path}/opp + + # 使用二进制输入时,执行如下命令。不开启aipp,用于精度测试 + ${install_path}/atc/bin/atc --model=./pelee_dynamic_bs_modify.onnx --framework=5 --output=pelee_bs1 --input_format=NCHW --input_shape="image:1,3,304,304" --log=info --soc_version=Ascend${chip_name} --enable_small_channel=1 # Ascend310P3 + + # 使用二进制输入时,执行如下命令。开启aipp,用于性能测试 + ${install_path}/atc/bin/atc --model=./pelee_dynamic_bs_modify.onnx --framework=5 --output=pelee_bs32 --input_format=NCHW --input_shape="image:32,3,304,304" --log=info --soc_version=Ascend${chip_name} --insert_op_conf=aipp.config --enable_small_channel=1 # Ascend310P3 + ``` diff --git a/ACL_PyTorch/built-in/cv/Pelee_for_Pytorch/atc.sh b/ACL_PyTorch/built-in/cv/Pelee_for_Pytorch/atc.sh index 486b05cb0c4eb33d0d72667967ab4786ffc45763..63fb08b3a228a4a6df6e066ecd42b2fbbeac1a78 100644 --- a/ACL_PyTorch/built-in/cv/Pelee_for_Pytorch/atc.sh +++ b/ACL_PyTorch/built-in/cv/Pelee_for_Pytorch/atc.sh @@ -6,7 +6,7 @@ export LD_LIBRARY_PATH=${install_path}/atc/lib64:${install_path}/acllib/lib64:$L export ASCEND_OPP_PATH=${install_path}/opp # 使用二进制输入时,执行如下命令。不开启aipp,用于精度测试 -${install_path}/atc/bin/atc --model=./pelee_dynamic_bs_modify.onnx --framework=5 --output=pelee_bs1 --input_format=NCHW --input_shape="image:1,3,304,304" --log=info --soc_version=Ascend710 --enable_small_channel=1 +${install_path}/atc/bin/atc --model=./pelee_dynamic_bs_modify.onnx --framework=5 --output=pelee_bs1 --input_format=NCHW --input_shape="image:1,3,304,304" --log=info --soc_version=$1 --enable_small_channel=1 # 使用二进制输入时,执行如下命令。开启aipp,用于性能测试 -${install_path}/atc/bin/atc --model=./pelee_dynamic_bs_modify.onnx --framework=5 --output=pelee_bs32 --input_format=NCHW --input_shape="image:32,3,304,304" --log=info --soc_version=Ascend710 --enable_small_channel=1 --insert_op_conf=aipp.config \ No newline at end of file +${install_path}/atc/bin/atc --model=./pelee_dynamic_bs_modify.onnx --framework=5 --output=pelee_bs32 --input_format=NCHW --input_shape="image:32,3,304,304" --log=info --soc_version=$1 --enable_small_channel=1 --insert_op_conf=aipp.config \ No newline at end of file diff --git a/ACL_PyTorch/built-in/cv/Res2Net_v1b_101_for_PyTorch/README.md b/ACL_PyTorch/built-in/cv/Res2Net_v1b_101_for_PyTorch/README.md index 69fa898a117ea50952a777d8b53a7596c4653c9c..5c99a67a5d636f822fd82bea04aaf29c7e70d5c2 100644 --- a/ACL_PyTorch/built-in/cv/Res2Net_v1b_101_for_PyTorch/README.md +++ b/ACL_PyTorch/built-in/cv/Res2Net_v1b_101_for_PyTorch/README.md @@ -1,11 +1,12 @@ # Res2Net_v1b_101模型测试指导 -- [1 文件说明](#1-文件说明) -- [2 设置环境变量](#2-设置环境变量) -- [3 端到端推理步骤](#3-端到端推理步骤) - - [3.1 下载代码](#31-下载代码) - - [3.2 om模型转换](#32-om模型转换) - - [3.3 om模型推理](#33-om模型推理) +- [Res2Net_v1b_101模型测试指导](#res2net_v1b_101模型测试指导) + - [1 文件说明](#1-文件说明) + - [2 设置环境变量](#2-设置环境变量) + - [3 端到端推理步骤](#3-端到端推理步骤) + - [3.1 下载代码](#31-下载代码) + - [3.2 om模型转换](#32-om模型转换) + - [3.3 om模型推理](#33-om模型推理) ------ @@ -51,10 +52,14 @@ python3.7 pth2onnx.py -m ./res2net101_v1b_26w_4s-0812c246.pth -o ./res2net.onnx python3.7 pth2onnx.py -m ./res2net101_v1b_26w_4s-0812c246.pth -o ./res2net.onnx --optimizer ``` -利用ATC工具转换为om模型 -```shell -bash atc.sh -``` + +利用ATC工具转换为om模型, ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + + ```shell + bash atc.sh Ascend${chip_name} # Ascend310P3 + ``` ### 3.3 om模型推理 diff --git a/ACL_PyTorch/built-in/cv/Res2Net_v1b_101_for_PyTorch/atc.sh b/ACL_PyTorch/built-in/cv/Res2Net_v1b_101_for_PyTorch/atc.sh index e98af05173364eed0b1a581105d3685178fe157b..a376037eb7b1b7efd567f135c9948a7b216c5caa 100644 --- a/ACL_PyTorch/built-in/cv/Res2Net_v1b_101_for_PyTorch/atc.sh +++ b/ACL_PyTorch/built-in/cv/Res2Net_v1b_101_for_PyTorch/atc.sh @@ -5,12 +5,12 @@ export PYTHONPATH=${install_path}/atc/python/site-packages:$PYTHONPATH export LD_LIBRARY_PATH=${install_path}/atc/lib64:${install_path}/acllib/lib64:$LD_LIBRARY_PATH export ASCEND_OPP_PATH=${install_path}/opp -# 710 fp16,执行如下命令 +# 310P fp16,执行如下命令 atc --model=./res2net.onnx \ --framework=5 \ --output=res2net_bs16 \ --input_format=NCHW \ --input_shape="x:16,3,224,224" \ --log=error \ - --soc_version=Ascend710 \ + --soc_version=$1 \ --enable_small_channel=1 diff --git a/ACL_PyTorch/built-in/cv/Resnet101_Pytorch_Infer/ReadMe.md b/ACL_PyTorch/built-in/cv/Resnet101_Pytorch_Infer/ReadMe.md index b89cfb0e6e79831f8da11db45d48085a48276ca1..ab8dc9b37d570df5c219eac8de18baf05ca38969 100644 --- a/ACL_PyTorch/built-in/cv/Resnet101_Pytorch_Infer/ReadMe.md +++ b/ACL_PyTorch/built-in/cv/Resnet101_Pytorch_Infer/ReadMe.md @@ -1,29 +1,30 @@ # ResNet101 Onnx模型端到端推理指导 -- [1 模型概述](#1-模型概述) - - [1.1 论文地址](#11-论文地址) - - [1.2 代码地址](#12-代码地址) -- [2 环境说明](#2-环境说明) - - [2.1 深度学习框架](#21-深度学习框架) - - [2.2 python第三方库](#22-python第三方库) -- [3 模型转换](#3-模型转换) - - [3.1 pth转onnx模型](#31-pth转onnx模型) - - [3.2 onnx模型量化](#32-onnx模型量化) - - [3.3 onnx转om模型](#33-onnx转om模型) -- [4 数据集预处理](#4-数据集预处理) - - [4.1 数据集获取](#41-数据集获取) - - [4.2 数据集预处理](#42-数据集预处理) - - [4.3 生成数据集信息文件](#43-生成数据集信息文件) -- [5 离线推理](#5-离线推理) - - [5.1 benchmark工具概述](#51-benchmark工具概述) - - [5.2 离线推理](#52-离线推理) -- [6 精度对比](#6-精度对比) - - [6.1 离线推理TopN精度统计](#61-离线推理TopN精度统计) - - [6.2 开源TopN精度](#62-开源TopN精度) - - [6.3 精度对比](#63-精度对比) -- [7 性能对比](#7-性能对比) - - [7.1 npu性能数据](#71-npu性能数据) - - [7.2 T4性能数据](#72-T4性能数据) - - [7.3 性能对比](#73-性能对比) +- [ResNet101 Onnx模型端到端推理指导](#resnet101-onnx模型端到端推理指导) + - [1 模型概述](#1-模型概述) + - [1.1 论文地址](#11-论文地址) + - [1.2 代码地址](#12-代码地址) + - [2 环境说明](#2-环境说明) + - [2.1 深度学习框架](#21-深度学习框架) + - [2.2 python第三方库](#22-python第三方库) + - [3 模型转换](#3-模型转换) + - [3.1 pth转onnx模型](#31-pth转onnx模型) + - [3.2 onnx模型量化](#32-onnx模型量化) + - [3.3 onnx转om模型](#33-onnx转om模型) + - [4 数据集预处理](#4-数据集预处理) + - [4.1 数据集获取](#41-数据集获取) + - [4.2 数据集预处理](#42-数据集预处理) + - [4.3 生成数据集信息文件](#43-生成数据集信息文件) + - [5 离线推理](#5-离线推理) + - [5.1 benchmark工具概述](#51-benchmark工具概述) + - [5.2 离线推理](#52-离线推理) + - [6 精度对比](#6-精度对比) + - [6.1 离线推理TopN精度统计](#61-离线推理topn精度统计) + - [6.2 开源TopN精度](#62-开源topn精度) + - [6.3 精度对比](#63-精度对比) + - [7 性能对比](#7-性能对比) + - [7.1 npu性能数据](#71-npu性能数据) + - [7.2 T4性能数据](#72-t4性能数据) + - [7.3 性能对比](#73-性能对比) @@ -131,20 +132,21 @@ amct_onnx calibration --model resnet101.onnx --save_path ./result/resnet101 -- ### 3.3 onnx转om模型 -1.设置环境变量 +1. 设置环境变量 -``` -source env.sh -``` -2.使用atc将onnx模型转换为om模型文件,工具使用方法可以参考《[CANN 开发辅助工具指南 01](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373?category=developer-documents&subcategory=auxiliary-development-tools)》中的ATC工具使用指南章节 + ``` + source env.sh + ``` +2. 使用atc将onnx模型转换为om模型文件,工具使用方法可以参考《[CANN 开发辅助工具指南 01](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373?category=developer-documents&subcategory=auxiliary-development-tools)》中的ATC工具使用指南章节 -``` -atc --framework=5 --model=./resnet101.onnx --output=resnet101_bs16 --input_format=NCHW --input_shape="image:16,3,224,224" --log=debug --soc_version=Ascend310 --insert_op_conf=aipp.config -``` + ``` + atc --framework=5 --model=./resnet101.onnx --output=resnet101_bs16 --input_format=NCHW --input_shape="image:16,3,224,224" --log=debug --soc_version=Ascend310 --insert_op_conf=aipp.config + ``` **说明:** -> 若设备类型为Ascend710,设置soc_version==Ascend710即可; +> 若设备类型为Ascend310P,设置--soc_version=Ascend${chip_name}(Ascend310P3), ${chip_name}可通过`npu-smi info`指令查看; +> ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) > > aipp.config是AIPP工具数据集预处理配置文件,详细说明可参考"ATC工具使用指南"中的"AIPP配置"章节。 @@ -183,7 +185,7 @@ python3.7 gen_dataset_info.py bin ./prep_dataset ./resnet101_prep_bin.info 224 2 ### 5.1 benchmark工具概述 -benchmark工具为华为自研的模型推理工具,支持多种模型的离线推理,能够迅速统计出模型在Ascend310、710上的性能,支持真实数据和纯推理两种模式,配合后处理脚本,可以实现诸多模型的端到端过程,获取工具及使用方法可以参考《[CANN 推理benchmark工具用户指南 01](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373?category=developer-documents&subcategory=auxiliary-development-tools)》 +benchmark工具为华为自研的模型推理工具,支持多种模型的离线推理,能够迅速统计出模型在Ascend310、310P上的性能,支持真实数据和纯推理两种模式,配合后处理脚本,可以实现诸多模型的端到端过程,获取工具及使用方法可以参考《[CANN 推理benchmark工具用户指南 01](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373?category=developer-documents&subcategory=auxiliary-development-tools)》 ### 5.2 离线推理 1.设置环境变量 ``` @@ -285,7 +287,7 @@ Interface throughputRate: 370.456,370.456x4=1481.82既是batch32 310单卡吞 **说明:** -> 注意如果设备为Ascend710,则Interface throughputRate的值就是710的单卡吞吐率,不需要像310那样x4 +> 注意如果设备为Ascend310P,则Interface throughputRate的值就是310P的单卡吞吐率,不需要像310那样x4 ### 7.2 T4性能数据 在装有T4卡的服务器上测试gpu性能,测试过程请确保卡没有运行其他任务,TensorRT版本:7.2.3.4,cuda版本:11.0,cudnn版本:8.2 @@ -377,5 +379,5 @@ batch4,8,32的npu性能也都大于T4 310单个device的吞吐率乘4即单卡吞吐率比T4单卡的吞吐率大,故310性能高于T4性能,性能达标。 对于batch1的310性能高于T4性能2.08倍,batch16的310性能高于T4性能1.46倍,对于batch1与batch16,310性能均高于T4性能1.2倍,该模型放在Benchmark/cv/classification目录下。 -710单卡吞吐率要求最优batchsize情况下为310的1.5倍,当前已符合要求,具体数据不在此赘述。 +310P单卡吞吐率要求最优batchsize情况下为310的1.5倍,当前已符合要求,具体数据不在此赘述。 diff --git a/ACL_PyTorch/built-in/cv/Resnet18_for_PyTorch/README.md b/ACL_PyTorch/built-in/cv/Resnet18_for_PyTorch/README.md index 9495982f9e63226d3fa8b3b0d32444de0003df2d..99a5be2be458bcd8df72ccbacf4f657bafac3f3c 100644 --- a/ACL_PyTorch/built-in/cv/Resnet18_for_PyTorch/README.md +++ b/ACL_PyTorch/built-in/cv/Resnet18_for_PyTorch/README.md @@ -114,21 +114,25 @@ python3.7.5 calibration_bin.py prep_dataset calibration_bin 64 ### 4.3 onnx转om模型 -1.设置环境变量 -``` -source env.sh -``` +1. 设置环境变量 + ``` + source env.sh + ``` **说明** >此脚本中环境变量只供参考,请以实际安装环境配置环境变量 -2.使用atc将onnx模型转换为om模型文件,工具使用方法可以参考[CANN 开发辅助工具指南 (推理)](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373?category=developer-documents&subcategory=auxiliary-development-tools) -``` -atc --framework=5 --model=./resnet18.onnx --output=resnet18_bs1 --input_format=NCHW --input_shape="image:1,3,224,224" --log=debug --soc_version=Ascend310 --insert_op_conf=aipp.config --enable_small_channel=1 +2. 使用atc将onnx模型转换为om模型文件,工具使用方法可以参考[CANN 开发辅助工具指南 (推理)](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373?category=developer-documents&subcategory=auxiliary-development-tools) + + ${chip_name}可通过`npu-smi info`指令查看,例:310P3 + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` + atc --framework=5 --model=./resnet18.onnx --output=resnet18_bs1 --input_format=NCHW --input_shape="image:1,3,224,224" --log=debug --soc_version=Ascend${chip_name} --insert_op_conf=aipp.config --enable_small_channel=1 # Ascend310P3 -## Int8量化(可选) -atc --framework=5 --model=./result/resnet18_deploy_model.onnx --output=resnet18_bs1_int8 --input_format=NCHW --input_shape="image:1,3,224,224" --log=debug --soc_version=Ascend710 --insert_op_conf=aipp.config --enable_small_channel=1 -``` + + ## Int8量化(可选) + atc --framework=5 --model=./result/resnet18_deploy_model.onnx --output=resnet18_bs1_int8 --input_format=NCHW --input_shape="image:1,3,224,224" --log=debug --soc_version=Ascend${chip_name} --insert_op_conf=aipp.config --enable_small_channel=1 # Ascend310P3 + ``` ### 4.4 模型离线推理 diff --git a/ACL_PyTorch/built-in/cv/Resnet34_for_Pytorch/ReadMe.md b/ACL_PyTorch/built-in/cv/Resnet34_for_Pytorch/ReadMe.md index 321a4e6bfca1abfaedb18f5a7259973908de38c3..5963662c4eab6c22d8140b8adae7f0a2e05c2cf0 100644 --- a/ACL_PyTorch/built-in/cv/Resnet34_for_Pytorch/ReadMe.md +++ b/ACL_PyTorch/built-in/cv/Resnet34_for_Pytorch/ReadMe.md @@ -16,50 +16,21 @@ ## 推理端到端步骤: 1. 数据预处理,把ImageNet 50000张图片转为二进制文件(.bin) - - ```shell - python3.7 pytorch_transfer.py resnet /home/HwHiAiUser/dataset/ImageNet/ILSVRC2012_img_val ./prep_bin - ``` +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)```shell![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)python3.7 pytorch_transfer.py resnet /home/HwHiAiUser/dataset/ImageNet/ILSVRC2012_img_val ./prep_bin![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)``` 2. 生成数据集info文件 - - ```shell - python3.7 get_info.py bin ./prep_bin ./BinaryImageNet.info 256 256 - ``` +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)```shell![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)python3.7 get_info.py bin ./prep_bin ./BinaryImageNet.info 256 256![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)``` 3. 从torchvision下载resnet34模型或者指定自己训练好的pth文件路径,通过pth2onnx.py脚本转化为onnx模型 - - ```shell - python3.7 pth2onnx.py ./resnet34-333f7ec4.pth ./resnet34_dynamic.onnx - ``` +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)```shell![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)python3.7 pth2onnx.py ./resnet34-333f7ec4.pth ./resnet34_dynamic.onnx![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)``` 4. 支持脚本将.onnx文件转为离线推理模型文件.om文件 - ```shell - bash resnet34_atc.sh - ``` +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)${chip_name}可通过`npu-smi info`指令查看 +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)```![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)+----------------------------------------------------------------------------------------------------+![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)+-------------------+-----------------+--------------------------------------------------------------+![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)| Power(W) Temp(C) Hugepages-Usage(page) |![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)| AICore(%) Memory-Usage(MB) |![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)+===================+=================+==============================================================+![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)|![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)+===================+=================+==============================================================+![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)``` +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)```shell![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)bash resnet34_atc.sh Ascend${chip_name} # Ascend310P3![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)``` -5. 使用Benchmark工具进行推理 - ```shell - export install_path=/usr/local/Ascend/ascend-toolkit/latest - export PATH=${install_path}/atc/ccec_compiler/bin:$PATH - export LD_LIBRARY_PATH=${install_path}/atc/lib64:${install_path}/fwkacllib/lib64:${install_path}/acllib/lib64:${install_path}/atc/lib64/plugin/opskernel/:/usr/local/Ascend/aoe/lib64/:$LD_LIBRARY_PATH - export PYTHONPATH=${install_path}/atc/python/site-packages:$PYTHONPATH - export ASCEND_OPP_PATH=${install_path}/opp - export ASCEND_AICPU_PATH=${install_path} - ./benchmark -model_type=vision -om_path=resnet34_fp16_bs16.om -device_id=0 -batch_size=16 -input_text_path=BinaryImageNet.info -input_width=256 -input_height=256 -useDvpp=false -output_binary=false - ``` +5. 使用Benchmark工具进行推理![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)```shell![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)export install_path=/usr/local/Ascend/ascend-toolkit/latest![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)export PATH=${install_path}/atc/ccec_compiler/bin:$PATH![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)export LD_LIBRARY_PATH=${install_path}/atc/lib64:${install_path}/fwkacllib/lib64:${install_path}/acllib/lib64:${install_path}/atc/lib64/plugin/opskernel/:/usr/local/Ascend/aoe/lib64/:$LD_LIBRARY_PATH![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)export PYTHONPATH=${install_path}/atc/python/site-packages:$PYTHONPATH![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)export ASCEND_OPP_PATH=${install_path}/opp![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)export ASCEND_AICPU_PATH=${install_path}![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)./benchmark -model_type=vision -om_path=resnet34_fp16_bs16.om -device_id=0 -batch_size=16 -input_text_path=BinaryImageNet.info -input_width=256 -input_height=256 -useDvpp=false -output_binary=false![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)``` 6. 精度验证,调用vision_metric_ImageNet.py脚本与数据集标签val_label.txt比对,可以获得Accuracy数据,结果保存在result.json中 - - ```shell - python3.7 vision_metric_ImageNet.py result/dumpOutput_device0/ ./val_label.txt ./ result.json - ``` -7. 模型量化: - a.生成量化数据: - '''shell - mkdir amct_prep_bin - python3.7 pytorch_transfer_amct.py /home/HwHiAiUser/dataset/ImageNet/ILSVRC2012_img_val ./amct_prep_bin - mkdir data_bs64 - python3.7 calibration_bin ./amct_prep_bin data_bs64 64 - b. 量化模型转换: - amct_onnx calibration --model resnet34_dynamic.onnx --save_path ./result/resnet34 --input_shape="actual_input_1:64,3,224,224" --data_dir "./data_bs64/" --data_type "float32" \ No newline at end of file +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)```shell![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)python3.7 vision_metric_ImageNet.py result/dumpOutput_device0/ ./val_label.txt ./ result.json![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)``` +7. 模型量化:![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)a.生成量化数据:![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)'''shell![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) mkdir amct_prep_bin![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) python3.7 pytorch_transfer_amct.py /home/HwHiAiUser/dataset/ImageNet/ILSVRC2012_img_val ./amct_prep_bin![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) mkdir data_bs64![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) python3.7 calibration_bin ./amct_prep_bin data_bs64 64![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png)b. 量化模型转换:![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) amct_onnx calibration --model resnet34_dynamic.onnx --save_path ./result/resnet34 --input_shape="actual_input_1:64,3,224,224" --data_dir "./data_bs64/" --data_type "float32" \ No newline at end of file diff --git a/ACL_PyTorch/built-in/cv/Resnet34_for_Pytorch/resnet34_atc.sh b/ACL_PyTorch/built-in/cv/Resnet34_for_Pytorch/resnet34_atc.sh index 9077d46372dda673deaf6060e2564d22a40b3748..df334dbed8c20bad9dcde6a1641d4553f6ee09e0 100644 --- a/ACL_PyTorch/built-in/cv/Resnet34_for_Pytorch/resnet34_atc.sh +++ b/ACL_PyTorch/built-in/cv/Resnet34_for_Pytorch/resnet34_atc.sh @@ -11,14 +11,14 @@ export ASCEND_SLOG_PRINT_TO_STDOUT=0 /usr/local/Ascend/driver/tools/msnpureport -g error -d 0 /usr/local/Ascend/driver/tools/msnpureport -g error -d 1 /usr/local/Ascend/driver/tools/msnpureport -g error -d 2 -# 710 fp16,执行如下命令 -atc --model=./resnet34_dynamic.onnx --framework=5 --output=resnet34_fp16_bs8 --input_format=NCHW --input_shape="actual_input_1:8,3,224,224" --log=info --soc_version=Ascend710 --insert_op_conf=resnet34_aipp.config +# 310P fp16,执行如下命令 +atc --model=./resnet34_dynamic.onnx --framework=5 --output=resnet34_fp16_bs8 --input_format=NCHW --input_shape="actual_input_1:8,3,224,224" --log=info --soc_version=$1 --insert_op_conf=resnet34_aipp.config # 310 fp16,执行如下命令 # atc --model=./resnet34_dynamic.onnx --framework=5 --output=resnet34_fp16_bs16 --input_format=NCHW --input_shape="actual_input_1:16,3,224,224" --log=info --soc_version=Ascend310 --insert_op_conf=resnet34_aipp.config -# 710 int8,执行如下命令 -atc --model=./resnet34_deploy_model.onnx --framework=5 --output=resnet34_int8_bs16 --input_format=NCHW --input_shape="actual_input_1:16,3,224,224" --log=info --soc_version=Ascend710 --insert_op_conf=resnet34_aipp.config +# 310P int8,执行如下命令 +atc --model=./resnet34_deploy_model.onnx --framework=5 --output=resnet34_int8_bs16 --input_format=NCHW --input_shape="actual_input_1:16,3,224,224" --log=info --soc_version=$1 --insert_op_conf=resnet34_aipp.config # 310 int8,执行如下命令 # atc --model=./resnet34_deploy_model.onnx --framework=5 --output=resnet34_int8_bs16 --input_format=NCHW --input_shape="actual_input_1:16,3,224,224" --log=info --soc_version=Ascend310 --insert_op_conf=resnet34_aipp.config diff --git a/ACL_PyTorch/built-in/cv/Resnet50_Pytorch_Infer/ReadMe.md b/ACL_PyTorch/built-in/cv/Resnet50_Pytorch_Infer/ReadMe.md index 8519e34de85aad43f2fa371c639c5e5bc6a8870e..aca1a3a8c2b7641dc6ae240420117ae1d6610cd1 100644 --- a/ACL_PyTorch/built-in/cv/Resnet50_Pytorch_Infer/ReadMe.md +++ b/ACL_PyTorch/built-in/cv/Resnet50_Pytorch_Infer/ReadMe.md @@ -16,7 +16,7 @@ 8.benchmark工具源码地址:https://gitee.com/ascend/cann-benchmark/tree/master/infer -710增加文件说明: +310P增加文件说明: 1.pthtar2onx_dynamic.py:用于转换pth.tar文件到动态onnx文件; @@ -26,7 +26,7 @@ 4.gen_resnet50_64bs_bin.py:基于数据预处理结果合成量化所需的64bs输入; -5.aipp_resnet50_710.aippconfig:aipp配置文件 +5.aipp_resnet50_310P.aippconfig:aipp配置文件 推理端到端步骤: @@ -54,7 +54,7 @@ 验证推理结果 -710精度验证步骤: +310P精度验证步骤: (1)python3.7.5 imagenet_torch_preprocess.py resnet ./ImageNet/val_union ./prep_dataset 数据集处理; diff --git a/ACL_PyTorch/built-in/cv/Resnet50_Pytorch_Infer/aipp_resnet50_710.aippconfig b/ACL_PyTorch/built-in/cv/Resnet50_Pytorch_Infer/aipp_resnet50_310P.aippconfig similarity index 95% rename from ACL_PyTorch/built-in/cv/Resnet50_Pytorch_Infer/aipp_resnet50_710.aippconfig rename to ACL_PyTorch/built-in/cv/Resnet50_Pytorch_Infer/aipp_resnet50_310P.aippconfig index 71e0923f2ae25bb4ece78356a9bd9ee865f6bcc0..173c2d80353dc9de5b252a0b612cec5cde113361 100644 --- a/ACL_PyTorch/built-in/cv/Resnet50_Pytorch_Infer/aipp_resnet50_710.aippconfig +++ b/ACL_PyTorch/built-in/cv/Resnet50_Pytorch_Infer/aipp_resnet50_310P.aippconfig @@ -1,20 +1,20 @@ -aipp_op{ - aipp_mode:static - input_format : RGB888_U8 - - src_image_size_w : 256 - src_image_size_h : 256 - - crop: true - load_start_pos_h : 16 - load_start_pos_w : 16 - crop_size_w : 224 - crop_size_h: 224 - - min_chn_0 : 123.675 - min_chn_1 : 116.28 - min_chn_2 : 103.53 - var_reci_chn_0: 0.0171247538316637 - var_reci_chn_1: 0.0175070028011204 - var_reci_chn_2: 0.0174291938997821 +aipp_op{ + aipp_mode:static + input_format : RGB888_U8 + + src_image_size_w : 256 + src_image_size_h : 256 + + crop: true + load_start_pos_h : 16 + load_start_pos_w : 16 + crop_size_w : 224 + crop_size_h: 224 + + min_chn_0 : 123.675 + min_chn_1 : 116.28 + min_chn_2 : 103.53 + var_reci_chn_0: 0.0171247538316637 + var_reci_chn_1: 0.0175070028011204 + var_reci_chn_2: 0.0174291938997821 } \ No newline at end of file diff --git a/ACL_PyTorch/built-in/cv/SE_ResNet50_Pytorch_Infer/README.md b/ACL_PyTorch/built-in/cv/SE_ResNet50_Pytorch_Infer/README.md index d9e1c75e635f0b818038f7f385f7ac8b0141483b..e22a8816003cf16b217347df690b7593b3be685e 100644 --- a/ACL_PyTorch/built-in/cv/SE_ResNet50_Pytorch_Infer/README.md +++ b/ACL_PyTorch/built-in/cv/SE_ResNet50_Pytorch_Infer/README.md @@ -1,25 +1,26 @@ # SE_ResNet50 Onnx模型端到端推理指导 -- [1 模型概述](#1-模型概述) - - [1.1 论文地址](#11-论文地址) - - [1.2 代码地址](#12-代码地址) -- [2 环境说明](#2-环境说明) - - [2.1 推理硬件设备](#21-推理硬件设备) - - [2.2 深度学习框架](#22-深度学习框架) - - [2.3 Python第三方库](#23-Python第三方库) -- [3 模型转换](#3-模型转换) - - [3.1 获取pth权重文件](#31-获取pth权重文件) - - [3.2 获取pth权重文件](#32-pth转onnx模型) - - [3.3 pth转om模型](#33-onnx转om模型) -- [4 数据集预处理](#4-数据集预处理) - - [4.1 数据集获取](#41-数据集获取) - - [4.2 数据集预处理](#42-数据集预处理) - - [4.3 生成数据集信息文件](#43-生成数据集信息文件) -- [5 离线推理](#5-离线推理) - - [5.1 benchmark工具概述](#51-benchmark工具概述) - - [5.2 离线推理](#52-离线推理) - - [5.3 性能验证](#53-性能验证) -- [6 评测结果](#6-评测结果) -- [7 test目录说明](#7-test目录说明) +- [SE_ResNet50 Onnx模型端到端推理指导](#se_resnet50-onnx模型端到端推理指导) + - [1 模型概述](#1-模型概述) + - [1.1 论文地址](#11-论文地址) + - [1.2 代码地址](#12-代码地址) + - [2 环境说明](#2-环境说明) + - [2.1 推理硬件设备](#21-推理硬件设备) + - [2.2 深度学习框架](#22-深度学习框架) + - [2.3 Python第三方库](#23-python第三方库) + - [3 模型转换](#3-模型转换) + - [3.1 获取pth权重文件](#31-获取pth权重文件) + - [3.2 pth转onnx模型](#32-pth转onnx模型) + - [3.3 onnx转om模型](#33-onnx转om模型) + - [4 数据集预处理](#4-数据集预处理) + - [4.1 数据集获取](#41-数据集获取) + - [4.2 数据集预处理](#42-数据集预处理) + - [4.3 生成数据集信息文件](#43-生成数据集信息文件) + - [5 离线推理](#5-离线推理) + - [5.1 benchmark工具概述](#51-benchmark工具概述) + - [5.2 离线推理](#52-离线推理) + - [5.3 性能验证](#53-性能验证) + - [6 评测结果](#6-评测结果) + - [6 test目录说明](#6-test目录说明) ## 1 模型概述 @@ -43,7 +44,7 @@ ### 2.1 推理硬件设备 ``` -Ascend710 +Ascend310P ``` ### 2.2 深度学习框架 @@ -106,8 +107,12 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh b.执行atc模型转换命令: +${chip_name}可通过`npu-smi info`指令查看,例:310P3 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` -atc --model=./se_resnet50_dynamic_bs.onnx --framework=5 --input_format=NCHW --input_shape="image:32,3,224,224" --output=./se_resnet50_fp16_bs32 --log=error --soc_version=Ascend710 --insert_op_conf=./aipp_SE_ResNet50_pth.config --enable_small_channel=1 +atc --model=./se_resnet50_dynamic_bs.onnx --framework=5 --input_format=NCHW --input_shape="image:32,3,224,224" --output=./se_resnet50_fp16_bs32 --log=error --soc_version=Ascend${chip_name} --insert_op_conf=./aipp_SE_ResNet50_pth.config --enable_small_channel=1 ``` 参数说明: @@ -117,7 +122,7 @@ atc --model=./se_resnet50_dynamic_bs.onnx --framework=5 --input_format=NCHW --in --input_shape:输入数据的shape。 --output:输出的OM模型。 --log:日志级别。 - --soc_version:处理器型号,Ascend310或Ascend710。 + --soc_version:处理器型号。 --insert_op_config:插入算子的配置文件路径与文件名,例如aipp预处理算子。 --enable_small_channel:Set enable small channel. 0(default): disable; 1: enable @@ -166,7 +171,7 @@ python3 ./gen_dataset_info.py bin ./data/ImageNet_bin ./data/ImageNet_bin.info 2 ### 5.1 benchmark工具概述 -benchmark工具为华为自研的模型推理工具,支持多种模型的离线推理,能够迅速统计出模型在Ascend710上的性能,支持真实数据和纯推理两种模式,配合后处理脚本,可以实现诸多模型的端到端过程,获取工具及使用方法可以参考[CANN V100R020C10 推理benchmark工具用户指南 01](https://support.huawei.com/enterprise/zh/doc/EDOC1100164874?idPath=23710424%7C251366513%7C22892968%7C251168373) +benchmark工具为华为自研的模型推理工具,支持多种模型的离线推理,能够迅速统计出模型在Ascend310P上的性能,支持真实数据和纯推理两种模式,配合后处理脚本,可以实现诸多模型的端到端过程,获取工具及使用方法可以参考[CANN V100R020C10 推理benchmark工具用户指南 01](https://support.huawei.com/enterprise/zh/doc/EDOC1100164874?idPath=23710424%7C251366513%7C22892968%7C251168373) ### 5.2 离线推理 1.设置环境变量: @@ -210,7 +215,7 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh ## 6 评测结果 评测结果 -| 模型 | pth精度 | 710精度 | 性能基准 | 710性能 | +| 模型 | pth精度 | 310P精度 | 性能基准 | 310P性能 | | --------------- | ---------------------- | ------------------------- | ------------ | ----------- | | SE_ResNet50 bs32 | Acc@1 77.63,Acc@5 93.64| Acc@1 77.36,Acc@5 93.76 | 1554.726fps | 2690.43fps | diff --git a/ACL_PyTorch/built-in/cv/SE_ResNet50_Pytorch_Infer/test/pth2om.sh b/ACL_PyTorch/built-in/cv/SE_ResNet50_Pytorch_Infer/test/pth2om.sh index 76d1a6fb3caada4fef07ad485906a8969841a1d6..934f5e402effeafccbd4c36517f3f70f88d9efcb 100644 --- a/ACL_PyTorch/built-in/cv/SE_ResNet50_Pytorch_Infer/test/pth2om.sh +++ b/ACL_PyTorch/built-in/cv/SE_ResNet50_Pytorch_Infer/test/pth2om.sh @@ -13,7 +13,7 @@ fi echo 'onnx -> om batch32' rm -rf ./se_resnet50_bs32.om -atc --model=./se_resnet50_dynamic_bs.onnx --framework=5 --input_format=NCHW --input_shape="image:32,3,224,224" --output=./se_resnet50_fp16_bs32 --log=error --soc_version=Ascend710 --insert_op_conf=./aipp_SE_ResNet50_pth.config --enable_small_channel=1 +atc --model=./se_resnet50_dynamic_bs.onnx --framework=5 --input_format=NCHW --input_shape="image:32,3,224,224" --output=./se_resnet50_fp16_bs32 --log=error --soc_version=$2 --insert_op_conf=./aipp_SE_ResNet50_pth.config --enable_small_channel=1 if [ $? != 0 ]; then echo "fail!" exit -1 diff --git a/ACL_PyTorch/built-in/cv/U2-Net_for_PyTorch/README.md b/ACL_PyTorch/built-in/cv/U2-Net_for_PyTorch/README.md index 235e2006106bd32d8a1621d330fd23f9216da7ab..0a69239c641645ab67b1efa7fa62609f5e89cdc6 100644 --- a/ACL_PyTorch/built-in/cv/U2-Net_for_PyTorch/README.md +++ b/ACL_PyTorch/built-in/cv/U2-Net_for_PyTorch/README.md @@ -41,9 +41,14 @@ cp -r ECSSD ./datasets/ ## 2 离线推理 -710上执行,执行时使npu-smi info查看设备状态,确保device空闲: +310P上执行, 执行时使npu-smi info查看设备状态,确保device空闲: + +${chip_name}可通过`npu-smi info`指令查看 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` -bash test/pth2om.sh +bash test/pth2om.sh Ascend${chip_name} # Ascend310P3 bash test/eval_acc_perf.sh ``` @@ -56,7 +61,7 @@ bash perf_g.sh **评测结果:** -| 模型 | pth精度 | 710离线推理精度 | 基准性能 | 710性能 | +| 模型 | pth精度 | 310P离线推理精度 | 基准性能 | 310P性能 | | :------: | :------: | :------: | :------: | :------: | | U2-Net bs1 | maxF:95.1% MAE:0.033 | maxF:94.8% MAE:0.033 | 111.147 fps | 254 fps | | U2-Net bs16 | maxF:95.1% MAE:0.033 | maxF:94.8% MAE:0.033 | 141.465 fps | 227 fps | @@ -67,4 +72,4 @@ bash perf_g.sh |-------------|----------------------|----------------------|-------------|----------| 最优bs比较: -710/T4=254/141.465=1.795,符合要求 +310P/T4=254/141.465=1.795,符合要求 diff --git a/ACL_PyTorch/built-in/cv/U2-Net_for_PyTorch/test/pth2om.sh b/ACL_PyTorch/built-in/cv/U2-Net_for_PyTorch/test/pth2om.sh index efdebe0e07cad2fb87725b0e4bd80c09f647b897..c06ad14d61aff786377c508dc53ac03e1dd54e15 100644 --- a/ACL_PyTorch/built-in/cv/U2-Net_for_PyTorch/test/pth2om.sh +++ b/ACL_PyTorch/built-in/cv/U2-Net_for_PyTorch/test/pth2om.sh @@ -16,5 +16,5 @@ for bs in 1 4 8 16 32 64; do python3.7 -m onnxsim models/u2net.onnx models/u2net_sim_bs${bs}.onnx --input-shape "image:${bs},3,320,320" &> log python3.7 fix_onnx.py models/u2net_sim_bs${bs}.onnx models/u2net_sim_bs${bs}_fixv2.onnx &> log - atc --framework=5 --model=models/u2net_sim_bs${bs}_fixv2.onnx --output=models/u2net_sim_bs${bs}_fixv2 --input_format=NCHW --input_shape="image:${bs},3,320,320" --out_nodes='Sigmoid_1048:0' --log=error --soc_version=Ascend710 + atc --framework=5 --model=models/u2net_sim_bs${bs}_fixv2.onnx --output=models/u2net_sim_bs${bs}_fixv2 --input_format=NCHW --input_shape="image:${bs},3,320,320" --out_nodes='Sigmoid_1048:0' --log=error --soc_version=$1 done diff --git a/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/ReadMe.md b/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/ReadMe.md index 95c7aff52a80974fd1518d6184780dd232916fb9..4602eea52562940ca5965b544ba93fc8043e07f3 100644 --- a/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/ReadMe.md +++ b/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/ReadMe.md @@ -23,7 +23,16 @@ 注意:脚本中导入sys.path.append(r"./pytorch-ssd")即是下载的源码 3. 运行vgg16_ssd_atc.sh脚本转换om模型 - 可将--input_shape="actual_input_1:1,3,300,300" 改成想测试的shape,如 (16,3,300,300)测试16batch的onnx + + 可将--input_shape="actual_input_1:1,3,300,300" 改成想测试的shape,如 (16,3,300,300)测试16batch的onnx + + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + + ``` + bash vgg16_ssd_atc.sh Ascend${chip_name} # Ascend310P3 + ``` 4. 用ssd_pth_preprocess.py脚本处理数据集, diff --git a/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/auto_atc.sh b/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/auto_atc.sh index d65dc17a2b26656d60c64e80acf37353076a8b9f..06c3d86178096e30baed3901ecbe5e19157985f1 100644 --- a/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/auto_atc.sh +++ b/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/auto_atc.sh @@ -8,7 +8,7 @@ export ASCEND_AICPU_PATH=${install_path} # modify model_name run_model soc_version model_name="ssd_vgg16" run_model=(vgg16_ssd.onnx result_amct/vgg16_ssd_deploy_model.onnx) ## FP16 INT8 -soc_version="Ascend710" ## 710 +soc_version=$1 ## 310P batch_sizes="1 4 8 16 32 64" format_types=(fp16, int8) diff --git a/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/vgg16_ssd_atc.sh b/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/vgg16_ssd_atc.sh index 4c6c8d59539943430fbcc6bfd0b4f29cf1fda9de..2c2f5b248012b3a3766feefe43fb7da51c38e1bb 100644 --- a/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/vgg16_ssd_atc.sh +++ b/ACL_PyTorch/built-in/cv/VGG16_SSD_for_PyTorch/vgg16_ssd_atc.sh @@ -8,6 +8,6 @@ export ASCEND_AICPU_PATH=${install_path} export SLOG_PRINT_TO_STDOUT=1 export REPEAT_TUNE=True -atc --model=./vgg16_ssd.onnx --framework=5 --output=vgg16_ssd --input_format=NCHW --input_shape="actual_input_1:1,3,300,300" --log=info --soc_version=Ascend710 +atc --model=./vgg16_ssd.onnx --framework=5 --output=vgg16_ssd --input_format=NCHW --input_shape="actual_input_1:1,3,300,300" --log=info --soc_version=$1 -atc --model= ./result_amct/vgg16_ssd_deploy_model.onnx --framework=5 --output=vgg16_ssd_deploy_model --input_format=NCHW --input_shape="actual_input_1:1,3,300,300" --log=info --soc_version=Ascend710 \ No newline at end of file +atc --model= ./result_amct/vgg16_ssd_deploy_model.onnx --framework=5 --output=vgg16_ssd_deploy_model --input_format=NCHW --input_shape="actual_input_1:1,3,300,300" --log=info --soc_version=$1 \ No newline at end of file diff --git a/ACL_PyTorch/built-in/cv/YoloXs_for_Pytorch/readme.md b/ACL_PyTorch/built-in/cv/YoloXs_for_Pytorch/readme.md index 9ad0df1a2e3dfa33dc2fe3bd72876cd23ecc32ba..870da72ec4fc29a00cd90f4aae27c907360e938d 100644 --- a/ACL_PyTorch/built-in/cv/YoloXs_for_Pytorch/readme.md +++ b/ACL_PyTorch/built-in/cv/YoloXs_for_Pytorch/readme.md @@ -31,20 +31,24 @@ pip3 install -v -e . ### 2. 离线推理 -710上执行,执行时使npu-smi info查看设备状态,确保device空闲,设置环境变量后运行对应脚本即可,将modelzoo下载的本模型(YoloXs_for_Pytorch)下的文件及文件夹拷贝到YOLOX目录下 +310P上执行,执行时使npu-smi info查看设备状态,确保device空闲,设置环境变量后运行对应脚本即可,将modelzoo下载的本模型(YoloXs_for_Pytorch)下的文件及文件夹拷贝到YOLOX目录下 + +${chip_name}可通过`npu-smi info`指令查看 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) ```bash source /usr/local/Ascend/ascend-toolkit/set_env.sh cd YOLOX cp yolox_s.pth ./YOLOX cp -r YoloXs_for_Pytorch/* ./YOLOX -bash test/pth2om.sh +bash test/pth2om.sh Ascend${chip_name} # Ascend310P3 bash test/eval-acc-perf.sh --datasets_path=/root/datasets ``` **评测结果:** -| 模型 | pth精度 | 710离线推理精度 | 性能基准 | 710性能 | +| 模型 | pth精度 | 310P离线推理精度 | 性能基准 | 310P性能 | | ----------- | --------- | --------------- | --------- | ------- | | yolox-s | map:40.5% | map:40.1% | | 950fps | diff --git a/ACL_PyTorch/built-in/cv/YoloXs_for_Pytorch/test/pth2om.sh b/ACL_PyTorch/built-in/cv/YoloXs_for_Pytorch/test/pth2om.sh index 75629dc73bb7eff8164ad1adb81966025ce4a63e..df197fe6fb7d864eef6d6d934340695077256232 100644 --- a/ACL_PyTorch/built-in/cv/YoloXs_for_Pytorch/test/pth2om.sh +++ b/ACL_PyTorch/built-in/cv/YoloXs_for_Pytorch/test/pth2om.sh @@ -15,7 +15,7 @@ fi rm -rf models/*.om atc --model=yolox.onnx --framework=5 --output=./models/yolox --input_format=NCHW --optypelist_for_implmode="Sigmoid" \ - --op_select_implmode=high_performance --input_shape='images:4,3,640,640' --log=info --soc_version=Ascend710 + --op_select_implmode=high_performance --input_shape='images:4,3,640,640' --log=info --soc_version=$1 if [ -f "models/yolox.om" ]; then echo "success" diff --git a/ACL_PyTorch/built-in/cv/Yolov4_for_Pytorch/README.md b/ACL_PyTorch/built-in/cv/Yolov4_for_Pytorch/README.md index bfd8ba1dbeeb6dd01466c8be91a444d3daffec44..9617a3ef844f2f5d73d6db6fe4367479e174c0a6 100644 --- a/ACL_PyTorch/built-in/cv/Yolov4_for_Pytorch/README.md +++ b/ACL_PyTorch/built-in/cv/Yolov4_for_Pytorch/README.md @@ -27,19 +27,24 @@ python3.7 dy_resize.py yolov4_-1_3_608_608_dynamic.onnx (3)配置环境变量转换om模型 +${chip_name}可通过`npu-smi info`指令查看 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` source env.sh -# soc_version:支持Ascend310和Ascend710 +# soc_version:支持Ascend310和Ascend310P[1-4] # 二进制输入 -atc --model=yolov4_-1_3_608_608_dynamic_dbs.onnx --framework=5 --output=yolov4_bs1 --input_format=NCHW --log=info --soc_version=Ascend310 --input_shape="input:1,3,608,608" --insert_op_conf=aipp.config --enable_small_channel=1 +atc --model=yolov4_-1_3_608_608_dynamic_dbs.onnx --framework=5 --output=yolov4_bs1 --input_format=NCHW --log=info --soc_version=Ascend${chip_name} --input_shape="input:1,3,608,608" --insert_op_conf=aipp.config --enable_small_channel=1 # 二进制输入 int8量化 -atc --model=yolov4_deploy_model.onnx --framework=5 --output=yolov4_bs1 --input_format=NCHW --log=info --soc_version=Ascend310 --input_shape="input:1,3,608,608" --insert_op_conf=aipp.config --enable_small_channel=1 +atc --model=yolov4_deploy_model.onnx --framework=5 --output=yolov4_bs1 --input_format=NCHW --log=info --soc_version=Ascend${chip_name} --input_shape="input:1,3,608,608" --insert_op_conf=aipp.config --enable_small_channel=1 ``` + (4)解析数据集 下载coco2014数据集val2014和label文件**instances_valminusminival2014.json**,运行**parse_json.py**解析数据集 diff --git a/ACL_PyTorch/built-in/cv/Yolov5_for_Pytorch/README.md b/ACL_PyTorch/built-in/cv/Yolov5_for_Pytorch/README.md index a9329efbb511c1f223d9f016d5045c2a7bdf53b7..0488751d31fd84d8691f9a02e52dc8e34abfc8d5 100644 --- a/ACL_PyTorch/built-in/cv/Yolov5_for_Pytorch/README.md +++ b/ACL_PyTorch/built-in/cv/Yolov5_for_Pytorch/README.md @@ -1,13 +1,18 @@ # Yolov5模型推理 -- [1 环境准备](#1-环境准备) -- [2 推理步骤](#2-推理步骤) - - [2.1 设置环境变量](#21-设置环境变量) - - [2.2 pt导出om模型](#22-pt导出om模型) - - [2.3 om模型推理](#23-om模型推理) -- [3 端到端推理Demo](#3-端到端推理Demo) -- [4 量化(可选)](#4-量化(可选)) -- [5 FAQ](#5-FAQ) +- [Yolov5模型推理](#yolov5模型推理) + - [文件说明](#文件说明) + - [1 环境准备](#1-环境准备) + - [1.1 下载pytorch源码,切换到对应分支](#11-下载pytorch源码切换到对应分支) + - [1.2 准备以下文件,放到pytorch源码根目录](#12-准备以下文件放到pytorch源码根目录) + - [安装依赖](#安装依赖) + - [2 推理步骤](#2-推理步骤) + - [2.1 设置环境变量](#21-设置环境变量) + - [2.2 pt导出om模型](#22-pt导出om模型) + - [2.3 om模型推理](#23-om模型推理) + - [3 端到端推理Demo](#3---端到端推理demo) + - [4 量化(可选)](#4-量化可选) + - [FAQ](#faq) ------ diff --git a/ACL_PyTorch/built-in/nlp/Bert_Base_Uncased_for_Pytorch/ReadMe.md b/ACL_PyTorch/built-in/nlp/Bert_Base_Uncased_for_Pytorch/ReadMe.md index 865a190861676cb9ce559a47d9002bbe6ad2dd58..b141f39d606d6d942b73d3c22c654e8aa0b8ee0f 100644 --- a/ACL_PyTorch/built-in/nlp/Bert_Base_Uncased_for_Pytorch/ReadMe.md +++ b/ACL_PyTorch/built-in/nlp/Bert_Base_Uncased_for_Pytorch/ReadMe.md @@ -170,7 +170,7 @@ 3. ###### 使用ATC工具将ONNX模型转OM模型。 1. 修改bert_base_uncased_atc.sh脚本,通过ATC工具使用脚本完成转换,具体的脚本示例如下: - + ``` # 配置环境变量 export install_path=/usr/local/Ascend/ascend-toolkit/latest @@ -180,7 +180,7 @@ export ASCEND_OPP_PATH=${install_path}/opp # 使用二进制输入时,执行如下命令 - atc --input_format=ND --framework=5 --model=bert_base_batch_8.onnx --input_shape="input_ids:8,512;token_type_ids:8,512;attention_mask:8,512" --output=bert_base_batch_8_auto --log=info --soc_version=Ascend710 --optypelist_for_implmode="Gelu" --op_select_implmode=high_performance --input_fp16_nodes="attention_mask" + atc --input_format=ND --framework=5 --model=bert_base_batch_8.onnx --input_shape="input_ids:8,512;token_type_ids:8,512;attention_mask:8,512" --output=bert_base_batch_8_auto --log=info --soc_version=$1 --optypelist_for_implmode="Gelu" --op_select_implmode=high_performance --input_fp16_nodes="attention_mask" ``` ![img](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/turing/resourcecenter/img/public_sys-resources/note_3.0-zh-cn.png) @@ -200,8 +200,12 @@ 2. 执行atc转换脚本,将.onnx文件转为离线推理模型文件.om文件。 + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` - bash atc_bert_base_uncased.sh + bash atc_bert_base_uncased.sh Ascend${chip_name} # Ascend310P3 ``` 运行成功后生成bert_base_batch_8_auto.om用于二进制输入推理的模型文件。 diff --git a/ACL_PyTorch/built-in/nlp/Bert_Base_Uncased_for_Pytorch/atc_bert_base_uncased.sh b/ACL_PyTorch/built-in/nlp/Bert_Base_Uncased_for_Pytorch/atc_bert_base_uncased.sh index 3b208c71dac34fa639c7f5bee5f4d39cabf24c5f..01af5830b1a634d07ada00cb4b239705884e4e22 100644 --- a/ACL_PyTorch/built-in/nlp/Bert_Base_Uncased_for_Pytorch/atc_bert_base_uncased.sh +++ b/ACL_PyTorch/built-in/nlp/Bert_Base_Uncased_for_Pytorch/atc_bert_base_uncased.sh @@ -9,5 +9,5 @@ for i in $(seq 0 7); do /usr/local/Ascend/driver/tools/msnpureport -g error -d $ atc --input_format=ND --framework=5 --model=bert_base_batch_8.onnx\ --input_shape="input_ids:8,512;token_type_ids:8,512;attention_mask:8,512"\ --output=bert_base_batch_8_auto\ - --log=error --soc_version=Ascend710 --optypelist_for_implmode="Gelu"\ + --log=error --soc_version=$1 --optypelist_for_implmode="Gelu"\ --op_select_implmode=high_performance --input_fp16_nodes="attention_mask" \ No newline at end of file diff --git a/ACL_PyTorch/built-in/nlp/CNN_Transformer_for_Pytorch/ReadMe.md b/ACL_PyTorch/built-in/nlp/CNN_Transformer_for_Pytorch/ReadMe.md index 38b94eb75703250254eb5f8efd66c4585a1350b0..1c8739ee66f93257f51e9568e0ed8960181c9c08 100644 --- a/ACL_PyTorch/built-in/nlp/CNN_Transformer_for_Pytorch/ReadMe.md +++ b/ACL_PyTorch/built-in/nlp/CNN_Transformer_for_Pytorch/ReadMe.md @@ -61,12 +61,12 @@ python3.7 preprocess.py --pre_data_save_path=./pre_data/clean --which_dataset=cl ``` #!/bin/bash export install_path=/usr/local/Ascend/ascend-toolkit/latest - export PATH=/usr/local/python3.7.5/bin:${install_path}/atc/ccec_compiler/bin:${install_path}/atc/bin:$PATH + export PATH=/usr/local/python3.7.5/bin:${install_path}/atc/ccec_compiler/bin:${install_path}/atc/bin:$PATH export PYTHONPATH=${install_path}/atc/python/site-packages:${install_path}/pyACL/python/site-packages/acl:$PYTHONPATH - export LD_LIBRARY_PATH=${install_path}/atc/lib64:${install_path}/acllib/lib64:$LD_LIBRARY_PATH + export LD_LIBRARY_PATH=${install_path}/atc/lib64:${install_path}/acllib/lib64:$LD_LIBRARY_PATH export ASCEND_OPP_PATH=${install_path}/opp - atc --framework=5 --model=./models/wav2vec2-base-960h.onnx --output=./models/wav2vec2-base-960h --input_format=ND --input_shape="input:1,-1" --dynamic_dims="10000;20000;30000;40000;50000;60000;70000;80000;90000;100000;110000;120000;130000;140000;150000;160000;170000;180000;190000;200000;210000;220000;230000;240000;250000;260000;270000;280000;290000;300000;310000;320000;330000;340000;350000;360000;370000;380000;390000;400000;410000;420000;430000;440000;450000;460000;470000;480000;490000;500000;510000;520000;530000;540000;550000;560000" --log=error --soc_version=Ascend710 + atc --framework=5 --model=./models/wav2vec2-base-960h.onnx --output=./models/wav2vec2-base-960h --input_format=ND --input_shape="input:1,-1" --dynamic_dims="10000;20000;30000;40000;50000;60000;70000;80000;90000;100000;110000;120000;130000;140000;150000;160000;170000;180000;190000;200000;210000;220000;230000;240000;250000;260000;270000;280000;290000;300000;310000;320000;330000;340000;350000;360000;370000;380000;390000;400000;410000;420000;430000;440000;450000;460000;470000;480000;490000;500000;510000;520000;530000;540000;550000;560000" --log=error --soc_version=$1 ``` 参数说明: @@ -81,8 +81,12 @@ python3.7 preprocess.py --pre_data_save_path=./pre_data/clean --which_dataset=cl 2. 执行onnx2om.sh脚本,将onnx文件转为离线推理模型文件om文件。 + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` - bash onnx2om.sh + bash onnx2om.sh Ascend${chip_name} # Ascend310P ``` 运行成功后在`models`目录下生成`wav2vec2-base-960h.om`模型文件。 @@ -101,7 +105,7 @@ python3.7 preprocess.py --pre_data_save_path=./pre_data/clean --which_dataset=cl `install_path`请修改为Toolkit的实际安装路径。 - 2. 运行`pyacl_infer.py`进行推理,同时输出推理性能数据。 + 1. 运行`pyacl_infer.py`进行推理,同时输出推理性能数据。 ``` python3.7 pyacl_infer.py \ @@ -130,7 +134,7 @@ python3.7 preprocess.py --pre_data_save_path=./pre_data/clean --which_dataset=cl - e.g. 模型有多个输入:--input_dtypes=float32,float32,float32(需要和bin_info文件多输入排列一致) - --infer_res_save_path:推理结果保存目录 - --res_save_type:推理结果保存类型,bin或npy - 3. 推理数据后处理与精度统计。 + 1. 推理数据后处理与精度统计。 运行`postprocess.py`,会进行推理数据后处理,并进行精度统计。 @@ -147,7 +151,7 @@ python3.7 preprocess.py --pre_data_save_path=./pre_data/clean --which_dataset=cl - --res_save_path:后处理结果存放txt文件 - --which_dataset:精度统计所用的数据集,参看preprocess.py的参数说明 -4. 性能测试 +3. 性能测试 由于TensorRT无法运行`wav2vec2-base-960h.onnx`模型,所以性能测试以pyacl得到的om推理性能和pytorch在线推理性能作比较。 diff --git a/ACL_PyTorch/built-in/nlp/CNN_Transformer_for_Pytorch/onnx2om.sh b/ACL_PyTorch/built-in/nlp/CNN_Transformer_for_Pytorch/onnx2om.sh index 1ee1ae734c908af3f09b50ed901d72a0454e5d5e..c78afe982d5b4316397c0a7fe6b0421e88c6103a 100644 --- a/ACL_PyTorch/built-in/nlp/CNN_Transformer_for_Pytorch/onnx2om.sh +++ b/ACL_PyTorch/built-in/nlp/CNN_Transformer_for_Pytorch/onnx2om.sh @@ -6,4 +6,4 @@ export PYTHONPATH=${install_path}/atc/python/site-packages:${install_path}/pyACL export LD_LIBRARY_PATH=${install_path}/atc/lib64:${install_path}/acllib/lib64:$LD_LIBRARY_PATH export ASCEND_OPP_PATH=${install_path}/opp -atc --framework=5 --model=./models/wav2vec2-base-960h.onnx --output=./models/wav2vec2-base-960h --input_format=ND --input_shape="input:1,-1" --dynamic_dims="10000;20000;30000;40000;50000;60000;70000;80000;90000;100000;110000;120000;130000;140000;150000;160000;170000;180000;190000;200000;210000;220000;230000;240000;250000;260000;270000;280000;290000;300000;310000;320000;330000;340000;350000;360000;370000;380000;390000;400000;410000;420000;430000;440000;450000;460000;470000;480000;490000;500000;510000;520000;530000;540000;550000;560000" --log=error --soc_version=Ascend710 +atc --framework=5 --model=./models/wav2vec2-base-960h.onnx --output=./models/wav2vec2-base-960h --input_format=ND --input_shape="input:1,-1" --dynamic_dims="10000;20000;30000;40000;50000;60000;70000;80000;90000;100000;110000;120000;130000;140000;150000;160000;170000;180000;190000;200000;210000;220000;230000;240000;250000;260000;270000;280000;290000;300000;310000;320000;330000;340000;350000;360000;370000;380000;390000;400000;410000;420000;430000;440000;450000;460000;470000;480000;490000;500000;510000;520000;530000;540000;550000;560000" --log=error --soc_version=$1 diff --git a/ACL_PyTorch/built-in/nlp/textcnn/README.md b/ACL_PyTorch/built-in/nlp/textcnn/README.md index 0d1a1be488841150f7e7f32d8b5d0f8d9c547e29..3806cc8b33da1132f3623050accf0af11ce61237 100644 --- a/ACL_PyTorch/built-in/nlp/textcnn/README.md +++ b/ACL_PyTorch/built-in/nlp/textcnn/README.md @@ -32,12 +32,16 @@ python3 TextCNN_pth2onnx.py --weight_path ./TextCNN_9045_seed460473.pth --onnx_p 3. 转om -``` -cd .. -bash onnxsim.sh -bash onnx2mgonnx.sh -bash onnx2om.sh -``` + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + + ``` + cd .. + bash onnxsim.sh + bash onnx2mgonnx.sh + bash onnx2om.sh Ascend${chip_name} # Ascend310P3 + ``` 4. 后处理得到精度 @@ -56,7 +60,7 @@ python3 ascend-textcnn/TextCNN_postprocess.py result/dumpOutput_device0 >result_ ``` ## 3 自验 -| 模型 | 官网精度 | 710离线推理精度 | 710性能 | +| 模型 | 官网精度 | 310P离线推理精度 | 310P性能 | |--------------|--------|-----------|-------| | Textcnn 64bs | [91.22%](https://gitee.com/huangyd8/Chinese-Text-Classification-Pytorch) | 90.47% | 27242.83 | diff --git a/ACL_PyTorch/built-in/nlp/textcnn/onnx2om.sh b/ACL_PyTorch/built-in/nlp/textcnn/onnx2om.sh index bc9eefcb40734b206e80d67e57da967558fdadc8..30704a487b39b5ad158166383bda390a2b33090a 100644 --- a/ACL_PyTorch/built-in/nlp/textcnn/onnx2om.sh +++ b/ACL_PyTorch/built-in/nlp/textcnn/onnx2om.sh @@ -7,5 +7,5 @@ fi for i in 4 8 16 32 64 do - atc --model=mg_onnx_dir/textcnn_${i}bs_mg.onnx --framework=5 --output=mg_om_dir/textcnn_${i}bs_mg --output_type=FP16 --soc_version=Ascend710 --enable_small_channel=1 + atc --model=mg_onnx_dir/textcnn_${i}bs_mg.onnx --framework=5 --output=mg_om_dir/textcnn_${i}bs_mg --output_type=FP16 --soc_version=$1 --enable_small_channel=1 done diff --git a/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/README.md b/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/README.md index a5d5732a9ccbaf64a445e5588b9f6970aad01cce..8eab74e2c9b2141fa0dd19faa41be2e139f57d64 100644 --- a/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/README.md +++ b/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/README.md @@ -100,13 +100,17 @@ python fix_conv1d.py ecapa_tdnn.onnx ecapa_tdnn_sim.onnx ``` ### 2.2 onnx模型转om模型,以batch_size=16为例 -在710环境下,运行to_om.sh脚本,其中--model和--output参数自行修改,下面仅作参考 +在310P环境下,运行to_om.sh脚本,其中--model和--output参数自行修改,下面仅作参考 + +${chip_name}可通过`npu-smi info`指令查看 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) ``` sudo apt install dos2unix dos2unix ./*.sh chmod +x *.sh -./to_om.sh +./to_om.sh Ascend${chip_name} # Ascend310P3 ``` ## 3.数据集预处理 @@ -117,11 +121,11 @@ chmod +x *.sh python preprocess.py VoxCeleb input_bs4/ speaker_bs4/ 4 ``` -执行完成后将Ecapa_Tdnn/input_bs4/下内容传至710环境中 +执行完成后将Ecapa_Tdnn/input_bs4/下内容传至310P环境中 ## 4.模型推理 -在710环境中,cd至msame文件夹下含.masame文件的路径下 +在310P环境中,cd至msame文件夹下含.masame文件的路径下 执行推理,其中--model为之前转化好的bs为4的om模型,--input为第三步中得到的前处理后的数据路径 @@ -177,7 +181,7 @@ trtexec --onnx=ecapa_tdnn.onnx --fp16 --shapes=mel:4x80x200 ./msame --model "om/ecapa_tdnn_bs4.om" --output "result" --outfmt TXT --loop 100 ``` -| Model | batch_size | T4Throughput/Card | 710Throughput/Card | +| Model | batch_size | T4Throughput/Card | 310PThroughput/Card | |------------|------------|-------------------|--------------------| | ECAPA-TDNN | 1 | 485.43 | 764.52 | | ECAPA-TDNN | 4 | 705.46 | 1408.45 | diff --git a/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/test/pth2om.sh b/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/test/pth2om.sh index 52afb83c0014b4efff41560ebe97f866f23ceb48..cb8a558e1f237b5d0d59a585d889ccad6ba53daf 100644 --- a/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/test/pth2om.sh +++ b/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/test/pth2om.sh @@ -7,19 +7,19 @@ rm -rf ./om mkdir om echo om_bs=1 -atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=om/ecapa_tdnn_bs1 --input_format=ND --input_shape="mel:1,80,200" --log=debug --soc_version=Ascend710>after_bs1.log +atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=om/ecapa_tdnn_bs1 --input_format=ND --input_shape="mel:1,80,200" --log=debug --soc_version=$1>after_bs1.log echo om_bs=8 -atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=om/ecapa_tdnn_bs8 --input_format=ND --input_shape="mel:8,80,200" --log=debug --soc_version=Ascend710>after_bs8.log +atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=om/ecapa_tdnn_bs8 --input_format=ND --input_shape="mel:8,80,200" --log=debug --soc_version=$1>after_bs8.log echo om_bs=16 -atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=om/ecapa_tdnn_bs16 --input_format=ND --input_shape="mel:16,80,200" --log=debug --soc_version=Ascend710>after_bs16.log +atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=om/ecapa_tdnn_bs16 --input_format=ND --input_shape="mel:16,80,200" --log=debug --soc_version=$1>after_bs16.log echo om_bs=32 -atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=om/ecapa_tdnn_bs32 --input_format=ND --input_shape="mel:32,80,200" --log=debug --soc_version=Ascend710>after_bs32.log +atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=om/ecapa_tdnn_bs32 --input_format=ND --input_shape="mel:32,80,200" --log=debug --soc_version=$1>after_bs32.log echo om_bs=64 -atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=om/ecapa_tdnn_bs64 --input_format=ND --input_shape="mel:64,80,200" --log=debug --soc_version=Ascend710>after_bs64.log +atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=om/ecapa_tdnn_bs64 --input_format=ND --input_shape="mel:64,80,200" --log=debug --soc_version=$1>after_bs64.log echo om_bs=4 rm -rf ./om_aoe @@ -39,4 +39,4 @@ chmod 777 ./aoe_result_bs4 aoe --model=ecapa_tdnn_sim.onnx --framework=5 --input_format=ND --output=./om_aoe/ecapa_tdnn_bs4_jt1 --job_type=1 --input_shape="mel:4,80,200" aoe --model=ecapa_tdnn_sim.onnx --framework=5 --input_format=ND --output=./om_aoe/ecapa_tdnn_bs4_jt12 --job_type=2 --input_shape="mel:4,80,200" -atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=./om/ecapa_tdnn_bs4 --input_format=ND --input_shape="mel:4,80,200" --log=debug --soc_version=Ascend710 +atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=./om/ecapa_tdnn_bs4 --input_format=ND --input_shape="mel:4,80,200" --log=debug --soc_version=$1 diff --git a/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/to_om.sh b/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/to_om.sh index e49e75fcf96d21cf06ce95909adb4d1fb2e6d70f..41363036ed4fca98768b4979a5385c143b12dc1a 100644 --- a/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/to_om.sh +++ b/ACL_PyTorch/contrib/audio/Ecapa_Tdnn/to_om.sh @@ -13,4 +13,4 @@ mkdir om chmod 777 ./aoe_result_bs4 -atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=./om/ecapa_tdnn_bs4 --input_format=ND --input_shape="mel:4,80,200" --log=debug --soc_version=Ascend710 \ No newline at end of file +atc --framework=5 --model=ecapa_tdnn_sim.onnx --output=./om/ecapa_tdnn_bs4 --input_format=ND --input_shape="mel:4,80,200" --log=debug --soc_version=$1 \ No newline at end of file diff --git a/ACL_PyTorch/contrib/cv/classfication/Shufflenetv2+/README.md b/ACL_PyTorch/contrib/cv/classfication/Shufflenetv2+/README.md index b2e7bf5d221fe58f5d4fc1cef88fda8a631dc92b..41b20540980458b8721fcd5199136c9bfdd37756 100644 --- a/ACL_PyTorch/contrib/cv/classfication/Shufflenetv2+/README.md +++ b/ACL_PyTorch/contrib/cv/classfication/Shufflenetv2+/README.md @@ -1,25 +1,27 @@ Shufflenetv2+ Onnx模型端到端推理指导 -- [1 模型概述](#1-模型概述) - - [1.1 论文地址](#11-论文地址) - - [1.2 代码地址](#12-代码地址) -- [2 环境说明](#2-环境说明) - - [2.1 深度学习框架](#21-深度学习框架) - - [2.2 python第三方库](#22-python第三方库) -- [3 模型转换](#3-模型转换) - - [3.1 pth转onnx模型](#31-pth转onnx模型) - - [3.2 onnx转om模型](#32-onnx转om模型) -- [4 数据集预处理](#4-数据集预处理) - - [4.1 数据集获取](#41-数据集获取) - - [4.2 数据集预处理](#42-数据集预处理) - - [4.3 生成数据集信息文件](#43-生成数据集信息文件) -- [5 离线推理](#5-离线推理) - - [5.1 benchmark工具概述](#51-benchmark工具概述) - - [5.2 离线推理](#52-离线推理) -- [6 精度对比](#6-精度对比) - - [6.1 离线推理TopN精度统计](#61-离线推理TopN精度统计) - - [6.2 开源TopN精度](#62-开源TopN精度) - - [6.3 精度对比](#63-精度对比) +- [1 模型概述](#1-模型概述) + - [1.1 论文地址](#11-论文地址) + - [1.2 代码地址](#12-代码地址) +- [2 环境说明](#2-环境说明) + - [2.1 深度学习框架](#21-深度学习框架) + - [2.2 python第三方库](#22-python第三方库) +- [3 模型转换](#3-模型转换) + - [3.1 pth转onnx模型](#31-pth转onnx模型) + - [3.2 onnx转om模型](#32-onnx转om模型) +- [4 数据集预处理](#4-数据集预处理) + - [4.1 数据集获取](#41-数据集获取) + - [4.2 数据集预处理](#42-数据集预处理) + - [4.3 生成数据集信息文件](#43-生成数据集信息文件) +- [5 离线推理](#5-离线推理) + - [5.1 benchmark工具概述](#51-benchmark工具概述) + - [5.2 离线推理](#52-离线推理) +- [6 精度对比](#6-精度对比) + - [6.1 离线推理TopN精度统计](#61-离线推理topn精度统计) + - [6.2 开源TopN精度](#62-开源topn精度) + - [6.3 精度对比](#63-精度对比) +- [7 性能对比](#7-性能对比) + - [7.1 npu性能数据](#71-npu性能数据) @@ -110,7 +112,7 @@ python3.7 shufflenetv2_pth2onnx.py ShuffleNetV2+.Small.pth.tar shufflenetv2_bs1. ### 3.2 onnx转om模型 -1.设置环境变量 +1. 设置环境变量 ``` source /usr/local/Ascend/ascend-toolkit/set_env.sh ``` @@ -118,13 +120,16 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh **说明:** >环境变量影响atc命令是否成功,在测试时如报错需验证环境变量的正确性 -2.使用atc将onnx模型转换为om模型文件,工具使用方法可以参考CANN 5.0.1 开发辅助工具指南 (推理) 01 +2. 使用atc将onnx模型转换为om模型文件,工具使用方法可以参考CANN 5.0.1 开发辅助工具指南 (推理) 01 ``` -atc --framework=5 --model=./shufflenetv2_bs1.onnx --input_format=NCHW --input_shape="image:1,3,224,224" --output=shufflenetv2_bs1 --log=debug --soc_version=Ascend310 +atc --framework=5 --model=./shufflenetv2_bs1.onnx --input_format=NCHW --input_shape="image:1,3,224,224" --output=shufflenetv2_bs1 --log=debug --soc_version=Ascend${chip_name} # Ascend310P3 ``` 针对不同bs的onnx模型,需修改--model, --input_shape, --output 三个参数中的bs值; -针对不同的芯片(310/310P),需修改参数--soc_version,分别为Ascend310、Ascend710; +针对不同的芯片(310/310P),需修改参数--soc_version,${chip_name}可通过`npu-smi info`指令查看; + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ## 4 数据集预处理 - **[数据集获取](#41-数据集获取)** diff --git a/ACL_PyTorch/contrib/cv/classfication/convmixer_1536_20/README.md b/ACL_PyTorch/contrib/cv/classfication/convmixer_1536_20/README.md index 6caf0b8617ef1bea6f1c9f9dae4054441c3c02d5..52a3801f45a3b9341d5d63a94f6be0fbf154f462 100644 --- a/ACL_PyTorch/contrib/cv/classfication/convmixer_1536_20/README.md +++ b/ACL_PyTorch/contrib/cv/classfication/convmixer_1536_20/README.md @@ -52,9 +52,9 @@ Pillow==9.0.1 > 其他第三方库: 可以通过 pip3.7 install -r requirements.txt 进行安装 ## 3. 模型转换 -一步式从pth.tar权重文件转om模型的脚本,能够由pth.tar权重文件生成bacth为1的om模型: +一步式从pth.tar权重文件转om模型的脚本,能够由pth.tar权重文件生成bacth为1的om模型,${chip_name}为实际芯片版本,可通过`npu-smi info`指令查看: ```bash -bash ./test/pth2om.sh --batch_size=1 --not_skip_onnx=true +bash ./test/pth2om.sh --batch_size=1 --not_skip_onnx=true --soc_version=Ascend${chip_name} ``` 运行后会生成如下文件: ```bash @@ -78,10 +78,15 @@ python3.7 convmixer_pth2onnx.py --source "./convmixer_1536_20_ks9_p7.pth.tar" -- 其中"source"表示模型加载权重的地址和名称,"target"表示转换后生成的onnx模型的存储地址和名称 ### 3.2 onnx转om模型 + 1. 使用atc将onnx模型转换为om模型文件,工具使用方法可以参考[CANN V100R020C10 开发辅助工具指南 (推理) 01](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/51RC2alpha002/infacldevg/atctool) + ${chip_name}可通过`npu-smi info`指令查看 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ```bash -atc --framework=5 --model=./convmixer_1536_20.onnx --output=./convmixer_1536_20_bs1 --input_format=NCHW --input_shape="image:1,3,224,224" --log=error --soc_version=Ascend710 --op_select_implmode=high_performance --optypelist_for_implmode="Gelu" +atc --framework=5 --model=./convmixer_1536_20.onnx --output=./convmixer_1536_20_bs1 --input_format=NCHW --input_shape="image:1,3,224,224" --log=error --soc_version=Ascend${chip_name} --op_select_implmode=high_performance --optypelist_for_implmode="Gelu" # Ascend310P3 ``` ## 4. 数据预处理 diff --git a/ACL_PyTorch/contrib/cv/classfication/convmixer_1536_20/test/pth2om.sh b/ACL_PyTorch/contrib/cv/classfication/convmixer_1536_20/test/pth2om.sh index de1e3064d62fb0fa6089bc549f7b799c4fe7dffa..dbef42784205a42240f1105e1ca858651bb0688e 100644 --- a/ACL_PyTorch/contrib/cv/classfication/convmixer_1536_20/test/pth2om.sh +++ b/ACL_PyTorch/contrib/cv/classfication/convmixer_1536_20/test/pth2om.sh @@ -13,6 +13,9 @@ do if [[ $para == --not_skip_onnx* ]]; then not_skip_onnx=`echo ${para#*=}` fi + if [[ $para == --soc_version* ]]; then + soc_version=`echo ${para#*=}` + fi done # ======================= convert onnx ======================================= @@ -35,7 +38,7 @@ rm -rf convmixer_1536_20_bs${batch_size}.om atc --framework=5 --model=./convmixer_1536_20.onnx \ --output=./convmixer_1536_20_bs${batch_size} \ --input_format=NCHW --input_shape="image:${batch_size},3,224,224" \ - --log=error --soc_version=Ascend710 \ + --log=error --soc_version=${soc_version} \ --op_select_implmode=high_performance --optypelist_for_implmode="Gelu" if [ -f "convmixer_1536_20_bs${batch_size}.om" ] ; then echo "==> 2. creating om model successfully." diff --git a/ACL_PyTorch/contrib/cv/detection/FCENet/readme.md b/ACL_PyTorch/contrib/cv/detection/FCENet/readme.md index 70b1f0ba23d5f00cbff32d78f7506815942068a4..7cbc1a6e743e009531f3241c43af8c64605603ea 100644 --- a/ACL_PyTorch/contrib/cv/detection/FCENet/readme.md +++ b/ACL_PyTorch/contrib/cv/detection/FCENet/readme.md @@ -31,14 +31,19 @@ cd .. 将msame工具放到当前工作目录下。 ### **2. 离线推理** -710上执行,执行时使npu-smi info查看设备状态,确保device空闲 + +310P上执行,执行时使npu-smi info查看设备状态,确保device空闲 + +${chip_name}可通过`npu-smi info`指令查看,例:310P3 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) ``` -bash test/pth2om.sh --batch_size=1 #转onnx、om模型 +bash test/pth2om.sh --batch_size=1 --soc_version=Ascend${chip_name} #转onnx、om模型 bash test/eval_acc_perf.sh #精度统计 ``` **评测结果** -| 模型 | pth精度 | 710离线推理精度 | 710性能 | +| 模型 | pth精度 | 310P离线推理精度 | 310P性能 | |-------------|-------|-----------|------------| | FCENet bs1 | 0.880 | 0.872 | fps 39.404 | diff --git a/ACL_PyTorch/contrib/cv/detection/FCENet/test/pth2om.sh b/ACL_PyTorch/contrib/cv/detection/FCENet/test/pth2om.sh index 1669085f86fb1de74bbbcec500c2ad58c99af2ac..ecd52ae70b1170f0ba502ed173af2b6966ce2f2f 100644 --- a/ACL_PyTorch/contrib/cv/detection/FCENet/test/pth2om.sh +++ b/ACL_PyTorch/contrib/cv/detection/FCENet/test/pth2om.sh @@ -5,6 +5,9 @@ do if [[ $para == --batch_size* ]]; then batch_size=`echo ${para#*=}` fi + if [[ $para == --soc_version* ]]; then + soc_version=`echo ${para#*=}` + fi done if [ -f "fcenet_dynamicbs.onnx" ]; then @@ -28,7 +31,7 @@ rm -f fcenet.om source /usr/local/Ascend/ascend-toolkit/set_env.sh atc --framework=5 --model=./fcenet_dynamicbs.onnx --output=./fcenet_bs$batch_size --input_format=NCHW \ ---input_shape="input:$batch_size,3,1280,2272" --log=error --soc_version=Ascend710 +--input_shape="input:$batch_size,3,1280,2272" --log=error --soc_version=${soc_version} if [ -f "fcenet_bs$batch_size.om" ]; then echo "success" diff --git a/ACL_PyTorch/contrib/cv/detection/RefineDet/README.md b/ACL_PyTorch/contrib/cv/detection/RefineDet/README.md index e2153e4e22855270ac8b027e74efb8eb0379172c..8d78f666e08830b7ce3c4d64c5be9ce51a8825f8 100644 --- a/ACL_PyTorch/contrib/cv/detection/RefineDet/README.md +++ b/ACL_PyTorch/contrib/cv/detection/RefineDet/README.md @@ -128,20 +128,14 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh b. 执行命令得到om模型 -Ascend710: +${chip_name}可通过`npu-smi info`指令查看 -``` -atc --framework=5 --out_nodes="Reshape_239:0;Softmax_246:0;Reshape_226:0;Softmax_233:0" --model=RefineDet320_VOC_final_no_nms.onnx ---output=refinedet_voc_320_non_nms_bs1_710 --input_format=NCHW --input_shape="image:1,3,320,320" --log=debug ---soc_version=Ascend710 --precision_mode allow_fp32_to_fp16 -``` - -Ascend310: +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) ``` atc --framework=5 --out_nodes="Reshape_239:0;Softmax_246:0;Reshape_226:0;Softmax_233:0" --model=RefineDet320_VOC_final_no_nms.onnx ---output=refinedet_voc_320_non_nms_bs1_310 --input_format=NCHW --input_shape="image:1,3,320,320" --log=debug ---soc_version=Ascend310 --precision_mode allow_fp32_to_fp16 +--output=refinedet_voc_320_non_nms_bs1 --input_format=NCHW --input_shape="image:1,3,320,320" --log=debug +--soc_version=Ascend${chip_name} --precision_mode allow_fp32_to_fp16 # Ascend310P3 ``` 提示:切换不同batchsize时需要修改参数--output中的文件名"和--input_shape="batchsize,3,320,320" @@ -159,7 +153,7 @@ atc --framework=5 --out_nodes="Reshape_239:0;Softmax_246:0;Reshape_226:0;Softmax 若atc执行出错,错误代码为E10016,请使用Netron工具查看对应Reshape节点和Softmax节点,并修改代码。 ``` -3. 开始推理验证 +1. 开始推理验证 a. 使用Benchmark工具进行推理 @@ -176,12 +170,11 @@ benchmark.${arch}:选择对应操作系统的benchmark工具,如benchmark.x8 b. 执行推理 ``` -./benchmark.x86_64 -model_type=vision -device_id=0 -batch_size=1 -om_path=./refinedet_voc_320_non_nms_bs1_710.om +./benchmark.x86_64 -model_type=vision -device_id=0 -batch_size=1 -om_path=./refinedet_voc_320_non_nms_bs1.om -input_text_path=./voc07test.info -input_width=320 -input_height=320 -output_binary=True -useDvpp=False ``` - 切换不同batchsize时需要修改参数 -batch_size=和对应的om文件名即参数 -om_path -- 默认推理的是Ascend710处理器下转换的om文件,如要推理Ascend310处理器下转换的om文件,应当修改对应参数"--om_path="的后缀为310 推理后的输出默认在当前目录result下。 @@ -225,7 +218,7 @@ python3.7 RefineDet_postprocess.py --datasets_path '/root/datasets/VOCdevkit/' - **评测结果:** -| 模型 | pth精度 | 310精度 |310性能 |710精度 |710性能 +| 模型 | pth精度 | 310精度 |310性能 |310P精度 |310P性能 |:--------------:| :------: | :------: | :------: | :------: | :------: | | RefineDet bs1 | [mAP:79.81%](https://github.com/luuuyi/RefineDet.PyTorch) | mAP:79.56%|166.06fps|mAP:79.58%|269.125fps | RefineDet bs32 | [mAP:79.81%](https://github.com/luuuyi/RefineDet.PyTorch) |mAP:79.56% | 232.18fps|mAP:79.58%|374.522fps diff --git a/ACL_PyTorch/contrib/cv/detection/Retinanet/README.md b/ACL_PyTorch/contrib/cv/detection/Retinanet/README.md index a4443fbf01b8aeeacf8af44dec9598633a9ffda0..24dcd2d1e7bb7947f629d1dd230dd614cb959f54 100644 --- a/ACL_PyTorch/contrib/cv/detection/Retinanet/README.md +++ b/ACL_PyTorch/contrib/cv/detection/Retinanet/README.md @@ -1,26 +1,27 @@ # 基于detectron2训练的retinanet Onnx模型端到端推理指导 -- [1 模型概述](#1-模型概述) - - [1.1 论文地址](#11-论文地址) - - [1.2 代码地址](#12-代码地址) -- [2 环境说明](#2-环境说明) - - [2.1 深度学习框架](#21-深度学习框架) - - [2.2 python第三方库](#22-python第三方库) -- [3 模型转换](#3-模型转换) - - [3.1 pth转onnx模型](#31-pth转onnx模型) - - [3.2 onnx转om模型](#32-onnx转om模型) -- [4 数据集预处理](#4-数据集预处理) - - [4.1 数据集获取](#41-数据集获取) - - [4.2 数据集预处理](#42-数据集预处理) - - [4.3 生成数据集信息文件](#43-生成数据集信息文件) -- [5 离线推理](#5-离线推理) - - [5.1 benchmark工具概述](#51-benchmark工具概述) - - [5.2 离线推理](#52-离线推理) -- [6 精度对比](#6-精度对比) - - [6.1 离线推理精度统计](#61-离线推理精度统计) - - [6.2 开源精度](#62-开源精度) - - [6.3 精度对比](#63-精度对比) -- [7 性能对比](#7-性能对比) - - [7.1 npu性能数据](#71-npu性能数据) +- [基于detectron2训练的retinanet Onnx模型端到端推理指导](#基于detectron2训练的retinanet-onnx模型端到端推理指导) + - [1 模型概述](#1-模型概述) + - [1.1 论文地址](#11-论文地址) + - [1.2 代码地址](#12-代码地址) + - [2 环境说明](#2-环境说明) + - [2.1 深度学习框架](#21-深度学习框架) + - [2.2 python第三方库](#22-python第三方库) + - [3 模型转换](#3-模型转换) + - [3.1 pth转onnx模型](#31-pth转onnx模型) + - [3.2 onnx转om模型](#32-onnx转om模型) + - [4 数据集预处理](#4-数据集预处理) + - [4.1 数据集获取](#41-数据集获取) + - [4.2 数据集预处理](#42-数据集预处理) + - [4.3 生成预处理数据集信息文件](#43-生成预处理数据集信息文件) + - [5 离线推理](#5-离线推理) + - [5.1 benchmark工具概述](#51-benchmark工具概述) + - [5.2 离线推理](#52-离线推理) + - [6 精度对比](#6-精度对比) + - [6.1 离线推理精度统计](#61-离线推理精度统计) + - [6.2 开源精度](#62-开源精度) + - [6.3 精度对比](#63-精度对比) + - [7 性能对比](#7-性能对比) + - [7.1 npu性能数据](#71-npu性能数据) ## 1 模型概述 @@ -168,16 +169,20 @@ export ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest/ 4. 生成的retinanet_revise.onnx和retinanet_int8_revise.onnx即为用于转om离线模型的onnx文件 -3. 使用atc将onnx模型(包括量化模型和非量化模型)转换为om模型文件,工具使用方法可以参考[CANN V100R020C10 开发辅助工具指南 (推理) 01](https://support.huawei.com/enterprise/zh/doc/EDOC1100164868?idPath=23710424%7C251366513%7C22892968%7C251168373),需要指定输出节点以去除无用输出,使用netron开源可视化工具查看具体的输出节点名,如使用的设备是710,则将--soc_version设置为Ascend710: +4. 使用atc将onnx模型(包括量化模型和非量化模型)转换为om模型文件,工具使用方法可以参考[CANN V100R020C10 开发辅助工具指南 (推理) 01](https://support.huawei.com/enterprise/zh/doc/EDOC1100164868?idPath=23710424%7C251366513%7C22892968%7C251168373),需要指定输出节点以去除无用输出,使用netron开源可视化工具查看具体的输出节点名: + +${chip_name}可通过`npu-smi info`指令查看 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) ```shell -atc --model=retinanet_revise.onnx --framework=5 --output=retinanet_detectron2_npu --input_format=NCHW --input_shape="input0:1,3,1344,1344" --out_nodes="Cast_1224:0;Reshape_1218:0;Gather_1226:0" --log=info --soc_version=Ascend310 +atc --model=retinanet_revise.onnx --framework=5 --output=retinanet_detectron2_npu --input_format=NCHW --input_shape="input0:1,3,1344,1344" --out_nodes="Cast_1224:0;Reshape_1218:0;Gather_1226:0" --log=info --soc_version=Ascend${chip_name} # Ascend310P3 ``` 量化模型转om(注意输出节点名字已改变,使用netron打开后手动修改) ``` -atc --model=retinanet_int8_revise.onnx --framework=5 --output=retinanet_detectron2_npu --input_format=NCHW --input_shape="input0:1,3,1344,1344" --out_nodes="Cast_1229_sg2:0;Reshape_1223_sg2:0;Gather_1231_sg2:0" --log=info --soc_version=Ascend310 +atc --model=retinanet_int8_revise.onnx --framework=5 --output=retinanet_detectron2_npu --input_format=NCHW --input_shape="input0:1,3,1344,1344" --out_nodes="Cast_1229_sg2:0;Reshape_1223_sg2:0;Gather_1231_sg2:0" --log=info --soc_version=Ascend${chip_name} # Ascend310P3 ``` @@ -340,7 +345,7 @@ AP,AP50,AP75,APs,APm,APl ### 6.3 精度对比 -310上om推理box map精度为0.384,官方开源pth推理box map精度为0.387,精度下降在1个点之内,因此可视为精度达标,710上fp16精度0.383, int8 0.382,可视为精度达标 +310上om推理box map精度为0.384,官方开源pth推理box map精度为0.387,精度下降在1个点之内,因此可视为精度达标,310P上fp16精度0.383, int8 0.382,可视为精度达标 ## 7 性能对比 @@ -349,7 +354,7 @@ AP,AP50,AP75,APs,APm,APl ### 7.1 npu性能数据 batch1的性能: -5.2步骤中,离线推理的Interface throughputRate即为吞吐量,对于310,需要乘以4,710只有一颗芯片,FPS为该值本身 +5.2步骤中,离线推理的Interface throughputRate即为吞吐量,对于310,需要乘以4,310P只有一颗芯片,FPS为该值本身 retinanet detectron2不支持多batch diff --git a/ACL_PyTorch/contrib/cv/detection/YOLOF/readme.md b/ACL_PyTorch/contrib/cv/detection/YOLOF/readme.md index 960872cef8254c43ebfd5b2a2906fa56c68c4d4f..0ef938f2b154cdaafff1b87e9d306accc0f8a587 100644 --- a/ACL_PyTorch/contrib/cv/detection/YOLOF/readme.md +++ b/ACL_PyTorch/contrib/cv/detection/YOLOF/readme.md @@ -32,16 +32,20 @@ cd .. ### 2. 离线推理 -710上执行,执行时使npu-smi info查看设备状态,确保device空闲 +310P上执行,执行时使npu-smi info查看设备状态,确保device空闲 + +${chip_name}可通过`npu-smi info`指令查看 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) ```bash -bash test/pth2om.sh --batch_size=1 +bash test/pth2om.sh --batch_size=1 --soc_version=Ascend${chip_name} # Ascend310P3 bash test/eval_acc_perf.sh --batch_size=1 ``` **评测结果:** -| 模型 | pth精度 | 710离线推理精度 | 710性能 | +| 模型 | pth精度 | 310P离线推理精度 | 310P性能 | | ---------- | ----------- | --------------- | ---------- | | YOLOF bs1 | box AP:50.9 | box AP:51.0 | fps 27.697 | | YOLOF bs16 | box AP:50.9 | box AP:51.0 | fps 38.069 | \ No newline at end of file diff --git a/ACL_PyTorch/contrib/cv/detection/YOLOF/test/pth2om.sh b/ACL_PyTorch/contrib/cv/detection/YOLOF/test/pth2om.sh index 3d89780778278e6566abd568033949484210fb3f..2cd9bc934668ce1ed1e2020fb15f991ff97fa5cd 100644 --- a/ACL_PyTorch/contrib/cv/detection/YOLOF/test/pth2om.sh +++ b/ACL_PyTorch/contrib/cv/detection/YOLOF/test/pth2om.sh @@ -8,6 +8,9 @@ do if [[ $para == --batch_size* ]]; then batch_size=`echo ${para#*=}` fi + if [[ $para == --soc_version* ]]; then + soc_version=`echo ${para#*=}` + fi done @@ -28,7 +31,7 @@ rm -f yolof.om source /usr/local/Ascend/ascend-toolkit/set_env.sh atc --framework=5 --model=yolof.onnx --output=yolof --input_format=NCHW \ ---input_shape="input:$batch_size,3,608,608" --log=error --soc_version=Ascend710 +--input_shape="input:$batch_size,3,608,608" --log=error --soc_version=${soc_version} if [ -f "yolof.om" ]; then echo "success" diff --git a/ACL_PyTorch/contrib/cv/detection/YOLOX-mmdetection/readme.md b/ACL_PyTorch/contrib/cv/detection/YOLOX-mmdetection/readme.md index 645cc073d17cc9c6290570fb32fdfdc09ecb5f4f..3fe7456efde62ee7f99d9e8cd0ad7a8931192337 100644 --- a/ACL_PyTorch/contrib/cv/detection/YOLOX-mmdetection/readme.md +++ b/ACL_PyTorch/contrib/cv/detection/YOLOX-mmdetection/readme.md @@ -33,16 +33,20 @@ cd .. ### 2. 离线推理 -710上执行,执行时使npu-smi info查看设备状态,确保device空闲 +310P上执行,执行时使npu-smi info查看设备状态,确保device空闲 + +${chip_name}可通过`npu-smi info`指令查看 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) ```bash -bash test/pth2om.sh --batch_size=1 +bash test/pth2om.sh --batch_size=1 --soc_version=Ascend${chip_name} # Ascend310P3 bash test/eval_acc_perf.sh --datasets_path=/root/datasets --batch_size=1 ``` **评测结果:** -| 模型 | pth精度 | 710离线推理精度 | 性能基准 | 710性能 | +| 模型 | pth精度 | 310P离线推理精度 | 性能基准 | 310P性能 | | ----------- | --------- | --------------- | --------- | ------- | | YOLOX bs1 | box AP:50.9 | box AP:51.0 | fps 11.828 | fps 27.697 | | YOLOX bs16 | box AP:50.9 | box AP:51.0 | fps 14.480 | fps 38.069 | diff --git a/ACL_PyTorch/contrib/cv/detection/YOLOX-mmdetection/test/pth2om.sh b/ACL_PyTorch/contrib/cv/detection/YOLOX-mmdetection/test/pth2om.sh index e7eece1b2736e18bcb5d1426afadb5bc606596d4..658b1ff477cef072153bafb0e371713adb5c41c1 100644 --- a/ACL_PyTorch/contrib/cv/detection/YOLOX-mmdetection/test/pth2om.sh +++ b/ACL_PyTorch/contrib/cv/detection/YOLOX-mmdetection/test/pth2om.sh @@ -8,6 +8,9 @@ do if [[ $para == --batch_size* ]]; then batch_size=`echo ${para#*=}` fi + if [[ $para == --soc_version* ]]; then + soc_version=`echo ${para#*=}` + fi done @@ -31,7 +34,7 @@ rm -f yolox.om source /usr/local/Ascend/ascend-toolkit/set_env.sh atc --framework=5 --model=yolox.onnx --output=yolox --input_format=NCHW \ ---input_shape="input:$batch_size,3,640,640" --log=error --soc_version=Ascend710 +--input_shape="input:$batch_size,3,640,640" --log=error --soc_version=${soc_version} if [ -f "yolox.om" ]; then echo "success" diff --git a/ACL_PyTorch/contrib/cv/face/Retinaface/README.md b/ACL_PyTorch/contrib/cv/face/Retinaface/README.md index 6e585c1f7033758dd357c426c9963f0fd01e9cb1..1cb019fc6f43022b7e3723f82fae28612fca04d1 100644 --- a/ACL_PyTorch/contrib/cv/face/Retinaface/README.md +++ b/ACL_PyTorch/contrib/cv/face/Retinaface/README.md @@ -1,28 +1,29 @@ # Retinface Onnx模型端到端推理指导 -- [1 模型概述](#1-模型概述) - - [1.1 论文地址](#11-论文地址) - - [1.2 代码地址](#12-代码地址) -- [2 环境说明](#2-环境说明) - - [2.1 深度学习框架](#21-深度学习框架) - - [2.2 python第三方库](#22-python第三方库) -- [3 模型转换](#3-模型转换) - - [3.1 pth转onnx模型](#31-pth转onnx模型) - - [3.2 onnx转om模型](#32-onnx转om模型) -- [4 数据集预处理](#4-数据集预处理) - - [4.1 数据集获取](#41-数据集获取) - - [4.2 数据集预处理](#42-数据集预处理) - - [4.3 生成数据集信息文件](#43-生成数据集信息文件) -- [5 离线推理](#5-离线推理) - - [5.1 benchmark工具概述](#51-benchmark工具概述) - - [5.2 离线推理](#52-离线推理) -- [6 精度对比](#6-精度对比) - - [6.1 离线推理精度统计](#61-离线推理精度统计) - - [6.2 开源精度](#62-开源精度) - - [6.3 精度对比](#63-精度对比) -- [7 性能对比](#7-性能对比) - - [7.1 npu性能数据](#71-npu性能数据) - - [7.2 T4性能数据](#72-T4性能数据) - - [7.3 性能对比](#73-性能对比) +- [Retinface Onnx模型端到端推理指导](#retinface-onnx模型端到端推理指导) + - [1 模型概述](#1-模型概述) + - [1.1 论文地址](#11-论文地址) + - [1.2 代码地址](#12-代码地址) + - [2 环境说明](#2-环境说明) + - [2.1 深度学习框架](#21-深度学习框架) + - [2.2 python第三方库](#22-python第三方库) + - [3 模型转换](#3-模型转换) + - [3.1 pth转onnx模型](#31-pth转onnx模型) + - [3.2 onnx转om模型](#32-onnx转om模型) + - [4 数据集预处理](#4-数据集预处理) + - [4.1 数据集获取](#41-数据集获取) + - [4.2 数据集预处理](#42-数据集预处理) + - [4.3 生成数据集信息文件](#43-生成数据集信息文件) + - [5 离线推理](#5-离线推理) + - [5.1 benchmark工具概述](#51-benchmark工具概述) + - [5.2 离线推理](#52-离线推理) + - [6 精度对比](#6-精度对比) + - [6.1 离线推理精度统计](#61-离线推理精度统计) + - [6.2 开源精度](#62-开源精度) + - [6.3 精度对比](#63-精度对比) + - [7 性能对比](#7-性能对比) + - [7.1 npu性能数据](#71-npu性能数据) + - [7.2 T4性能数据](#72-t4性能数据) + - [7.3 性能对比](#73-性能对比) @@ -106,8 +107,13 @@ python3.7 pth2onnx.py -m mobilenet0.25_Final.pth source /usr/local/Ascend/ascend-toolkit/set_env.sh ``` 2.使用atc工具将onnx模型转换为om模型文件,工具使用方法可以参考CANN 5.0.1 开发辅助工具指南 (推理) 01 + +${chip_name}可通过`npu-smi info`指令查看 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ``` -atc --framework 5 --model retinaface.onnx --input_shape "image:16,3,1000,1000" --soc_version Ascend710 --output retinaface_bs16 --log error --out-nodes="Concat_205:0;Softmax_206:0;Concat_155:0" --enable_small_channel=1 --insert_op_conf=./aipp.cfg +atc --framework 5 --model retinaface.onnx --input_shape "image:16,3,1000,1000" --soc_version Ascend${chip_name} --output retinaface_bs16 --log error --out-nodes="Concat_205:0;Softmax_206:0;Concat_155:0" --enable_small_channel=1 --insert_op_conf=./aipp.cfg # Ascend310P3 ``` ## 4 数据集预处理 diff --git a/ACL_PyTorch/contrib/cv/face/Retinaface/test/pth2om.sh b/ACL_PyTorch/contrib/cv/face/Retinaface/test/pth2om.sh index 9d9d0d6d529a22618ab2d78cec96c7087c10af5a..d7b0dfa64e8109537cae2b62cb4a963b6b1ded4a 100644 --- a/ACL_PyTorch/contrib/cv/face/Retinaface/test/pth2om.sh +++ b/ACL_PyTorch/contrib/cv/face/Retinaface/test/pth2om.sh @@ -9,8 +9,8 @@ if [ $? != 0 ]; then fi source test/env.sh rm -rf retinaface_bs1.om retinaface_bs16.om -atc --framework 5 --model retinaface.onnx --input_shape "image:1,3,1000,1000" --soc_version Ascend710 --output retinaface_bs1 --log error --out-nodes="Concat_205:0;Softmax_206:0;Concat_155:0" --enable_small_channel=1 --insert_op_conf=./aipp.cfg -atc --framework 5 --model retinaface.onnx --input_shape "image:16,3,1000,1000" --soc_version Ascend710 --output retinaface_bs16 --log error --out-nodes="Concat_205:0;Softmax_206:0;Concat_155:0" --enable_small_channel=1 --insert_op_conf=./aipp.cfg +atc --framework 5 --model retinaface.onnx --input_shape "image:1,3,1000,1000" --soc_version $1 --output retinaface_bs1 --log error --out-nodes="Concat_205:0;Softmax_206:0;Concat_155:0" --enable_small_channel=1 --insert_op_conf=./aipp.cfg +atc --framework 5 --model retinaface.onnx --input_shape "image:16,3,1000,1000" --soc_version $1 --output retinaface_bs16 --log error --out-nodes="Concat_205:0;Softmax_206:0;Concat_155:0" --enable_small_channel=1 --insert_op_conf=./aipp.cfg if [ -f "retinaface_bs1.om" ] && [ -f "retinaface_bs16.om" ]; then echo "success" diff --git a/ACL_PyTorch/contrib/cv/pose_estimation/UniFormer/readme.md b/ACL_PyTorch/contrib/cv/pose_estimation/UniFormer/readme.md index 016d16e098108495f69e23d87878c08d49b608e9..5a5c49a079ae98eb70b84fd2dc394b5a04bf182e 100644 --- a/ACL_PyTorch/contrib/cv/pose_estimation/UniFormer/readme.md +++ b/ACL_PyTorch/contrib/cv/pose_estimation/UniFormer/readme.md @@ -34,16 +34,20 @@ cd ../.. ### 2. 离线推理 -710上执行,执行时使npu-smi info查看设备状态,确保device空闲 +310P上执行,执行时使npu-smi info查看设备状态,确保device空闲 + +${chip_name}可通过`npu-smi info`指令查看 + +![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) ```bash -bash test/pth2om.sh --batch_size=1 +bash test/pth2om.sh --batch_size=1 --soc_version=Ascend${chip_name} # Ascend310P3 bash test/eval_acc_perf.sh --datasets_path=data/coco --batch_size=1 ``` **评测结果:** -| 模型 | pth精度 | 710离线推理精度 | 性能基准 | 710性能 | +| 模型 | pth精度 | 310P离线推理精度 | 性能基准 | 310P性能 | | ----------- | --------- | --------------- | --------- | ------- | | UniFormer bs1 | AP50=93.6 | AP50=93.5 | 88.914 fps | 162.601 fps | | UniFormer bs16 | AP50=93.6 | AP50=93.5 | 116.939 fps | 277.441 fps | diff --git a/ACL_PyTorch/contrib/cv/pose_estimation/UniFormer/test/pth2om.sh b/ACL_PyTorch/contrib/cv/pose_estimation/UniFormer/test/pth2om.sh index d0fb930b85736685b183f6b77e6030057ca9fab6..d20f1ac1a49735e61947567041b532b99d519807 100644 --- a/ACL_PyTorch/contrib/cv/pose_estimation/UniFormer/test/pth2om.sh +++ b/ACL_PyTorch/contrib/cv/pose_estimation/UniFormer/test/pth2om.sh @@ -8,6 +8,9 @@ do if [[ $para == --batch_size* ]]; then batch_size=`echo ${para#*=}` fi + if [[ $para == --soc_version* ]]; then + soc_version=`echo ${para#*=}` + fi done rm -f uniformer_dybs.onnx @@ -29,7 +32,7 @@ rm -f uniformer_bs$batch_size.om source /usr/local/Ascend/ascend-toolkit/set_env.sh atc --framework=5 --model=uniformer_dybs.onnx --output=uniformer_bs$batch_size --input_format=NCHW \ - --input_shape="input:$batch_size,3,256,192" --log=error --soc_version=Ascend710 + --input_shape="input:$batch_size,3,256,192" --log=error --soc_version=${soc_version} if [ -f "uniformer_bs$batch_size.om" ]; then echo "success" diff --git a/ACL_PyTorch/contrib/cv/super_resolution/RCAN/README.md b/ACL_PyTorch/contrib/cv/super_resolution/RCAN/README.md index 677804284da368e6f9284631fa76cbe4afe5903e..d31f9cd68222fe5330403d2a0ff6bffe8552a887 100644 --- a/ACL_PyTorch/contrib/cv/super_resolution/RCAN/README.md +++ b/ACL_PyTorch/contrib/cv/super_resolution/RCAN/README.md @@ -121,15 +121,15 @@ scipy == 1.7.3 | :--: | :---------------------------: | :---------------------: | | bs1 | 0.7245 | 9.3220 | + 3. 使用atc工具将onnx模型转换为om模型,命令参考 - 310: - ```bash - atc --framework=5 --model=rcan.onnx --output=rcan_1bs --input_format=NCHW --input_shape="image:1,3,256,256" --fusion_switch_file=switch.cfg --log=debug --soc_version=Ascend310 - ``` - 310p: + ${chip_name}可通过`npu-smi info`指令查看 + + ![Image](https://gitee.com/Ronnie_zheng/ascend-pytorch-crowdintelligence-doc/raw/master/Ascend-PyTorch%E7%A6%BB%E7%BA%BF%E6%8E%A8%E7%90%86%E6%8C%87%E5%AF%BC/images/310P3.png) + ```bash - atc --framework=5 --model=rcan.onnx --output=rcan_1bs --input_format=NCHW --input_shape="image:1,3,256,256" --fusion_switch_file=switch.cfg --log=debug --soc_version=Ascend710 + atc --framework=5 --model=rcan.onnx --output=rcan_1bs --input_format=NCHW --input_shape="image:1,3,256,256" --fusion_switch_file=switch.cfg --log=debug --soc_version=Ascend${chip_name} # Ascend310P3 ``` 此命令将在运行路径下生成一个rcan_1bs.om文件,此文件即为目标om模型文件