diff --git "a/AscendPyTorch\346\250\241\345\236\213\344\274\227\346\231\272\346\226\207\346\241\243-\347\246\273\347\272\277\346\216\250\347\220\206.md" "b/AscendPyTorch\346\250\241\345\236\213\344\274\227\346\231\272\346\226\207\346\241\243-\347\246\273\347\272\277\346\216\250\347\220\206.md"
index a5ecdc4a5dab6a5ad34d8160b4055f6ba1251ae3..dc11974c7846225dbe43ee39900e70c43ff119b5 100644
--- "a/AscendPyTorch\346\250\241\345\236\213\344\274\227\346\231\272\346\226\207\346\241\243-\347\246\273\347\272\277\346\216\250\347\220\206.md"
+++ "b/AscendPyTorch\346\250\241\345\236\213\344\274\227\346\231\272\346\226\207\346\241\243-\347\246\273\347\272\277\346\216\250\347\220\206.md"
@@ -1,4 +1,4 @@
-# Ascend PyTorch模型端到端推理指导
+# Ascend PyTorch 模型众智文档-离线推理
 -   [1 资源与端到端推理流程](#1-资源与端到端推理流程)
 	-   [1.1 Ascend文档与软件包网址](#11-Ascend文档与软件包网址)
 	-   [1.2 端到端推理流程与交付标准](#12-端到端推理流程与交付标准)
@@ -38,7 +38,7 @@
 
 **Ascend相关文档与软件发布在华为云[support地址](https://support.huawei.com/enterprise/zh/category/ascend-computing-pid-1557196528909)CANN和A300-3010**
 
->![](public_sys-resources/icon-note.gif) 
+>![](https://gitee.com/wangjiangben_hw/ascend-pytorch-crowdintelligence-doc/raw/master/public_sys-resources/icon-note.gif) 
 >**说明：** 
 >
 >开发者除编程语言知识外，应对如下基础知识有一定的了解和熟悉：
@@ -67,7 +67,7 @@ npu 310单颗芯片上模型推理性能的吞吐率乘以4颗即单卡吞吐率
 
 -   **[深度学习框架与第三方库](#22-深度学习框架与第三方库)**  
 
->![](public_sys-resources/icon-note.gif) 
+>![](https://gitee.com/wangjiangben_hw/ascend-pytorch-crowdintelligence-doc/raw/master/public_sys-resources/icon-note.gif) 
 **说明：** 
 >
 > **若使用搭建完成的环境，可跳过此步骤。一般情况下，华为默认会提供搭建完成的环境。**
@@ -82,7 +82,7 @@ npu 310单颗芯片上模型推理性能的吞吐率乘以4颗即单卡吞吐率
 
     请参考[《CANN V100R020C10 软件安装指南》](https://support.huawei.com/enterprise/zh/doc/EDOC1100164870/59fb2d06)的”安装昇腾芯片驱动和固件“ -\>“安装开发套件和框架插件包”章节，完成安装。
 
-    >![](public_sys-resources/icon-note.gif) **说明：** 
+    >![](https://gitee.com/wangjiangben_hw/ascend-pytorch-crowdintelligence-doc/raw/master/public_sys-resources/icon-note.gif) **说明：** 
     >
     >安装驱动和固件需要root用户，若使用默认路径安装：
     >
@@ -114,7 +114,7 @@ numpy == 1.18.5
 Pillow == 7.2.0
 opencv-python == 4.2.0.34
 ```
->![](public_sys-resources/icon-note.gif) 
+>![](https://gitee.com/wangjiangben_hw/ascend-pytorch-crowdintelligence-doc/raw/master/public_sys-resources/icon-note.gif) 
 **说明：** 
 > 
 > X86架构：pytorch和torchvision可以通过官方下载whl包安装，其他可以通过pip install 包名 安装
@@ -156,7 +156,7 @@ git clone https://github.com/lukemelas/EfficientNet-PyTorch
 cd EfficientNet-Pytorch
 pip install -e .
 ```
->![](public_sys-resources/icon-note.gif) 
+>![](https://gitee.com/wangjiangben_hw/ascend-pytorch-crowdintelligence-doc/raw/master/public_sys-resources/icon-note.gif) 
 **说明：** 
 >
 > 有些模型代码库没有提供安装脚本，可以在python脚本中通过添加如下代码引用EfficientNet-PyTorch的EfficientNetlei类：
@@ -205,7 +205,7 @@ def convert():
     dummy_input = torch.randn(1, 3, 224, 224)
     torch.onnx.export(model, dummy_input, output_file, input_names = input_names, output_names = output_names, opset_version=11, verbose=True)
 ```
->![](public_sys-resources/icon-note.gif) 
+>![](https://gitee.com/wangjiangben_hw/ascend-pytorch-crowdintelligence-doc/raw/master/public_sys-resources/icon-note.gif) 
  **说明：** 
 >
 >注意目前ATC支持的onnx算子版本为11
@@ -348,7 +348,7 @@ export ASCEND_OPP_PATH=${install_path}/opp
 
 atc --framework=5 --model=efficientnet-b0_sim.onnx --output=efficientnet-b0_bs1 --input_format=NCHW --input_shape="image:1,3,224,224" --log=debug --soc_version=Ascend310
 ```
->![](public_sys-resources/icon-note.gif) 
+>![](https://gitee.com/wangjiangben_hw/ascend-pytorch-crowdintelligence-doc/raw/master/public_sys-resources/icon-note.gif) 
  **说明：** 
 >
 >为使性能达标有些模型需要开启autotune或repeat autotune
@@ -456,7 +456,7 @@ python3 get_info.py jpg dataset/ImageNet/val_union ImageNet.info
 2 dataset/ImageNet/val_union/ILSVRC2012_val_00004213.jpeg 116 87
 ...
 ```
->![](public_sys-resources/icon-note.gif) 
+>![](https://gitee.com/wangjiangben_hw/ascend-pytorch-crowdintelligence-doc/raw/master/public_sys-resources/icon-note.gif) 
  **说明：** 
 >
 > 这里只给出示例代码，前后处理，配置与评价等脚本来源：
@@ -484,7 +484,7 @@ benchmark工具为华为自研的模型推理工具，支持多种模型的离
 ```
 ImageNet.info为图片信息，注意这里的“input_height”和“input_weight”与AIPP节点输入一致，值为256因为AIPP中做了裁剪，参数-useDvpp=true。
 输出结果默认保存在当前目录result/dumpOutput_device0，模型只有一个名为class的输出，shape为bs * 1000，数据类型为FP32，对应1000个分类的预测结果，每个输入对应的输出对应一个_x.bin文件。
->![](public_sys-resources/icon-note.gif) 
+>![](https://gitee.com/wangjiangben_hw/ascend-pytorch-crowdintelligence-doc/raw/master/public_sys-resources/icon-note.gif) 
  **说明：** 
 >
 > 若benchmark执行失败可以通过查看系统输出日志初步定位原因，参考[CANN V100R020C10 日志参考 (推理) 01](https://support.huawei.com/enterprise/zh/doc/EDOC1100164869?idPath=23710424%7C251366513%7C22892968%7C251168373)
@@ -585,7 +585,7 @@ cd /usr/local/Ascend/ascend-toolkit/latest/x86_64-linux/toolkit/tools/profiler/p
 python3.7 msprof.py import -dir /home/HwHiAiUser/test/生成的profiling目录
 python3.7 msprof.py export summary -dir /home/HwHiAiUser/test/生成的profiling目录
 ```
->![](public_sys-resources/icon-note.gif) 
+>![](https://gitee.com/wangjiangben_hw/ascend-pytorch-crowdintelligence-doc/raw/master/public_sys-resources/icon-note.gif) 
  **说明：** 
 >
 >查看aicore算子运行时间整体统计，对影响性能可以融合的算子进行融合，参考[CANN V100R020C10 图融合和UB融合规则参考 (推理) 01](https://support.huawei.com/enterprise/zh/doc/EDOC1100164873?idPath=23710424%7C251366513%7C22892968%7C251168373)等
@@ -1329,3 +1329,4 @@ https://gitee.com/ascend/tools/tree/master/msquickcmp
         ```
 
 
+
diff --git "a/AscendPyTorch\346\250\241\345\236\213\346\216\250\347\220\206\344\274\227\346\231\272\351\252\214\346\224\266\346\214\207\345\215\227.md" "b/AscendPyTorch\346\250\241\345\236\213\346\216\250\347\220\206\344\274\227\346\231\272\351\252\214\346\224\266\346\214\207\345\215\227.md"
index 25773b1525e2e791d1984cd5f49d4fb9ea6db87a..e0449aaa17214ddce93b90e67c663e7f3fabfe80 100644
--- "a/AscendPyTorch\346\250\241\345\236\213\346\216\250\347\220\206\344\274\227\346\231\272\351\252\214\346\224\266\346\214\207\345\215\227.md"
+++ "b/AscendPyTorch\346\250\241\345\236\213\346\216\250\347\220\206\344\274\227\346\231\272\351\252\214\346\224\266\346\214\207\345\215\227.md"
@@ -2,18 +2,35 @@
 
 1. 先上gitee管理平台，将验收目标调整至验收状态
 2. 检查PR内容，文件夹路径和文件结构
-    - PR末班和文件路径结构都在下面附件里有详细说明，请仔细check
-3. 按照验收脚本在交付文件夹下进行验收
+    - PR模板和文件路径结构都在下面附件里有详细说明，请仔细check
+    - 参见付件pr检视，请仔细check
+3. 按照验收脚本在交付文件夹下进行验收  
+    验收机器：192.168.88.45  
+    参考[ResNext50测试说明](https://gitee.com/ascend/modelzoo/blob/master/built-in/ACL_PyTorch/Benchmark/cv/classification/ResNext50/test/README.md)  
+    准备环境： 
+    ``` 
+    1.拉取modelzoo上提交的模型pr，然后将模型文件夹ResNext50拷贝到验收机器的/home/verify_models，并进入到/home/verify_models/ResNext50  
+    2.根据requirements.txt安装必要的依赖  
+    3.git clone ResNext50模型结构代码所在的开源代码仓torchvision  
+    4.如果通过补丁修改了开源模型代码则将补丁打入，如果开源模型代码需要安装则安装  
+    5.获取训练的权重文件
+    6.获取数据集存放路径
+    7.获取benchmark工具
+    ```
 
+    
     ```shell
+    #准备环境
+    交付的代码文件夹下获取模型结构的开源代码，安装必要的依赖，获取训练提供的权重文件，获取数据集路径，获取benchmark工具
+    
     # pth是否能正确转换为om
     bash test/pth2om.sh
     
-    # 精度数据是否达标（需要显示官网精度与om模型的精度）
-    # npu性能数据(如果模型支持多batch，测试bs1与bs16，否则只测试bs1，性能数据以单卡吞吐率为标准)
-    bash test/eval_acc_perf.sh
-    
-    # 在t4环境测试性能数据(如果模型支持多batch，测试bs1与bs16，否则只测试bs1，如果导出的onnx模型因含自定义算子等不能离线推理，则在t4上测试pytorch模型的在线推理性能，性能数据以单卡吞吐率为标准)
+    # 精度数据是否达标（需要显示官网pth精度与om模型的精度）
+    # npu性能数据(确保device空闲时测试，如果模型支持多batch，测试bs1与bs16，否则只测试bs1，性能数据以单卡吞吐率为标准)，不指定数据集目录时默认/root/datasets
+    bash test/eval_acc_perf.sh --datasets_path=/root/datasets
+        
+    # 在t4环境测试性能数据(确保gpu空闲时测试，如果模型支持多batch，测试bs1与bs16，否则只测试bs1，如果导出的onnx模型因含自定义算子等不能离线推理，则在t4上测试pytorch模型的在线推理性能，性能数据以单卡吞吐率为标准)
     bash test/perf_t4.sh
     ```
 
@@ -32,24 +49,34 @@
     # 验收结果： OK / Failed
     # 备注： 成功生成om，无运行报错，报错日志xx 等
     
-    # 精度数据是否达标（需要显示官网精度与om模型的精度）
-    # npu性能数据(如果模型支持多batch，测试bs1与bs16，否则只测试bs1，性能数据以单卡吞吐率为标准)
-    bash test/eval_acc_perf.sh
+    # 精度数据是否达标（需要显示官网pth精度与om模型的精度）
+    # npu性能数据(确保device空闲时测试，如果模型支持多batch，测试bs1与bs16，否则只测试bs1，性能数据以单卡吞吐率为标准)
+    bash test/eval_acc_perf.sh --datasets_path=/root/datasets
     # 验收结果： 是 / 否
-    # 备注： 目标精度top1:77.62% top5:93.70%；bs1,bs16验收精度top1:77.62% top5:93.69%；精度下降不超过1%；无运行报错，报错日志xx 等
-    # 备注： 验收测试性能bs1:1497.252FPS bs16:2096.376FPS；无运行报错，报错日志xx 等
-    
-    # 在t4环境测试性能数据(如果模型支持多batch，测试bs1与bs16，否则只测试bs1，如果导出的onnx模型因含自定义算子等不能离线推理，则在t4上测试pytorch模型的在线推理性能，性能数据以单卡吞吐率为标准)
+    # 备注： 目标pth精度top1:77.62% top5:93.70%；bs1,bs16验收om精度top1:77.62% top5:93.69%；精度下降不超过1%；无运行报错，报错日志xx 等
+    # 备注： 验收310测试性能bs1:1497.252FPS bs16:2096.376FPS；无运行报错，报错日志xx 等
+        
+    # 在t4环境测试性能数据(确保gpu空闲时测试，如果模型支持多batch，测试bs1与bs16，否则只测试bs1，如果导出的onnx模型因含自定义算子等不能离线推理，则在t4上测试pytorch模型的在线推理性能，性能数据以单卡吞吐率为标准)，该步是验证eval_acc_perf.sh显示的t4性能数据是否正确，该脚本中填写的性能数据与t4实测性能数据要接近
     bash test/perf_t4.sh
     # 验收结果： OK / Failed
-    # 备注： 验收测试性能bs1:763.044FPS bs16:1234.940FPS；无运行报错，报错日志xx 等
-    
+    # 备注： 验收t4测试性能bs1:763.044FPS bs16:1234.940FPS，与eval_acc_perf.sh脚本显示的t4性能数据一致；无运行报错，报错日志xx 等
+        
     # 310性能是否超过t4： 是 / 否
-    bs1:310=1.96倍t4
-    bs16:310=1.70倍t4
+    bs1:310=(1497.252/763.044)1.96倍t4
+    bs16:310=(2096.376/1234.940)1.70倍t4
     ```      
     - 示例链接 https://gitee.com/ascend/modelzoo/pulls/836#note_4814643
-5. 验收完成后，上gitee管理平台，将验收目标调整至完成状态
+5. 验收完成后，需要进行以下几步
+   - 在pr评论区按照上文模板反馈验收结果
+   - 上gitee管理平台，将验收目标调整至完成状态
+   - 上团队空间-测试管理-PyTorch模型众智验收跟踪表 更新模型验收数据
+   - 完成验收测试报告文档，归档obs
+   - 整理验收必要的交付件，归档obs，将/home/verify_models/{模型名}目录归档，归档时需要删除该目录下的占用磁盘空间的无用文件夹预处理后的数据集prep_dataset，result/dumpOutput_device0与result/dumpOutput_device1
+6. 验收归档与统计  
+  1./home/verify_models/modelzoo目录用来拉取modelzoo代码pr  
+  1./home/verify_models目录下需要保存以上测试后通过的模型   
+  3./home/verify_models/models_result.xlsx里填写模型的测试数据，bs4,8,32的性能数据从README.md中获取，如果蓝区版本精度性能不达标，而黄区测试达标在备注里写明黄区版本，如果黄区测试也不能达标则写明黄区测试精度或性能不达标  
+  4./home/verify_models仅用来存放测试通过的模型，models_result.xlsx以及modelzoo的代码，不要在该目录存放其它无用的文件  
     
     
     
@@ -61,14 +88,47 @@
     贴上验收报告
        
     ```
+    - 在pr提交的内容栏里编辑issue的链接即可关联对应的issue，问题解决后issue将自动关闭
     - 示例链接 https://gitee.com/ascend/modelzoo/issues/I3FI5L?from=project-issue
     
+### 附： pr检视
+
+- pr检视：  
+1.标题格式：[华为大学昇腾学院][高校贡献][Pytorch离线推理][Cascade_RCNN]-初次提交  
+2.包含bs1与bs16权重精度与om精度，包含bs1与bs16的t4与310性能数据，性能数据用fps表示  
+3.备注：如果蓝区版本测精度或性能不达标，最新CANN版本测可以达标，这里需要写出原因与最新CANN包版本，用最新版本测。如果是无法规避的算子缺陷导致性能不达标，这里需要添加性能不达标的原因与解决方案。如果onnx因包含自定义算子不支持推理，需要说明性能是在t4上测的在线推理，如果模型不支持batch 16，也需要说明一下  
+4.自验报告：CANN包版本与精度性能等数据是否正确  
+
+- 代码规范：  
+参考[ResNext50](https://gitee.com/ascend/modelzoo/tree/master/built-in/ACL_PyTorch/Benchmark/cv/classification/ResNext50)  
+1.pipline要通过，缺陷扫描与规范扫描要尽可能改  
+2.python脚本文件头需要加License声明  
+3.pr不要包括开源模型的代码与权重文件  
+注意：  
+4.python脚本不能包含从网上下载权重的代码，比如函数预训练为true时一般会下载权重  
+5.python脚本避免依赖非必要的第三方库  
+6.requirements.txt包含服务器上安装的本模型所有必要依赖的开源库的具体版本  
 
+- 模型README.md检视：  
+模板参见[README.md](https://gitee.com/ascend/modelzoo/tree/master/built-in/ACL_PyTorch/Benchmark/cv/classification/ResNext50/README.md)  
+1.1.2 代码地址->需要给出使用的模型开源代码地址与其branch，commitid  
+2.2 环境说明->需要给出服务器上安装的本模型所有必要依赖的开源库的具体版本  
+3.3.1 pth转onnx模型->优先使用训练提供的权重文件，如果训练的权重文件网上能获则需给出网址，否则需要给出从哪获取权重文件。如果训练没有提供权重则使用开源代码仓的权重文件。需要给出权重文件名与其md5sum值  
+4.3.1 pth转onnx模型->如果需要对模型的开源代码做修改，以打patch的形式修改  
+5.3.1 模型转换要点：->对于CANN包算子有问题导致模型转换失败或需要规避才能转换成功，则需要在模型转换要点里写明定位主要过程，原因与措施  
+6.6.1 离线推理TopN精度统计->精度测试需要测试bs1与bs16的精度  
+7.6.1 精度调试：->对于CANN包算子有问题导致精度不达标或需要规避才能达标，则需要在精度调试里写明定位主要过程，原因与措施  
+8.7 性能对比->性能数据需要测bs1，16，4，8，32的性能数据，且需要计算出单卡吞吐率  
+9.7 性能优化：->对于CANN包算子有问题导致性能不达标或需要规避才能达标，则需要在性能优化里写明定位主要过程，原因与措施  
+
+- test/README.md检视：  
+该文件是验收测试说明，主要是准备环境，pip3.7 install -r requirements.txt可能会重新安装某版本pytorch，验收时根据需要决定是否执行  
+参见模板[test/README.md](https://gitee.com/ascend/modelzoo/tree/master/built-in/ACL_PyTorch/Benchmark/cv/classification/ResNext50/test/README.md)  
 
 ### 附： 模型推理指导中的交付标准与规范
 - 交付标准
     - 精度：  
- om模型推理的精度与PyTorch预训练模型github代码仓README.md或官网文档公布的精度对比，精度下降不超过1%则认为精度达标
+ om模型推理的精度与Ascend 910训练出的权重精度或PyTorch预训练模型github代码仓README.md或官网文档公布的精度对比，精度下降不超过1%则认为精度达标
     - 性能：  
  Ascend benchmark工具在数据集上推理测的NPU 310单颗device吞吐率乘以4颗即单卡吞吐率大于TensorRT工具测的GPU T4单卡吞吐率则认为性能达标  
  如若交付要求中对性能有要求(易模型)，310的性能必须高于t4的性能  
@@ -87,25 +147,33 @@
 
      说明：
      ```
-     1.如果开源代码仓提供了多个权重文件，使用常用的基础的那个配置的权重文件即可；如果开源代码仓没有提供pth权重文件，则需要该模型的训练同学提供pth权重文件，或者使用开源代码仓训练脚本简单训练一个pth权重文件，然后对比om精度与该pth权重文件的精度  
+     1.如果已经有了ascend 910训练提供的权重文件，那么优先使用910训练提供的权重文件做离线推理，精度与910训练出的精度对齐；如果开源代码仓提供了多个权重文件，使用常用的基础的那个配置的权重文件即可；如果开源代码仓没有提供pth权重文件，则需要该模型的训练同学提供pth权重文件，或者使用开源代码仓训练脚本简单训练一个pth权重文件，然后对比om精度与该pth权重文件的精度  
+
      2.由于随机数可能不能模拟数据分布，Ascend benchmark工具纯推理功能测的有些模型性能数据可能不太准，所以模型测试脚本与提交代码的描述中的性能数据以Ascend benchmark在数据集上推理时得到性能数据为准  
-     3.如果模型支持多batch，需要测试batch1,4,8,16,32的精度与性能，写在模型名称_Onnx端到端推理指导.md里，模型测试脚本与提交代码的描述只需提供bs1和bs16的精度性能数据  
+
+     3.如果模型支持多batch，需要测试batch1,4,8,16,32的精度与性能，写在README.md里，模型测试脚本与提交代码的描述只需提供bs1和bs16的精度性能数据  
+
      4.如果导出的onnx因包含自定义算子等而不能推理，则在t4上运行开源评测脚本测试pth模型在线推理性能  
+
      5.对于性能不达标的模型，需要进行如下工作：
        1）优化修改onnx模型去掉影响性能的冗余pad，用Ascend atc的相关优化选项尝试一下，尝试使用最近邻替换双线性的resize重新训练，降低图片分辨率等使性能达标。  
        2）对于算子导致的性能问题，需要使用profiling分析定位引起性能下降的原因，具体到引起性能下降的算子。优先修改模型代码以使其选择性能好的npu算子替换性能差的npu算子使性能达标，然后在modelzoo上提issue，等修复版本发布后再重测性能，继续优化。  
-       3）需要交付profiling性能数据，对经过上述方法性能可以达标的模型，在交付文档中写明问题原因与达标需要执行的操作；对经过上述方法性能仍不达标的模型，在交付文档中写明问题原因与简要的定位过程。  
-     6.工作量为简单模型2-3个工作日，复杂模型5-10个工作日，个别难度大的模型15-20个工作日。
+       3）需要交付profiling性能数据，对经过上述方法性能可以达标的模型，在交付文档中写明问题原因与达标需要执行的操作；对经过上述方法性能仍不达标的模型，在交付的README.md文档中写明问题原因与简要的定位过程。  
+
+     6.git clone开源模型代码仓到工作目录，如果模型代码仓没有安装命令，pth2onnx.py脚本需要引用模型代码仓的函数或类时，通过sys.path.append(r"./代码仓目录")添加搜索路径，如果需要修改开源代码仓代码，将修改用git diff做成一个patch文件，交付件不要交付开源代码仓里的代码，只需要交付这个patch文件。参见本文3.5 maskrcnn端到端推理指导-开源detectron2加载npu权重的推理指导
+
+     7.数据集统一放在/root/datasets/目录
      ```
 
 - 交付件
     - 交付件参考：[ResNeXt50_Onnx模型端到端推理指导.md](https://gitee.com/ascend/modelzoo/tree/master/built-in/ACL_PyTorch/Benchmark/cv/classification/ResNext50)
     - 最终交付件：  
-      包含以上交付标准的代码，模型名称_Onnx端到端推理指导.md，以及验收脚本  
+      包含以上交付标准的代码，README.md，以及验收脚本  
+      权重文件、profiling性能数据等非代码交付件一并打压缩包邮件发送  
     - 最终交付形式：  
       gitee网址：https://gitee.com/ascend/modelzoo/tree/master/contrib/ACL_PyTorch/Research  
       commit信息格式：【高校贡献-${学校学院名称}】【Pytorch离线推理-${模型名称}】${PR内容摘要}  
-                      模型命名风格为大驼峰，模型名含多个字符串时使用横杠或下划线连接，当上下文用横杠时模型名用下划线连接，否则用横杠连接  
+      模型命名风格为大驼峰，模型名含多个字符串时使用横杠或下划线连接，当上下文用横杠时模型名用下划线连接，否则用横杠连接  
       对于batch1与batch16，npu性能均高于T4性能1.2倍的模型，放在Benchmark目录下，1-1.2倍对应Official目录，低于1倍放在Research目录，目前都放在contrib/ACL_PyTorch/Research下即可  
 
 - gitee仓PR贡献流程
@@ -124,24 +192,26 @@
             > **提交前请确保自验通过！确保直接执行以下脚本就可运行！**
     
         ```shell
+        #准备环境
+        交付的代码文件夹下获取模型结构的开源代码，安装必要的依赖，获取训练提供的权重文件，获取数据集路径，获取benchmark工具
         
         # pth是否能正确转换为om
         bash test/pth2om.sh
         
-        # 精度数据是否达标（需要显示官网精度与om模型的精度）
-        # npu性能数据(如果模型支持多batch，测试bs1与bs16，否则只测试bs1，性能数据以单卡吞吐率为标准)
-        bash test/eval_acc_perf.sh
+        # 精度数据是否达标（需要显示官网pth精度与om模型的精度）
+        # npu性能数据(确保device空闲时测试，如果模型支持多batch，测试bs1与bs16，否则只测试bs1，性能数据以单卡吞吐率为标准)，不指定数据集目录时默认/root/datasets
+        bash test/eval_acc_perf.sh --datasets_path=/root/datasets
         
-        # 在t4环境测试性能数据(如果模型支持多batch，测试bs1与bs16，否则只测试bs1，如果导出的onnx模型因含自定义算子等不能离线推理，则在t4上测试pytorch模型的在线推理性能，性能数据以单卡吞吐率为标准)
+        # 在t4环境测试性能数据(确保gpu空闲时测试，如果模型支持多batch，测试bs1与bs16，否则只测试bs1，如果导出的onnx模型因含自定义算子等不能离线推理，则在t4上测试pytorch模型的在线推理性能，性能数据以单卡吞吐率为标准)
         bash test/perf_t4.sh
         ```
     - PR内容模板  
         - PR示例链接 https://gitee.com/ascend/modelzoo/pulls/887
         - PR名称
-            - 【高校贡献-${学校学院名称}】【Pytorch离线推理-${模型名称}】${PR内容摘要}
-                - 举例说明：【高校贡献-华为大学昇腾学院】【Pytorch离线推理-ResNeXt50】初次提交。
-        ```
-      
+            - [学校学院名称][高校贡献][Pytorch离线推理][模型名称]-PR内容摘要
+                - 举例说明：[华为大学昇腾学院][高校贡献][Pytorch离线推理][ResNeXt50]-初次提交
+
+      ```      
         <!--  Thanks for sending a pull request!  Here are some tips for you:
         # 首次必看，看完请删除这部分tips
         1) If this is your first time, please read our contributor guidelines: https://gitee.com/ascend/modelzoo/blob/master/contrib/CONTRIBUTING.md
@@ -161,8 +231,7 @@
         | ResNeXt50 bs16 | top1:77.62% top5:93.70% | top1:77.62% top5:93.69% | 1234.940fps | 2096.376fps | 
         # 如果是无法规避的算子缺陷导致性能不达标，这里需要添加性能不达标的原因与解决方案
 
-        # 自验报告
-        ```shell    
+        自验报告
         # 第X次验收测试   
         # 验收结果 OK / Failed
         # 验收环境: A + K / CANN 5.0.1
@@ -173,22 +242,22 @@
         # 验收结果： OK / Failed
         # 备注： 成功生成om，无运行报错，报错日志xx 等
         
-        # 精度数据是否达标（需要显示官网精度与om模型的精度）
-        # npu性能数据(如果模型支持多batch，测试bs1与bs16，否则只测试bs1，性能数据以单卡吞吐率为标准)
-        bash test/eval_acc_perf.sh
+        # 精度数据是否达标（需要显示官网pth精度与om模型的精度）
+        # npu性能数据(确保device空闲时测试，如果模型支持多batch，测试bs1与bs16，否则只测试bs1，性能数据以单卡吞吐率为标准)
+        bash test/eval_acc_perf.sh --datasets_path=/root/datasets
         # 验收结果： 是 / 否
-        # 备注： 目标精度top1:77.62% top5:93.70%；bs1,bs16验收精度top1:77.62% top5:93.69%；精度下降不超过1%；无运行报错，报错日志xx 等
-        # 备注： 验收测试性能bs1:1497.252FPS bs16:2096.376FPS；无运行报错，报错日志xx 等
+        # 备注： 目标pth精度top1:77.62% top5:93.70%；bs1,bs16验收om精度top1:77.62% top5:93.69%；精度下降不超过1%；无运行报错，报错日志xx 等
+        # 备注： 验收310测试性能bs1:1497.252FPS bs16:2096.376FPS；无运行报错，报错日志xx 等
         
-        # 在t4环境测试性能数据(如果模型支持多batch，测试bs1与bs16，否则只测试bs1，如果导出的onnx模型因含自定义算子等不能离线推理，则在t4上测试pytorch模型的在线推理性能，性能数据以单卡吞吐率为标准)
+        # 在t4环境测试性能数据(确保gpu空闲时测试，如果模型支持多batch，测试bs1与bs16，否则只测试bs1，如果导出的onnx模型因含自定义算子等不能离线推理，则在t4上测试pytorch模型的在线推理性能，性能数据以单卡吞吐率为标准)，该步是验证eval_acc_perf.sh显示的t4性能数据是否正确，该脚本中填写的性能数据与t4实测性能数据要接近
         bash test/perf_t4.sh
         # 验收结果： OK / Failed
-        # 备注： 验收测试性能bs1:763.044FPS bs16:1234.940FPS；无运行报错，报错日志xx 等
+        # 备注： 验收t4测试性能bs1:763.044FPS bs16:1234.940FPS，与eval_acc_perf.sh脚本显示的t4性能数据一致；无运行报错，报错日志xx 等
         
         # 310性能是否超过t4： 是 / 否
-        bs1:310=1.96倍t4
-        bs16:310=1.70倍t4
-        ```  
+        bs1:310=(1497.252/763.044)1.96倍t4
+        bs16:310=(2096.376/1234.940)1.70倍t4
+
         - 示例链接 https://gitee.com/ascend/modelzoo/pulls/836#note_4750681
         
         **Which issue(s) this PR fixes**:
diff --git a/docs/.keep b/docs/.keep
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git "a/docs/XxxxNet\347\275\221\347\273\234\346\250\241\345\236\213[\344\272\244\344\273\230\345\206\205\345\256\271]\346\265\213\350\257\225\346\212\245\345\221\212.docx" "b/docs/XxxxNet\347\275\221\347\273\234\346\250\241\345\236\213[\344\272\244\344\273\230\345\206\205\345\256\271]\346\265\213\350\257\225\346\212\245\345\221\212.docx"
new file mode 100644
index 0000000000000000000000000000000000000000..870d22278c0a820f5a2ae736c4696015ce467a11
Binary files /dev/null and "b/docs/XxxxNet\347\275\221\347\273\234\346\250\241\345\236\213[\344\272\244\344\273\230\345\206\205\345\256\271]\346\265\213\350\257\225\346\212\245\345\221\212.docx" differ
diff --git a/docs/models_result.xlsx b/docs/models_result.xlsx
new file mode 100644
index 0000000000000000000000000000000000000000..6569e5c353d1669a4fe7bf95610263775ccf0e6b
Binary files /dev/null and b/docs/models_result.xlsx differ
diff --git "a/onnx\347\253\257\345\210\260\347\253\257\346\216\250\347\220\206\346\214\207\345\257\274/benchmark/cv/segmentation/ssd_detection.diff" "b/onnx\347\253\257\345\210\260\347\253\257\346\216\250\347\220\206\346\214\207\345\257\274/benchmark/cv/segmentation/ssd_detection.diff"
new file mode 100644
index 0000000000000000000000000000000000000000..6c8e0123a9b73a5cd07a1d4f55d3235b655c0486
--- /dev/null
+++ "b/onnx\347\253\257\345\210\260\347\253\257\346\216\250\347\220\206\346\214\207\345\257\274/benchmark/cv/segmentation/ssd_detection.diff"
@@ -0,0 +1,140 @@
+diff --git a/mmdet/core/anchor/anchor_generator.py b/mmdet/core/anchor/anchor_generator.py
+index 3c2fd5a0..f6d11fa7 100644
+--- a/mmdet/core/anchor/anchor_generator.py
++++ b/mmdet/core/anchor/anchor_generator.py
+@@ -197,6 +197,8 @@ class AnchorGenerator:
+             tuple[torch.Tensor]: The mesh grids of x and y.
+         """
+         # use shape instead of len to keep tracing while exporting to onnx
++        x = x.to(dtype=torch.int32)
++        y = y.to(dtype=torch.int32)
+         xx = x.repeat(y.shape[0])
+         yy = y.view(-1, 1).repeat(1, x.shape[0]).view(-1)
+         if row_major:
+diff --git a/mmdet/core/bbox/coder/delta_xywh_bbox_coder.py b/mmdet/core/bbox/coder/delta_xywh_bbox_coder.py
+index 98d30906..48bcdae3 100644
+--- a/mmdet/core/bbox/coder/delta_xywh_bbox_coder.py
++++ b/mmdet/core/bbox/coder/delta_xywh_bbox_coder.py
+@@ -207,10 +207,22 @@ def delta2bbox(rois,
+                                                      deltas.size(-1) // 4)
+     stds = deltas.new_tensor(stds).view(1, -1).repeat(1, deltas.size(-1) // 4)
+     denorm_deltas = deltas * stds + means
+-    dx = denorm_deltas[..., 0::4]
++    '''dx = denorm_deltas[..., 0::4]
+     dy = denorm_deltas[..., 1::4]
+     dw = denorm_deltas[..., 2::4]
+-    dh = denorm_deltas[..., 3::4]
++    dh = denorm_deltas[..., 3::4]'''
++    if denorm_deltas.shape[2] > 4:
++        #please self fix when shape[2] > 4
++        denorm_deltas = denorm_deltas.view(-1, 80, 4)
++        dx = denorm_deltas[:, :, 0:1:].view(-1, 80)
++        dy = denorm_deltas[:, :, 1:2:].view(-1, 80)
++        dw = denorm_deltas[:, :, 2:3:].view(-1, 80)
++        dh = denorm_deltas[:, :, 3:4:].view(-1, 80)
++    else:
++        dx = denorm_deltas[..., 0:1:]
++        dy = denorm_deltas[..., 1:2:]
++        dw = denorm_deltas[..., 2:3:]
++        dh = denorm_deltas[..., 3:4:]
+ 
+     x1, y1 = rois[..., 0], rois[..., 1]
+     x2, y2 = rois[..., 2], rois[..., 3]
+diff --git a/mmdet/models/dense_heads/anchor_head.py b/mmdet/models/dense_heads/anchor_head.py
+index e7c975f5..e2d057e9 100644
+--- a/mmdet/models/dense_heads/anchor_head.py
++++ b/mmdet/models/dense_heads/anchor_head.py
+@@ -9,6 +9,55 @@ from ..builder import HEADS, build_loss
+ from .base_dense_head import BaseDenseHead
+ from .dense_test_mixins import BBoxTestMixin
+ 
++class BatchNMSOp(torch.autograd.Function):
++    @staticmethod
++    def forward(ctx, bboxes, scores, score_threshold, iou_threshold, max_size_per_class, max_total_size):
++        """
++        boxes (torch.Tensor): boxes in shape (batch, N, C, 4).
++        scores (torch.Tensor): scores in shape (batch, N, C).
++        return:
++            nmsed_boxes: (1, N, 4)
++            nmsed_scores: (1, N)
++            nmsed_classes: (1, N)
++            nmsed_num: (1,)
++        """
++
++        # Phony implementation for onnx export
++        nmsed_boxes = bboxes[:, :max_total_size, 0, :]
++        nmsed_scores = scores[:, :max_total_size, 0]
++        nmsed_classes = torch.arange(max_total_size, dtype=torch.long)
++        nmsed_num = torch.Tensor([max_total_size])
++
++        return nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num
++
++    @staticmethod
++    def symbolic(g, bboxes, scores, score_thr, iou_thr, max_size_p_class, max_t_size):
++        nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num = g.op('BatchMultiClassNMS',
++            bboxes, scores, score_threshold_f=score_thr, iou_threshold_f=iou_thr,
++            max_size_per_class_i=max_size_p_class, max_total_size_i=max_t_size, outputs=4)
++        return nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num
++
++def batch_nms_op(bboxes, scores, score_threshold, iou_threshold, max_size_per_class, max_total_size):
++    """
++    boxes (torch.Tensor): boxes in shape (N, 4).
++    scores (torch.Tensor): scores in shape (N, ).
++    """
++
++    if bboxes.dtype == torch.float32:
++        bboxes = bboxes.reshape(bboxes.size(0), bboxes.shape[1].numpy(), -1, 4).half()
++        scores = scores.reshape(scores.size(0), scores.shape[1].numpy(), -1).half()
++    else:
++        bboxes = bboxes.reshape(bboxes.size(0), bboxes.shape[1].numpy(), -1, 4)
++        scores = scores.reshape(scores.size(0), scores.shape[1].numpy(), -1)
++
++    nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num = BatchNMSOp.apply(bboxes, scores,
++        score_threshold, iou_threshold, max_size_per_class, max_total_size)
++    nmsed_boxes = nmsed_boxes.float()
++    nmsed_scores = nmsed_scores.float()
++    nmsed_classes = nmsed_classes.long()
++    dets = torch.cat((nmsed_boxes.reshape((bboxes.size(0), max_total_size, 4)), nmsed_scores.reshape((bboxes.size(0), max_total_size, 1))), -1)
++    labels = nmsed_classes.reshape((bboxes.size(0), max_total_size))
++    return dets, labels
+ 
+ @HEADS.register_module()
+ class AnchorHead(BaseDenseHead, BBoxTestMixin):
+@@ -653,7 +702,10 @@ class AnchorHead(BaseDenseHead, BBoxTestMixin):
+             anchors = anchors.expand_as(bbox_pred)
+             # Always keep topk op for dynamic input in onnx
+             from mmdet.core.export import get_k_for_topk
+-            nms_pre = get_k_for_topk(nms_pre_tensor, bbox_pred.shape[1])
++            #nms_pre = get_k_for_topk(nms_pre_tensor, bbox_pred.shape[1])
++            nms_pre = bbox_pred.shape[1]
++            if nms_pre_tensor > 0 and bbox_pred.shape[1] > nms_pre_tensor:
++                nms_pre = nms_pre_tensor
+             if nms_pre > 0:
+                 # Get maximum scores for foreground classes.
+                 if self.use_sigmoid_cls:
+@@ -662,11 +714,14 @@ class AnchorHead(BaseDenseHead, BBoxTestMixin):
+                     # remind that we set FG labels to [0, num_class-1]
+                     # since mmdet v2.0
+                     # BG cat_id: num_class
+-                    max_scores, _ = scores[..., :-1].max(-1)
++                    scores_tmp = scores.permute(2, 1, 0)
++                    max_scores, _ = scores_tmp[:-1, ...].max(0)
++                    max_scores = max_scores.permute(1, 0)
+ 
+                 _, topk_inds = max_scores.topk(nms_pre)
+                 batch_inds = torch.arange(batch_size).view(
+-                    -1, 1).expand_as(topk_inds)
++                    -1, 1).to(dtype=torch.int32).expand_as(topk_inds)
++                batch_inds = batch_inds.to(dtype=torch.int64)
+                 anchors = anchors[batch_inds, topk_inds, :]
+                 bbox_pred = bbox_pred[batch_inds, topk_inds, :]
+                 scores = scores[batch_inds, topk_inds, :]
+@@ -694,6 +749,8 @@ class AnchorHead(BaseDenseHead, BBoxTestMixin):
+             iou_threshold = cfg.nms.get('iou_threshold', 0.5)
+             score_threshold = cfg.score_thr
+             nms_pre = cfg.get('deploy_nms_pre', -1)
++            dets, labels = batch_nms_op(batch_mlvl_bboxes, batch_mlvl_scores, score_threshold, iou_threshold, cfg.max_per_img, cfg.max_per_img)
++            return dets, labels
+             return add_dummy_nms_for_onnx(batch_mlvl_bboxes, batch_mlvl_scores,
+                                           max_output_boxes_per_class,
+                                           iou_threshold, score_threshold,