From 697821555aed9264d6af59fbfccd1f51f61b2157 Mon Sep 17 00:00:00 2001 From: exias152 <15851197936@163.com> Date: Sat, 20 Aug 2022 16:35:04 +0800 Subject: [PATCH 1/2] =?UTF-8?q?=20!1525=20[=E6=B8=85=E5=8D=8E=E5=A4=A7?= =?UTF-8?q?=E5=AD=A6=E6=B7=B1=E5=9C=B3=E5=9B=BD=E9=99=85=E7=A0=94=E7=A9=B6?= =?UTF-8?q?=E7=94=9F=E9=99=A2][=E9=AB=98=E6=A0=A1=E8=B4=A1=E7=8C=AE][PyTor?= =?UTF-8?q?ch=E8=BF=81=E7=A7=BB1.8][FCN8S]-=E5=88=9D=E6=AC=A1=E6=8F=90?= =?UTF-8?q?=E4=BA=A4?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../cv/semantic_segmentation/FCN8s/README.md | 277 ++++++++++++------ .../mmseg/models/decode_heads/decode_head.py | 17 ++ .../FCN8s/test/train_full_1p.sh | 2 +- .../FCN8s/test/train_full_8p.sh | 2 +- .../FCN8s/test/train_performance_1p.sh | 2 +- .../FCN8s/test/train_performance_8p.sh | 2 +- .../FCN8s/tools/train.py | 11 +- 7 files changed, 216 insertions(+), 97 deletions(-) diff --git a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/README.md b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/README.md index 08f26f52ad..99334d81c7 100644 --- a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/README.md +++ b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/README.md @@ -1,151 +1,244 @@ -# FCN8s +# XXX for PyTorch\_Owner -This implements training of PSPNet on the PASCAL VOC Aug dataset, mainly modified from [mmsegmentation](https://github.com/open-mmlab/mmsegmentation). +- [交付件基本信息](交付件基本信息.md) +- [概述](概述.md) +- [准备训练环境](准备训练环境.md) +- [开始训练](开始训练.md) +- [训练结果展示](训练结果展示.md) +- [版本说明](版本说明.md) -**ps** +# 交付件基本信息 -1. As of the current date, Ascend-Pytorch doesn't support SyncBN, the backbone uses BN instead. To get a similar performance to SyncBN, we set a larger batch size of 16 rather than 4 in mmsegmentation. -2. Semantic segmentation is trained by iteration. The model trained on 1 NPU is useless, so we do not give the evaluation script for the 1p model. +应用领域(Application Domain):Image Segmentation +模型版本(Model Version):1.1 -## Environment preparation +修改时间(Modified):2022.8.18 -The latest Ascend-Pytorch version is 1.5.0. MMSegmentation 0.10.0 and mmcv 1.2.7 are chosen as they support pytorch1.5.0. +_大小(Size):378 MB -1. Install the latest Ascend-Pytorch. +框架(Framework):PyTorch\_1.8.1 -2. Downding the repository of Ascned Model Zoo to the folder of `$YOURMODELZOO` +模型格式(Model Format):pth -``` -# download source code -cd $YOURMODELZOO -git clone https://gitee.com/KevinKe/modelzoo -# go to fcn8s -cd contrib/PyTorch/Research/cv/semantic_segmentation/FCN8s -``` +精度(Precision):Mixed -Denote `$FCN` as the path of `$YOURMODELZOO/contrib/PyTorch/Research/cv/semantic_segmentation/FCN8s`. +处理器(Processor):Ascend 910 -3. Build mmcv using +应用级别(Categories):Research -Firstly, download [mmcv1.2.7](https://github.com/open-mmlab/mmcv/tree/v1.2.7) to the path `$YOURMMVCPATH`. Then, copy the `mmcv_replace` to `$YOURMMVCPATH/mmcv`. +描述(Description):基于PyTorch框架的FCN8S语义分割网络训练 -Check the numpy version is 1.21.2. +# 概述 -``` -# configure -cd $FCN -source env_npu.sh - -# copy -rm -rf $YOURMMVCPATH/mmcv -mkdir mmcv -cp -r mmcv_replace/* $YOURMMVCPATH/mmcv/ - -# compile -cd $YOURMMVCPATH -export MMCV_WITH_OPS=1 -export MAX_JOBS=8 -python3.7.5 setup.py build_ext -python3.7.5 setup.py develop -pip3.7.5 list | grep mmcv -``` +## 简述 +FCN8S是一个经典的语义分割网络,FCN8S使用全卷积结构,可以接受任意尺寸的输入图像,采用反卷积对最后一层的特征图进行上采样,得到与输入图像相同尺寸的输出,从而对输入进行逐像素预测。本代码主要在[mmsegmentation](https://github.com/open-mmlab/mmsegmentation)进行修改, 以PSPNet为backbone在PASCAL VOC数据集上进行训练。 -Then go back to the $PSPNET folder -``` -cd $FCN -``` +- 参考实现: -4. Permission configuration -``` -chmod -R 777 ./ -``` + ``` + url=https://github.com/pytorch/vision.git + ``` -5. remove the `mmcv_replace` folder -``` -rm -rf mmcv_replace -``` +- 适配昇腾 AI 处理器的实现: + + ``` + url=https://gitee.com/ascend/ModelZoo-PyTorch.git + code_path=PyTorch/contrib/cv/semantic_segmentation/FCN8s + ``` + +- 通过Git获取代码方法如下: + + ``` + git clone https://gitee.com/exias152/ModelZoo-PyTorch # 克隆仓库的代码 + cd contrib/PyTorch/Research/cv/semantic_segmentation/FCN8s # 切换到模型代码所在路径 + # Denote `$FCN` as the path of `$YOURMODELZOO/contrib/PyTorch/Research/cv/semantic_segmentation/FCN8s`. + ``` + +- 通过单击“立即下载”,下载源码包。 + +# 准备训练环境 + +## 准备环境 + +1. 当前模型支持的固件与驱动、 CANN 以及 PyTorch 如下表所示。 + + **表 1** 版本配套表 + + | 配套 | 版本 | + | ---------- | ------------------------------------------------------------ | + | 固件与驱动 | [1.0.15](https://www.hiascend.com/hardware/firmware-drivers?tag=commercial) | + | CANN | [5.1.RC1](https://www.hiascend.com/software/cann/commercial?version=5.1.RC1) | + | PyTorch | [1.8](https://gitee.com/ascend/pytorch/tree/master/))| + +- 环境准备指导。 + + 请参考《[Pytorch框架训练环境准备](https://www.hiascend.com/document/detail/zh/ModelZoo/pytorchframework/ptes)》。 + +2. 安装依赖 + + ``` + pip install -r requirements.txt + ``` + +3. 构建mmcv。 + 下载[mmcv1.2.7](https://github.com/open-mmlab/mmcv/tree/v1.2.7)到路径`$YOURMMVCPATH`。然后,复制`mmcv_replace` 到 `$YOURMMVCPATH/mmcv`。 + ``` + # configure + cd $FCN + source env_npu.sh + + # copy + rm -rf $YOURMMVCPATH/mmcv + mkdir mmcv + cp -r mmcv_replace/* $YOURMMVCPATH/mmcv/ + + # compile + cd $YOURMMVCPATH + export MMCV_WITH_OPS=1 + export MAX_JOBS=8 + python3.7.5 setup.py build_ext + python3.7.5 setup.py develop + pip3.7.5 list | grep mmcv + ``` + + Then go back to the $PSPNET folder + ``` + cd $FCN + + ``` + +4. 权限配置。 + ``` + chmod -R 777 ./ + ``` +5. 删除文件夹 `mmcv_replace` + ``` + rm -rf mmcv_replace + ``` -## Dataset Preparation +## 准备数据集 -1. Download the training and validation set of [PASCAL VOC 2012 dataset](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) and [PASCAL VOC2010 dataset](https://ascend-test-dataset.obs.cn-north-4.myhuaweicloud.com/train/zip/VOCtrainval_03-May-2010.tar). +1. 获取数据集。 + + 下载[PASCAL VOC 2012 dataset](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) 和 [PASCAL VOC2010 dataset](https://ascend-test-dataset.obs.cn-north-4.myhuaweicloud.com/train/zip/VOCtrainval_03-May-2010.tar)的训练集和验证集。 + 解压后,数据集目录结构如下所示。 -After decompression, the structure of the dataset folder should be: ```none ├── VOCdevkit │ │ ├── VOC2012 │ │ ├── VOC2010 ``` -2. Download [PASCALAug dataset](http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz). -After depressing, copy `benchmark_REALSE/dataset` to `VOCaug` in the `VOCdevkit` folder. -The structure of the dataset folder should be: + > **说明:** + >该数据集的训练过程脚本只作为一种参考示例。 + +2. 数据预处理。 +下载 [PASCALAug dataset](http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz).解压后,复制`benchmark_REALSE/dataset`到`VOCdevkit`文件夹下的`VOCaug`。 +数据集目录结构如下所示。 ```none ├── VOCdevkit │ │ ├── VOC2012 │ │ ├── VOC2010 │ │ ├── VOCaug ``` -3. Convert the VOCAug dataset using +3. 处理VOCAug数据集。 ``` cd $PSPNET python tools/convert_datasets/voc_aug.py data/VOCdevkit data/VOCdevkit/VOCaug --nproc 8 ``` +**Note: ** 可能会提示`Segmentation fault (core dumped)` , 提示原因是mmcv需要pytorch支持。 请返回源码包根目录并运行 `source env_npu.sh`。 -**Note: ** `Segmentation fault (core dumped)` may rise. The reason is that mmcv needs the support of pytorch. Go back to repo folder and run `source env_npu.sh` first. - -4. [Optional] Make a soft link of the dataset to the folder of mmseg100 +4. [Optional] 建立数据集到文件夹mmseg100的软链。 ``` cd $FCN mkdir data ln -s VOCdevkit data # data_path=./data/VOCdevkit/VOC2012 ``` -## Training - **Note [Optional]:** When running scripts, the error `$'\r': command not found` may rise. Use `dos2unix script_file_name` to change it from window format to Unix format first. +# 开始训练 +## 训练模型 + + **Note [Optional]:**运行脚本的时候可能会报错 `$'\r': command not found` 。 使用 `dos2unix script_file_name` 将脚本从windows格式转为unix格式。 -```bash -cd $FCN -source npu_env.sh +1. 进入解压后的源码包根目录。 -# training 1p accuracy -bash ./test/train_full_1p.sh --data_path=xxx -# --data_path=data/VOCdevkit/VOC2012 + ``` + cd $FCN + ``` -# training 1p performance -bash ./test/train_performance_1p.sh --data_path=xxx +2. 运行训练脚本。 -# training 8p accuracy -bash ./test/train_full_8p.sh --data_path=xxx + 该模型支持单机单卡训练和单机8卡训练。 -# training 8p performance -bash ./test/train_performance_8p.sh --data_path=xxx + - 单机单卡训练 -# evaluation 8p accuracy -bash ./test/train_val_8p.sh --data_path=xxx -``` + 启动单卡训练。 -Log and checkpoint path: -``` -./output/devie_id/FCN/train_${device_id}.log # training detail log -./output/devie_id/FCN/FCN_bs16_8p_acc.log # 8p training performance result log -./output/devie_id/FCN/ckpt # checkpoits -./output/devie_id/FCN_prof/FCN_bs16_8p_acc.log # 8p training accuracy result log + ``` + # training 1p accuracy + bash ./test/train_full_1p.sh --data_path=xxx --device_id=xxx + # --data_path=data/VOCdevkit/VOC2012 + # training 1p performance + bash ./test/train_performance_1p.sh --data_path=xxx --device_id=xxx + ``` -``` - + - 单机8卡训练 + + 启动8卡训练。 + + ``` + # training 8p accuracy + bash ./test/train_full_8p.sh --data_path=xxx + + # training 8p performance + bash ./test/train_performance_8p.sh --data_path=xxx + + # evaluation 8p accuracy + bash ./test/train_val_8p.sh --data_path=xxx + ``` + + --data\_path参数填写数据集路径。 + + + 训练完成后,日志和权重文件保存在如下路径。 + ``` + ./output/devie_id/FCN/train_${device_id}.log # training detail log + ./output/devie_id/FCN/FCN_bs16_8p_acc.log # 8p training performance result log + ./output/devie_id/FCN/ckpt # checkpoits + ./output/devie_id/FCN_prof/FCN_bs16_8p_acc.log # 8p training accuracy result log + + ``` -## Evaluation Details +# 训练结果展示 -### FCN with 8p +**表 2** 训练结果展示表 -| device | fps | aAcc | mIoU | mAcc | +| 名称 | FPS | aAcc | mIoU | mAcc | | :------: | :------: | :------: | :------: | :------: | -|mmsegmentaion| |-- | 67.08| -- | -|GPU-8p| 82.296| 93.16 | 69.19 | 78.7 | -|NPU-8p| 135.19 | 93.23 | 69.36 | 78.88 | +| 1p-竞品 | ----- | ----- | ----- | ----- | +| 1p-NPU | 20.99 | 93.28 | 69.62 | 79.29 | +| 8p-竞品 | 135.19 | 93.23 | 69.36 | 78.88 | +| 8p-NPU | 156.30 | 93.24 | 69.54 | 78.69 | + + +# 版本说明 + +## 变更 + +2022.8.18:更新内容,重新发布。 + +2020.07.08:首次发布。 + + + + + + + + + -ps: 2x training data are used for training on NPU (`bs*#NPU*#iter=16*8*10000`) v.s. GPU (`bs*#GPU*#iter=4*8*20000`) \ No newline at end of file diff --git a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/mmseg/models/decode_heads/decode_head.py b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/mmseg/models/decode_heads/decode_head.py index b5a58aef8f..a132ea830e 100644 --- a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/mmseg/models/decode_heads/decode_head.py +++ b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/mmseg/models/decode_heads/decode_head.py @@ -1,3 +1,19 @@ +# Copyright (c) Soumith Chintala 2016, +# All rights reserved +# +# Copyright 2020 Huawei Technologies Co., Ltd +# +# Licensed under the BSD 3-Clause License (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://spdx.org/licenses/BSD-3-Clause.html +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. from abc import ABCMeta, abstractmethod import torch @@ -207,6 +223,7 @@ class BaseDecodeHead(nn.Module, metaclass=ABCMeta): def cls_seg(self, feat): """Classify each pixel.""" if self.dropout is not None: + torch.npu.synchronize() feat = self.dropout(feat) output = self.conv_seg(feat) return output diff --git a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_full_1p.sh b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_full_1p.sh index dd1170d992..61c629a750 100644 --- a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_full_1p.sh +++ b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_full_1p.sh @@ -65,7 +65,7 @@ if [ x"${etp_flag}" != x"true" ];then fi #执行训练脚本,以下传参不需要修改,其他需要模型审视修改 -python3.7.5 ./tools/train.py ./configs/fcn/fcn_r50-d8_512x512_20k_voc12aug.py \ +python3 ./tools/train.py ./configs/fcn/fcn_r50-d8_512x512_20k_voc12aug.py \ --work-dir=${cur_path}/output/${Network}/$ASCEND_DEVICE_ID/ckpt \ --device="npu" \ --amp \ diff --git a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_full_8p.sh b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_full_8p.sh index fbddc37fba..f6a51d8d35 100644 --- a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_full_8p.sh +++ b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_full_8p.sh @@ -98,7 +98,7 @@ CaseName=${Network}_bs${BatchSize}_${RANK_SIZE}'p'_'acc' ##获取性能数据,不需要修改 #吞吐量 -ActualFPS=`awk 'BEGIN{printf "%.2f\n", '${RANK_SIZE}'*'${FPS}'}'` +ActualFPS=`awk 'BEGIN{printf "%.2f\n", '${FPS}'}'` #单迭代训练时长 TrainingTime=`awk 'BEGIN{printf "%.2f\n", '${batch_size}'*1000/'${FPS}'}'` diff --git a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_performance_1p.sh b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_performance_1p.sh index 158e51c0ff..185d5fe90d 100644 --- a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_performance_1p.sh +++ b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_performance_1p.sh @@ -65,7 +65,7 @@ if [ x"${etp_flag}" != x"true" ];then fi #执行训练脚本,以下传参不需要修改,其他需要模型审视修改 -python3.7.5 ./tools/train.py ./configs/fcn/fcn_r50-d8_512x512_20k_voc12aug.py \ +python3 ./tools/train.py ./configs/fcn/fcn_r50-d8_512x512_20k_voc12aug.py \ --work-dir=${cur_path}/output/${Network}/$ASCEND_DEVICE_ID/ckpt \ --device="npu" \ --amp \ diff --git a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_performance_8p.sh b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_performance_8p.sh index a280efbf93..0cd81e278f 100644 --- a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_performance_8p.sh +++ b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/test/train_performance_8p.sh @@ -98,7 +98,7 @@ CaseName=${Network}_bs${BatchSize}_${RANK_SIZE}'p'_'acc' ##获取性能数据,不需要修改 #吞吐量 -ActualFPS=`awk 'BEGIN{printf "%.2f\n", '${RANK_SIZE}'*'${FPS}'}'` +ActualFPS=`awk 'BEGIN{printf "%.2f\n", '${FPS}'}'` #单迭代训练时长 TrainingTime=`awk 'BEGIN{printf "%.2f\n", '${batch_size}'*1000/'${FPS}'}'` diff --git a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/tools/train.py b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/tools/train.py index 89c4205595..3575c68e94 100644 --- a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/tools/train.py +++ b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/tools/train.py @@ -38,6 +38,9 @@ import time import mmcv import torch +if torch.__version__ >="1.8": + import torch_npu +#print(torch.__version__) from mmcv.runner import get_dist_info, init_dist from mmcv.utils import Config, DictAction, get_git_hash @@ -115,10 +118,11 @@ def parse_args(): def main(): + args = parse_args() os.environ['MASTER_ADDR'] = '127.0.0.1' # 可以使用当前真实ip或者'127.0.0.1' - os.environ['MASTER_PORT'] = '29688' # 随意一个可使用的port即可 + os.environ['MASTER_PORT'] = '29338' # 随意一个可使用的port即可 cfg = Config.fromfile(args.config) if args.options is not None: @@ -182,6 +186,8 @@ def main(): cfg.warm_up_epochs = args.warm_up_epochs # weik add end + + # create work_dir mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir)) # dump config @@ -218,6 +224,7 @@ def main(): model = build_segmentor( cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg) + logger.info(model) datasets = [build_dataset(cfg.data.train)] @@ -235,6 +242,8 @@ def main(): PALETTE=datasets[0].PALETTE) # add an attribute for visualization convenience model.CLASSES = datasets[0].CLASSES + + train_segmentor( model, datasets, -- Gitee From baf8e17c827e822e7cb58ec505a2dd299b7a4eb5 Mon Sep 17 00:00:00 2001 From: exias152 <15851197936@163.com> Date: Wed, 21 Sep 2022 07:33:48 +0000 Subject: [PATCH 2/2] update PyTorch/contrib/cv/semantic_segmentation/FCN8s/README.md. Signed-off-by: exias152 <15851197936@163.com> --- .../cv/semantic_segmentation/FCN8s/README.md | 20 +++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/README.md b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/README.md index 4b51fbc967..6b68516deb 100644 --- a/PyTorch/contrib/cv/semantic_segmentation/FCN8s/README.md +++ b/PyTorch/contrib/cv/semantic_segmentation/FCN8s/README.md @@ -104,7 +104,17 @@ FCN8s是一个经典的语义分割网络,FCN8s使用全卷积结构,可以 ```none ├── VOCdevkit │ │ ├── VOC2012 + | | │ │ ├── Annotations + | | │ │ ├── ImageSets + | | │ │ ├── JPEGImages + | | │ │ ├── SegmentationClass + | | │ │ ├── SegmentationObject │ │ ├── VOC2010 + | | │ │ ├── Annotations + | | │ │ ├── ImageSets + | | │ │ ├── JPEGImages + | | │ │ ├── SegmentationClass + | | │ │ ├── SegmentationObject ``` @@ -116,7 +126,17 @@ FCN8s是一个经典的语义分割网络,FCN8s使用全卷积结构,可以 ```none ├── VOCdevkit │ │ ├── VOC2012 + | | │ │ ├── Annotations + | | │ │ ├── ImageSets + | | │ │ ├── JPEGImages + | | │ │ ├── SegmentationClass + | | │ │ ├── SegmentationObject │ │ ├── VOC2010 + | | │ │ ├── Annotations + | | │ │ ├── ImageSets + | | │ │ ├── JPEGImages + | | │ │ ├── SegmentationClass + | | │ │ ├── SegmentationObject │ │ ├── VOCaug ``` 2. 数据预处理。 -- Gitee