diff --git a/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/README.md b/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/README.md index 0603392e57756570da27a8ff476e610cc8b88b66..6524016533d4e4d28a87aededdfb0b9a3da91d80 100644 --- a/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/README.md +++ b/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/README.md @@ -1,6 +1,183 @@ -# GANs_Tensorflow_V2 - GANs with tensorflow2.1, Using Customization Models - Giving samples to show the template for GANs‘ codes when use Tensorflow2.0, Customization Models, Python. Most on MNIST. +- [基本信息](#基本信息.md) +- [概述](#概述.md) +- [训练环境准备](#训练环境准备.md) +- [快速上手](#快速上手.md) +- [迁移学习指导](#迁移学习指导.md) +- [高级参考](#高级参考.md) - 给出了在TensorFlow2.0下,自定义模型搭建GAN’s的例子,展示了一般的TensorFlow2.0 复现GAN‘s的模式。参考了谷歌官方的代码。 +

基本信息

+**发布者(Publisher):Huawei** + +**应用领域(Application Domain): Image Synthesis** + +**版本(Version):1.1** + +**修改时间(Modified) :2022.04.11** + +**大小(Size):6.9M** + +**框架(Framework):TensorFlow_2.4.1** + +**模型格式(Model Format):ckpt** + +**精度(Precision):Mixed** + +**处理器(Processor):昇腾910** + +**应用级别(Categories):Research** + +**描述(Description):基于wasserstein loss的生成对抗网络** + +

概述

+ + 传统GAN网络理论上来说,如果两个分布不相交,则JS散度将不再是连续的,因此将不可微,从而导致梯度为0。WGAN通过使用wasserstein loss解决了这个问题,使得loss函数在任何地方都连续且可微。 + +- 参考论文: + + [https://arxiv.org/abs/1701.07875](https://arxiv.org/abs/1701.07875) + +- 参考实现: + + [https://github.com/Zhaopudark/GANs_TensorflowV2](https://github.com/Zhaopudark/GANs_TensorflowV2) + +- 适配昇腾 AI 处理器的实现: + + [https://gitee.com/ascend/ModelZoo-TensorFlow/tree/master/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X](https://gitee.com/ascend/ModelZoo-TensorFlow/tree/master/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X) + +- 通过Git获取对应commit\_id的代码方法如下: + ``` + git clone {repository_url} # 克隆仓库的代码 + cd {repository_name} # 切换到模型的代码仓目录 + git checkout {branch} # 切换到对应分支 + git reset --hard {commit_id} # 代码设置到对应的commit_id + cd {code_path} # 切换到模型代码所在路径,若仓库下只有该模型,则无需切换 + ``` + +## 默认配置 + +- 主要训练超参(单卡): + - batch_size: 128 + - epochs: 400 + - lr: 0.001 + +## 支持特性 + +| 特性列表 | 是否支持 | +| ---------- | -------- | +| 分布式训练 | 否 | +| 混合精度 | 是 | +| 数据并行 | 否 | + +## 混合精度训练 + +昇腾910 AI处理器提供自动混合精度功能,可以针对全网中float32数据类型的算子,按照内置的优化策略,自动将部分float32的算子降低精度到float16,从而在精度损失很小的情况下提升系统性能并减少内存使用。 + +## 开启混合精度 + + +``` + npu_device.global_options().precision_mode='allow_mix_precision' + npu_device.open().as_default() +``` + + +

训练环境准备

+ +- 硬件环境和运行环境准备请参见《[CANN软件安装指南](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373?category=installation-update)》 +- 运行以下命令安装依赖。 +``` +pip3 install requirements.txt +``` +说明:依赖配置文件requirements.txt文件位于模型的根目录 + +

快速上手

+ +## 数据集准备 + +1. 用户需自行下载MNIST训练数据集,应有如下结构 + ``` + cifar10/ + ├── mnist.npz + ├── t10k-images.idx3-ubyte + ├── t10k-labels.idx3-ubyte + ├── train-images.idx3-ubyte + ├── train-labels.idx3-ubyte + └── ... + ``` + +## 模型训练 + +- 单击“立即下载”,并选择合适的下载方式下载源码包。 +- 开始训练 + + 1. 启动训练之前,首先要配置程序运行相关环境变量。 + + 环境变量配置信息参见: + + [Ascend 910训练平台环境变量设置](https://gitee.com/ascend/modelzoo/wikis/Ascend%20910%E8%AE%AD%E7%BB%83%E5%B9%B3%E5%8F%B0%E7%8E%AF%E5%A2%83%E5%8F%98%E9%87%8F%E8%AE%BE%E7%BD%AE?sort_id=3148819) + + 2. 单卡训练 + + 2.1 配置train_full_1p.sh脚本中`data_path`(脚本路径GAN_ID2351_for_TensorFlow2.X/test/train_full_1p.sh),请用户根据实际路径配置,数据集参数如下所示: + + --data_path=/home/MNIST + + 2.2 1p指令如下: + + bash train_full_1p.sh --data_path=/home/MNIST + +

迁移学习指导

+ +- 数据集准备。 + + 1. 获取数据。 + 请参见“快速上手”中的数据集准备。 + +- 模型训练。 + + 参考“模型训练”中训练步骤。 + +- 模型评估。 + + 参考“模型训练”中验证步骤。 + +

高级参考

+ +## 脚本和示例代码 + +``` +convmixer_ID2501_for_TensorFlow2.X/ +├── LICENSE +├── modelzoo_level.txt +├── README.md +├── requirements.txt +├── tf_v2_03_WGAN.py +├── test +│   ├── train_full_1p.sh +│   ├── train_performance_1p_static_eval.sh +│   ├── train_performance_1p_dynamic_eval.sh + +``` + +## 脚本参数 + +``` +--data_path 训练数据集路径 +--train_epochs 训练epoch设置 +--batch_size 训练bs设置 +``` + +## 训练过程 + +1. 通过“模型训练”中的训练指令启动单卡训练。 +2. 将训练脚本(train_full_1p.sh)中的data_path设置为训练数据集的路径。具体的流程参见“模型训练”的示例。 +3. 模型存储路径为“curpath/output/ASCEND_DEVICE_ID”,包括训练的log文件。 +4. 以多卡训练为例,loss信息在文件curpath/output/{ASCEND_DEVICE_ID}/train_${ASCEND_DEVICE_ID}.log中。 + +## 推理/验证过程 + +``` + NA + +``` diff --git a/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/configs/ops_info.json b/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/configs/ops_info.json new file mode 100644 index 0000000000000000000000000000000000000000..d0ed0b5c214d886e3e7b4d2823b5f1cc38e9f0eb --- /dev/null +++ b/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/configs/ops_info.json @@ -0,0 +1,21 @@ +{ + "black-list":{ + "to-add":[ + "SquaredDifference", + "AddN", + "Add", + "Relu", + "Sigmoid", + "Assign", + "Minimum", + "Square", + "Sub", + "Mul", + "RealDiv", + "ConfusionSoftmaxGrad", + "ReduceSumD", + "SoftmaxCrossEntropyWithLogits", + "StridedSliceD" + ] + } +} \ No newline at end of file diff --git a/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/test/train_full_1p.sh b/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/test/train_full_1p.sh index cc587acec8d29a73367fa02bea32aa8530b1c80f..b309bfe1639355b508f84590a9f1ba9398153a98 100644 --- a/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/test/train_full_1p.sh +++ b/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/test/train_full_1p.sh @@ -34,7 +34,7 @@ fi data_dump_flag=False data_dump_step="10" profiling=False -use_mixlist=False +use_mixlist=True mixlist_file="./configs/ops_info.json" fusion_off_flag=False fusion_off_file="./configs/fusion_switch.cfg" @@ -153,7 +153,7 @@ CaseName=${Network}_bs${batch_size}_${RankSize}'p'_'acc' TrainingTime=`grep "Time" $cur_path/test/output/$ASCEND_DEVICE_ID/train_$ASCEND_DEVICE_ID.log |awk 'END{print $6}'` wait ActualFPS=`awk 'BEGIN{printf "%.2f\n",'${batch_size}'/'${TrainingTime}'}'` -train_accuracy="None" +train_accuracy=`grep -a 'd_loss:' $cur_path/test/output/$ASCEND_DEVICE_ID/train_${ASCEND_DEVICE_ID}.log | awk 'END{print $2}'` ##获取性能数据,不需要修改 #从train_$ASCEND_DEVICE_ID.log提取Loss到train_${CaseName}_loss.txt中,需要根据模型审视 diff --git a/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/test/train_performance_1p.sh b/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/test/train_performance_1p.sh index 1dc2fa1b1e9c21a8c7fb836de163810920ef53d0..1f809a00304e0cf637ea5da4f6d722d6d646b08b 100644 --- a/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/test/train_performance_1p.sh +++ b/TensorFlow2/built-in/GAN_ID2351_for_TensorFlow2.X/test/train_performance_1p.sh @@ -34,7 +34,7 @@ fi data_dump_flag=False data_dump_step="10" profiling=False -use_mixlist=False +use_mixlist=True mixlist_file="./configs/ops_info.json" fusion_off_flag=False fusion_off_file="./configs/fusion_switch.cfg" @@ -153,7 +153,7 @@ CaseName=${Network}_bs${batch_size}_${RankSize}'p'_'perf' TrainingTime=`grep "Time" $cur_path/test/output/$ASCEND_DEVICE_ID/train_$ASCEND_DEVICE_ID.log |awk 'END{print $6}'` wait ActualFPS=`awk 'BEGIN{printf "%.2f\n",'${batch_size}'/'${TrainingTime}'}'` -train_accuracy="None" +train_accuracy=`grep -a 'd_loss:' $cur_path/test/output/$ASCEND_DEVICE_ID/train_${ASCEND_DEVICE_ID}.log | awk 'END{print $2}'` ##获取性能数据,不需要修改 #从train_$ASCEND_DEVICE_ID.log提取Loss到train_${CaseName}_loss.txt中,需要根据模型审视 diff --git a/TensorFlow2/built-in/cv/image_classification/ResNet50_ID0360_for_TensorFlow2.X/README.md b/TensorFlow2/built-in/cv/image_classification/ResNet50_ID0360_for_TensorFlow2.X/README.md index 20ec3f19cbc6e1e1f6f3e8150f1b10208b84f2c1..f1d3bf7ab1a2ad7254b53d403878d039a4e57f16 100644 --- a/TensorFlow2/built-in/cv/image_classification/ResNet50_ID0360_for_TensorFlow2.X/README.md +++ b/TensorFlow2/built-in/cv/image_classification/ResNet50_ID0360_for_TensorFlow2.X/README.md @@ -240,7 +240,7 @@ npu_device.global_options().precision_mode=FLAGS.precision_mode 3.1 8卡训练指令(脚本位于ResNet50_ID0360_for_TensorFlow2.X/test/train_full_8p_256bs_SGD.sh),请确保下面例子中的“--data_path”修改为用户的ImageNet的路径。 - bash test/train_full_8p_128bs.sh --data_path=/home/ImageNet + bash test/train_full_8p_256bs_SGD.sh --data_path=/home/ImageNet