diff --git a/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/README.md b/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/README.md index 5845a54387dee110a2985fdc7511185592ac7801..8b80e895d7b0e11cc217c78ab96a229b75a950c7 100644 --- a/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/README.md +++ b/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/README.md @@ -1,176 +1,214 @@ -# agegenderLMTCNN - -## 概述 - -agegenderLMTCNN是一种同时预测年龄和性别的网络。 - -- 参考论文 [Joint Estimation of Age and Gender from Unconstrained Face Images using Lightweight Multi-task CNN for Mobile Applications](https://arxiv.org/abs/1806.02023)。 - -- 参考项目https://github.com/ivclab/agegenderLMTCNN - -## 默认配置 - -- 数据图片resize为227*227 -- 训练超参: - - image_size:227 - - batch_size:32 - - max_steps:50000 - - steps_per_decay:10000 - - lr:0.01 - - eta_decay_rate:0.1 - -## 支持特性 - -| 特性列表 | 是否支持 | -| ---------- | -------- | -| 分布式训练 | 否 | -| 混合精度 | 否 | -| 数据并行 | 否 | - -## 文件目录 - -data.py:读取训练数据 - -datapreparation.py:将原始数据拆分为训练集、验证集和测试集,以进行五折交叉验证。此项目已在 DataPreparation/FiveFolds/train_val_test_per_fold_agegender 中生成此 txt 文件。 - -download_adiencedb.py:下载Adience 数据集。 - -download_model.py:下载训练好的模型。 - -eval.py:评估 LMTCNN 模型。 - -model.py:定义网络。 - -- multipreproc.py:预处理原始数据,在tfrecord目录下生成训练集、验证集和测试集的tfrecord文件。 - - -train.py:训练模型文件 - -util.py - -## 训练环境准备 - -- Python 3.0及以上 -- Numpy -- OpenCV -- [TensorFlow](https://www.tensorflow.org/install/install_linux) 1.15 -- 昇腾NPU环境 - -## 快速上手 - -### 数据集准备 - -从[谷歌云端](https://docs.google.com/uc?export=download&id=11Zv__6WvbjtovcQzZOjELdscmyx63Gov )下载对齐后的Adience数据集,或者从Adience官网下载数据集并自行对齐。运行multipreproc.py文件生成train, test,valid的tfrecord文件。生成的数据集按照五等分交叉验证分别存储在test_fold_is_0到test_fold_is_4文件夹下。 - -### 模型训练 - -参考下方的训练过程。参数可自行定义。 - -预训练模型下载链接:https://pan.baidu.com/s/1JniERocb6wBcOG23qiRbYA -提取码:e722 - -## 高级参考 - -### 脚本和示例代码 - -``` -|-- LICENSE -|-- README.md -|-- data.py 数据读入 -|-- datapreparation.py 拆分为训练集、验证集和测试集 -|-- download_adiencedb.py 下载adience数据库 -|-- download_model.py 下载预训练好的模型 -|-- eval.py 测试模型 -|-- model.py 网络定义文件 -|-- multipreproc.py 生成tfrecord -|-- modelzoo_level.txt -|-- requirements.txt -|-- script 运行脚本 -| |-- evalfold1.sh -| |-- evalfold2.sh -| |-- evalfold3.sh -| |-- evalfold4.sh -| |-- evalfold5.sh -| |-- trainfold1.sh -| |-- trainfold2.sh -| |-- trainfold3.sh -| |-- trainfold4.sh -| `-- trainfold5.sh -|-- train.py 训练模型 -`-- utils.py - -``` - -### 脚本参数 - -``` - --model_type LMTCNN-1-1/LMTCNN-2-1 - --pre_checkpoint_path restore this pretrained model before beginning any training - --data_dir the root path of tfrecords - --model_dir the root of store models - --batch_size batch size for training - --image_size 227 - --eta Learning rate - --pdrop Dropout probability - --max_steps Number of iterations - --steps_per_decay step for starting decay learning_rate - --eta_decay_rate learning rate decay - --epochs Number of epochs -``` - -### 训练过程 - -```bash -# 根据数据集路径修改train_dir的值。根据数据集的不同划分分别运行trainfold1.sh到trainfold5.sh -$ ./script/trainfold1.sh ~ $ ./script/trainfold5t.sh : -``` - -### 验证过程 - -```bash -# 根据数据集路径修改train_dir的值。根据数据集的不同划分分别运行trainfold1.sh到trainfold5.sh -$ ./script/evalfold1.sh ~ $ ./script/evalfold5.sh -``` - -## 训练精度 - -五等分交叉验证的GPU与NPU运行结果如下: - -| GPU | Age(Top-1)(Acc) | Age(Top-2)(Acc) | Gender(Acc) | -| ---- | ------------------- | ------------------- | ------------- | -| 0 | 49.06 | 73.42 | 83.63 | -| 1 | 37.08 | 60.61 | 80.89 | -| 2 | 44.81 | 70.06 | 79.34 | -| 3 | 40.73 | 65.35 | 80.74 | -| 4 | 38.92 | 64.18 | 77.60 | -| Ave | 42.12 | 66.72 | 80.44 | - - - -| NPU | Age(Top-1)(Acc) | Age(Top-2)(Acc) | Gender(Acc) | -| ---- | ------------------- | ------------------- | ------------- | -| 0 | 45.11 | 68.75 | 80.95 | -| 1 | 36.43 | 58.50 | 78.52 | -| 2 | 41.09 | 66.37 | 77.25 | -| 3 | 41.68 | 64.37 | 80.64 | -| 4 | 40.10 | 65.25 | 79.25 | -| Ave | 40.88 | 64.65 | 79.32 | - - - -| Ave | Age(Top-1)(Acc) | Age(Top-2)(Acc) | Gender(Acc) | -| ---- | ------------------- | ------------------- | ------------- | -| 论文 | 40.84 | 66.10 | 82.04 | -| GPU | 42.12 | 66.72 | 80.44 | -| NPU | 40.88 | 64.65 | 79.32 | - - - -## 训练性能对比 - -经过1000个batch训练后的平均性能分别如下: - -| GPU NVIDIA V100 | NPU | -| ------------- | -------------- | -| 0.084 s/batch | 0.139 s/batch | - +# agegenderLMTCNN + +## 概述 + +agegenderLMTCNN是一种同时预测年龄和性别的网络。 + +- 参考论文 [Joint Estimation of Age and Gender from Unconstrained Face Images using Lightweight Multi-task CNN for Mobile Applications](https://arxiv.org/abs/1806.02023)。 + +- 参考项目https://github.com/ivclab/agegenderLMTCNN + +## 默认配置 + +- 数据图片resize为227*227 +- 训练超参: + - image_size:227 + - batch_size:32 + - max_steps:50000 + - steps_per_decay:10000 + - lr:0.01 + - eta_decay_rate:0.1 + +## 支持特性 + +| 特性列表 | 是否支持 | +| ---------- | -------- | +| 分布式训练 | 否 | +| 混合精度 | 是 | +| 数据并行 | 否 | + +## 混合精度训练 + +昇腾910 AI处理器提供自动混合精度功能,可以针对全网中float32数据类型的算子,按照内置的优化策略,自动将部分float32的算子降低精度到float16,从而在精度损失很小的情况下提升系统性能并减少内存使用。 + +## 开启混合精度 + +脚本已默认开启混合精度,设置precision_mode参数的脚本参考如下。 + +```python +config = tf.ConfigProto() +custom_op = config.graph_options.rewrite_options.custom_optimizers.add() +custom_op.name = "NpuOptimizer" +custom_op.parameter_map["use_off_line"].b = True +custom_op.parameter_map["precision_mode"].s = tf.compat.as_bytes("allow_mix_precision") +config.graph_options.rewrite_options.remapping = RewriterConfig.OFF # 关闭remap开关 +config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF +``` + + + +## 文件目录 + +data.py:读取训练数据 + +datapreparation.py:将原始数据拆分为训练集、验证集和测试集,以进行五折交叉验证。此项目已在 DataPreparation/FiveFolds/train_val_test_per_fold_agegender 中生成此 txt 文件。 + +download_adiencedb.py:下载Adience 数据集。 + +download_model.py:下载训练好的模型。 + +eval.py:评估 LMTCNN 模型。 + +model.py:定义网络。 + +- multipreproc.py:预处理原始数据,在tfrecord目录下生成训练集、验证集和测试集的tfrecord文件。 + + +train.py:训练模型文件 + +util.py + +## 训练环境准备 + +- Python 3.0及以上 +- Numpy +- OpenCV +- [TensorFlow](https://www.tensorflow.org/install/install_linux) 1.15 +- 昇腾NPU环境 + +## 快速上手 + +### 数据集准备 + +从[谷歌云端](https://docs.google.com/uc?export=download&id=11Zv__6WvbjtovcQzZOjELdscmyx63Gov )下载对齐后的Adience数据集,或者从Adience官网下载数据集并自行对齐。运行multipreproc.py文件生成train, test,valid的tfrecord文件。生成的数据集按照五等分交叉验证分别存储在test_fold_is_0到test_fold_is_4文件夹下。 + +### 模型训练 + +参考下方的训练过程。参数可自行定义。 + +预训练模型下载链接:https://pan.baidu.com/s/1JniERocb6wBcOG23qiRbYA +提取码:e722 + +## 高级参考 + +### 脚本和示例代码 + +``` +|-- LICENSE +|-- README.md +|-- data.py 数据读入 +|-- datapreparation.py 拆分为训练集、验证集和测试集 +|-- download_adiencedb.py 下载adience数据库 +|-- download_model.py 下载预训练好的模型 +|-- eval.py 测试模型 +|-- model.py 网络定义文件 +|-- modelarts_entry_acc.py +|-- modelarts_entry_perf.py +|-- multipreproc.py 生成tfrecord +|-- modelzoo_level.txt +|-- requirements.txt +|-- script 运行脚本 +| |-- evalfold1.sh +| |-- evalfold2.sh +| |-- evalfold3.sh +| |-- evalfold4.sh +| |-- evalfold5.sh +| |-- trainfold1.sh +| |-- trainfold2.sh +| |-- trainfold3.sh +| |-- trainfold4.sh +| `-- trainfold5.sh +|-- test +| |-- train_full_1p.sh +| `-- train_performance_1p.sh +|-- train.py 训练模型 +`-- utils.py + +``` + +### 脚本参数 + +``` + --model_type LMTCNN-1-1/LMTCNN-2-1 + --pre_checkpoint_path restore this pretrained model before beginning any training + --data_dir the root path of tfrecords + --model_dir the root of store models + --batch_size batch size for training + --image_size 227 + --eta Learning rate + --pdrop Dropout probability + --max_steps Number of iterations + --steps_per_decay step for starting decay learning_rate + --eta_decay_rate learning rate decay + --epochs Number of epochs +``` + +### 训练过程 + +```bash +# 根据数据集路径修改train_dir的值。根据数据集的不同划分分别运行trainfold1.sh到trainfold5.sh +$ ./script/trainfold1.sh ~ $ ./script/trainfold5t.sh : +``` + +在Ascend 910芯片上进行训练的部分日志如下: + +```bash +step 1790, ageloss= 0.934, genderloss= 0.129 , totalloss= 1.065 (297.0 examples/sec; 0.108 sec/batch) +step 1800, ageloss= 0.664, genderloss= 0.089 , totalloss= 0.755 (295.0 examples/sec; 0.108 sec/batch) +step 1810, ageloss= 0.555, genderloss= 0.093 , totalloss= 0.650 (297.0 examples/sec; 0.108 sec/batch) +step 1820, ageloss= 0.545, genderloss= 0.072 , totalloss= 0.619 (347.5 examples/sec; 0.092 sec/batch) +step 1830, ageloss= 1.193, genderloss= 0.088 , totalloss= 1.283 (321.4 examples/sec; 0.100 sec/batch) +step 1840, ageloss= 0.486, genderloss= 0.402 , totalloss= 0.890 (241.0 examples/sec; 0.133 sec/batch) +step 1850, ageloss= 0.600, genderloss= 0.138 , totalloss= 0.740 (307.1 examples/sec; 0.104 sec/batch) +step 1860, ageloss= 0.530, genderloss= 0.087 , totalloss= 0.618 (310.9 examples/sec; 0.103 sec/batch) +``` + +### 验证过程 + +```bash +# 根据数据集路径修改train_dir的值。根据数据集的不同划分分别运行trainfold1.sh到trainfold5.sh +$ ./script/evalfold1.sh ~ $ ./script/evalfold5.sh +``` + +## 训练精度 + +五等分交叉验证的GPU与NPU运行结果如下: + +| GPU | Age(Top-1)(Acc) | Age(Top-2)(Acc) | Gender(Acc) | +| ---- | ------------------- | ------------------- | ------------- | +| 0 | 49.06 | 73.42 | 83.63 | +| 1 | 37.08 | 60.61 | 80.89 | +| 2 | 44.81 | 70.06 | 79.34 | +| 3 | 40.73 | 65.35 | 80.74 | +| 4 | 38.92 | 64.18 | 77.60 | +| Ave | 42.12 | 66.72 | 80.44 | + + + +| NPU | Age(Top-1)(Acc) | Age(Top-2)(Acc) | Gender(Acc) | +| ---- | ------------------- | ------------------- | ------------- | +| 0 | 45.11 | 68.75 | 80.95 | +| 1 | 36.43 | 58.50 | 78.52 | +| 2 | 41.09 | 66.37 | 77.25 | +| 3 | 41.68 | 64.37 | 80.64 | +| 4 | 40.10 | 65.25 | 79.25 | +| Ave | 40.88 | 64.65 | 79.32 | + + + +| Ave | Age(Top-1)(Acc) | Age(Top-2)(Acc) | Gender(Acc) | +| ---- | ------------------- | ------------------- | ------------- | +| 论文 | 40.84 | 66.10 | 82.04 | +| GPU | 42.12 | 66.72 | 80.44 | +| NPU | 40.88 | 64.65 | 79.32 | + + + +## 训练性能对比 + + + +| GPU NVIDIA V100 | NPU | +| --------------- | ------------ | +| 0.084 s/batch | 0.04 s/batch | + diff --git a/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/modelarts_entry_acc.py b/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/modelarts_entry_acc.py new file mode 100644 index 0000000000000000000000000000000000000000..13077b10e660de32d6f7861257a50e1a01ede9ba --- /dev/null +++ b/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/modelarts_entry_acc.py @@ -0,0 +1,63 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import argparse +import sys + +# 解析输入参数data_url +parser = argparse.ArgumentParser() +parser.add_argument("--data_url", type=str, default="/home/ma-user/modelarts/inputs/data_url_0") +parser.add_argument("--train_url", type=str, default="/home/ma-user/modelarts/outputs/train_url_0/") +config = parser.parse_args() + +print("[CANN-Modelzoo] code_dir path is [%s]" % (sys.path[0])) +code_dir = sys.path[0] +os.chdir(code_dir) +print("[CANN-Modelzoo] work_dir path is [%s]" % (os.getcwd())) + +print("[CANN-Modelzoo] before train - list my run files:") +os.system("ls -al /usr/local/Ascend/ascend-toolkit/") + +print("[CANN-Modelzoo] before train - list my dataset files:") +os.system("ls -al %s" % config.data_url) + +print("[CANN-Modelzoo] start run train shell") +# 设置sh文件格式为linux可执行 +os.system("dos2unix ./test/*") + +# 执行train_full_1p.sh或者train_performance_1p.sh,需要用户自己指定 +# full和performance的差异,performance只需要执行很少的step,控制在15分钟以内,主要关注性能FPS +os.system("bash ./test/train_full_1p.sh --data_path=%s --output_path=%s " % (config.data_url, config.train_url)) + +print("[CANN-Modelzoo] finish run train shell") + +# 将当前执行目录所有文件拷贝到obs的output进行备份 +print("[CANN-Modelzoo] after train - list my output files:") +os.system("cp -r %s %s " % (code_dir, config.train_url)) +os.system("ls -al %s" % config.train_url) diff --git a/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/modelarts_entry_perf.py b/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/modelarts_entry_perf.py new file mode 100644 index 0000000000000000000000000000000000000000..14384e227a0fa90a514254590aef5078c62ff700 --- /dev/null +++ b/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/modelarts_entry_perf.py @@ -0,0 +1,63 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import argparse +import sys + +# 解析输入参数data_url +parser = argparse.ArgumentParser() +parser.add_argument("--data_url", type=str, default="/home/ma-user/modelarts/inputs/data_url_0") +parser.add_argument("--train_url", type=str, default="/home/ma-user/modelarts/outputs/train_url_0/") +config = parser.parse_args() + +print("[CANN-Modelzoo] code_dir path is [%s]" % (sys.path[0])) +code_dir = sys.path[0] +os.chdir(code_dir) +print("[CANN-Modelzoo] work_dir path is [%s]" % (os.getcwd())) + +print("[CANN-Modelzoo] before train - list my run files:") +os.system("ls -al /usr/local/Ascend/ascend-toolkit/") + +print("[CANN-Modelzoo] before train - list my dataset files:") +os.system("ls -al %s" % config.data_url) + +print("[CANN-Modelzoo] start run train shell") +# 设置sh文件格式为linux可执行 +os.system("dos2unix ./test/*") + +# 执行train_full_1p.sh或者train_performance_1p.sh,需要用户自己指定 +# full和performance的差异,performance只需要执行很少的step,控制在15分钟以内,主要关注性能FPS +os.system("bash ./test/train_performance_1p.sh --data_path=%s --output_path=%s " % (config.data_url, config.train_url)) + +print("[CANN-Modelzoo] finish run train shell") + +# 将当前执行目录所有文件拷贝到obs的output进行备份 +print("[CANN-Modelzoo] after train - list my output files:") +os.system("cp -r %s %s " % (code_dir, config.train_url)) +os.system("ls -al %s" % config.train_url) diff --git a/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/modelzoo_level.txt b/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/modelzoo_level.txt index 95fe04bd41c44a3ebef4b5830425c55eb7fa01e3..9836fdf6176262bcbbc7d1ea4bd856edc6b8b1c6 100644 --- a/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/modelzoo_level.txt +++ b/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/modelzoo_level.txt @@ -1,5 +1,5 @@ -FuncStatus:OK -PerfStatus:POK -GPUStatus:OK -NPUMigrationStatus:POK +FuncStatus:OK +PerfStatus:OK +GPUStatus:OK +NPUMigrationStatus:POK PrecisionStatus:OK \ No newline at end of file diff --git a/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/train.py b/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/train.py index a02c7034510bb0e983bf2a0d30255092cb842c19..f1dabf63ba0386b280733503135ed2665e94f1c7 100644 --- a/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/train.py +++ b/TensorFlow/contrib/cv/LMTCNN_ID1278_for_TensorFlow/train.py @@ -1,275 +1,305 @@ -""" Rude Carnie: Age and Gender Deep Learning with Tensorflow found at -https://github.com/dpressel/rude-carnie -""" -# ============================================================================== -# MIT License -# -# Modifications copyright (c) 2018 Image & Vision Computing Lab, Institute of Information Science, Academia Sinica -# -# Permission is hereby granted, free of charge, to any person obtaining a copy -# of this software and associated documentation files (the "Software"), to deal -# in the Software without restriction, including without limitation the rights -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -# copies of the Software, and to permit persons to whom the Software is -# furnished to do so, subject to the following conditions: -# -# The above copyright notice and this permission notice shall be included in all -# copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -# SOFTWARE. -# ============================================================================== -#!/usr/bin/env python -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -from npu_bridge.npu_init import * -from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig -from six.moves import xrange -from datetime import datetime -import time -import os -import numpy as np -import tensorflow as tf -from data import multiinputs -from model import select_model -import json -import re -import sys -code_dir = os.path.dirname(__file__) -work_path = os.path.join(code_dir ,'../') -sys.path.append(work_path) -#import moxing as mox -from pdb import set_trace as bp - -LAMBDA = 0.01 -MOM = 0.9 - -tf.app.flags.DEFINE_boolean('multitask', True, 'Whether utilize multitask model') -tf.app.flags.DEFINE_string('model_type', 'LMTCNN-1-1','choose model structure. LMTCNN and mobilenet_multitask for multitask. inception, levi_hassner_bn and levi_hassner for singletask ') -tf.app.flags.DEFINE_string('class_type', '','select which single task to train (Age or Gender), only be utilized when multitask=False and choose single task model_type') - -tf.app.flags.DEFINE_string('pre_checkpoint_path', '','if specified, restore this pretrained model before beginning any training.') -tf.app.flags.DEFINE_string('data_dir','./tfrecord/train_val_test_per_fold_agegender/test_fold_is_0','training age and gender directory.') -tf.app.flags.DEFINE_string('model_dir','./models','store models before training') -tf.app.flags.DEFINE_boolean('log_device_placement', False,'Whether to log device placement.') - -tf.app.flags.DEFINE_integer('num_preprocess_threads', 4, 'Number of preprocessing threads') -tf.app.flags.DEFINE_string('optim', 'Momentum','Optimizer') -tf.app.flags.DEFINE_integer('image_size', 227, 'Image size') -tf.app.flags.DEFINE_float('eta', 0.01,'Learning rate') -tf.app.flags.DEFINE_float('pdrop', 0.,'Dropout probability') -tf.app.flags.DEFINE_integer('max_steps', 50000,'Number of iterations') -tf.app.flags.DEFINE_integer('steps_per_decay', 10000,'Number of steps before learning rate decay') -tf.app.flags.DEFINE_float('eta_decay_rate', 0.1, 'learning rate decay') -tf.app.flags.DEFINE_integer('epochs', -1,'Number of epochs') -tf.app.flags.DEFINE_integer('batch_size', 32,'Batch size') -tf.app.flags.DEFINE_string('checkpoint', 'checkpoint','Checkpoint name') -#tf.app.flags.DEFINE_string('data_url', './dataset', "data_root") -tf.app.flags.DEFINE_string('train_url', './log', "output_root") -# inception_v3.ckpt -tf.app.flags.DEFINE_string('pre_model','False', 'checkpoint file') - -FLAGS = tf.app.flags.FLAGS - -def exponential_staircase_decay(at_step=10000, decay_rate=0.1): - - print('decay [%f] every [%d] steps' % (decay_rate, at_step)) - - def _decay(lr, global_step): - return tf.train.exponential_decay(lr, global_step, at_step, decay_rate, staircase=True) - - return _decay - -def optimizer(optim, eta, loss_fn, at_step, decay_rate): - - global_step = tf.Variable(0, trainable=False) - optz = optim - if optim == 'Adadelta': - optz = lambda lr: tf.train.AdadeltaOptimizer(lr, 0.95, 1e-6) - lr_decay_fn = None - elif optim == 'Momentum': - optz = lambda lr: tf.train.MomentumOptimizer(lr, MOM) - lr_decay_fn = exponential_staircase_decay(at_step, decay_rate) - - return tf.contrib.layers.optimize_loss(loss_fn, global_step, eta, optz, clip_gradients=4., learning_rate_decay_fn=lr_decay_fn) - -def loss(logits, labels): - - labels = tf.cast(labels, tf.int32) - cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( - logits=logits, labels=labels, name='cross_entropy_per_example') - cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy') - tf.add_to_collection('losses', cross_entropy_mean) - losses = tf.get_collection('losses') - regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) - total_loss = cross_entropy_mean + LAMBDA * sum(regularization_losses) - tf.summary.scalar('tl (raw)', total_loss) - loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg') - loss_averages_op = loss_averages.apply(losses + [total_loss]) - for l in losses + [total_loss]: - tf.summary.scalar(l.op.name + ' (raw)', l) - tf.summary.scalar(l.op.name, loss_averages.average(l)) - with tf.control_dependencies([loss_averages_op]): - total_loss = tf.identity(total_loss) - - return total_loss - -def multiloss(agelogits, agelabels, genderlogits, genderlabels): - - agelabels = tf.cast(agelabels, tf.int32) - genderlabels = tf.cast(genderlabels, tf.int32) - - age_cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( - logits=agelogits, labels=agelabels, name='cross_entropy_per_example_age') - gender_cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( - logits=genderlogits, labels=genderlabels, name='cross_entropy_per_example_gender') - - age_cross_entropy_mean = tf.reduce_mean(age_cross_entropy, name='cross_entropy_age') - gender_cross_entropy_mean = tf.reduce_mean(gender_cross_entropy, name='cross_entropy_gender') - - tf.add_to_collection('agelosses', age_cross_entropy_mean) - tf.add_to_collection('genderlosses', gender_cross_entropy_mean) - - agelosses = tf.get_collection('agelosses') - genderlosses = tf.get_collection('genderlosses') - regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) - totallosses = age_cross_entropy_mean+gender_cross_entropy_mean+LAMBDA*sum(regularization_losses) - tf.summary.scalar('tl total (raw)', totallosses) - - loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg') - loss_averages_op = loss_averages.apply(agelosses+genderlosses+[totallosses]) - - for l in agelosses+genderlosses+[totallosses]: - tf.summary.scalar(l.op.name + '(raw)', l) - tf.summary.scalar(l.op.name , loss_averages.average(l)) - with tf.control_dependencies([loss_averages_op]): - totallosses=tf.identity(totallosses) - - return agelosses, genderlosses, totallosses - -def main(argv=None): - - if not os.path.exists(FLAGS.model_dir): - os.mkdir(FLAGS.model_dir) - folddirlist = FLAGS.data_dir.split(os.sep) - subdir = FLAGS.model_dir+os.sep+folddirlist[-2] - if not os.path.exists(subdir): - os.mkdir(subdir) - savemodeldir = subdir+os.sep+folddirlist[-1] - if not os.path.exists(savemodeldir): - os.mkdir(savemodeldir) - - if FLAGS.multitask: - - with tf.Graph().as_default(): - model_fn = select_model(FLAGS.model_type) - # Open the metadata file and figure out nlabels, and size of epoch - input_file_age = os.path.join(FLAGS.data_dir, 'mdage.json') - input_file_gender = os.path.join(FLAGS.data_dir, 'mdgender.json') - with open(input_file_age,'r') as fage: - mdage = json.load(fage) - with open(input_file_gender,'r') as fgender: - mdgender = json.load(fgender) - with tf.device('/cpu:0'): - images_holder = tf.placeholder(tf.float32, shape=[FLAGS.batch_size, 227, 227, 3]) - # agelabels_holder = tf.placeholder(tf.int64,shape = [FLAGS.batch_size],name = 'agelabels_holder') - # genderlabels_holder = tf.placeholder(tf.int64,shape = [FLAGS.batch_size],name = 'genderlabels_holder') - - agelabels_holder = tf.placeholder(tf.int32, shape=[FLAGS.batch_size]) - genderlabels_holder = tf.placeholder(tf.int32, shape=[FLAGS.batch_size]) - - agelogits, genderlogits = model_fn(mdage['nlabels'], images_holder, mdgender['nlabels'], images_holder,1 - FLAGS.pdrop, True) - agelosses, genderlosses, totallosses = multiloss(agelogits, agelabels_holder, genderlogits,genderlabels_holder) - agegendertrain_op = optimizer(FLAGS.optim, FLAGS.eta, totallosses, FLAGS.steps_per_decay,FLAGS.eta_decay_rate) - - saver = tf.train.Saver(tf.global_variables()) - summary_op = tf.summary.merge_all() - - sess = tf.Session(config=npu_config_proto(config_proto=tf.ConfigProto(allow_soft_placement = True, log_device_placement=FLAGS.log_device_placement))) - - - tf.global_variables_initializer().run(session=sess) - - # fine-tune dp_multitask and mobilenet_multitask - if FLAGS.pre_checkpoint_path: - print('Trying to restore checkpoint from %s ' % FLAGS.pre_checkpoint_path) - - if FLAGS.model_type is 'LMTCNN': - all_variables = tf.get_collection(tf.GraphKeys.VARIABLES, scope="multitaskdpcnn") - elif FLAGS.model_type is 'mobilenet_multitask': - all_variables = tf.get_collection(tf.GraphKeys.VARIABLES, scope="MobileNetmultitask") - - age_variables = tf.get_collection(tf.GraphKeys.VARIABLES, scope="ageoutput") - gender_variables = tf.get_collection(tf.GraphKeys.VARIABLES, scope="genderoutput") - all_variables.extend(age_variables) - all_variables.extend(gender_variables) - restorer = tf.train.Saver(all_variables) - restorer.restore(sess, FLAGS.pre_checkpoint_path) - - print('%s: Pre-trained model restored from %s' % (datetime.now(), FLAGS.pre_checkpoint_path)) - - #run_dir = '%s/%s-run-%d' %(savemodeldir, FLAGS.model_type, os.getpid()) - run_dir = '%s/%s-run' %(savemodeldir, FLAGS.model_type) - checkpoint_path = '%s/%s' % (run_dir, FLAGS.checkpoint) - if tf.gfile.Exists(run_dir) is False: - print('Creating %s' % run_dir) - tf.gfile.MakeDirs(run_dir) - - tf.train.write_graph(sess.graph_def, run_dir, 'agegendermodel.pb', as_text=True) - tf.train.start_queue_runners(sess=sess) - summary_writer = tf.summary.FileWriter(run_dir, sess.graph) - - steps_per_train_epoch = int(mdage['train_counts'] / FLAGS.batch_size) - num_steps = FLAGS.max_steps if FLAGS.epochs < 1 else FLAGS.epochs * steps_per_train_epoch - print('Requested number of steps [%d]' % num_steps) - - dataset = multiinputs(data_dir=os.path.join(FLAGS.data_dir, 'train.tfrecord'), batch_size=FLAGS.batch_size,train=True, num_epochs=FLAGS.epochs) - iterator = dataset.make_one_shot_iterator() - images0, agelabels0, genderlabels0 = iterator.get_next() - for step in range(num_steps): - start_time = time.time() - - images, agelabels_1, genderlabels_1 = sess.run([images0, agelabels0, genderlabels0]) - # images1 = (np.reshape(images,(FLAGS.batch_size,227,227,3))).astype(np.float32) - agelabels_1 = (np.reshape(agelabels_1, (FLAGS.batch_size))).astype(np.int32) - genderlabels_1 = (np.reshape(genderlabels_1, (FLAGS.batch_size))).astype(np.int32) - - _,totallossvalue, agelossvalue, genderlossvalue = sess.run([ agegendertrain_op, totallosses, agelosses, genderlosses], - feed_dict = {images_holder:images,agelabels_holder: agelabels_1, genderlabels_holder:genderlabels_1}) - duration = time.time() - start_time - - assert not np.isnan(agelossvalue), 'Model diverged with ageloss = NaN' - assert not np.isnan(genderlossvalue), 'Model diverged with genderloss = NaN' - assert not np.isnan(totallossvalue), 'Model diverged with totallossvalue= NaN' - - if step % 10 == 0: - num_examples_per_step = FLAGS.batch_size - examples_per_sec = num_examples_per_step / duration - sec_per_batch = float(duration) - - format_str = ('%s: step %d , ageloss= %.3f , genderloss= %.3f , totalloss= %.3f (%.1f examples/sec ; %.3f ' 'sec/batch)') - print(format_str % (datetime.now(), step, agelossvalue[0], genderlossvalue[0], totallossvalue, examples_per_sec, sec_per_batch)) - - # loss evaluated every 100 steps - if step % 100 == 0: - summary_str = sess.run(summary_op,feed_dict = {images_holder:images,agelabels_holder: agelabels_1, genderlabels_holder:genderlabels_1}) - summary_writer.add_summary(summary_str, step) - - if step % 1000 == 0 or (step+1) == num_steps: - saver.save(sess, checkpoint_path, global_step=step) - #mox.file.copy_parallel(FLAGS.model_dir, FLAGS.train_url) - - -if __name__ == '__main__': - os.environ['ASCEND_GLOBAL_LOG_LEVEL'] = "0" - - try: - tf.app.run() - finally: - print('''download log''') +""" Rude Carnie: Age and Gender Deep Learning with Tensorflow found at +https://github.com/dpressel/rude-carnie +""" +# ============================================================================== +# MIT License +# +# Modifications copyright (c) 2018 Image & Vision Computing Lab, Institute of Information Science, Academia Sinica +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# ============================================================================== +#!/usr/bin/env python +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +from npu_bridge.npu_init import * +from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig +from six.moves import xrange +from datetime import datetime +import time +import os +import numpy as np +import tensorflow as tf +from data import multiinputs +from model import select_model +import json +import re +import sys +code_dir = os.path.dirname(__file__) +work_path = os.path.join(code_dir ,'../') +sys.path.append(work_path) +import moxing as mox +from pdb import set_trace as bp + +LAMBDA = 0.01 +MOM = 0.9 + +tf.app.flags.DEFINE_boolean('multitask', True, 'Whether utilize multitask model') +tf.app.flags.DEFINE_string('model_type', 'LMTCNN-1-1','choose model structure. LMTCNN and mobilenet_multitask for multitask. inception, levi_hassner_bn and levi_hassner for singletask ') +tf.app.flags.DEFINE_string('class_type', '','select which single task to train (Age or Gender), only be utilized when multitask=False and choose single task model_type') + +tf.app.flags.DEFINE_string('pre_checkpoint_path', '','if specified, restore this pretrained model before beginning any training.') +tf.app.flags.DEFINE_string('data_dir','./tfrecord/train_val_test_per_fold_agegender/test_fold_is_0','training age and gender directory.') +tf.app.flags.DEFINE_string('model_dir','./models','store models before training') +tf.app.flags.DEFINE_boolean('log_device_placement', False,'Whether to log device placement.') + +tf.app.flags.DEFINE_integer('num_preprocess_threads', 4, 'Number of preprocessing threads') +tf.app.flags.DEFINE_string('optim', 'Momentum','Optimizer') +tf.app.flags.DEFINE_integer('image_size', 227, 'Image size') +tf.app.flags.DEFINE_float('eta', 0.01,'Learning rate') +tf.app.flags.DEFINE_float('pdrop', 0.,'Dropout probability') +tf.app.flags.DEFINE_integer('max_steps', 50000,'Number of iterations') +tf.app.flags.DEFINE_integer('steps_per_decay', 10000,'Number of steps before learning rate decay') +tf.app.flags.DEFINE_float('eta_decay_rate', 0.1, 'learning rate decay') +tf.app.flags.DEFINE_integer('epochs', -1,'Number of epochs') +tf.app.flags.DEFINE_integer('batch_size', 32,'Batch size') +tf.app.flags.DEFINE_string('checkpoint', 'checkpoint','Checkpoint name') +tf.app.flags.DEFINE_string('data_url', './dataset', "data_root") +tf.app.flags.DEFINE_string('train_url', './log', "output_root") +# inception_v3.ckpt +tf.app.flags.DEFINE_string('pre_model','False', 'checkpoint file') + +FLAGS = tf.app.flags.FLAGS + +def before(): + # 在ModelArts容器创建数据存放目录 + data_dir = "/cache/dataset" + if not os.path.exists(data_dir): + os.makedirs(data_dir) + # OBS数据拷贝到ModelArts容器内 + mox.file.copy_parallel(FLAGS.data_url, data_dir) + FLAGS.data_dir = '/cache/dataset/' + + + + +def exponential_staircase_decay(at_step=10000, decay_rate=0.1): + + print('decay [%f] every [%d] steps' % (decay_rate, at_step)) + + def _decay(lr, global_step): + return tf.train.exponential_decay(lr, global_step, at_step, decay_rate, staircase=True) + + return _decay + +def optimizer(optim, eta, loss_fn, at_step, decay_rate): + + global_step = tf.Variable(0, trainable=False) + optz = optim + if optim == 'Adadelta': + optz = lambda lr: tf.train.AdadeltaOptimizer(lr, 0.95, 1e-6) + lr_decay_fn = None + elif optim == 'Momentum': + optz = lambda lr: tf.train.MomentumOptimizer(lr, MOM) + lr_decay_fn = exponential_staircase_decay(at_step, decay_rate) + + return tf.contrib.layers.optimize_loss(loss_fn, global_step, eta, optz, clip_gradients=4., learning_rate_decay_fn=lr_decay_fn) + +def loss(logits, labels): + + labels = tf.cast(labels, tf.int32) + cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( + logits=logits, labels=labels, name='cross_entropy_per_example') + cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy') + tf.add_to_collection('losses', cross_entropy_mean) + losses = tf.get_collection('losses') + regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) + total_loss = cross_entropy_mean + LAMBDA * sum(regularization_losses) + tf.summary.scalar('tl (raw)', total_loss) + loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg') + loss_averages_op = loss_averages.apply(losses + [total_loss]) + for l in losses + [total_loss]: + tf.summary.scalar(l.op.name + ' (raw)', l) + tf.summary.scalar(l.op.name, loss_averages.average(l)) + with tf.control_dependencies([loss_averages_op]): + total_loss = tf.identity(total_loss) + + return total_loss + +def multiloss(agelogits, agelabels, genderlogits, genderlabels): + + agelabels = tf.cast(agelabels, tf.int32) + genderlabels = tf.cast(genderlabels, tf.int32) + + age_cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( + logits=agelogits, labels=agelabels, name='cross_entropy_per_example_age') + gender_cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( + logits=genderlogits, labels=genderlabels, name='cross_entropy_per_example_gender') + + age_cross_entropy_mean = tf.reduce_mean(age_cross_entropy, name='cross_entropy_age') + gender_cross_entropy_mean = tf.reduce_mean(gender_cross_entropy, name='cross_entropy_gender') + + tf.add_to_collection('agelosses', age_cross_entropy_mean) + tf.add_to_collection('genderlosses', gender_cross_entropy_mean) + + agelosses = tf.get_collection('agelosses') + genderlosses = tf.get_collection('genderlosses') + regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) + totallosses = age_cross_entropy_mean+gender_cross_entropy_mean+LAMBDA*sum(regularization_losses) + tf.summary.scalar('tl total (raw)', totallosses) + + loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg') + loss_averages_op = loss_averages.apply(agelosses+genderlosses+[totallosses]) + + for l in agelosses+genderlosses+[totallosses]: + tf.summary.scalar(l.op.name + '(raw)', l) + tf.summary.scalar(l.op.name , loss_averages.average(l)) + with tf.control_dependencies([loss_averages_op]): + totallosses=tf.identity(totallosses) + + return agelosses, genderlosses, totallosses + +def main(argv=None): + + if not os.path.exists(FLAGS.model_dir): + os.mkdir(FLAGS.model_dir) + folddirlist = FLAGS.data_dir.split(os.sep) + subdir = FLAGS.model_dir+os.sep+folddirlist[-2] + if not os.path.exists(subdir): + os.mkdir(subdir) + savemodeldir = subdir+os.sep+folddirlist[-1] + if not os.path.exists(savemodeldir): + os.mkdir(savemodeldir) + + if FLAGS.multitask: + + with tf.Graph().as_default(): + model_fn = select_model(FLAGS.model_type) + # Open the metadata file and figure out nlabels, and size of epoch + input_file_age = os.path.join(FLAGS.data_dir, 'mdage.json') + input_file_gender = os.path.join(FLAGS.data_dir, 'mdgender.json') + with open(input_file_age,'r') as fage: + mdage = json.load(fage) + with open(input_file_gender,'r') as fgender: + mdgender = json.load(fgender) + with tf.device('/cpu:0'): + images_holder = tf.placeholder(tf.float32, shape=[FLAGS.batch_size, 227, 227, 3]) + # agelabels_holder = tf.placeholder(tf.int64,shape = [FLAGS.batch_size],name = 'agelabels_holder') + # genderlabels_holder = tf.placeholder(tf.int64,shape = [FLAGS.batch_size],name = 'genderlabels_holder') + + agelabels_holder = tf.placeholder(tf.int32, shape=[FLAGS.batch_size]) + genderlabels_holder = tf.placeholder(tf.int32, shape=[FLAGS.batch_size]) + + agelogits, genderlogits = model_fn(mdage['nlabels'], images_holder, mdgender['nlabels'], images_holder,1 - FLAGS.pdrop, True) + agelosses, genderlosses, totallosses = multiloss(agelogits, agelabels_holder, genderlogits,genderlabels_holder) + agegendertrain_op = optimizer(FLAGS.optim, FLAGS.eta, totallosses, FLAGS.steps_per_decay,FLAGS.eta_decay_rate) + + saver = tf.train.Saver(tf.global_variables()) + summary_op = tf.summary.merge_all() + + # sess = tf.Session(config=npu_config_proto(config_proto=tf.ConfigProto(allow_soft_placement = True, log_device_placement=FLAGS.log_device_placement))) + config = tf.ConfigProto() + custom_op = config.graph_options.rewrite_options.custom_optimizers.add() + custom_op.name = "NpuOptimizer" + custom_op.parameter_map["use_off_line"].b = True + # custom_op.parameter_map["profiling_mode"].b = True + custom_op.parameter_map["precision_mode"].s = tf.compat.as_bytes("allow_mix_precision") + # custom_op.parameter_map['enable_data_pre_proc'].b = True + # custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"/cache/xingneng/","task_trace":"on","aicpu":"on","fp_point":"","bp_point":""}') + config.graph_options.rewrite_options.remapping = RewriterConfig.OFF # 关闭remap开关 + config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF + + sess = tf.Session(config=npu_config_proto(config_proto = config)) + tf.global_variables_initializer().run(session=sess) + + + tf.global_variables_initializer().run(session=sess) + + # fine-tune dp_multitask and mobilenet_multitask + if FLAGS.pre_checkpoint_path: + print('Trying to restore checkpoint from %s ' % FLAGS.pre_checkpoint_path) + + if FLAGS.model_type is 'LMTCNN': + all_variables = tf.get_collection(tf.GraphKeys.VARIABLES, scope="multitaskdpcnn") + elif FLAGS.model_type is 'mobilenet_multitask': + all_variables = tf.get_collection(tf.GraphKeys.VARIABLES, scope="MobileNetmultitask") + + age_variables = tf.get_collection(tf.GraphKeys.VARIABLES, scope="ageoutput") + gender_variables = tf.get_collection(tf.GraphKeys.VARIABLES, scope="genderoutput") + all_variables.extend(age_variables) + all_variables.extend(gender_variables) + restorer = tf.train.Saver(all_variables) + restorer.restore(sess, FLAGS.pre_checkpoint_path) + + print('%s: Pre-trained model restored from %s' % (datetime.now(), FLAGS.pre_checkpoint_path)) + + run_dir = '%s/%s-run-%d' %(savemodeldir, FLAGS.model_type, os.getpid()) + checkpoint_path = '%s/%s' % (run_dir, FLAGS.checkpoint) + if tf.gfile.Exists(run_dir) is False: + print('Creating %s' % run_dir) + tf.gfile.MakeDirs(run_dir) + + tf.train.write_graph(sess.graph_def, run_dir, 'agegendermodel.pb', as_text=True) + tf.train.start_queue_runners(sess=sess) + summary_writer = tf.summary.FileWriter(run_dir, sess.graph) + + steps_per_train_epoch = int(mdage['train_counts'] / FLAGS.batch_size) + num_steps = FLAGS.max_steps if FLAGS.epochs < 1 else FLAGS.epochs * steps_per_train_epoch + print('Requested number of steps [%d]' % num_steps) + + dataset = multiinputs(data_dir=os.path.join(FLAGS.data_dir, 'train.tfrecord'), batch_size=FLAGS.batch_size,train=True, num_epochs=FLAGS.epochs) + # iterator = dataset.make_one_shot_iterator() + iterator = dataset.make_initializable_iterator() + sess.run(iterator.initializer) + images0, agelabels0, genderlabels0 = iterator.get_next() + for step in range(num_steps): + start_time = time.time() + + images, agelabels_1, genderlabels_1 = sess.run([images0, agelabels0, genderlabels0]) + # images1 = (np.reshape(images,(FLAGS.batch_size,227,227,3))).astype(np.float32) + agelabels_1 = (np.reshape(agelabels_1, (FLAGS.batch_size))).astype(np.int32) + genderlabels_1 = (np.reshape(genderlabels_1, (FLAGS.batch_size))).astype(np.int32) + + _,totallossvalue, agelossvalue, genderlossvalue = sess.run([ agegendertrain_op, totallosses, agelosses, genderlosses], + feed_dict = {images_holder:images,agelabels_holder: agelabels_1, genderlabels_holder:genderlabels_1}) + duration = time.time() - start_time + + assert not np.isnan(agelossvalue), 'Model diverged with ageloss = NaN' + assert not np.isnan(genderlossvalue), 'Model diverged with genderloss = NaN' + assert not np.isnan(totallossvalue), 'Model diverged with totallossvalue= NaN' + + if step % 10 == 0: + num_examples_per_step = FLAGS.batch_size + examples_per_sec = num_examples_per_step / duration + sec_per_batch = float(duration) + + format_str = ('%s: step %d, ageloss= %.3f, genderloss= %.3f , totalloss= %.3f (%.1f examples/sec; %.3f ' 'sec/step)') + print(format_str % (datetime.now(), step, agelossvalue[0], genderlossvalue[0], totallossvalue, examples_per_sec, sec_per_batch)) + + # loss evaluated every 100 steps + if step % 100 == 0: + summary_str = sess.run(summary_op,feed_dict = {images_holder:images,agelabels_holder: agelabels_1, genderlabels_holder:genderlabels_1}) + summary_writer.add_summary(summary_str, step) + + if step % 1000 == 0 or (step+1) == num_steps: + saver.save(sess, checkpoint_path, global_step=step) + mox.file.copy_parallel(FLAGS.model_dir, FLAGS.train_url) + + +if __name__ == '__main__': + os.environ['ASCEND_GLOBAL_LOG_LEVEL'] = "0" + (npu_sess, npu_shutdown) = init_resource() + before() + try: + tf.app.run() + finally: + print('''download log''') + + shutdown_resource(npu_sess, npu_shutdown) + close_session(npu_sess)