From 57642dd281899c22fccf7740e064eaf12ea91619 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E5=BC=A0=E6=B4=8B=E6=B4=8B?= <584244991@qq.com>
Date: Thu, 11 Aug 2022 03:34:41 +0000
Subject: [PATCH] update
TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow/README.md.
---
.../README.md | 331 ++++++++----------
1 file changed, 139 insertions(+), 192 deletions(-)
diff --git a/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow/README.md b/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow/README.md
index 067f718e2..c5eea9eb1 100644
--- a/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow/README.md
+++ b/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow/README.md
@@ -1,180 +1,170 @@
+- [基本信息](#基本信息.md)
+- [概述](#概述.md)
+- [训练环境准备](#训练环境准备.md)
+- [快速上手](#快速上手.md)
+- [迁移学习指导](#迁移学习指导.md)
+- [高级参考](#高级参考.md)
基本信息
-发布者(Publisher):Huawei
+**发布者(Publisher):Huawei**
-版本(Version):1.1
+**应用领域(Application Domain):Natural Language Processing**
-修改时间(Modified) :2022.7.19
+**版本(Version):1.1**
-大小(Size):74M
+**修改时间(Modified) :2022.8.11**
-框架(Framework):TensorFlow 1.15.0
+**大小(Size):104**
-模型格式(Model Format):ckpt
+**框架(Framework):TensorFlow_1.15**
-精度(Precision):Mixed
+**模型格式(Model Format):ckpt**
-处理器(Processor):昇腾910
+**精度(Precision):FP32**
-应用级别(Categories):Official
+**处理器(Processor):昇腾910**
+**应用级别(Categories):Official**
+**描述(Description):基于TensorFlow框架quantum_sample_learning处理网络训练代码**
概述
-Here we use byte as the unit of our model for quantum sample learning. In our case, the inputs to the language model are samples of bitstrings
-and the model is trained to predict the next bit given the bits observed so far,
-starting with a start of sequence token. We use a standard LSTM language model
-with a logistic output layer. To sample from the model, we input the start of
-sequence token, sample from the output distribution then input the result as
-the next timestep. This is repeated until the required number of samples is
-obtained.
+## 简述
-- 参考论文:[[2010.11983\] Learnability and Complexity of Quantum Samples (arxiv.org)](https://arxiv.org/abs/2010.11983)
+在这里,我们使用字节作为量子样本学习模型的单位。在我们的例子中,语言模型的输入是位字符串的样本,模型被训练为预测下一个位,给定到目前为止观察到的位,从序列令牌的开始。我们使用带有逻辑输出层的标准LSTM语言模型。要从模型中采样,我们输入序列标记的开始,从输出分布中采样,然后将结果作为下一个时间步输入。重复此操作,直到获得所需的样品数量。
-- 参考实现:https://github.com/google-research/google-research/tree/master/quantum_sample_learning
-
-- 适配昇腾 AI 处理器的实现:
-
- [TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow · Ascend/ModelZoo-TensorFlow - 码云 - 开源中国 (gitee.com)](https://gitee.com/ascend/ModelZoo-TensorFlow/tree/master/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow)
-
-
-- 通过Git获取对应commit\_id的代码方法如下:
+- 参考论文:
- ```
- git clone {repository_url} # 克隆仓库的代码
- cd {repository_name} # 切换到模型的代码仓目录
- git checkout {branch} # 切换到对应分支
- git reset --hard {commit_id} # 代码设置到对应的commit_id
- cd {code_path} # 切换到模型代码所在路径,若仓库下只有该模型,则无需切换
- ```
+ [https://arxiv.org/abs/2010.11983](Learnability and Complexity of Quantum Samples)
+- 参考实现:
+ https://github.com/google-research/google-research/tree/master/quantum_sample_learning
-## 默认配置
-
-- 训练超参
-
- - epoch:20
- - batch_size:64
- - learning_rate:0.001
- - num_qubits:12
- - rnn_units:256
-
+- 适配昇腾 AI 处理器的实现:
+ https://gitee.com/ascend/ModelZoo-TensorFlow/edit/master/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow
+- 通过Git获取对应commit\_id的代码方法如下:
+
+ git clone {repository_url} # 克隆仓库的代码
+ cd {repository_name} # 切换到模型的代码仓目录
+ git checkout {branch} # 切换到对应分支
+ git reset --hard {commit_id} # 代码设置到对应的commit_id
+ cd {code_path} # 切换到模型代码所在路径,若仓库下只有该模型,则无需切换
+
-训练环境准备
-
-1. 硬件环境准备请参见各硬件产品文档"[驱动和固件安装升级指南]( https://support.huawei.com/enterprise/zh/category/ai-computing-platform-pid-1557196528909)"。需要在硬件设备上安装与CANN版本配套的固件与驱动。
-
-2. requirements
-
- ```
- python==3.6
- absl-py
- cirq==0.8.0
- numpy==1.16.4
- scipy==1.2.1
- tensorflow==1.15
-
- Ascend: 1*Ascend 910
- CPU: 24vCPUs 96GiB
- ```
-
-
-
-## 快速上手
-
-- 数据集准备
- - 模型训练使用q12c0数据集
-
-
-
-## 模型训练
-
-- 单击“立即下载”,并选择合适的下载方式下载源码包。
-
-- 启动训练之前,首先要配置程序运行相关环境变量。
+## 默认配置
-- 单卡训练
+- 训练超参(单卡):
+ - Batch size: 64
+ - epoch:20
+ - learning_rate:0.001
+ - checkpoint_dir
+ - probabilities_path
+ - eval_samples=500000
+ - training_eval_samples=4000
+ - train_size=500000
- 1. 配置训练参数。
- 在`run_lm.py`中,配置checkpoint保存路径,请用户根据实际路径配置,参数如下所示:
+## 支持特性
- ```
- flags.DEFINE_string('checkpoint_dir', './checkpoint',
- 'Where to save checkpoints')
- ```
+| 特性列表 | 是否支持 |
+| ---------- | -------- |
+| 分布式训练 | 是 |
+| 混合精度 | 否 |
+| 数据并行 | 是 |
- 2. 启动训练。
- ```
- python run_lm.py
- ```
+## 混合精度训练
- 3. 在`evaluate.py`中配置进行验证的checkpoint路径
+昇腾910 AI处理器提供自动混合精度功能,可以针对全网中float32数据类型的算子,按照内置的优化策略,自动将部分float32的算子降低精度到float16,从而在精度损失很小的情况下提升系统性能并减少内存使用。
- ```
- flags.DEFINE_string('checkpoint_dir', './checkpoint',
- 'Where to save checkpoints')
- ```
+## 开启混合精度
- 4. 进行验证
+拉起脚本中,
- ```
- python evaluate.py
- ```
+```
+ ./train_full_1p.sh --help
+
+parameter explain:
+ --precision_mode #precision mode(allow_fp32_to_fp16/force_fp16/must_keep_origin_dtype/allow_mix_precision)
+ --data_path # dataset of training
+ --output_path # output of training
+ --train_steps # max_step for training
+ --train_epochs # max_epoch for training
+ --batch_size # batch size
+ -h/--help show help message
+```
+混合精度相关代码示例:
+ ```
+ precision_mode="allow_mix_precision"
-## 训练结果
+ ```
-论文
+训练环境准备
+- 硬件环境和运行环境准备请参见《[CANN软件安装指南](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373?category=installation-update)》
+- 运行以下命令安装依赖。
```
-Linear Fidelity: 0.982864
-Logistic Fidelity: 0.979632
-theoretical_linear_xeb: 1.018109
-theoretical_logistic_xeb: 1.006301
-linear_xeb: 0.982864
-logistic_xeb: 0.979632
-kl_div: 0.021964
+pip3 install requirements.txt
```
+说明:依赖配置文件requirements.txt文件位于模型的根目录
-GPU
+快速上手
-```
-Linear Fidelity: 1.006005
-Logistic Fidelity: 1.001702
-theoretical_linear_xeb: 1.015967
-theoretical_logistic_xeb: 1.005840
-linear_xeb: 1.006005
-logistic_xeb: 1.001702
-kl_div: 0.004158
-```
+## 数据集准备
-NPU
+1、数据集链接https://pan.baidu.com/s/1WAl4C_EnQi4wp6l684yI9w 提取码:unp5
-```
-Linear Fidelity: 1.058425
-Logistic Fidelity: 1.029696
-theoretical_linear_xeb: 1.020069
-theoretical_logistic_xeb: 1.006884
-linear_xeb: 1.058425
-logistic_xeb: 1.029696
-kl_div: 0.004556
-```
+2、quantum_sample_learning训练的模型及数据集可以参考"简述 -> 参考实现"
-+ 精度对比
- | | 论文 | GPU | NPU |
- | ------------ | -------- | -------- | -------- |
- | linear_xeb | 0.982864 | 1.006005 | 1.058425 |
- | logistic_xeb | 0.979632 | 1.001702 | 1.029696 |
+## 模型训练
-
+- 单击“立即下载”,并选择合适的下载方式下载源码包。
+- 开始训练。
+
+ - 启动训练之前,首先要配置程序运行相关环境变量。
+
+ 环境变量配置信息参见:
+
+ [Ascend 910训练平台环境变量设置](https://gitee.com/ascend/modelzoo/wikis/Ascend%20910%E8%AE%AD%E7%BB%83%E5%B9%B3%E5%8F%B0%E7%8E%AF%E5%A2%83%E5%8F%98%E9%87%8F%E8%AE%BE%E7%BD%AE?sort_id=3148819)
+
+ - 单卡训练
+
+
+ 1. 配置训练参数。
+
+ 在`run_lm.py`中,配置checkpoint保存路径,请用户根据实际路径配置,参数如下所示:
+
+ ```
+ flags.DEFINE_string('checkpoint_dir', './checkpoint',
+ 'Where to save checkpoints')
+ ```
+
+ 2. 启动训练。
+
+ ```
+ python run_lm.py
+ ```
+
+ 3. 在`evaluate.py`中配置进行验证的checkpoint路径
+
+ ```
+ flags.DEFINE_string('checkpoint_dir', './checkpoint',
+ 'Where to save checkpoints')
+ ```
+
+ 4. 进行验证
+
+ ```
+ python evaluate.py
+
+ ```
高级参考
@@ -194,71 +184,28 @@ Quantum Sample Learning
├─evaluate.py 模型评估程序入口
```
-
-
-## 脚本参数
+## 脚本参数
```
-flags.DEFINE_string('data_url', './dataset',
- 'Where to save datasets')
-flags.DEFINE_string('train_url', './output',
- 'Where to save Output')
-flags.DEFINE_string('checkpoint_dir', './checkpoint',
- 'Where to save checkpoints')
-flags.DEFINE_string('save_data', '', 'Where to generate data (optional).')
-flags.DEFINE_string('eval_sample_file', '',
- 'A file of samples to evaluate (optional).')
-flags.DEFINE_boolean(
- 'eval_has_separator', False,
- 'Set if the numbers in the samples are separated by spaces.')
-flags.DEFINE_integer('epochs', 20, 'Number of epochs to train.')
-flags.DEFINE_integer('eval_samples', 500000,
- 'Number of samples for evaluation.')
-flags.DEFINE_integer('training_eval_samples', 4000,
- 'Number of samples for evaluation during training.')
-flags.DEFINE_integer('num_qubits', 12, 'Number of qubits to be learnt')
-flags.DEFINE_integer('rnn_units', 256, 'Number of RNN hidden units.')
-flags.DEFINE_integer(
- 'num_moments', -2,
- 'If > 12, then use training data generated with this number of moments.')
-flags.DEFINE_integer('batch_size', 64, 'Batch size')
-flags.DEFINE_float('learning_rate', 0.001, 'Learning rate')
-flags.DEFINE_boolean('use_adamax', False,
- 'Use the Adamax optimizer.')
-flags.DEFINE_boolean('eval_during_training', False,
- 'Perform eval while training.')
-flags.DEFINE_float('kl_smoothing', 1, 'The KL smoothing factor.')
-flags.DEFINE_boolean(
- 'save_test_counts', False, 'Whether to save test counts distribution.')
-flags.DEFINE_string(
- 'probabilities_path', './data/q12c0.txt',
- 'The path of the theoretical distribution')
-flags.DEFINE_string(
- 'experimental_bitstrings_path',
- 'quantum_sample_learning/data/experimental_samples_q12c0d14.txt',
- 'The path of the experiment measurements')
-flags.DEFINE_integer('train_size', 500000, 'Training set size to generate')
-flags.DEFINE_boolean('use_theoretical_distribution', True,
- 'Use the theoretical bitstring distribution.')
-flags.DEFINE_integer(
- 'subset_parity_size', 0,
- 'size of the subset for reordering the bit strings according to the '
- 'parity defined by the bit string of length specified here')
-flags.DEFINE_boolean('random_subset', False,
- 'Randomly choose which subset of bits to '
- 'evaluate the subset parity on.')
-flags.DEFINE_boolean('porter_thomas', False,
- 'Sample from Poter-Thomas distribution')
+--checkpoint_dir, './checkpoint' #模型保存路径
+--save_data
+--eval_sample_file
+--eval_has_separator', False,
+--epochs, 20
+--eval_samples, 500000
+--training_eval_samples', 4000
+--num_qubits', 12, 'Number of qubits to be learnt
+--rnn_units', 256, 'Number of RNN hidden units
+--batch_size', 64, 'Batch size')
+--learning_rate', 0.001
+--save_test_counts, False
+--probabilities_path, './data/q12c0.txt' #数据集路径
+--experimental_bitstrings_path',quantum_sample_learning/data/experimental_samples_q12c0d14.txt'
+--train_size, 500000
+--random_subset', False
+--porter_thomas', False
```
+## 训练过程
-
-## 下载链接
-
-### 数据集下载
-
-链接:https://pan.baidu.com/s/1WAl4C_EnQi4wp6l684yI9w 提取码:unp5
-
-### checkpoint文件
-
-链接:https://pan.baidu.com/s/1wckJSk7sNv0HvFuzJKWSdA 提取码:s4yz
+通过“模型训练”中的训练指令启动单卡或者多卡训练。单卡和多卡通过运行不同脚本,支持单卡,8卡网络训练。模型存储路径为${cur_path}/output/$ASCEND_DEVICE_ID,包括训练的log以及checkpoints文件。以1卡训练为例,loss信息在文件${cur_path}/output/${ASCEND_DEVICE_ID}/train_${ASCEND_DEVICE_ID}.log中。
\ No newline at end of file
--
Gitee