From 57642dd281899c22fccf7740e064eaf12ea91619 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E5=BC=A0=E6=B4=8B=E6=B4=8B?= <584244991@qq.com> Date: Thu, 11 Aug 2022 03:34:41 +0000 Subject: [PATCH] update TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow/README.md. --- .../README.md | 331 ++++++++---------- 1 file changed, 139 insertions(+), 192 deletions(-) diff --git a/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow/README.md b/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow/README.md index 067f718e2..c5eea9eb1 100644 --- a/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow/README.md +++ b/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow/README.md @@ -1,180 +1,170 @@ +- [基本信息](#基本信息.md) +- [概述](#概述.md) +- [训练环境准备](#训练环境准备.md) +- [快速上手](#快速上手.md) +- [迁移学习指导](#迁移学习指导.md) +- [高级参考](#高级参考.md)

基本信息

-发布者(Publisher):Huawei +**发布者(Publisher):Huawei** -版本(Version):1.1 +**应用领域(Application Domain):Natural Language Processing** -修改时间(Modified) :2022.7.19 +**版本(Version):1.1** -大小(Size):74M +**修改时间(Modified) :2022.8.11** -框架(Framework):TensorFlow 1.15.0 +**大小(Size):104** -模型格式(Model Format):ckpt +**框架(Framework):TensorFlow_1.15** -精度(Precision):Mixed +**模型格式(Model Format):ckpt** -处理器(Processor):昇腾910 +**精度(Precision):FP32** -应用级别(Categories):Official +**处理器(Processor):昇腾910** +**应用级别(Categories):Official** +**描述(Description):基于TensorFlow框架quantum_sample_learning处理网络训练代码**

概述

-Here we use byte as the unit of our model for quantum sample learning. In our case, the inputs to the language model are samples of bitstrings -and the model is trained to predict the next bit given the bits observed so far, -starting with a start of sequence token. We use a standard LSTM language model -with a logistic output layer. To sample from the model, we input the start of -sequence token, sample from the output distribution then input the result as -the next timestep. This is repeated until the required number of samples is -obtained. +## 简述 -- 参考论文:[[2010.11983\] Learnability and Complexity of Quantum Samples (arxiv.org)](https://arxiv.org/abs/2010.11983) +在这里,我们使用字节作为量子样本学习模型的单位。在我们的例子中,语言模型的输入是位字符串的样本,模型被训练为预测下一个位,给定到目前为止观察到的位,从序列令牌的开始。我们使用带有逻辑输出层的标准LSTM语言模型。要从模型中采样,我们输入序列标记的开始,从输出分布中采样,然后将结果作为下一个时间步输入。重复此操作,直到获得所需的样品数量。 -- 参考实现:https://github.com/google-research/google-research/tree/master/quantum_sample_learning - -- 适配昇腾 AI 处理器的实现: - - [TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow · Ascend/ModelZoo-TensorFlow - 码云 - 开源中国 (gitee.com)](https://gitee.com/ascend/ModelZoo-TensorFlow/tree/master/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow) - - -- 通过Git获取对应commit\_id的代码方法如下: +- 参考论文: - ``` - git clone {repository_url} # 克隆仓库的代码 - cd {repository_name} # 切换到模型的代码仓目录 - git checkout {branch} # 切换到对应分支 - git reset --hard {commit_id} # 代码设置到对应的commit_id - cd {code_path} # 切换到模型代码所在路径,若仓库下只有该模型,则无需切换 - ``` + [https://arxiv.org/abs/2010.11983](Learnability and Complexity of Quantum Samples) +- 参考实现: + https://github.com/google-research/google-research/tree/master/quantum_sample_learning -## 默认配置 - -- 训练超参 - - - epoch:20 - - batch_size:64 - - learning_rate:0.001 - - num_qubits:12 - - rnn_units:256 - +- 适配昇腾 AI 处理器的实现: + https://gitee.com/ascend/ModelZoo-TensorFlow/edit/master/TensorFlow/contrib/nlp/quantum_sample_learning_ID2036_for_Tensorflow +- 通过Git获取对应commit\_id的代码方法如下: + + git clone {repository_url} # 克隆仓库的代码 + cd {repository_name} # 切换到模型的代码仓目录 + git checkout {branch} # 切换到对应分支 + git reset --hard {commit_id} # 代码设置到对应的commit_id + cd {code_path} # 切换到模型代码所在路径,若仓库下只有该模型,则无需切换 + -

训练环境准备

- -1. 硬件环境准备请参见各硬件产品文档"[驱动和固件安装升级指南]( https://support.huawei.com/enterprise/zh/category/ai-computing-platform-pid-1557196528909)"。需要在硬件设备上安装与CANN版本配套的固件与驱动。 - -2. requirements - - ``` - python==3.6 - absl-py - cirq==0.8.0 - numpy==1.16.4 - scipy==1.2.1 - tensorflow==1.15 - - Ascend: 1*Ascend 910 - CPU: 24vCPUs 96GiB - ``` - - - -## 快速上手 - -- 数据集准备 - - 模型训练使用q12c0数据集 - - - -## 模型训练 - -- 单击“立即下载”,并选择合适的下载方式下载源码包。 - -- 启动训练之前,首先要配置程序运行相关环境变量。 +## 默认配置 -- 单卡训练 +- 训练超参(单卡): + - Batch size: 64 + - epoch:20 + - learning_rate:0.001 + - checkpoint_dir + - probabilities_path + - eval_samples=500000 + - training_eval_samples=4000 + - train_size=500000 - 1. 配置训练参数。 - 在`run_lm.py`中,配置checkpoint保存路径,请用户根据实际路径配置,参数如下所示: +## 支持特性 - ``` - flags.DEFINE_string('checkpoint_dir', './checkpoint', - 'Where to save checkpoints') - ``` +| 特性列表 | 是否支持 | +| ---------- | -------- | +| 分布式训练 | 是 | +| 混合精度 | 否 | +| 数据并行 | 是 | - 2. 启动训练。 - ``` - python run_lm.py - ``` +## 混合精度训练 - 3. 在`evaluate.py`中配置进行验证的checkpoint路径 +昇腾910 AI处理器提供自动混合精度功能,可以针对全网中float32数据类型的算子,按照内置的优化策略,自动将部分float32的算子降低精度到float16,从而在精度损失很小的情况下提升系统性能并减少内存使用。 - ``` - flags.DEFINE_string('checkpoint_dir', './checkpoint', - 'Where to save checkpoints') - ``` +## 开启混合精度 - 4. 进行验证 +拉起脚本中, - ``` - python evaluate.py - ``` +``` + ./train_full_1p.sh --help + +parameter explain: + --precision_mode #precision mode(allow_fp32_to_fp16/force_fp16/must_keep_origin_dtype/allow_mix_precision) + --data_path # dataset of training + --output_path # output of training + --train_steps # max_step for training + --train_epochs # max_epoch for training + --batch_size # batch size + -h/--help show help message +``` +混合精度相关代码示例: + ``` + precision_mode="allow_mix_precision" -## 训练结果 + ``` -论文 +

训练环境准备

+- 硬件环境和运行环境准备请参见《[CANN软件安装指南](https://support.huawei.com/enterprise/zh/ascend-computing/cann-pid-251168373?category=installation-update)》 +- 运行以下命令安装依赖。 ``` -Linear Fidelity: 0.982864 -Logistic Fidelity: 0.979632 -theoretical_linear_xeb: 1.018109 -theoretical_logistic_xeb: 1.006301 -linear_xeb: 0.982864 -logistic_xeb: 0.979632 -kl_div: 0.021964 +pip3 install requirements.txt ``` +说明:依赖配置文件requirements.txt文件位于模型的根目录 -GPU +

快速上手

-``` -Linear Fidelity: 1.006005 -Logistic Fidelity: 1.001702 -theoretical_linear_xeb: 1.015967 -theoretical_logistic_xeb: 1.005840 -linear_xeb: 1.006005 -logistic_xeb: 1.001702 -kl_div: 0.004158 -``` +## 数据集准备 -NPU +1、数据集链接https://pan.baidu.com/s/1WAl4C_EnQi4wp6l684yI9w 提取码:unp5 -``` -Linear Fidelity: 1.058425 -Logistic Fidelity: 1.029696 -theoretical_linear_xeb: 1.020069 -theoretical_logistic_xeb: 1.006884 -linear_xeb: 1.058425 -logistic_xeb: 1.029696 -kl_div: 0.004556 -``` +2、quantum_sample_learning训练的模型及数据集可以参考"简述 -> 参考实现" -+ 精度对比 - | | 论文 | GPU | NPU | - | ------------ | -------- | -------- | -------- | - | linear_xeb | 0.982864 | 1.006005 | 1.058425 | - | logistic_xeb | 0.979632 | 1.001702 | 1.029696 | +## 模型训练 - +- 单击“立即下载”,并选择合适的下载方式下载源码包。 +- 开始训练。 + + - 启动训练之前,首先要配置程序运行相关环境变量。 + + 环境变量配置信息参见: + + [Ascend 910训练平台环境变量设置](https://gitee.com/ascend/modelzoo/wikis/Ascend%20910%E8%AE%AD%E7%BB%83%E5%B9%B3%E5%8F%B0%E7%8E%AF%E5%A2%83%E5%8F%98%E9%87%8F%E8%AE%BE%E7%BD%AE?sort_id=3148819) + + - 单卡训练 + + + 1. 配置训练参数。 + + 在`run_lm.py`中,配置checkpoint保存路径,请用户根据实际路径配置,参数如下所示: + + ``` + flags.DEFINE_string('checkpoint_dir', './checkpoint', + 'Where to save checkpoints') + ``` + + 2. 启动训练。 + + ``` + python run_lm.py + ``` + + 3. 在`evaluate.py`中配置进行验证的checkpoint路径 + + ``` + flags.DEFINE_string('checkpoint_dir', './checkpoint', + 'Where to save checkpoints') + ``` + + 4. 进行验证 + + ``` + python evaluate.py + + ```

高级参考

@@ -194,71 +184,28 @@ Quantum Sample Learning ├─evaluate.py 模型评估程序入口 ``` - - -## 脚本参数 +## 脚本参数 ``` -flags.DEFINE_string('data_url', './dataset', - 'Where to save datasets') -flags.DEFINE_string('train_url', './output', - 'Where to save Output') -flags.DEFINE_string('checkpoint_dir', './checkpoint', - 'Where to save checkpoints') -flags.DEFINE_string('save_data', '', 'Where to generate data (optional).') -flags.DEFINE_string('eval_sample_file', '', - 'A file of samples to evaluate (optional).') -flags.DEFINE_boolean( - 'eval_has_separator', False, - 'Set if the numbers in the samples are separated by spaces.') -flags.DEFINE_integer('epochs', 20, 'Number of epochs to train.') -flags.DEFINE_integer('eval_samples', 500000, - 'Number of samples for evaluation.') -flags.DEFINE_integer('training_eval_samples', 4000, - 'Number of samples for evaluation during training.') -flags.DEFINE_integer('num_qubits', 12, 'Number of qubits to be learnt') -flags.DEFINE_integer('rnn_units', 256, 'Number of RNN hidden units.') -flags.DEFINE_integer( - 'num_moments', -2, - 'If > 12, then use training data generated with this number of moments.') -flags.DEFINE_integer('batch_size', 64, 'Batch size') -flags.DEFINE_float('learning_rate', 0.001, 'Learning rate') -flags.DEFINE_boolean('use_adamax', False, - 'Use the Adamax optimizer.') -flags.DEFINE_boolean('eval_during_training', False, - 'Perform eval while training.') -flags.DEFINE_float('kl_smoothing', 1, 'The KL smoothing factor.') -flags.DEFINE_boolean( - 'save_test_counts', False, 'Whether to save test counts distribution.') -flags.DEFINE_string( - 'probabilities_path', './data/q12c0.txt', - 'The path of the theoretical distribution') -flags.DEFINE_string( - 'experimental_bitstrings_path', - 'quantum_sample_learning/data/experimental_samples_q12c0d14.txt', - 'The path of the experiment measurements') -flags.DEFINE_integer('train_size', 500000, 'Training set size to generate') -flags.DEFINE_boolean('use_theoretical_distribution', True, - 'Use the theoretical bitstring distribution.') -flags.DEFINE_integer( - 'subset_parity_size', 0, - 'size of the subset for reordering the bit strings according to the ' - 'parity defined by the bit string of length specified here') -flags.DEFINE_boolean('random_subset', False, - 'Randomly choose which subset of bits to ' - 'evaluate the subset parity on.') -flags.DEFINE_boolean('porter_thomas', False, - 'Sample from Poter-Thomas distribution') +--checkpoint_dir, './checkpoint' #模型保存路径 +--save_data +--eval_sample_file +--eval_has_separator', False, +--epochs, 20 +--eval_samples, 500000 +--training_eval_samples', 4000 +--num_qubits', 12, 'Number of qubits to be learnt +--rnn_units', 256, 'Number of RNN hidden units +--batch_size', 64, 'Batch size') +--learning_rate', 0.001 +--save_test_counts, False +--probabilities_path, './data/q12c0.txt' #数据集路径 +--experimental_bitstrings_path',quantum_sample_learning/data/experimental_samples_q12c0d14.txt' +--train_size, 500000 +--random_subset', False +--porter_thomas', False ``` +## 训练过程 - -## 下载链接 - -### 数据集下载 - -链接:https://pan.baidu.com/s/1WAl4C_EnQi4wp6l684yI9w 提取码:unp5 - -### checkpoint文件 - -链接:https://pan.baidu.com/s/1wckJSk7sNv0HvFuzJKWSdA 提取码:s4yz +通过“模型训练”中的训练指令启动单卡或者多卡训练。单卡和多卡通过运行不同脚本,支持单卡,8卡网络训练。模型存储路径为${cur_path}/output/$ASCEND_DEVICE_ID,包括训练的log以及checkpoints文件。以1卡训练为例,loss信息在文件${cur_path}/output/${ASCEND_DEVICE_ID}/train_${ASCEND_DEVICE_ID}.log中。 \ No newline at end of file -- Gitee