diff --git a/tutorials/source_en/advanced_use/host_device_training.md b/tutorials/source_en/advanced_use/host_device_training.md index cb9ae8e6a5b113ee7faed9e5549319f0d0589d78..befc7740437de4e3f60ecaa714b85f44e68ced84 100644 --- a/tutorials/source_en/advanced_use/host_device_training.md +++ b/tutorials/source_en/advanced_use/host_device_training.md @@ -72,6 +72,12 @@ This tutorial introduces how to train [Wide&Deep](https://gitee.com/mindspore/mi ```python self.embedding_table = Parameter(Tensor(np.random.normal(loc=0.0, scale=0.01, size=[184968, 80]).astype(dtype=np_type)), name='V_l2', sparse_grad='V_l2') ``` + + In the same file, add the import information in the head: + + ```python + from mindspore import Tensor + ``` In the `construct` function of `class WideDeepModel(nn.Cell)` of file `src/wide_and_deep.py`, to adapt for sparse parameters, replace the return value as: @@ -103,10 +109,9 @@ This tutorial introduces how to train [Wide&Deep](https://gitee.com/mindspore/mi ## Training the Model -Use the script `script/run_auto_parallel_train.sh`, and run the command `bash run_auto_parallel_train.sh 1 1 DATASET RANK_TABLE_FILE MINDSPORE_HCCL_CONFIG_PATH`, +Use the script `script/run_auto_parallel_train.sh`. Run the command `bash run_auto_parallel_train.sh 1 1 DATASET RANK_TABLE_FILE`, where the first `1` is the number of accelerators, the second `1` is the number of epochs, `DATASET` is the path of dataset, -and `RANK_TABLE_FILE` and `MINDSPORE_HCCL_CONFIG_PATH` is the path of the above `rank_table_1p_0.json` file. - +and `RANK_TABLE_FILE` is the path of the above `rank_table_1p_0.json` file. The running log is in the directory of `device_0`, where `loss.log` contains every loss value of every step in the epoch: diff --git a/tutorials/source_zh_cn/advanced_use/host_device_training.md b/tutorials/source_zh_cn/advanced_use/host_device_training.md index 7e169c1bcaf2ff8f32c6186b40fca2bfef22cdcc..9dd8750a4666c0a548811fb3cf5f42bd57fd07bd 100644 --- a/tutorials/source_zh_cn/advanced_use/host_device_training.md +++ b/tutorials/source_zh_cn/advanced_use/host_device_training.md @@ -70,7 +70,11 @@ ```python self.embedding_table = Parameter(Tensor(np.random.normal(loc=0.0, scale=0.01, size=[184968, 80]).astype(dtype=np_type)), name='V_l2', sparse_grad='V_l2') ``` + 此外,需要在`src/wide_and_deep.py`的头文件引用中添加: + ```python + from mindspore import Tensor + ``` 在`src/wide_and_deep.py`文件的`class WideDeepModel(nn.Cell)`类的`construct`函数中,将函数的返回值替换为如下值,以适配参数的稀疏性: ``` @@ -101,9 +105,8 @@ ## 训练模型 -使用训练脚本`script/run_auto_parallel_train.sh`, -执行命令:`bash run_auto_parallel_train.sh 1 1 DATASET RANK_TABLE_FILE MINDSPORE_HCCL_CONFIG_PATH`, -其中第一个`1`表示用例使用的卡数,第二`1`表示训练的epoch数,`DATASET`是数据集所在路径,`RANK_TABLE_FILE`和`MINDSPORE_HCCL_CONFIG_PATH`为上述`rank_table_1p_0.json`文件所在路径。 +使用训练脚本`script/run_auto_parallel_train.sh`。执行命令:`bash run_auto_parallel_train.sh 1 1 DATASET RANK_TABLE_FILE`, +其中第一个`1`表示用例使用的卡数,第二`1`表示训练的epoch数,`DATASET`是数据集所在路径,`RANK_TABLE_FILE`为上述`rank_table_1p_0.json`文件所在路径。 运行日志保存在`device_0`目录下,其中`loss.log`保存一个epoch内中多个loss值,如下: