From 6f9f21c51855b83e8c923bba461dfcbf8d6d926e Mon Sep 17 00:00:00 2001 From: huan <3174348550@qq.com> Date: Tue, 3 Sep 2024 10:36:24 +0800 Subject: [PATCH] modify the error links --- README.md | 2 +- README_CN.md | 4 ++-- official/cv/CTPN/README.md | 2 +- official/cv/CTPN/README_CN.md | 2 +- official/cv/DeepText/README.md | 2 +- official/cv/DeepText/README_CN.md | 2 +- official/cv/Inception/inceptionv4/README.md | 2 +- official/cv/Inception/inceptionv4/README_CN.md | 2 +- official/cv/Inception/xception/README.md | 2 +- official/cv/Inception/xception/README_CN.md | 2 +- official/cv/ResNet/README.md | 2 +- official/cv/RetinaNet/README.md | 2 +- official/cv/RetinaNet/README_CN.md | 2 +- official/cv/Unet/README_CN.md | 2 +- official/cv/VGG/vgg16/README.md | 2 +- official/cv/VGG/vgg16/README_CN.md | 2 +- official/cv/VGG/vgg19/README.md | 2 +- official/cv/VGG/vgg19/README_CN.md | 2 +- official/cv/VIT/README_CN.md | 2 +- official/nlp/Pangu_alpha/README.md | 8 ++++---- official/nlp/Pangu_alpha/README_CN.md | 8 ++++---- research/cv/3D_DenseNet/README.md | 2 +- research/cv/3D_DenseNet/README_CN.md | 2 +- research/cv/AlignedReID++/README_CN.md | 2 +- research/cv/C3D/README.md | 2 +- research/cv/C3D/README_CN.md | 2 +- research/cv/EGnet/README_CN.md | 2 +- research/cv/LightCNN/README.md | 2 +- research/cv/LightCNN/README_CN.md | 2 +- research/cv/Unet3d/README.md | 2 +- research/cv/Unet3d/README_CN.md | 2 +- research/cv/cnnctc/README_CN.md | 4 ++-- research/cv/cspdarknet53/README.md | 2 +- research/cv/dlinknet/README_CN.md | 2 +- research/cv/east/README.md | 2 +- research/cv/googlenet/README_CN.md | 2 +- research/cv/hardnet/README_CN.md | 4 ++-- research/cv/inception_resnet_v2/README.md | 2 +- research/cv/inception_resnet_v2/README_CN.md | 2 +- research/cv/nas-fpn/README_CN.md | 2 +- research/cv/osnet/README.md | 2 +- research/cv/retinanet_resnet101/README.md | 2 +- research/cv/retinanet_resnet101/README_CN.md | 2 +- research/cv/retinanet_resnet152/README.md | 2 +- research/cv/retinanet_resnet152/README_CN.md | 2 +- research/cv/sphereface/README_CN.md | 2 +- research/cv/squeezenet/README.md | 2 +- research/cv/squeezenet1_1/README.md | 2 +- research/cv/textfusenet/README.md | 2 +- research/cv/textfusenet/README_CN.md | 2 +- research/cv/tinydarknet/README_CN.md | 2 +- research/cv/vnet/README_CN.md | 2 +- research/cv/wideresnet/README.md | 2 +- research/cv/wideresnet/README_CN.md | 2 +- research/nlp/mass/README.md | 4 ++-- research/nlp/mass/README_CN.md | 4 ++-- research/nlp/rotate/README_CN.md | 2 +- research/recommend/ncf/README.md | 4 ++-- 58 files changed, 70 insertions(+), 70 deletions(-) diff --git a/README.md b/README.md index b6ac17dec..c36c978a1 100644 --- a/README.md +++ b/README.md @@ -50,7 +50,7 @@ For more information about `MindSpore` framework, please refer to [FAQ](https:// - **Q: What is Some *RANK_TBAL_FILE* which mentioned in many models?** - **A**: *RANK_TABLE_FILE* is the config file of cluster on Ascend while running distributed training. For more information, you could refer to the generator [hccl_tools](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools) and [Parallel Distributed Training Example](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html) + **A**: *RANK_TABLE_FILE* is the config file of cluster on Ascend while running distributed training. For more information, you could refer to the generator [hccl_tools](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools) and [Parallel Distributed Training Example](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html) - **Q: How to run the scripts on Windows system?** diff --git a/README_CN.md b/README_CN.md index 0428a6864..a655cd380 100644 --- a/README_CN.md +++ b/README_CN.md @@ -50,11 +50,11 @@ MindSpore已获得Apache 2.0许可,请参见LICENSE文件。 - **Q: 一些模型描述中提到的*RANK_TABLE_FILE*文件,是什么?** - **A**: *RANK_TABLE_FILE*是一个Ascend环境上用于指定分布式集群信息的文件,更多信息可以参考生成工具[hccl_toos](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)和[分布式并行训练教程](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html) + **A**: *RANK_TABLE_FILE*是一个Ascend环境上用于指定分布式集群信息的文件,更多信息可以参考生成工具[hccl_toos](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)和[分布式并行训练教程](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html) - **Q: 如何使用多机多卡运行脚本** - **A**: 本仓内所提供的分布式(distribute)运行启动默认为单机多卡,如需多机多卡启动需要在单机多卡的基础上进行一定程度的适配,可参考[多机多卡分布式教程](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html#%E5%A4%9A%E6%9C%BA%E5%A4%9A%E5%8D%A1) + **A**: 本仓内所提供的分布式(distribute)运行启动默认为单机多卡,如需多机多卡启动需要在单机多卡的基础上进行一定程度的适配,可参考[多机多卡分布式教程](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html#%E5%A4%9A%E6%9C%BA%E5%A4%9A%E5%8D%A1) - **Q: 在windows环境上要怎么运行网络脚本?** diff --git a/official/cv/CTPN/README.md b/official/cv/CTPN/README.md index 47be063e7..e05fc84d6 100644 --- a/official/cv/CTPN/README.md +++ b/official/cv/CTPN/README.md @@ -246,7 +246,7 @@ imagenet_cfg = edict({ Then you can train it with ImageNet2012. > Notes: -> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. +> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. > > This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_distribute_train.sh` > diff --git a/official/cv/CTPN/README_CN.md b/official/cv/CTPN/README_CN.md index 5484d8145..bb04f79b0 100644 --- a/official/cv/CTPN/README_CN.md +++ b/official/cv/CTPN/README_CN.md @@ -234,7 +234,7 @@ imagenet_cfg = edict({ 然后,您可以使用ImageNet2012训练它。 > 注: -> RANK_TABLE_FILE文件,请参考[链接](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html)。如需获取设备IP,请点击[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。对于InceptionV4等大模型,最好导出外部环境变量`export HCCL_CONNECT_TIMEOUT=600`,将hccl连接检查时间从默认的120秒延长到600秒。否则,连接可能会超时,因为随着模型增大,编译时间也会增加。 +> RANK_TABLE_FILE文件,请参考[链接](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html)。如需获取设备IP,请点击[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。对于InceptionV4等大模型,最好导出外部环境变量`export HCCL_CONNECT_TIMEOUT=600`,将hccl连接检查时间从默认的120秒延长到600秒。否则,连接可能会超时,因为随着模型增大,编译时间也会增加。 > > 处理器绑核操作取决于`device_num`和总处理器数。如果不希望这样做,请删除`scripts/run_distribute_train.sh`中的`taskset`操作。 > diff --git a/official/cv/DeepText/README.md b/official/cv/DeepText/README.md index 73854bf83..42fffda7a 100644 --- a/official/cv/DeepText/README.md +++ b/official/cv/DeepText/README.md @@ -143,7 +143,7 @@ Here we used 4 datasets for training, and 1 datasets for Evaluation. ``` > Notes: -> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. +> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. > > This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_distribute_train.sh` > diff --git a/official/cv/DeepText/README_CN.md b/official/cv/DeepText/README_CN.md index 2a69543b7..604a3517a 100644 --- a/official/cv/DeepText/README_CN.md +++ b/official/cv/DeepText/README_CN.md @@ -133,7 +133,7 @@ InceptionV4的整体网络架构如下: ``` > 注: -> RANK_TABLE_FILE文件,请参考[链接](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html)。如需获取设备IP,请点击[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。对于InceptionV4等大模型,最好导出外部环境变量`export HCCL_CONNECT_TIMEOUT=600`,将hccl连接检查时间从默认的120秒延长到600秒。否则,连接可能会超时,因为随着模型增大,编译时间也会增加。 +> RANK_TABLE_FILE文件,请参考[链接](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html)。如需获取设备IP,请点击[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。对于InceptionV4等大模型,最好导出外部环境变量`export HCCL_CONNECT_TIMEOUT=600`,将hccl连接检查时间从默认的120秒延长到600秒。否则,连接可能会超时,因为随着模型增大,编译时间也会增加。 > > 处理器绑核操作取决于`device_num`和总处理器数。如果不希望这样做,请删除`scripts/run_distribute_train.sh`中的`taskset`操作。 > diff --git a/official/cv/Inception/inceptionv4/README.md b/official/cv/Inception/inceptionv4/README.md index 7b96795d6..df52f98bf 100644 --- a/official/cv/Inception/inceptionv4/README.md +++ b/official/cv/Inception/inceptionv4/README.md @@ -279,7 +279,7 @@ You can start training using python or shell scripts. The usage of shell scripts ``` > Notes: -> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. +> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. > > This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_distribute_train.sh` diff --git a/official/cv/Inception/inceptionv4/README_CN.md b/official/cv/Inception/inceptionv4/README_CN.md index 9478adfd0..9502b7626 100644 --- a/official/cv/Inception/inceptionv4/README_CN.md +++ b/official/cv/Inception/inceptionv4/README_CN.md @@ -267,7 +267,7 @@ train.py和config.py中的主要涉及如下参数: ``` > 注: -> 有关RANK_TABLE_FILE,可参考[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html)。设备IP可参考[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。对于像InceptionV4这样的大型模型,最好设置外部环境变量`export HCCL_CONNECT_TIMEOUT=600`,将hccl连接检查时间从默认的120秒延长到600秒。否则,可能会连接超时,因为编译时间会随着模型增大而增加。 +> 有关RANK_TABLE_FILE,可参考[链接](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html)。设备IP可参考[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。对于像InceptionV4这样的大型模型,最好设置外部环境变量`export HCCL_CONNECT_TIMEOUT=600`,将hccl连接检查时间从默认的120秒延长到600秒。否则,可能会连接超时,因为编译时间会随着模型增大而增加。 > > 绑核操作取决于`device_num`参数值及处理器总数。如果不需要,删除`scripts/run_distribute_train.sh`脚本中的`taskset`操作任务集即可。 diff --git a/official/cv/Inception/xception/README.md b/official/cv/Inception/xception/README.md index 28482d028..a0181e25a 100644 --- a/official/cv/Inception/xception/README.md +++ b/official/cv/Inception/xception/README.md @@ -189,7 +189,7 @@ You can start training using python or shell scripts. The usage of shell scripts bash run_infer_310.sh MINDIR_PATH DATA_PATH LABEL_FILE DEVICE_ID ``` -> Notes: RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html), and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). +> Notes: RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html), and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). ### Launch diff --git a/official/cv/Inception/xception/README_CN.md b/official/cv/Inception/xception/README_CN.md index 02cc8207b..f36fc15c1 100644 --- a/official/cv/Inception/xception/README_CN.md +++ b/official/cv/Inception/xception/README_CN.md @@ -189,7 +189,7 @@ Xception的整体网络架构如下: bash run_infer_310.sh MINDIR_PATH DATA_PATH LABEL_FILE DEVICE_ID ``` -> 注:RANK_TABLE_FILE可以参考[链接](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html),device_ip可以参考[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。 +> 注:RANK_TABLE_FILE可以参考[链接](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html),device_ip可以参考[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。 ### 启动 diff --git a/official/cv/ResNet/README.md b/official/cv/ResNet/README.md index 680ef11ae..26c078d21 100644 --- a/official/cv/ResNet/README.md +++ b/official/cv/ResNet/README.md @@ -480,7 +480,7 @@ bash run_eval_gpu_resnet_benchmark.sh [DATASET_PATH] [CKPT_PATH] [BATCH_SIZE](op For distributed training, a hostfile configuration needs to be created in advance. -Please follow the instructions in the link [GPU-Multi-Host](https://www.mindspore.cn/tutorials/experts/en/master/parallel/mpirun.html). +Please follow the instructions in the link [GPU-Multi-Host](https://www.mindspore.cn/docs/en/master/model_train/parallel/mpirun.html). #### Running parameter server mode training diff --git a/official/cv/RetinaNet/README.md b/official/cv/RetinaNet/README.md index 76e6bd72c..51b9f5503 100644 --- a/official/cv/RetinaNet/README.md +++ b/official/cv/RetinaNet/README.md @@ -208,7 +208,7 @@ bash scripts/run_single_train.sh DEVICE_ID MINDRECORD_DIR CONFIG_PATH PRE_TRAINE > Note: - For details about RANK_TABLE_FILE, see [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html). For details about how to obtain device IP address, see [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). + For details about RANK_TABLE_FILE, see [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html). For details about how to obtain device IP address, see [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). #### Running diff --git a/official/cv/RetinaNet/README_CN.md b/official/cv/RetinaNet/README_CN.md index 837efcf31..ee093f339 100644 --- a/official/cv/RetinaNet/README_CN.md +++ b/official/cv/RetinaNet/README_CN.md @@ -203,7 +203,7 @@ bash scripts/run_single_train.sh DEVICE_ID MINDRECORD_DIR CONFIG_PATH PRE_TRAINE > 注意: - RANK_TABLE_FILE相关参考资料见[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html), 获取device_ip方法详见[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。 + RANK_TABLE_FILE相关参考资料见[链接](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html), 获取device_ip方法详见[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。 #### 运行 diff --git a/official/cv/Unet/README_CN.md b/official/cv/Unet/README_CN.md index ecf29e430..cd3aea7da 100644 --- a/official/cv/Unet/README_CN.md +++ b/official/cv/Unet/README_CN.md @@ -607,7 +607,7 @@ bash ./scripts/run_eval_onnx.sh [DATASET_PATH] [ONNX_MODEL] [DEVICE_TARGET] [CON **推理前需参照 [MindSpore C++推理部署指南](https://gitee.com/mindspore/models/blob/master/utils/cpp_infer/README_CN.md) 进行环境变量设置。** -如果您需要使用训练好的模型在Ascend 910、Ascend 310等多个硬件平台上进行推理上进行推理,可参考此[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。下面是一个简单的操作步骤示例: +如果您需要使用训练好的模型在Ascend 910、Ascend 310等多个硬件平台上进行推理上进行推理,可参考此[链接](https://www.mindspore.cn/docs/zh-CN/master/model_infer/ms_infer/overview.html)。下面是一个简单的操作步骤示例: ### 继续训练预训练模型 diff --git a/official/cv/VGG/vgg16/README.md b/official/cv/VGG/vgg16/README.md index 163b06eca..931d40a70 100644 --- a/official/cv/VGG/vgg16/README.md +++ b/official/cv/VGG/vgg16/README.md @@ -530,7 +530,7 @@ train_parallel1/log:epcoh: 2 step: 97, loss is 1.7133579 ... ``` -> About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/tutorials/experts/en/master/parallel/overview.html). +> About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/docs/en/master/model_train/parallel/overview.html). > **Attention** This will bind the processor cores according to the `device_num` and total processor numbers. If you don't expect to run pretraining with binding processor cores, remove the operations about `taskset` in `scripts/run_distribute_train.sh` ##### Run vgg16 on GPU diff --git a/official/cv/VGG/vgg16/README_CN.md b/official/cv/VGG/vgg16/README_CN.md index a8fb19e75..12d1fae1c 100644 --- a/official/cv/VGG/vgg16/README_CN.md +++ b/official/cv/VGG/vgg16/README_CN.md @@ -530,7 +530,7 @@ train_parallel1/log:epcoh: 2 step: 97, loss is 1.7133579 ... ``` -> 关于rank_table.json,可以参考[分布式并行训练](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/overview.html)。 +> 关于rank_table.json,可以参考[分布式并行训练](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/overview.html)。 > **注意** 将根据`device_num`和处理器总数绑定处理器核。如果您不希望预训练中绑定处理器内核,请在`scripts/run_distribute_train.sh`脚本中移除`taskset`相关操作。 ##### GPU处理器环境运行VGG16 diff --git a/official/cv/VGG/vgg19/README.md b/official/cv/VGG/vgg19/README.md index d0221fc19..bd2a9724e 100644 --- a/official/cv/VGG/vgg19/README.md +++ b/official/cv/VGG/vgg19/README.md @@ -453,7 +453,7 @@ train_parallel1/log:epcoh: 2 step: 97, loss is 1.7133579 ... ``` -> About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/tutorials/experts/en/master/parallel/overview.html). +> About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/docs/en/master/model_train/parallel/overview.html). > **Attention** This will bind the processor cores according to the `device_num` and total processor numbers. If you don't expect to run pretraining with binding processor cores, remove the operations about `taskset` in `scripts/run_distribute_train.sh` ##### Run vgg19 on GPU diff --git a/official/cv/VGG/vgg19/README_CN.md b/official/cv/VGG/vgg19/README_CN.md index 5b046bb76..7d9b1710b 100644 --- a/official/cv/VGG/vgg19/README_CN.md +++ b/official/cv/VGG/vgg19/README_CN.md @@ -466,7 +466,7 @@ train_parallel1/log:epcoh: 2 step: 97, loss is 1.7133579 ... ``` -> 关于rank_table.json,可以参考[分布式并行训练](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/overview.html)。 +> 关于rank_table.json,可以参考[分布式并行训练](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/overview.html)。 > **注意** 将根据`device_num`和处理器总数绑定处理器核。如果您不希望预训练中绑定处理器内核,请在`scripts/run_distribute_train.sh`脚本中移除`taskset`相关操作。 ##### GPU处理器环境运行VGG19 diff --git a/official/cv/VIT/README_CN.md b/official/cv/VIT/README_CN.md index 601a9508a..120ac25c7 100644 --- a/official/cv/VIT/README_CN.md +++ b/official/cv/VIT/README_CN.md @@ -451,7 +451,7 @@ python export.py --config_path=[CONFIG_PATH] ### 推理 -如果您需要使用此训练模型在GPU、Ascend 910、Ascend 310等多个硬件平台上进行推理,可参考此[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。下面是操作步骤示例: +如果您需要使用此训练模型在GPU、Ascend 910、Ascend 310等多个硬件平台上进行推理,可参考此[链接](https://www.mindspore.cn/docs/zh-CN/master/model_infer/ms_infer/overview.html)。下面是操作步骤示例: - Ascend处理器环境运行 diff --git a/official/nlp/Pangu_alpha/README.md b/official/nlp/Pangu_alpha/README.md index c07ba9f07..118a38c42 100644 --- a/official/nlp/Pangu_alpha/README.md +++ b/official/nlp/Pangu_alpha/README.md @@ -51,7 +51,7 @@ with our parallel setting. We summarized the training tricks as following: 2. Pipeline Model Parallelism 3. Optimizer Model Parallelism -The above features can be found [here](https://www.mindspore.cn/tutorials/experts/en/master/parallel/overview.html). +The above features can be found [here](https://www.mindspore.cn/docs/en/master/model_train/parallel/overview.html). More amazing features are still under developing. The technical report and checkpoint file can be found [here](https://git.openi.org.cn/PCL-Platform.Intelligence/PanGu-AIpha). @@ -157,7 +157,7 @@ bash scripts/run_distribute_train.sh /data/pangu_30_step_ba64/ /root/hccl_8p.jso The above command involves some `args` described below: - DATASET: The path to the mindrecord files's parent directory . For example: `/home/work/mindrecord/`. -- RANK_TABLE: The details of the rank table can be found [here](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html). It's a json file describes the `device id`, `service ip` and `rank`. +- RANK_TABLE: The details of the rank table can be found [here](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html). It's a json file describes the `device id`, `service ip` and `rank`. - RANK_SIZE: The device number. This can be your total device numbers. For example, 8, 16, 32 ... - TYPE: The param init type. The parameters will be initialized with float32. Or you can replace it with `fp16`. This will save a little memory used on the device. - MODE: The configure mode. This mode will set the `hidden size` and `layers` to make the parameter number near 2.6 billions. The other mode can be `13B` (`hidden size` 5120 and `layers` 40, which needs at least 16 cards to train.) and `200B`. @@ -206,7 +206,7 @@ bash scripts/run_distribute_train_gpu.sh RANK_SIZE HOSTFILE DATASET PER_BATCH MO ``` - RANK_SIZE: The device number. This can be your total device numbers. For example, 8, 16, 32 ... -- HOSTFILE: It's a text file describes the host ip and its devices. Please see our [tutorial](https://www.mindspore.cn/tutorials/experts/en/master/parallel/mpirun.html) or [OpenMPI](https://www.open-mpi.org/) for more details. +- HOSTFILE: It's a text file describes the host ip and its devices. Please see our [tutorial](https://www.mindspore.cn/docs/en/master/model_train/parallel/mpirun.html) or [OpenMPI](https://www.open-mpi.org/) for more details. - DATASET: The path to the mindrecord files's parent directory . For example: `/home/work/mindrecord/`. - PER_BATCH: The batch size for each data parallel-way. - MODE: Can be `1.3B` `2.6B`, `13B` and `200B`. @@ -228,7 +228,7 @@ bash scripts/run_distribute_train_moe_host_device.sh DATASET RANK_TABLE RANK_SIZ The above command involves some `args` described below: - DATASET: The path to the mindrecord files's parent directory . For example: `/home/work/mindrecord/`. -- RANK_TABLE: The details of the rank table can be found [here](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html). It's a json file describes the `device id`, `service ip` and `rank`. +- RANK_TABLE: The details of the rank table can be found [here](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html). It's a json file describes the `device id`, `service ip` and `rank`. - RANK_SIZE: The device number. This can be your total device numbers. For example, 8, 16, 32 ... - TYPE: The param init type. The parameters will be initialized with float32. Or you can replace it with `fp16`. This will save a little memory used on the device. - MODE: The configure mode. This mode will set the `hidden size` and `layers` to make the parameter number near 2.6 billions. The other mode can be `13B` (`hidden size` 5120 and `layers` 40, which needs at least 16 cards to train.) and `200B`. diff --git a/official/nlp/Pangu_alpha/README_CN.md b/official/nlp/Pangu_alpha/README_CN.md index dde68a147..9e272de85 100644 --- a/official/nlp/Pangu_alpha/README_CN.md +++ b/official/nlp/Pangu_alpha/README_CN.md @@ -51,7 +51,7 @@ 2. 流水线模型并行 3. 优化器模型并行 -有关上述特性,请点击[此处](https://www.mindspore.cn/tutorials/experts/en/master/parallel/overview.html)查看详情。 +有关上述特性,请点击[此处](https://www.mindspore.cn/docs/en/master/model_train/parallel/overview.html)查看详情。 更多特性敬请期待。 详细技术报告和检查点文件,可点击[此处](https://git.openi.org.cn/PCL-Platform.Intelligence/PanGu-AIpha)查看。 @@ -156,7 +156,7 @@ bash scripts/run_distribute_train.sh /data/pangu_30_step_ba64/ /root/hccl_8p.jso 上述命令涉及以下`args`: - DATASET:mindrecord文件父目录的路径。例如:`/home/work/mindrecord/`。 -- RANK_TABLE:rank table的详细信息,请点击[此处](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html)查看。该.json文件描述了`device id`、`service ip`和`rank`。 +- RANK_TABLE:rank table的详细信息,请点击[此处](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html)查看。该.json文件描述了`device id`、`service ip`和`rank`。 - RANK_SIZE:设备编号,也可以表示设备总数。例如,8、16、32 ... - TYPE:参数初始化类型。参数使用单精度(FP32) 或半精度(FP16)初始化。可以节省设备占用内存。 - MODE:配置模式。通过设置`hidden size`和`layers`,将参数量增至26亿。还可以选择13B(`hidden size`为5120和`layers`为40,训练至少需要16卡)和200B模式。 @@ -205,7 +205,7 @@ bash scripts/run_distribute_train_gpu.sh RANK_SIZE HOSTFILE DATASET PER_BATCH MO ``` - RANK_SIZE:设备编号,也可以表示设备总数。例如,8、16、32 ... -- HOSTFILE:描述主机IP及其设备的文本文件。有关更多详细信息,请参见我们的[教程](https://www.mindspore.cn/tutorials/experts/en/master/parallel/mpirun.html) or [OpenMPI](https://www.open-mpi.org/)。 +- HOSTFILE:描述主机IP及其设备的文本文件。有关更多详细信息,请参见我们的[教程](https://www.mindspore.cn/docs/en/master/model_train/parallel/mpirun.html) or [OpenMPI](https://www.open-mpi.org/)。 - DATASET:mindrecord文件父目录的路径。例如:`/home/work/mindrecord/`。 - PER_BATCH:每个数据并行的批处理大小, - MODE:可以是`1.3B`、`2.6B`、`13B`或`200B`。 @@ -227,7 +227,7 @@ bash scripts/run_distribute_train_moe_host_device.sh DATASET RANK_TABLE RANK_SIZ 上述命令涉及以下args: - DATASET:mindrecord文件父目录的路径。例如:`/home/work/mindrecord/`。 -- RANK_TABLE:rank table的详细信息,请点击[此处](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html)查看。该.json文件描述了device id、service ip和rank。 +- RANK_TABLE:rank table的详细信息,请点击[此处](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html)查看。该.json文件描述了device id、service ip和rank。 - RANK_SIZE:设备编号,也可以是您的设备总数。例如,8、16、32 ... - TYPE:参数初始化类型。参数使用单精度(FP32) 或半精度(FP16)初始化。可以节省设备占用内存。 - MODE:配置模式。通过设置`hidden size`和`layers`,将参数量增至26亿。还可以选择`13B`(`hidden size`为5120和`layers`为40,训练至少需要16卡)和`200B`模式。 diff --git a/research/cv/3D_DenseNet/README.md b/research/cv/3D_DenseNet/README.md index a7d502906..1b8a78ffe 100644 --- a/research/cv/3D_DenseNet/README.md +++ b/research/cv/3D_DenseNet/README.md @@ -222,7 +222,7 @@ Dice Coefficient (DC) for 9th subject (9 subjects for training and 1 subject for |-------------------|:-------------------:|:---------------------:|:-----:|:--------------:| |3D-SkipDenseSeg | 93.66| 90.80 | 90.65 | 91.70 | -Notes: RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools) For large models like InceptionV4, it's better to export an external environment variable export HCCL_CONNECT_TIMEOUT=600 to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. To avoid ops error,you should change the code like below: +Notes: RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools) For large models like InceptionV4, it's better to export an external environment variable export HCCL_CONNECT_TIMEOUT=600 to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. To avoid ops error,you should change the code like below: in train.py: diff --git a/research/cv/3D_DenseNet/README_CN.md b/research/cv/3D_DenseNet/README_CN.md index b46f10699..022157600 100644 --- a/research/cv/3D_DenseNet/README_CN.md +++ b/research/cv/3D_DenseNet/README_CN.md @@ -212,7 +212,7 @@ bash run_eval.sh 3D-DenseSeg-20000_36.ckpt data/data_val |-------------------|:-------------------:|:---------------------:|:-----:|:--------------:| |3D-SkipDenseSeg | 93.66| 90.80 | 90.65 | 91.70 | -Notes: 分布式训练需要一个RANK_TABLE_FILE,文件的删除方式可以参考该链接[Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html) ,device_ip的设置参考该链接 [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools) 对于像InceptionV4这样的大模型来说, 最好导出一个外部环境变量,export HCCL_CONNECT_TIMEOUT=600,以将hccl连接检查时间从默认的120秒延长到600秒。否则,连接可能会超时,因为编译时间会随着模型大小的增加而增加。在1.3.0版本下,3D算子可能存在一些问题,您可能需要更改context.set_auto_parallel_context的部分代码: +Notes: 分布式训练需要一个RANK_TABLE_FILE,文件的删除方式可以参考该链接[Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html) ,device_ip的设置参考该链接 [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools) 对于像InceptionV4这样的大模型来说, 最好导出一个外部环境变量,export HCCL_CONNECT_TIMEOUT=600,以将hccl连接检查时间从默认的120秒延长到600秒。否则,连接可能会超时,因为编译时间会随着模型大小的增加而增加。在1.3.0版本下,3D算子可能存在一些问题,您可能需要更改context.set_auto_parallel_context的部分代码: in train.py: diff --git a/research/cv/AlignedReID++/README_CN.md b/research/cv/AlignedReID++/README_CN.md index ff8c6ae81..559a94c11 100644 --- a/research/cv/AlignedReID++/README_CN.md +++ b/research/cv/AlignedReID++/README_CN.md @@ -405,7 +405,7 @@ market1501上评估AlignedReID++ ### 推理 -如果您需要使用此训练模型在GPU、Ascend 910、Ascend 310等多个硬件平台上进行推理,可参考此[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。下面是操作步骤示例: +如果您需要使用此训练模型在GPU、Ascend 910、Ascend 310等多个硬件平台上进行推理,可参考此[链接](https://www.mindspore.cn/docs/zh-CN/master/model_infer/ms_infer/overview.html)。下面是操作步骤示例: 在进行推理之前我们需要先导出模型,mindir可以在本地环境上导出。batch_size默认为1。 diff --git a/research/cv/C3D/README.md b/research/cv/C3D/README.md index 3c710bdfc..6102a83e4 100644 --- a/research/cv/C3D/README.md +++ b/research/cv/C3D/README.md @@ -465,7 +465,7 @@ The above shell script will run distribute training in the background. You can v #### Distributed training on Ascend > Notes: -> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. +> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. > ```text diff --git a/research/cv/C3D/README_CN.md b/research/cv/C3D/README_CN.md index 574da58dc..7678cd917 100644 --- a/research/cv/C3D/README_CN.md +++ b/research/cv/C3D/README_CN.md @@ -456,7 +456,7 @@ bash run_standalone_train_gpu.sh [CONFIG_PATH] [DEVICE_ID] #### Ascend分布式训练 > 注: -> RANK_TABLE_FILE文件,请参考[链接](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html)。如需获取设备IP,请点击[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。对于InceptionV4等大模型,最好导出外部环境变量`export HCCL_CONNECT_TIMEOUT=600`,将hccl连接检查时间从默认的120秒延长到600秒。否则,连接可能会超时,因为随着模型增大,编译时间也会增加。 +> RANK_TABLE_FILE文件,请参考[链接](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html)。如需获取设备IP,请点击[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。对于InceptionV4等大模型,最好导出外部环境变量`export HCCL_CONNECT_TIMEOUT=600`,将hccl连接检查时间从默认的120秒延长到600秒。否则,连接可能会超时,因为随着模型增大,编译时间也会增加。 > ```text diff --git a/research/cv/EGnet/README_CN.md b/research/cv/EGnet/README_CN.md index fcd0d32c4..e486c8f6c 100644 --- a/research/cv/EGnet/README_CN.md +++ b/research/cv/EGnet/README_CN.md @@ -363,7 +363,7 @@ bash run_standalone_train_gpu.sh bash run_distribute_train.sh 8 [RANK_TABLE_FILE] ``` -线下运行分布式训练请参照[rank table启动](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html) +线下运行分布式训练请参照[rank table启动](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html) - 线上modelarts分布式训练 diff --git a/research/cv/LightCNN/README.md b/research/cv/LightCNN/README.md index 409d83b0e..c2de4a523 100644 --- a/research/cv/LightCNN/README.md +++ b/research/cv/LightCNN/README.md @@ -139,7 +139,7 @@ reduce precision" to view the operators with reduced precision. - Generate config json file for 8-card training - [Simple tutorial](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools) - For detailed configuration method, please refer to - the [rank table Startup](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html). + the [rank table Startup](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html). # [Quick start](#Quickstart) diff --git a/research/cv/LightCNN/README_CN.md b/research/cv/LightCNN/README_CN.md index 94e9e5837..d363114b8 100644 --- a/research/cv/LightCNN/README_CN.md +++ b/research/cv/LightCNN/README_CN.md @@ -107,7 +107,7 @@ LightCNN适用于有大量噪声的人脸识别数据集,提出了maxout 的 - [MindSpore Python API](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.html) - 生成config json文件用于8卡训练。 - [简易教程](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools) - - 详细配置方法请参照[rank table启动](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html)。 + - 详细配置方法请参照[rank table启动](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html)。 # 快速入门 diff --git a/research/cv/Unet3d/README.md b/research/cv/Unet3d/README.md index 203b77eb1..4e823c207 100644 --- a/research/cv/Unet3d/README.md +++ b/research/cv/Unet3d/README.md @@ -312,7 +312,7 @@ After training, you'll get some checkpoint files under the `train_parallel_fp[32 #### Distributed training on Ascend > Notes: -> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. +> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. > ```shell diff --git a/research/cv/Unet3d/README_CN.md b/research/cv/Unet3d/README_CN.md index f1211bd14..1c27edd34 100644 --- a/research/cv/Unet3d/README_CN.md +++ b/research/cv/Unet3d/README_CN.md @@ -312,7 +312,7 @@ bash ./run_distribute_train_gpu_fp16.sh /path_prefix/LUNA16/train #### 在Ascend上进行分布式训练 > 注: -> RANK_TABLE_FILE参考[链接](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html),device_ip参考[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。对于像InceptionV4这样的大模型,最好导出外部环境变量`export HCCL_CONNECT_TIMEOUT=600`,将HCCL连接检查时间从默认的120秒延长到600秒。否则,连接可能会超时,因为编译时间会随着模型大小的增长而增加。 +> RANK_TABLE_FILE参考[链接](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html),device_ip参考[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)。对于像InceptionV4这样的大模型,最好导出外部环境变量`export HCCL_CONNECT_TIMEOUT=600`,将HCCL连接检查时间从默认的120秒延长到600秒。否则,连接可能会超时,因为编译时间会随着模型大小的增长而增加。 > ```shell diff --git a/research/cv/cnnctc/README_CN.md b/research/cv/cnnctc/README_CN.md index 54b9eb6a1..1454105dd 100644 --- a/research/cv/cnnctc/README_CN.md +++ b/research/cv/cnnctc/README_CN.md @@ -261,7 +261,7 @@ bash scripts/run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_CKPT(o > 注意: - RANK_TABLE_FILE相关参考资料见[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html), 获取device_ip方法详见[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). + RANK_TABLE_FILE相关参考资料见[链接](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html), 获取device_ip方法详见[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). ### 训练结果 @@ -485,7 +485,7 @@ accuracy: 0.8427 ### 推理 -如果您需要在GPU、Ascend 910、Ascend 310等多个硬件平台上使用训练好的模型进行推理,请参考此[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。以下为简单示例: +如果您需要在GPU、Ascend 910、Ascend 310等多个硬件平台上使用训练好的模型进行推理,请参考此[链接](https://www.mindspore.cn/docs/zh-CN/master/model_infer/ms_infer/overview.html)。以下为简单示例: - Ascend处理器环境运行 diff --git a/research/cv/cspdarknet53/README.md b/research/cv/cspdarknet53/README.md index f5130a3b1..5ddf567a9 100644 --- a/research/cv/cspdarknet53/README.md +++ b/research/cv/cspdarknet53/README.md @@ -206,7 +206,7 @@ bash run_distribute_train.sh [RANK_TABLE_FILE] [DATA_DIR] (option)[PATH_CHECKPOI bash run_standalone_train.sh [DEVICE_ID] [DATA_DIR] (option)[PATH_CHECKPOINT] ``` -> Notes: RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html), and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV3, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. +> Notes: RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html), and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV3, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. > > This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_distribute_train.sh` diff --git a/research/cv/dlinknet/README_CN.md b/research/cv/dlinknet/README_CN.md index 139e0b13d..86a13f42c 100644 --- a/research/cv/dlinknet/README_CN.md +++ b/research/cv/dlinknet/README_CN.md @@ -333,7 +333,7 @@ bash scripts/run_distribute_gpu_train.sh [DATASET] [CONFIG_PATH] [DEVICE_NUM] [C #### 推理 -如果您需要使用训练好的模型在Ascend 910、Ascend 310等多个硬件平台上进行推理上进行推理,可参考此[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。下面是一个简单的操作步骤示例: +如果您需要使用训练好的模型在Ascend 910、Ascend 310等多个硬件平台上进行推理上进行推理,可参考此[链接](https://www.mindspore.cn/docs/zh-CN/master/model_infer/ms_infer/overview.html)。下面是一个简单的操作步骤示例: ##### Ascend 310环境运行 diff --git a/research/cv/east/README.md b/research/cv/east/README.md index f9b125a59..bfe3231db 100644 --- a/research/cv/east/README.md +++ b/research/cv/east/README.md @@ -134,7 +134,7 @@ bash run_eval_gpu.sh [DATASET_PATH] [CKPT_PATH] [DEVICE_ID] ``` > Notes: -> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. +> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. > > This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_distribute_train.sh` > diff --git a/research/cv/googlenet/README_CN.md b/research/cv/googlenet/README_CN.md index 1c60a9eba..883104b4f 100644 --- a/research/cv/googlenet/README_CN.md +++ b/research/cv/googlenet/README_CN.md @@ -598,7 +598,7 @@ python export.py --config_path [CONFIG_PATH] ### 推理 -如果您需要使用此训练模型在GPU、Ascend 910、Ascend 310等多个硬件平台上进行推理,可参考此[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。下面是操作步骤示例: +如果您需要使用此训练模型在GPU、Ascend 910、Ascend 310等多个硬件平台上进行推理,可参考此[链接](https://www.mindspore.cn/docs/zh-CN/master/model_infer/ms_infer/overview.html)。下面是操作步骤示例: - Ascend处理器环境运行 diff --git a/research/cv/hardnet/README_CN.md b/research/cv/hardnet/README_CN.md index 65b6a7d2b..44f397ae0 100644 --- a/research/cv/hardnet/README_CN.md +++ b/research/cv/hardnet/README_CN.md @@ -449,7 +449,7 @@ bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [DEVICE_ID] ### 推理 -如果您需要使用此训练模型在Ascend 910上进行推理,可参考此[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。下面是操作步骤示例: +如果您需要使用此训练模型在Ascend 910上进行推理,可参考此[链接](https://www.mindspore.cn/docs/zh-CN/master/model_infer/ms_infer/overview.html)。下面是操作步骤示例: - Ascend处理器环境运行 @@ -486,7 +486,7 @@ bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [DEVICE_ID] print("==============Acc: {} ==============".format(acc)) ``` -如果您需要使用此训练模型在GPU上进行推理,可参考此[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。下面是操作步骤示例: +如果您需要使用此训练模型在GPU上进行推理,可参考此[链接](https://www.mindspore.cn/docs/zh-CN/master/model_infer/ms_infer/overview.html)。下面是操作步骤示例: - GPU处理器环境运行 diff --git a/research/cv/inception_resnet_v2/README.md b/research/cv/inception_resnet_v2/README.md index 4ef422773..d7a335067 100644 --- a/research/cv/inception_resnet_v2/README.md +++ b/research/cv/inception_resnet_v2/README.md @@ -122,7 +122,7 @@ bash scripts/run_standalone_train_ascend.sh DEVICE_ID DATA_DIR ``` > Notes: -> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. +> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. > > This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_distribute_train.sh` diff --git a/research/cv/inception_resnet_v2/README_CN.md b/research/cv/inception_resnet_v2/README_CN.md index 70943fca3..2358f9ae4 100644 --- a/research/cv/inception_resnet_v2/README_CN.md +++ b/research/cv/inception_resnet_v2/README_CN.md @@ -133,7 +133,7 @@ bash scripts/run_distribute_train_ascend.sh RANK_TABLE_FILE DATA_DIR bash scripts/run_standalone_train_ascend.sh DEVICE_ID DATA_DIR ``` -> 注:RANK_TABLE_FILE可参考[链接]( https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html)。device_ip可以通过[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)获取 +> 注:RANK_TABLE_FILE可参考[链接]( https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html)。device_ip可以通过[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools)获取 - GPU: diff --git a/research/cv/nas-fpn/README_CN.md b/research/cv/nas-fpn/README_CN.md index 83c05f1e5..d0a1a21fe 100644 --- a/research/cv/nas-fpn/README_CN.md +++ b/research/cv/nas-fpn/README_CN.md @@ -161,7 +161,7 @@ bash scripts/run_single_train.sh DEVICE_ID MINDRECORD_DIR PRE_TRAINED(optional) ``` > 注意: -RANK_TABLE_FILE相关参考资料见[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html), 获取device_ip方法详见[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). +RANK_TABLE_FILE相关参考资料见[链接](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html), 获取device_ip方法详见[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). #### 运行 diff --git a/research/cv/osnet/README.md b/research/cv/osnet/README.md index 2bbafb479..6303d8f5a 100644 --- a/research/cv/osnet/README.md +++ b/research/cv/osnet/README.md @@ -160,7 +160,7 @@ bash run_eval_ascend.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID] ``` > Notes: -> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. +> RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size. > > This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_train_distribute_ascend.sh` > diff --git a/research/cv/retinanet_resnet101/README.md b/research/cv/retinanet_resnet101/README.md index 67bef72b1..5df618a2f 100644 --- a/research/cv/retinanet_resnet101/README.md +++ b/research/cv/retinanet_resnet101/README.md @@ -287,7 +287,7 @@ bash run_distribute_train.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] [RANK_TABL bash run_single_train.sh [DEVICE_ID] [EPOCH_SIZE] [LR] [DATASET] [PRE_TRAINED](optional) [PRE_TRAINED_EPOCH_SIZE](optional) ``` -> Note: RANK_TABLE_FILE related reference materials see in this [link](https://www.mindspore.cn/tutorials/experts/en/master/parallel/rank_table.html), for details on how to get device_ip check this [link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). +> Note: RANK_TABLE_FILE related reference materials see in this [link](https://www.mindspore.cn/docs/en/master/model_train/parallel/rank_table.html), for details on how to get device_ip check this [link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). - GPU diff --git a/research/cv/retinanet_resnet101/README_CN.md b/research/cv/retinanet_resnet101/README_CN.md index 9b99e8518..c237d8cf0 100644 --- a/research/cv/retinanet_resnet101/README_CN.md +++ b/research/cv/retinanet_resnet101/README_CN.md @@ -292,7 +292,7 @@ bash run_distribute_train.sh [DEVICE_NUM] [EPOCH_SIZE] [LR] [DATASET] [RANK_TABL bash run_single_train.sh [DEVICE_ID] [EPOCH_SIZE] [LR] [DATASET] [PRE_TRAINED](optional) [PRE_TRAINED_EPOCH_SIZE](optional) ``` -> 注意: RANK_TABLE_FILE相关参考资料见[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html), 获取device_ip方法详见[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). +> 注意: RANK_TABLE_FILE相关参考资料见[链接](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html), 获取device_ip方法详见[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). - GPU diff --git a/research/cv/retinanet_resnet152/README.md b/research/cv/retinanet_resnet152/README.md index c64afc484..6ef8d5d6f 100644 --- a/research/cv/retinanet_resnet152/README.md +++ b/research/cv/retinanet_resnet152/README.md @@ -291,7 +291,7 @@ bash run_distribute_train.sh DEVICE_NUM EPOCH_SIZE LR DATASET RANK_TABLE_FILE PR bash run_distribute_train.sh DEVICE_ID EPOCH_SIZE LR DATASET PRE_TRAINED(optional) PRE_TRAINED_EPOCH_SIZE(optional) ``` -> Note: RANK_TABLE_FILE related reference materials see in this [link](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html), +> Note: RANK_TABLE_FILE related reference materials see in this [link](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html), > for details on how to get device_ip check this [link](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). - GPU: diff --git a/research/cv/retinanet_resnet152/README_CN.md b/research/cv/retinanet_resnet152/README_CN.md index 9120e7a59..8d9ae8df6 100644 --- a/research/cv/retinanet_resnet152/README_CN.md +++ b/research/cv/retinanet_resnet152/README_CN.md @@ -285,7 +285,7 @@ bash run_distribute_train.sh DEVICE_NUM EPOCH_SIZE LR DATASET RANK_TABLE_FILE PR bash run_distribute_train.sh DEVICE_ID EPOCH_SIZE LR DATASET PRE_TRAINED(optional) PRE_TRAINED_EPOCH_SIZE(optional) ``` -> 注意: RANK_TABLE_FILE相关参考资料见[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html), +> 注意: RANK_TABLE_FILE相关参考资料见[链接](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html), > 获取device_ip方法详见[链接](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools). - GPU: diff --git a/research/cv/sphereface/README_CN.md b/research/cv/sphereface/README_CN.md index 8946eace2..8a24fde6e 100644 --- a/research/cv/sphereface/README_CN.md +++ b/research/cv/sphereface/README_CN.md @@ -476,7 +476,7 @@ sphereface网络使用LFW推理得到的结果如下: ### 推理 -如果您需要使用此训练模型在GPU、Ascend 910、Ascend 310等多个硬件平台上进行推理,可参考此[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。下面是操作步骤示例: +如果您需要使用此训练模型在GPU、Ascend 910、Ascend 310等多个硬件平台上进行推理,可参考此[链接](https://www.mindspore.cn/docs/zh-CN/master/model_infer/ms_infer/overview.html)。下面是操作步骤示例: - Ascend、GPU处理器环境运行 diff --git a/research/cv/squeezenet/README.md b/research/cv/squeezenet/README.md index 48fe41b1c..8de6a7f20 100644 --- a/research/cv/squeezenet/README.md +++ b/research/cv/squeezenet/README.md @@ -720,7 +720,7 @@ Inference result is saved in current path, you can find result like this in acc. ### Inference -If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html). Following the steps below, this is a simple example: +If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/docs/en/master/model_infer/ms_infer/overview.html). Following the steps below, this is a simple example: - Running on Ascend diff --git a/research/cv/squeezenet1_1/README.md b/research/cv/squeezenet1_1/README.md index 63cfd4d24..ff006ebbf 100644 --- a/research/cv/squeezenet1_1/README.md +++ b/research/cv/squeezenet1_1/README.md @@ -306,7 +306,7 @@ Inference result is saved in current path, you can find result like this in acc. ### Inference -If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html). Following the steps below, this is a simple example: +If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/docs/en/master/model_infer/ms_infer/overview.html). Following the steps below, this is a simple example: - Running on Ascend diff --git a/research/cv/textfusenet/README.md b/research/cv/textfusenet/README.md index 2be4f567f..4eecda4be 100755 --- a/research/cv/textfusenet/README.md +++ b/research/cv/textfusenet/README.md @@ -319,7 +319,7 @@ Usage: bash run_standalone_train.sh [PRETRAINED_MODEL] ## [Training Process](#contents) -- Set options in `config.py`, including loss_scale, learning rate and network hyperparameters. Click [here](https://www.mindspore.cn/tutorials/experts/en/master/dataset/augment.html) for more information about dataset. +- Set options in `config.py`, including loss_scale, learning rate and network hyperparameters. Click [here](https://www.mindspore.cn/docs/en/master/model_train/dataset/augment.html) for more information about dataset. ### [Training](#content) diff --git a/research/cv/textfusenet/README_CN.md b/research/cv/textfusenet/README_CN.md index c34357189..635953fad 100755 --- a/research/cv/textfusenet/README_CN.md +++ b/research/cv/textfusenet/README_CN.md @@ -328,7 +328,7 @@ Shapely==1.5.9 ## 训练过程 -- 在`config.py`中设置配置项,包括loss_scale、学习率和网络超参。单击[此处](https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/augment.html)获取更多数据集相关信息. +- 在`config.py`中设置配置项,包括loss_scale、学习率和网络超参。单击[此处](https://www.mindspore.cn/docs/zh-CN/master/model_train/dataset/augment.html)获取更多数据集相关信息. ### 训练 diff --git a/research/cv/tinydarknet/README_CN.md b/research/cv/tinydarknet/README_CN.md index a2f66fc41..e537488d0 100644 --- a/research/cv/tinydarknet/README_CN.md +++ b/research/cv/tinydarknet/README_CN.md @@ -64,7 +64,7 @@ Tiny-DarkNet是Joseph Chet Redmon等人提出的一个16层的针对于经典的 - + # [环境要求](#目录) diff --git a/research/cv/vnet/README_CN.md b/research/cv/vnet/README_CN.md index 95924d26c..6572a6840 100644 --- a/research/cv/vnet/README_CN.md +++ b/research/cv/vnet/README_CN.md @@ -101,7 +101,7 @@ VNet适用于医学图像分割,使用3D卷积,能够处理3D MR图像数据 - [MindSpore Python API](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.html) - 生成config json文件用于多卡训练。 - [简易教程](https://gitee.com/mindspore/models/tree/master/utils/hccl_tools) - - 详细配置方法请参照[rank table启动](https://www.mindspore.cn/tutorials/experts/zh-CN/master/parallel/rank_table.html)。 + - 详细配置方法请参照[rank table启动](https://www.mindspore.cn/docs/zh-CN/master/model_train/parallel/rank_table.html)。 # 快速入门 diff --git a/research/cv/wideresnet/README.md b/research/cv/wideresnet/README.md index c16dec6d8..ec4fea43a 100644 --- a/research/cv/wideresnet/README.md +++ b/research/cv/wideresnet/README.md @@ -208,7 +208,7 @@ bash run_standalone_train_gpu.sh [DATASET_PATH] [CONFIG_PATH] [EXPERIMENT_LABEL] For distributed training, a hostfile configuration needs to be created in advance. -Please follow the instructions in the link [GPU-Multi-Host](https://www.mindspore.cn/tutorials/experts/en/master/parallel/mpirun.html). +Please follow the instructions in the link [GPU-Multi-Host](https://www.mindspore.cn/docs/en/master/model_train/parallel/mpirun.html). ##### Evaluation while training diff --git a/research/cv/wideresnet/README_CN.md b/research/cv/wideresnet/README_CN.md index b96a63bb3..339c79ff4 100644 --- a/research/cv/wideresnet/README_CN.md +++ b/research/cv/wideresnet/README_CN.md @@ -211,7 +211,7 @@ bash run_standalone_train_gpu.sh [DATASET_PATH] [CONFIG_PATH] [EXPERIMENT_LABEL] 对于分布式培训,需要提前创建主机文件配置。 -请按照链接中的说明操作 [GPU-Multi-Host](https://www.mindspore.cn/tutorials/experts/en/master/parallel/mpirun.html). +请按照链接中的说明操作 [GPU-Multi-Host](https://www.mindspore.cn/docs/en/master/model_train/parallel/mpirun.html). ## 培训时的评估 diff --git a/research/nlp/mass/README.md b/research/nlp/mass/README.md index 3364fb0dd..cb5cddaac 100644 --- a/research/nlp/mass/README.md +++ b/research/nlp/mass/README.md @@ -501,7 +501,7 @@ subword-nmt rouge ``` - + # Get started @@ -563,7 +563,7 @@ Get the log and output files under the path `./train_mass_*/`, and the model fil ## Inference -If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html). +If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/docs/en/master/model_infer/ms_infer/overview.html). For inference, config the options in `default_config.yaml` firstly: - Assign the `default_config.yaml` under `data_path` node to the dataset path. diff --git a/research/nlp/mass/README_CN.md b/research/nlp/mass/README_CN.md index c2203ddc5..f0053f767 100644 --- a/research/nlp/mass/README_CN.md +++ b/research/nlp/mass/README_CN.md @@ -505,7 +505,7 @@ subword-nmt rouge ``` - + # 快速上手 @@ -567,7 +567,7 @@ bash run_gpu.sh -t t -n 1 -i 1 ## 推理 -如果您需要使用此训练模型在GPU、Ascend 910、Ascend 310等多个硬件平台上进行推理,可参考此[链接](https://www.mindspore.cn/tutorials/experts/zh-CN/master/infer/inference.html)。 +如果您需要使用此训练模型在GPU、Ascend 910、Ascend 310等多个硬件平台上进行推理,可参考此[链接](https://www.mindspore.cn/docs/zh-CN/master/model_infer/ms_infer/overview.html)。 推理时,请先配置`config.json`中的选项: - 将`default_config.yaml`节点下的`data_path`配置为数据集路径。 diff --git a/research/nlp/rotate/README_CN.md b/research/nlp/rotate/README_CN.md index cf901cd7b..a87a4910b 100644 --- a/research/nlp/rotate/README_CN.md +++ b/research/nlp/rotate/README_CN.md @@ -86,7 +86,7 @@ bash run_infer_310.sh [MINDIR_HEAD_PATH] [MINDIR_TAIL_PATH] [DATASET_PATH] [NEED 在裸机环境(本地有Ascend 910 AI 处理器)进行分布式训练时,需要配置当前多卡环境的组网信息文件。 请遵循一下链接中的说明创建json文件: - + - GPU处理器环境运行 diff --git a/research/recommend/ncf/README.md b/research/recommend/ncf/README.md index 26e0b0a23..2bb4556b4 100644 --- a/research/recommend/ncf/README.md +++ b/research/recommend/ncf/README.md @@ -356,9 +356,9 @@ Inference result is saved in current path, you can find result like this in acc. ### Inference -If you need to use the trained model to perform inference on multiple hardware platforms, such as Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html). Following the steps below, this is a simple example: +If you need to use the trained model to perform inference on multiple hardware platforms, such as Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/docs/en/master/model_infer/ms_infer/overview.html). Following the steps below, this is a simple example: - + ```python # Load unseen dataset for inference -- Gitee