From 9cd7d872695ee726d73dbf60b99a5291d7f02b3b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E8=BD=AC=E8=A1=8C=E7=9A=84=E5=B0=8F=E7=A0=81=E5=86=9C?=
<547278563@qq.com>
Date: Fri, 19 Feb 2021 16:52:42 +0800
Subject: [PATCH 1/2] update
tutorials/training/source_en/advanced_use/use_on_the_cloud.md.
---
.../advanced_use/use_on_the_cloud.md | 207 ++++++++++++++++++
1 file changed, 207 insertions(+)
diff --git a/tutorials/training/source_en/advanced_use/use_on_the_cloud.md b/tutorials/training/source_en/advanced_use/use_on_the_cloud.md
index 0ae46ab51b..8091e5197b 100644
--- a/tutorials/training/source_en/advanced_use/use_on_the_cloud.md
+++ b/tutorials/training/source_en/advanced_use/use_on_the_cloud.md
@@ -3,3 +3,210 @@
No English version available right now, welcome to contribute.
+# Use MindSpore on the Cloud
+
+No English version available right now, welcome to contribute.
+
+
+overview
+ModelArts is a developer-oriented one-stop AI development platform provided by Huawei Cloud, which integrates Shengsheng AI processor resource pool and allows users to experience MindSpore on the platform.
+This tutorial takes ResNet-50 as an example to give a brief introduction to how to use MindSpore to complete training tasks in ModelArts.
+preparatory work
+ModelArts use preparation
+Refer to the "Preparing" section of the ModelArts tutorial to complete account registration, configuration with ModelArts, and preparation for bucket creation.
+instructions
+ModelArts tutorial links: https://support.huaweicloud.com/wtsnew-modelarts/index.html.The page provides a rich ModelArts tutorial, refer to the "Preparation" section to complete the preparation of ModelArts.
+Owns Cloud Ascend AI processor resources
+Make sure that your account has the public test qualification of ModelArts Huawei Cloud Ascension Cluster Service, and you can submit the application in ModelArts Huawei Cloud.
+data preparation
+ModelArts uses Object Storage Service (OBS) for data Storage, so data needs to be uploaded to OBS before starting the training task. This example uses the CIFAR-10 binary format dataset.
+1.Download the CIFAR-10 data set and unzip it.
+instructions
+CIFAR - 10 data set download page: http://www.cs.toronto.edu/~kriz/cifar.html. The page provides links to download three datasets. This example uses a cifar-10 binary version.
+2.Create a new OBS bucket of your own (for example, MS-dataset), create a data directory in the bucket (for example, cifar-10), and upload the cifar-10 data to the data directory in the following structure.
+└─对象存储/ms-dataset/cifar-10
+ ├─train
+ │ data_batch_1.bin
+ │ data_batch_2.bin
+ │ data_batch_3.bin
+ │ data_batch_4.bin
+ │ data_batch_5.bin
+ │
+ └─eval
+ test_batch.bin
+Execute script preparation
+Create a new OBS bucket of your own (for example: resnet50-train), create a code directory in the bucket (for example: resnet50_cifar10_train), and upload all scripts in the following directory to the code directory:
+instructions
+ModelArts tutorial links: https://support.huaweicloud.com/wtsnew-modelarts/index.html.The page provides a rich ModelArts tutorial, refer to the "Preparation" section to complete the preparation of ModelArts.
+Note that the version of the script to run needs to be the same as the version of MindSpore selected for the Create Training Task step. For example, using the scripts provided with the MindSpore 1.1 tutorial requires you to select the MindSpore 1.1 engine when creating training tasks.
+To facilitate the subsequent creation of training jobs, first create the training output directory and log output directory. The structure of the directory created in this example is as follows:
+└─对象存储/resnet50-train
+ ├─resnet50_cifar10_train
+ │ dataset.py
+ │ resnet50_train.py
+ │
+ ├─output
+ └─log
+Run the MindSpore script in ModelArts with a simple adaptation
+The scripts provided in the section "Execute Script Preparation" can be run directly in ModelArts, and you can skip this section if you want to quickly experience the RESNET-50 training CIFAR-10. If you need to run a custom MindSpore script or more MindSpore sample code in ModelArts, you need to refer to this section for a simple adaptation of the MindSpore code.
+Adapt script parameters
+1.Scripts running in ModelArts must be configured with data_url and train_url corresponding to the data storage path (OBS path) and training output path (OBS path),
+import argparse
+
+ parser = argparse.ArgumentParser(description='ResNet-50 train.')
+ parser.add_argument('--data_url', required=True, default=None, help='Location of data.')
+ parser.add_argument('--train_url', required=True, default=None, help='Location of training outputs.')
+2.The ModelArts interface supports passing values to other parameters in the script, as described in more detail in the next section, "Creating a training job."
+ parser.add_argument('--epoch_size', type=int, default=90, help='Train epoch size.')
+Adapt OBS data
+At the moment, MindSpore does not provide a direct interface to OBS data. Instead, it needs to interact with OBS through APIs provided by Moxing. ModelArts training scripts are executed in a container, usually using the /cache directory as the container data store path.
+instructions
+Huawei cloud MoXing provides a rich API for users to use https://github.com/huaweicloud/ModelArts-Lab/tree/master/docs/moxing_api_doc, this example only need to use copy_parallel interface.
+1.Download the data stored in OBS to the execution container.
+ import moxing as mox
+ mox.file.copy_parallel(src_url='s3://dataset_url/', dst_url='/cache/data_path')
+2.Upload the training output from the container to OBS.
+ import moxing as mox
+ mox.file.copy_parallel(src_url='/cache/output_path', dst_url='s3://output_url/')
+Adapt the 8-card training mission
+If you want to run the script on an 8* ASCEND environment, you need to adapt the code that creates the dataset to the local data path and configure the distributed policy. By obtaining the DEVICE_ID and RANK_SIZE environment variables, the user can build training scripts for both 1* ASCEND and 8* ASCEND specifications.
+1.Local path adaptation.
+import os
+
+device_num = int(os.getenv('RANK_SIZE'))
+ device_id = int(os.getenv('DEVICE_ID'))
+# define local data path
+local_data_path = '/cache/data'
+
+if device_num > 1:
+ # define distributed local data path
+ local_data_path = os.path.join(local_data_path, str(device_id))
+2.Data set adaptation.
+import os
+import mindspore.dataset.engine as de
+
+device_id = int(os.getenv('DEVICE_ID'))
+device_num = int(os.getenv('RANK_SIZE'))
+if device_num == 1:
+ # create train data for 1 Ascend situation
+ ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True)
+else:
+ # create train data for 1 Ascend situation, split train data for 8 Ascend situation
+ ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
+ num_shards=device_num, shard_id=device_id)
+3.Configure distributed policies.
+import os
+from mindspore import context
+from mindspore.context import ParallelMode
+
+device_num = int(os.getenv('RANK_SIZE'))
+if device_num > 1:
+ context.set_auto_parallel_context(device_num=device_num,
+ parallel_mode=ParallelMode.DATA_PARALLEL,
+ gradients_mean=True)
+sample code
+Combined with the above three points, the MindSpore script is simply adapted, taking the following pseudocode as an example:
+The original Mindspore script:
+import os
+import argparse
+from mindspore import context
+from mindspore.context import ParallelMode
+import mindspore.dataset.engine as de
+
+device_id = int(os.getenv('DEVICE_ID'))
+device_num = int(os.getenv('RANK_SIZE'))
+
+def create_dataset(dataset_path):
+ if device_num == 1:
+ ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True)
+ else:
+ ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
+ num_shards=device_num, shard_id=device_id)
+ return ds
+
+def resnet50_train(args):
+ if device_num > 1:
+ context.set_auto_parallel_context(device_num=device_num,
+ parallel_mode=ParallelMode.DATA_PARALLEL,
+ gradients_mean=True)
+ train_dataset = create_dataset(local_data_path)
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description='ResNet-50 train.')
+ parser.add_argument('--local_data_path', required=True, default=None, help='Location of data.')
+ parser.add_argument('--epoch_size', type=int, default=90, help='Train epoch size.')
+
+ args_opt, unknown = parser.parse_known_args()
+
+ resnet50_train(args_opt)
+Adapted MindSpore script:
+import os
+import argparse
+from mindspore import context
+from mindspore.context import ParallelMode
+import mindspore.dataset.engine as de
+
+# adapt to cloud: used for downloading data
+import moxing as mox
+
+device_id = int(os.getenv('DEVICE_ID'))
+device_num = int(os.getenv('RANK_SIZE'))
+
+def create_dataset(dataset_path):
+ if device_num == 1:
+ ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True)
+ else:
+ ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
+ num_shards=device_num, shard_id=device_id)
+ return ds
+
+def resnet50_train(args):
+ # adapt to cloud: define local data path
+ local_data_path = '/cache/data'
+
+ if device_num > 1:
+ context.set_auto_parallel_context(device_num=device_num,
+ parallel_mode=ParallelMode.DATA_PARALLEL,
+ gradients_mean=True)
+ # adapt to cloud: define distributed local data path
+ local_data_path = os.path.join(local_data_path, str(device_id))
+
+ # adapt to cloud: download data from obs to local location
+ print('Download data.')
+ mox.file.copy_parallel(src_url=args.data_url, dst_url=local_data_path)
+
+ train_dataset = create_dataset(local_data_path)
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description='ResNet-50 train.')
+ # adapt to cloud: get obs data path
+ parser.add_argument('--data_url', required=True, default=None, help='Location of data.')
+ # adapt to cloud: get obs output path
+ parser.add_argument('--train_url', required=True, default=None, help='Location of training outputs.')
+ parser.add_argument('--epoch_size', type=int, default=90, help='Train epoch size.')
+ args_opt, unknown = parser.parse_known_args()
+
+ resnet50_train(args_opt)
+Creating Training Tasks
+With the data ready and the script executed, you need to create training tasks to get the MindSpore script actually running. First-time users of ModelArts can follow the process of creating training jobs with ModelArts in this section.
+Enter the ModelArts console
+Open huawei cloud ModelArts homepage at https://www.huaweicloud.com/product/modelarts.html and click on the page "into the console.
+Create training jobs using common frameworks
+https://support.huaweicloud.com/engineers-modelarts/modelarts_23_0238.html ModelArts tutorial shows how to use common framework to create training assignments.
+Create training jobs using MindSpore as a common framework
+Taking the training scripts and data used in this tutorial as an example, details how to configure in the Create Training Job interface:
+1.Select the usual framework, then the AI Engine selects the Ascend-Power-Engine and the required version of MindSpore (this example image is MindSpore-0.5-Python 3.7-Aarch64, please note that the script for the selected version is used).
+2.The code directory selection creates the code directory in the OBS bucket in advance, and the startup file selects the startup script under the code directory.
+3.Data source Select the data storage location and fill in the location of the CIFAR-10 data set in OBS.
+4.Operation parameters: the data storage location and training output location correspond to the operation parameters DATA_URL and TRAIN_URL respectively. If you select to add the operation parameter, you can pass values to other parameters in the script, such as EPOCH_SIZE.
+5.Select the common resource pool > Ascend.
+6.Ascend: 1 * Ascend 910 CPU: 24-core 96GIB or Ascend: 8 * Ascend 910 CPU: 192-core 768GIB, respectively.
+
+View the results
+1.The running log can be viewed in the training job interface
+2.If the log path was specified when the training job was created, you can download the log file from OBS and view it.
+
+
+
+
+
--
Gitee
From bdaf481aec7cd3e2283f460d3b8261f4f0497547 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E8=BD=AC=E8=A1=8C=E7=9A=84=E5=B0=8F=E7=A0=81=E5=86=9C?=
<547278563@qq.com>
Date: Sat, 20 Feb 2021 11:04:58 +0800
Subject: [PATCH 2/2] update
tutorials/training/source_en/advanced_use/use_on_the_cloud.md.
---
.../advanced_use/use_on_the_cloud.md | 110 ++++++++----------
1 file changed, 48 insertions(+), 62 deletions(-)
diff --git a/tutorials/training/source_en/advanced_use/use_on_the_cloud.md b/tutorials/training/source_en/advanced_use/use_on_the_cloud.md
index 8091e5197b..50554537c4 100644
--- a/tutorials/training/source_en/advanced_use/use_on_the_cloud.md
+++ b/tutorials/training/source_en/advanced_use/use_on_the_cloud.md
@@ -4,10 +4,6 @@ No English version available right now, welcome to contribute.
# Use MindSpore on the Cloud
-
-No English version available right now, welcome to contribute.
-
-
overview
ModelArts is a developer-oriented one-stop AI development platform provided by Huawei Cloud, which integrates Shengsheng AI processor resource pool and allows users to experience MindSpore on the platform.
This tutorial takes ResNet-50 as an example to give a brief introduction to how to use MindSpore to complete training tasks in ModelArts.
@@ -52,10 +48,9 @@ The scripts provided in the section "Execute Script Preparation" can be run dire
Adapt script parameters
1.Scripts running in ModelArts must be configured with data_url and train_url corresponding to the data storage path (OBS path) and training output path (OBS path),
import argparse
-
- parser = argparse.ArgumentParser(description='ResNet-50 train.')
- parser.add_argument('--data_url', required=True, default=None, help='Location of data.')
- parser.add_argument('--train_url', required=True, default=None, help='Location of training outputs.')
+parser = argparse.ArgumentParser(description='ResNet-50 train.')
+parser.add_argument('--data_url', required=True, default=None, help='Location of data.')
+parser.add_argument('--train_url', required=True, default=None, help='Location of training outputs.')
2.The ModelArts interface supports passing values to other parameters in the script, as described in more detail in the next section, "Creating a training job."
parser.add_argument('--epoch_size', type=int, default=90, help='Train epoch size.')
Adapt OBS data
@@ -101,9 +96,9 @@ from mindspore.context import ParallelMode
device_num = int(os.getenv('RANK_SIZE'))
if device_num > 1:
- context.set_auto_parallel_context(device_num=device_num,
- parallel_mode=ParallelMode.DATA_PARALLEL,
- gradients_mean=True)
+context.set_auto_parallel_context(device_num=device_num,
+parallel_mode=ParallelMode.DATA_PARALLEL,
+gradients_mean=True)
sample code
Combined with the above three points, the MindSpore script is simply adapted, taking the following pseudocode as an example:
The original Mindspore script:
@@ -117,28 +112,24 @@ device_id = int(os.getenv('DEVICE_ID'))
device_num = int(os.getenv('RANK_SIZE'))
def create_dataset(dataset_path):
- if device_num == 1:
- ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True)
- else:
- ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
- num_shards=device_num, shard_id=device_id)
- return ds
-
+if device_num == 1:
+ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True)
+else:
+ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
+num_shards=device_num, shard_id=device_id)
+return ds
def resnet50_train(args):
- if device_num > 1:
- context.set_auto_parallel_context(device_num=device_num,
- parallel_mode=ParallelMode.DATA_PARALLEL,
- gradients_mean=True)
- train_dataset = create_dataset(local_data_path)
-
+if device_num > 1:
+context.set_auto_parallel_context(device_num=device_num,
+parallel_mode=ParallelMode.DATA_PARALLEL,
+gradients_mean=True)
+train_dataset = create_dataset(local_data_path)
if __name__ == '__main__':
- parser = argparse.ArgumentParser(description='ResNet-50 train.')
- parser.add_argument('--local_data_path', required=True, default=None, help='Location of data.')
- parser.add_argument('--epoch_size', type=int, default=90, help='Train epoch size.')
-
- args_opt, unknown = parser.parse_known_args()
-
- resnet50_train(args_opt)
+parser = argparse.ArgumentParser(description='ResNet-50 train.')
+parser.add_argument('--local_data_path', required=True, default=None, help='Location of data.')
+parser.add_argument('--epoch_size', type=int, default=90, help='Train epoch size.')
+args_opt, unknown = parser.parse_known_args()
+resnet50_train(args_opt)
Adapted MindSpore script:
import os
import argparse
@@ -153,40 +144,35 @@ device_id = int(os.getenv('DEVICE_ID'))
device_num = int(os.getenv('RANK_SIZE'))
def create_dataset(dataset_path):
- if device_num == 1:
- ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True)
- else:
- ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
- num_shards=device_num, shard_id=device_id)
- return ds
+if device_num == 1:
+ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True)
+else:
+ds = de.Cifar10Dataset(dataset_path, num_parallel_workers=8, shuffle=True,
+num_shards=device_num, shard_id=device_id)
+return ds
def resnet50_train(args):
- # adapt to cloud: define local data path
- local_data_path = '/cache/data'
-
- if device_num > 1:
- context.set_auto_parallel_context(device_num=device_num,
- parallel_mode=ParallelMode.DATA_PARALLEL,
- gradients_mean=True)
- # adapt to cloud: define distributed local data path
- local_data_path = os.path.join(local_data_path, str(device_id))
-
- # adapt to cloud: download data from obs to local location
- print('Download data.')
- mox.file.copy_parallel(src_url=args.data_url, dst_url=local_data_path)
-
- train_dataset = create_dataset(local_data_path)
-
+# adapt to cloud: define local data path
+local_data_path = '/cache/data'
+if device_num > 1:
+context.set_auto_parallel_context(device_num=device_num,
+parallel_mode=ParallelMode.DATA_PARALLEL,
+gradients_mean=True)
+# adapt to cloud: define distributed local data path
+local_data_path = os.path.join(local_data_path, str(device_id))
+# adapt to cloud: download data from obs to local location
+print('Download data.')
+mox.file.copy_parallel(src_url=args.data_url, dst_url=local_data_path)
+train_dataset = create_dataset(local_data_path)
if __name__ == '__main__':
- parser = argparse.ArgumentParser(description='ResNet-50 train.')
- # adapt to cloud: get obs data path
- parser.add_argument('--data_url', required=True, default=None, help='Location of data.')
- # adapt to cloud: get obs output path
- parser.add_argument('--train_url', required=True, default=None, help='Location of training outputs.')
- parser.add_argument('--epoch_size', type=int, default=90, help='Train epoch size.')
- args_opt, unknown = parser.parse_known_args()
-
- resnet50_train(args_opt)
+parser = argparse.ArgumentParser(description='ResNet-50 train.')
+# adapt to cloud: get obs data path
+parser.add_argument('--data_url', required=True, default=None, help='Location of data.')
+# adapt to cloud: get obs output path
+parser.add_argument('--train_url', required=True, default=None, help='Location of training outputs.')
+parser.add_argument('--epoch_size', type=int, default=90, help='Train epoch size.')
+args_opt, unknown = parser.parse_known_args()
+resnet50_train(args_opt)
Creating Training Tasks
With the data ready and the script executed, you need to create training tasks to get the MindSpore script actually running. First-time users of ModelArts can follow the process of creating training jobs with ModelArts in this section.
Enter the ModelArts console
--
Gitee