diff --git a/docs/mindspore/faq/source_en/data_processing.md b/docs/mindspore/faq/source_en/data_processing.md
index 8923cccd4294dc74bb3c85b63034b179fcf514d7..9b412bffe953a1db28d3e8e0e1615f35a537aa09 100644
--- a/docs/mindspore/faq/source_en/data_processing.md
+++ b/docs/mindspore/faq/source_en/data_processing.md
@@ -4,7 +4,7 @@
**Q: How do I offload data if I do not use high-level APIs?**
-A: You can refer to the [test_tdt_data_transfer.py](https://gitee.com/mindspore/mindspore/blob/master/tests/st/data_transfer/test_tdt_data_transfer.py) example of the manual offloading mode without using the `model.train` API. Currently, the GPU-based and Ascend-based hardware is supported.
+A: You can implement by referring to the [test_tdt_data_transfer.py](https://gitee.com/mindspore/mindspore/blob/master/tests/st/data_transfer/test_tdt_data_transfer.py) example of the manual offloading mode without using the `model.train` API. Currently, the GPU-based and Ascend-based hardware is supported.
@@ -12,19 +12,19 @@ A: You can refer to the [test_tdt_data_transfer.py](https://gitee.com/mindspore/
A: You can refer to the following steps to reduce the memory occupation, which may also reduce the efficiency of data processing.
- 1. Before defining the dataset object by `**Dataset`, set the prefetch size of `Dataset` processing by `ds.config.set_prefetch_size(2)`.
+ 1. Before defining the dataset `**Dataset` object, set the prefetch size of `Dataset` data processing, `ds.config.set_prefetch_size(2)`.
2. When defining the `**Dataset` object, set its parameter `num_parallel_workers` is 1.
- 3. If you further use `**Dataset` object `.map(...)` operation, you can set `.map(...)` operation's parameter `num_parallel_workers` is 1.
+ 3. If you further use `.map(...)` operation on `**Dataset` object, you can set `.map(...)` operation's parameter `num_parallel_workers` is 1.
- 4. If you further use `**Dataset` object `.batch(...)` operation, you can set `.batch(...)` operation's parameter `num_parallel_workers' is 1.
+ 4. If you further use `.batch(...)` operation on `**Dataset` object, you can set `.batch(...)` operation's parameter `num_parallel_workers' is 1.
-**Q: Why is there no difference between `shuffle=True` and `shuffle=False` in `GeneratorDataset`?**
+**Q: Why there is no difference between the parameter `shuffle` in `GeneratorDataset`, and `shuffle=True` and `shuffle=False` when the task is run? **
-A: If `shuffle` is enabled, the input `Dataset` must support random access (for example, the user-defined `Dataset` has the `getitem` method). If data is returned in `yeild` mode in the user-defined `Dataset`, random access is not supported. For details, see section [Loading Dataset Overview](https://www.mindspore.cn/docs/programming_guide/en/master/dataset_loading.html#id5).
+A: If `shuffle` is enabled, the input `Dataset` must support random access (for example, the user-defined `Dataset` has the `getitem` method). If data is returned in `yeild` mode in the user-defined `Dataset`, random access is not supported. For details, see section [Loading Dataset Overview](https://www.mindspore.cn/docs/programming_guide/en/master/dataset_loading.html#id5) in the tutorial.
@@ -47,7 +47,7 @@ Note: The `shapes`of the two `columns` are different. Therefore, you need to `fl
**Q: Does `GeneratorDataset` support `ds.PKSampler` sampling?**
-A: `GeneratorDataset` does not support `PKSampler` sampling logic. The main reason is that the custom data operation is too flexible. The built-in `PKSampler` cannot be universal. Therefore, a message is displayed at the API layer, indicating that the operation is not supported. However, for `GeneratorDataset`, you can easily define the required `Sampler` logic. That is, you can define specific `sampler` rules in the `__getitem__` function of the `ImageDataset` class and return the required data.
+A: The user-defined dataset`GeneratorDataset` does not support `PKSampler` sampling logic. The main reason is that the customizing data operation is too flexible. The built-in `PKSampler` cannot be universal. Therefore, a message is displayed at the API layer, indicating that the operation is not supported. However, for `GeneratorDataset`, you can easily define the required `Sampler` logic. That is, you can define specific `sampler` rules in the `__getitem__` function of the `ImageDataset` class and return the required data.
@@ -65,7 +65,7 @@ Principle: The underlying layer of `c_transform` uses `opencv/jpeg-turbo` of the
-**Q: A piece of data contains multiple images which have different widths and heights. I need to perform the `map` operation on the data in mindrecord format for data processing. However, the data I read from `record` is in `np.ndarray` format. My `operations` are for the image format. How can I preprocess the generated data in mindrecord format?**
+**Q: A piece of data contains multiple images which have different widths and heights. I need to perform the `map` operation on the data in mindrecord format. However, the data I read from `record` is in `np.ndarray` format. My `operations` of data processing are for the image format. How can I preprocess the generated data in mindrecord format?**
A: You are advised to perform the following operations:
@@ -114,7 +114,7 @@ for item in data_set.create_dict_iterator(output_numpy=True):
-**Q: When a custom image dataset is converted to the mindrecord format, the data is in the `numpy.ndarray` format and `shape` is [4,100,132,3], indicating four three-channel frames, and each value ranges from 0 to 255. However, when I view the data that is converted into the mindrecord format, I find that the `shape` is `[19800]` but that of the original data is `[158400]`. Why?**
+**Q: When a customizing image dataset is converted to the mindrecord format, the data is in the `numpy.ndarray` format and `shape` is [4,100,132,3], indicating four three-channel frames, and each value ranges from 0 to 255. However, when I view the data that is converted into the mindrecord format, I find that the `shape` is `[19800]` and the dimensions of the original data are all expanded as`[158400]`. Why?**
A: The value of `dtype` in `ndarray` might be set to `int8`. The difference between `[158400]` and `[19800]` is eight times. You are advised to set `dtype` of `ndarray` to `float64`.
@@ -122,15 +122,15 @@ A: The value of `dtype` in `ndarray` might be set to `int8`. The difference betw
**Q: I want to save the generated image, but the image cannot be found in the corresponding directory after the code is executed. Similarly, a dataset is generated in JupyterLab for training. During training, data can be read in the corresponding path, but the image or dataset cannot be found in the path. Why?**
-A: The images or datasets generated by JumperLab are stored in Docker. The data downloaded by `moxing` can be viewed only in Docker during the training process. After the training is complete, the data is released with Docker. You can try to transfer the data that needs to be downloaded to `obs` through `moxing` in the training task, and then download the data to the local host through `obs`.
+A: The images or datasets generated by JumperLab are stored in Docker. The data downloaded by `moxing` can be viewed only in Docker during the training process. After the training is complete, the data is released with Docker. You can try to transfer the data that needs to be `download` to `obs` through `moxing` in the training task, and then download the data to the local host through `obs`.
**Q: How do I understand the `dataset_sink_mode` parameter in `model.train` of MindSpore?**
-A: When `dataset_sink_mode` is set to `True`, data processing and network computing are performed in pipeline mode. That is, when data processing is performed step by step, after a `batch` of data is processed, the data is placed in a queue which is used to cache the processed data. Then, network computing obtains data from the queue for training. In this case, data processing and network computing are performed in pipeline mode. The entire training duration is the longest data processing/network computing duration.
+A: When `dataset_sink_mode` is set to `True`, data processing and network computing are performed in pipeline mode. That is, when data processing is performed step by step, after a `batch` of data is processed, the data is placed in a queue which is used to cache the processed data. Then, network computing obtains data from the queue for training. In this case, data processing and network computing are performed in `pipeline` mode. The entire training duration is the longest data processing/network computing duration.
-When `dataset_sink_mode` is set to `False`, data processing and network computing are performed in serial mode. That is, after a `batch` of data is processed, it is transferred to the network for computation. After the computation is complete, the next `batch` of data is processed and transferred to the network for computation. This process repeats until the training is complete. The total time consumed is the time consumed for data processing plus the time consumed for network computing.
+When `dataset_sink_mode` is set to `False`, data processing and network computing are performed in serial mode. That is, after a `batch` of data is processed, it is transferred to the network for computation. After the computation is complete, the next `batch` of data is processed and transferred to the network for computation. This process repeats until the training is complete. The total time consumed for the training is the time consumed for data processing plus the time consumed for network computing.
@@ -142,7 +142,7 @@ A: You can refer to the usage of YOLOv3 which contains the resizing of different
**Q: Must data be converted into MindRecords when MindSpore is used for segmentation training?**
-A: [build_seg_data.py](https://gitee.com/mindspore/models/blob/master/official/cv/deeplabv3/src/data/build_seg_data.py) is used to generate MindRecords based on a dataset. You can directly use or adapt it to your dataset. Alternatively, you can use `GeneratorDataset` if you want to read the dataset by yourself.
+A: [build_seg_data.py](https://gitee.com/mindspore/models/blob/master/official/cv/deeplabv3/src/data/build_seg_data.py) is the script of MindRecords generated by the dataset. You can directly use or adapt it to your dataset. Alternatively, you can use `GeneratorDataset` to customize the dataset loading if you want to implement the dataset reading by yourself.
[GenratorDataset example](https://www.mindspore.cn/docs/programming_guide/en/master/dataset_loading.html#loading-user-defined-dataset)
@@ -152,7 +152,7 @@ A: [build_seg_data.py](https://gitee.com/mindspore/models/blob/master/official/c
**Q: When MindSpore performs multi-device training on the Ascend hardware platform, how does the user-defined dataset transfer data to different chip?**
-A: When `GeneratorDataset` is used, the `num_shards=num_shards` and `shard_id=device_id` parameters can be used to control which shard of data is read by different devices. `__getitem__` and `__len__` are processed as full datasets.
+A: When `GeneratorDataset` is used, the `num_shards=num_shards` can be used. `shard_id=device_id` parameters can be used to control which shard of data is read by different devices. `__getitem__` and `__len__` are processed as full datasets.
An example is as follows:
@@ -179,7 +179,7 @@ For details, see [Converting Dataset to MindRecord](https://www.mindspore.cn/doc
-**Q: What can I do if an error message `wrong shape of image` is displayed when I use a model trained by MindSpore to perform prediction on a `28 x 28` digital image with white text on a black background?**
+**Q: What can I do if an error message `wrong shape of image` is displayed when I use a model trained by MindSpore to perform prediction on a `28 x 28` digital image made by myself with white text on a black background?**
A: The MNIST gray scale image dataset is used for MindSpore training. Therefore, when the model is used, the data must be set to a `28 x 28` gray scale image, that is, a single channel.
@@ -191,9 +191,9 @@ A: MindData provides the heterogeneous hardware acceleration function for data p
-**Q: When error raised during network training, indicating that sending data failed like "TDT Push data into device Failed", how to locate the problem?**
+**Q: When an error message that "TDT Push data into device Failed" is displayed during network training, how to locate the problem?**
-A: Firstly, above error refers failed sending data to the device through the training data transfer channel (TDT). Here are several possible reasons for this error. Therefore, the corresponding checking suggestions are given in the log. In detail:
+A: Firstly, above error refers to failed sending data to the device through the training data transfer channel (TDT). Here are several possible reasons for this error. Therefore, the corresponding checking suggestions are given in the log. In detail:
1. Commonly, we will find the first error (the first ERROR level error) or error TraceBack thrown in the log, and try to find information that helps locate the cause of the error.
@@ -201,7 +201,7 @@ A: Firstly, above error refers failed sending data to the device through the tra
3. **When error raised during the training process**, usually this is caused by the mismatch between the amount of data (batch number) has been sent and the amount of data (step number) required for network training. You can print and check the number of batches of an epoch with `get_dataset_size` interface,several possible reason are as follows:
- - With checking the print times of loss to figure out the trained steps when error raised, when data amount(trained steps) is just an integer multiple of the batches number in an epoch, there may be a problem in the data processing part involving Epoch processing, such as the following case:
+ - With checking the print times of loss to figure out that when data amount(trained steps) is just an integer multiple of the batches number in an epoch, there may be a processing existence problem in the data processing part involving Epoch processing, such as the following case:
```python
...
@@ -209,19 +209,19 @@ A: Firstly, above error refers failed sending data to the device through the tra
return dataset
```
- - The data processing performance is slow, and cannot keep up with the speed of network training. For this case, you can use the profiler tool and MindInsight to see if there is an obvious iteration gap, or manually iterating the dataset, and print the average single batch time , if longer than the combined forward and backward time of the network, there is a high probability that the performance of the data processing part needs to be optimized.
+ - The data processing performance is slow, and cannot keep up with the speed of network training. For this case, you can use the profiler tool and MindInsight to see if there is an obvious iteration gap, or manually iterating the dataset, and print the average single batch time if it is longer than the combined forward and backward time of the network. There is a high probability that the performance of the data processing part needs to be optimized if yes.
- During the training process, the occurrence of abnormal data may resulted in exception, causing sending data failed. In this case, there will be other `ERROR` logs that shows which part of the data processing process is abnormal and checking advice. If it is not obvious, you can also try to find the abnormal data by iterating each data batch in the dataset (such as turning off shuffle, and using dichotomy).
- 4. **when error raised after training**(this is probably caused by forced release of resources), this error can be ignored.
+ 4. **When after training** the log is printed (this is probably caused by forced release of resources), this error can be ignored.
- 5. If the specific cause cannot be located, please create issue or raise question in huawei clound forum for help.
+ 5. If the specific cause cannot be located, please create issue or raise question to ask the module developers for help.
**Q: Can the py_transforms and c_transforms operators be used together? If yes, how should I use them?**
-A: To ensure high performance, you are not advised to use the py_transforms and c_transforms operators together. For details, see [Image Data Processing and Enhancement](https://www.mindspore.cn/docs/programming_guide/en/master/augmentation.html#usage-instructions). However, if the main consideration is to streamline the process, the performance can be compromised more or less. If you cannot use all the c_transforms operators, that is, certain c_transforms operators are not available, the py_transforms operators can be used instead. In this case, the two operators are used together.
+A: To ensure high performance, you are not advised to use the py_transforms and c_transforms operators together. For details, see [Image Data Processing and Enhancement](https://www.mindspore.cn/docs/programming_guide/en/master/augmentation.html#usage-instructions). However, if the main consideration is to streamline the process, the performance can be compromised more or less. If you cannot use all the c_transforms operators, that is, corresponding certain c_transforms operators are not available, the py_transforms operators can be used instead. In this case, the two operators are used together.
Note that the c_transforms operator usually outputs numpy array, and the py_transforms operator outputs PIL Image. For details, check the operator description. The common method to use them together is as follows:
- c_transforms operator + ToPIL operator + py_transforms operator + ToTensor operator
@@ -255,7 +255,7 @@ A: The preceding error is usually caused by incorrect script writing. In normal
dataset3 = dataset1.map(***)
```
-`dataset3` is obtained by performing data enhancement on `dataset2` rather than `dataset1`. The correct format is as follows:
+ The correct format is as follows. dataset3 is obtained by performing data enhancement on dataset2 rather than dataset1.
```python
dataset2 = dataset1.map(***)
@@ -272,7 +272,7 @@ A: If the dataloader is considered as an API for receiving user-defined datasets
**Q: How do I debug a user-defined dataset when an error occurs?**
-A: Generally, a user-defined dataset is imported to GeneratorDataset. If the user-defined dataset is incorrectly pointed to, you can use some methods for debugging (for example, adding printing information and printing the shape and dtype of the return value). The intermediate processing result of a user-defined dataset is numpy array. You are not advised to use this operator together with the MindSpore network computing operator. In addition, you can directly traverse the user-defined dataset, such as MyDataset shown below, after initialization (to simplify debugging and analyze problems in the original dataset, you do not need to import GeneratorDataset). The debugging complies with common Python syntax rules.
+A: Generally, a user-defined dataset is imported to GeneratorDataset. If the user-defined dataset is incorrectly pointed to, you can use some methods for debugging (for example, adding printing information and printing the shape and dtype of the return value). The intermediate processing result of a user-defined dataset is numpy array. You are not advised to use this operator together with the MindSpore network computing operator. In addition, for the user-defined dataset, such as MyDataset shown below, after initialization, you can directly perform the following inritations (to simplify debugging and analyze problems in the original dataset, you do not need to import GeneratorDataset). The debugging complies with common Python syntax rules.
```python
Dataset = MyDataset()
@@ -284,8 +284,8 @@ for item in Dataset:
**Q: Can the data processing operator and network computing operator be used together?**
-A: Generally, if the data processing operator and network computing operator are used together, the performance deteriorates. If the corresponding data processing operator is unavailable and the user-defined py_transforms operator is inappropriate, you can try to use the data processing operator and network computing operator together. Note that the input of the data processing operator is Numpy array or PIL Image, but the input of the network computing operator must be MindSpore.Tensor.
-To use the two operators together, ensure that the output format of the previous operator is the same as the input format required by the next operator. Data processing operators refer to operators starting with mindspore.dataset in the API document on the official website, for example, mindspore.dataset.vision.c_transforms.CenterCrop. Network computing operators include operators in the mindspore.nn and mindspore.ops directories.
+A: Generally, if the data processing operator and network computing operator are used together, the performance deteriorates. If the corresponding data processing operator is unavailable and the user-defined py_transforms operator is inappropriate, you can try to use the data processing operator and network computing operator together. Note that because the input required by the operators is different, the input of the data processing operator is Numpy array or PIL Image, but the input of the network computing operator must be MindSpore.Tensor.
+To use the two operators together, ensure that the output format of the previous operator is the same as the input format of the next operator. Data processing operators refer to operators starting with mindspore.dataset in the API document on the official website, for example, mindspore.dataset.vision.c_transforms.CenterCrop. Network computing operators include operators in the mindspore.nn and mindspore.ops directories.
@@ -295,11 +295,11 @@ A: The .db file is the index file corresponding to the MindRecord file. If the .
-**Q: How to read image and perform Decode operation in user defined Dataset?**
+**Q: How to read image and perform Decode operation in user-defined Dataset?**
-A: The user defined Dataset that passed into GeneratorDataset, after reading the image inside the function (such as `__getitem__` function), it can directly return bytes type data, numpy array type array or numpy array that has been decoded, as shown below:
+A: The user-defined Dataset is passed into GeneratorDataset, and after reading the image inside the interface (such as `__getitem__` function), it can directly return bytes type data, numpy array type array or numpy array that has been decoded, as shown below:
-- Return bytes of data directly after reading the image
+- Return bytes type data directly after reading the image
```python
class ImageDataset:
@@ -328,7 +328,7 @@ A: The user defined Dataset that passed into GeneratorDataset, after reading the
- Return numpy array after reading the image
```python
- # In the above use case, the __getitem__ function can be modified as follows, and the Decode operation is the same as the above use case
+ # In the above case, the __getitem__ function can be modified as follows, and the Decode operation is the same as the above use case
def __getitem__(self, index):
# use np.fromfile to read image
img_np = np.fromfile(self.data[index])
@@ -337,10 +337,10 @@ A: The user defined Dataset that passed into GeneratorDataset, after reading the
return (img_np, )
```
-- Perform decode operation directly after reading the image
+- Perform Decode operation directly after reading the image
```python
- # According to the above use case, the __getitem__ function can be modified as follows to directly return the data after Decode. After that, there is no need to add Decode operation through the map operator.
+ # According to the above case, the __getitem__ function can be modified as follows to directly return the data after Decode. After that, there is no need to add Decode operation through the map operator.
def __getitem__(self, index):
# use Image.Open to open file, and convert to RGC
img_rgb = Image.Open(self.data[index]).convert("RGB")
diff --git a/docs/mindspore/faq/source_en/distributed_configure.md b/docs/mindspore/faq/source_en/distributed_configure.md
index d37cfc1d5baed2f3a7770fed9338295606370b8f..6f129d4900dad03270948cb46cb24e19c2b16008 100644
--- a/docs/mindspore/faq/source_en/distributed_configure.md
+++ b/docs/mindspore/faq/source_en/distributed_configure.md
@@ -1,8 +1,8 @@
-# Distributed Configure
+# Distributed Configuration
-**Q: What do I do if the error `Init plugin so failed, ret = 1343225860` occurs during the HCCL distributed training?**
+**Q: What do I do if the error message `Init plugin so failed, ret = 1343225860` is displayed during the HCCL distributed training?**
A: HCCL fails to be initialized. The possible cause is that `rank json` is incorrect. You can use the tool in `mindspore/model_zoo/utils/hccl_tools` to generate one. Alternatively, import the environment variable `export ASCEND_SLOG_PRINT_TO_STDOUT=1` to enable the log printing function of HCCL and check the log information.
@@ -17,7 +17,112 @@ Loading libgpu_collective.so failed. Many reasons could cause this:
3.mpi is not installed or found
```
-A: This message means that MindSpore failed to load library `libgpu_collective.so`. The Possible causes are:
+A: This message means that MindSpore is failed to dynamically load the collection communication library. The Possible causes are:
-- OpenMPI or NCCL is not installed in this environment.
-- NCCL version is not updated to `v2.7.6`: MindSpore `v1.1.0` supports GPU P2P communication operator which relies on NCCL `v2.7.6`. `libgpu_collective.so` can't be loaded successfully if NCCL is not updated to this version.
+- OpenMPI or NCCL relied by the diatributed training is not installed in this environment.
+- NCCL version is not updated to `v2.7.6`: MindSpore `v1.1.0` adds GPU P2P communication operator which relies on NCCL `v2.7.6`. The loading failure is caused if NCCL is not updated to this version.
+
+
+
+**Q: In the GPU distributed training scenario, if the number of environment variables CUDA_VISIBLE_DEVICES set incorrectly is less than the number of processes executed, the process blocking problem may occur.**
+
+A: In this scenario, some training processes will prompt the following error:
+
+```text
+[ERROR] DEVICE [mindspore/ccsrc/runtime/device/gpu/cuda_driver.cc:245] SetDevice] SetDevice for id:7 failed, ret[101], invalid device ordinal. Please make sure that the 'device_id' set in context is in the range:[0, total number of GPU). If the environment variable 'CUDA_VISIBLE_DEVICES' is set, the total number of GPU will be the number set in the environment variable 'CUDA_VISIBLE_DEVICES'. For example, if export CUDA_VISIBLE_DEVICES=4,5,6, the 'device_id' can be 0,1,2 at the moment, 'device_id' starts from 0, and 'device_id'=0 means using GPU of number 4.
+[ERROR] DEVICE [mindspore/ccsrc/runtime/device/gpu/gpu_device_manager.cc:27] InitDevice] Op Error: Failed to set current device id | Error Number: 0
+```
+
+The remaining processes may normally execute to the initialization `NCCL` step due to the successful allocation of GPU resources, and the log is as follows:
+
+```text
+[INFO] DEVICE [mindspore/ccsrc/runtime/hardware/gpu/gpu_device_context.cc:90] Initialize] Start initializing NCCL communicator for device 1
+```
+
+In this step, the `NCCL` interface `ncclCommInitRank` is called, which blocks until all processes agree. So if a process doesn't call `ncclCommInitRank`, it will cause the process to block.
+
+We have reported this issue to the `NCCL` community, and the community developers are designing a solution. The latest version has not been fixed, see [issue link](https://github.com/NVIDIA/nccl/issues/593#issuecomment-965939279).
+
+Solution: Manually `kill` the training process. According to the error log, set the correct card number, and then restart the training task.
+
+
+
+**Q: What can we do when in the GPU distributed training scenario, if a process exits abnormally, it may cause other processes to block?**
+
+A: In this scenario, the abnormal process exits due to various problems, and the remaining processes are executed normally to the initialization `NCCL` step due to the successful allocation of GPU resources. The log is as follows:
+
+```text
+[INFO] DEVICE [mindspore/ccsrc/runtime/hardware/gpu/gpu_device_context.cc:90] Initialize] Start initializing NCCL communicator for device 1
+```
+
+In this step, the `NCCL` interface `ncclCommInitRank` is called, which blocks until all processes agree. So if a process doesn't call `ncclCommInitRank`, it will cause the process to block.
+
+We have reported this issue to the `NCCL` community, and the community developers are designing a solution. The latest version has not been fixed, see [issue link](https://github.com/NVIDIA/nccl/issues/593#issuecomment-965939279).
+
+Solution: Manually `kill` the training process. According to the error log, set the correct card number, and then restart the training task.
+
+
+
+**Q: When executing a GPU stand-alone single-card script, when the process is started without mpirun, calling the mindspore.communication.init method may report an error, resulting in execution failure, how to deal with it?**
+
+```text
+[CRITICAL] DISTRIBUTED [mindspore/ccsrc/distributed/cluster/cluster_context.cc:130] InitNodeRole] Role name is invalid...
+```
+
+A: In the case where the user does not start the process using `mpirun` but still calls the `init()` method, MindSpore requires the user to configure several environment variables and verify according to [training and do not rely on OpenMPI for training]( https://www.mindspore.cn/docs/programming_guide/zh-CN/master/distributed_training_gpu.html#openmpi). If without configuring, MindSpore may display the above error message. Therefore, it is suggested that only when performing distributed training, `mindspore.communication.init` is called, and in the case of not using `mpirun`, it is configured the correct environment variables according to the documentation to start distributed training.
+
+
+
+**Q: What can we do when performing multi-machine multi-card training via OpenMPI, the prompt fails due to MPI_Allgather?**
+
+```text
+pml_ucx.c:175 Error: Failed to receive UCX worker address: Not found (-13)
+pml_ucx.c:452 Error: Failed to resolve UCX endpoint for rank X
+```
+
+A: This problem is that `OpenMPI` cannot communicate with the peer address when communicating on the Host side, which is generally caused by the different configuration of the NIC between the machines, and can be solved by manually setting the NIC name or subnet:
+
+```text
+mpirun -n process_num --mca btl tcp --mca btl_tcp_if_include eth0 ./run.sh
+```
+
+The above instruction starts the `process_num` of `run.sh` processes, and selects the Host side communication mode as `tcp`. The network card selects `eth0`, so that the network card used on each machine is the same, and then the communication abnormal problem is solved.
+
+You can also select subnets for matching:
+
+```text
+mpirun -n process_num --mca btl tcp --mca btl_tcp_if_include 192.168.1.0/24 ./run.sh
+```
+
+The subnet range needs to include the IP addresses used by all machines.
+
+
+
+**Q: What can we do when performing distributed training via OpenMPI, stand-alone multi-card training is normal, but during the multi-machine multi-card training, some machines prompt the GPU device id setting to fail?**
+
+```text
+[ERROR] DEVICE [mindspore/ccsrc/runtime/device/gpu/cuda_driver.cc:245] SetDevice] SetDevice for id:7 failed, ret[101], invalid device ordinal. Please make sure that the 'device_id' set in context is in the range:[0, total number of GPU). If the environment variable 'CUDA_VISIBLE_DEVICES' is set, the total number of GPU will be the number set in the environment variable 'CUDA_VISIBLE_DEVICES'. For example, if export CUDA_VISIBLE_DEVICES=4,5,6, the 'device_id' can be 0,1,2 at the moment, 'device_id' starts from 0, and 'device_id'=0 means using GPU of number 4.
+[ERROR] DEVICE [mindspore/ccsrc/runtime/device/gpu/gpu_device_manager.cc:27] InitDevice] Op Error: Failed to set current device id | Error Number: 0
+```
+
+A: In the multi-machine scenario, each process card number needs to be calculated after the host side `AllGather`and `HOSTNAME`. If the same `HOSTNAME` is used between machines, the process card number will be calculated incorrectly, causing the card number to cross the boundary and the setting to fail. This can be resolved by setting the HOSTNAME of each machine to its respective IP address in the execution script:
+
+```text
+export HOSTNAME=node_ip_address
+```
+
+
+
+**Q: What can we do when performing multi-machine multi-card training via OpenMPI, the NCCL error message displays that the network is not working.?**
+
+```text
+include/socket.h:403 NCCL WARN Connect to XXX failed: Network is unreachable
+```
+
+A: This problem is that `NCCL` cannot communicate with the peer address when synchronizing process information or initializing the communication domain on the Host side, which is generally caused by the different configuration of the network card between the machines, and the network card can be selected by setting the `NCCL` environment variable `NCCL_SOCKET_IFNAME`:
+
+```text
+export NCCL_SOCKET_IFNAME=eth
+```
+
+The above command sets the `NCCL` to select the network card name with `eth` in the Host side to communicate.
\ No newline at end of file