From dd53997032e246121eec1c1595acd4cb4aafc702 Mon Sep 17 00:00:00 2001
From: huanxiaoling <3174348550@qq.com>
Date: Mon, 17 Oct 2022 17:20:03 +0800
Subject: [PATCH] update the federated en files3
---
.../docs/source_en/federated_install.md | 20 +-
.../image_classification_application.md | 259 ++++++++----------
docs/federated/docs/source_en/index.rst | 6 +
.../image_classification_application.md | 94 +++----
4 files changed, 186 insertions(+), 193 deletions(-)
diff --git a/docs/federated/docs/source_en/federated_install.md b/docs/federated/docs/source_en/federated_install.md
index 4578d878bb..f0bf90008d 100644
--- a/docs/federated/docs/source_en/federated_install.md
+++ b/docs/federated/docs/source_en/federated_install.md
@@ -2,18 +2,24 @@
-## Installation Overview
+Currently, the [MindSpore Federated](https://gitee.com/mindspore/federated) framework code has been built independently, divided into device-side and cloud-side. Its cloud-side capability relies on MindSpore and MindSpore Federated, using MindSpore for cloud-side cluster aggregation training and communication with device-side, so it needs to get MindSpore whl package and MindSpore Federated whl package respectively. The device-side capability relies on MindSpore Lite and MindSpore Federated java packages, where MindSpore Federated java is mainly responsible for data pre-processing, model training and inference by calling MindSpore Lite for, as well as model-related uploads and downloads by using privacy protection mechanisms and the cloud side.
-Currently, the MindSpore Federated framework code has been integrated into the MindSpore framework on the cloud and the MindSpore Lite framework on the device. Therefore, you need to obtain the MindSpore WHL package and MindSpore Lite Java installation package separately. The MindSpore WHL package is used for cluster aggregation training on the cloud and communication with Lite. The MindSpore Lite Java package contains two parts. One is the MindSpore Lite training installation package, which is used for bottom-layer model training. The other is the Federated-Client installation package, which is used for model delivery, encryption, and interaction with the MindSpore service on the cloud.
-
-### Obtaining the MindSpore WHL Package
+## Obtaining the MindSpore WHL Package
You can use the source code or download the release version to install MindSpore on hardware platforms such as the x86 CPU and GPU CUDA. For details about the installation process, see [Install](https://www.mindspore.cn/install/en) on the MindSpore website.
-### Obtaining the MindSpore Lite Java Package
+## Obtaining the MindSpore Lite Java Package
+
+You can use the source code or download the release version. Currently, only the Linux and Android platforms are supported, and only the CPU hardware architecture is supported. For details about the installation process, see [Downloading MindSpore Lite](https://www.mindspore.cn/lite/docs/en/master/use/downloads.html) and [Building MindSpore Lite](https://www.mindspore.cn/lite/docs/en/master/use/build.html).
+
+## Obtaining MindSpore Federated WHL Package
+
+You can use the source code or download the release version. Currently, MindSpore Federated Learing supports the Linux and Android platforms. For details about the installation process, see [Building MindSpore Federated whl](https://www.mindspore.cn/federated/docs/en/master/deploy_federated_server.html).
+
+## Obtaining MindSpore Federated Java Package
-You can use the source code or download the release version. Currently, only the Linux and Android platforms are supported, and only the CPU hardware architecture is supported. For details about the installation process, see [Downloading MindSpore Lite](https://www.mindspore.cn/lite/docs/en/master/use/downloads.html) and [Building MindSpore Lite](https://www.mindspore.cn/lite/docs/en/master/use/build.html). For details, see "Deploying Federated-Client."
+You can use the source code or download the release version. Currently, MindSpore Federated Learing supports the Linux and Android platforms. For details about the installation process, see [Building MindSpore Federated java](https://www.mindspore.cn/federated/docs/en/master/deploy_federated_client.html).
-### Requirements for Building the Linux Environment
+## Requirements for Building the Linux Environment
Currently, the source code build is supported only in the Linux environment. For details about the environment requirements, see [MindSpore Source Code Build](https://www.mindspore.cn/install/en) and [MindSpore Lite Source Code Build](https://www.mindspore.cn/lite/docs/en/master/use/build.html).
diff --git a/docs/federated/docs/source_en/image_classification_application.md b/docs/federated/docs/source_en/image_classification_application.md
index 094bb9a216..8bcd69e281 100644
--- a/docs/federated/docs/source_en/image_classification_application.md
+++ b/docs/federated/docs/source_en/image_classification_application.md
@@ -2,7 +2,7 @@
-Federated learning can be divided into cross-silo federated learning and cross-device federated learning according to different participating customers. In the cross-silo federation learning scenario, the customers participating in federated learning are different organizations (for example, medical or financial) or geographically distributed data centers, that is, training models on multiple data islands. The clients participating in the cross-device federation learning scenario are a large number of mobiles or IoT devices. This framework will introduce how to use the network LeNet to implement an image classification application on the MindSpore cross-silo federation framework, and provides related tutorials for simulating to start multi-client participation in federated learning in the x86 environment.
+Federated learning can be divided into cross-silo federated learning and cross-device federated learning according to different participating clients. In the cross-silo federated learning scenario, the clients participating in federated learning are different organizations (for example, medical or financial) or data centers geographically distributed, that is, training models on multiple data islands. The clients participating in the cross-device federated learning scenario are a large number of mobiles or IoT devices. This framework will introduce how to use the network LeNet to implement an image classification application on the MindSpore cross-silo federated framework, and provides related tutorials for simulating to start multi-client participation in federated learning in the x86 environment.
Before you start, check whether MindSpore has been correctly installed. If not, install MindSpore on your computer by referring to [Install](https://www.mindspore.cn/install/en) on the MindSpore website.
@@ -10,51 +10,103 @@ Before you start, check whether MindSpore has been correctly installed. If not,
We provide [Federated Learning Image Classification Dataset FEMNIST](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/federated/3500_clients_bin.zip) and the [device-side model file](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/models/lenet_train.ms) of the `.ms` format for users to use directly. Users can also refer to the following tutorials to generate the datasets and models based on actual needs.
-### Data Processing
+### Generating a Device-side Model File
-In this example, the federated learning dataset `FEMNIST` in the `leaf` dataset is used. For the specific acquisition method of the dataset, please refer to the document [Device-cloud federation learning image classification dataset processing](https://gitee.com/mindspore/mindspore/blob/master/tests/st/fl/cross_device_lenet/client/image_classfication_dataset_process_en.md#).
-
-Users can also define the dataset by themselves. Note that the dataset must be a `.bin` format file, and the data dimension in the file must be consistent with the input dimension of the network.
-
-### Generating a Device Model File
-
-1. **Define the network and training process**
+1. Define the network and training process.
For the definition of the specific network and training process, please refer to [Beginners Getting Started](https://www.mindspore.cn/tutorials/en/master/beginner/quick_start.html).
- We provide the network definition file [model.py](https://gitee.com/mindspore/mindspore/blob/master/tests/st/fl/mobile/src/model.py) and the training process definition file [run_export_lenet.py](https://gitee.com/mindspore/mindspore/blob/master/tests/st/fl/cross_device_lenet/cloud/run_export_lenet.py) for your reference.
+2. Export a model as a MindIR file.
-2. **Export a model as a MindIR file.**
-
- Run the script `run_export_lenet.py` to obtain the MindIR format model file, the code snippet is as follows:
+ The code snippet is as follows:
```python
+ import argparse
+ import numpy as np
import mindspore as ms
- ...
+ import mindspore.nn as nn
+
+ def conv(in_channels, out_channels, kernel_size, stride=1, padding=0):
+ """weight initial for conv layer"""
+ weight = weight_variable()
+ return nn.Conv2d(
+ in_channels,
+ out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=padding,
+ weight_init=weight,
+ has_bias=False,
+ pad_mode="valid",
+ )
+
+ def fc_with_initialize(input_channels, out_channels):
+ """weight initial for fc layer"""
+ weight = weight_variable()
+ bias = weight_variable()
+ return nn.Dense(input_channels, out_channels, weight, bias)
+
+ def weight_variable():
+ """weight initial"""
+ return ms.common.initializer.TruncatedNormal(0.02)
+
+ class LeNet5(nn.Cell):
+ def __init__(self, num_class=10, channel=3):
+ super(LeNet5, self).__init__()
+ self.num_class = num_class
+ self.conv1 = conv(channel, 6, 5)
+ self.conv2 = conv(6, 16, 5)
+ self.fc1 = fc_with_initialize(16 * 5 * 5, 120)
+ self.fc2 = fc_with_initialize(120, 84)
+ self.fc3 = fc_with_initialize(84, self.num_class)
+ self.relu = nn.ReLU()
+ self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
+ self.flatten = nn.Flatten()
+
+ def construct(self, x):
+ x = self.conv1(x)
+ x = self.relu(x)
+ x = self.max_pool2d(x)
+ x = self.conv2(x)
+ x = self.relu(x)
+ x = self.max_pool2d(x)
+ x = self.flatten(x)
+ x = self.fc1(x)
+ x = self.relu(x)
+ x = self.fc2(x)
+ x = self.relu(x)
+ x = self.fc3(x)
+ return x
parser = argparse.ArgumentParser(description="export mindir for lenet")
parser.add_argument("--device_target", type=str, default="CPU")
- parser.add_argument("--mindir_path", type=str, default="lenet_train.mindir") # The path for the file in MindIR format.
- ...
-
- for _ in range(epoch):
- data = Tensor(np.random.rand(32, 3, 32, 32).astype(np.float32))
- label = Tensor(np.random.randint(0, 61, (32)).astype(np.int32))
- loss = train_network(data, label).asnumpy()
- losses.append(loss)
- ms.export(train_network, data, label, file_name= mindir_path, file_format='MINDIR') # Add the export statement to obtain the model file in MindIR format.
- print(losses)
- ```
+ parser.add_argument("--mindir_path", type=str,
+ default="lenet_train.mindir") # the mindir file path of the model to be export
- The specific operating instructions are as follows:
+ args, _ = parser.parse_known_args()
+ device_target = args.device_target
+ mindir_path = args.mindir_path
- ```sh
- python run_export_lenet.py --mindir_path="ms/lenet/lenet_train.mindir"
+ ms.set_context(mode=ms.GRAPH_MODE, device_target=device_target)
+
+ if __name__ == "__main__":
+ np.random.seed(0)
+ network = LeNet5(62)
+ criterion = nn.SoftmaxCrossEntropyWithLogits(sparse=False, reduction="mean")
+ net_opt = nn.Momentum(network.trainable_params(), 0.01, 0.9)
+ net_with_criterion = nn.WithLossCell(network, criterion)
+ train_network = nn.TrainOneStepCell(net_with_criterion, net_opt)
+ train_network.set_train()
+
+ data = ms.Tensor(np.random.rand(32, 3, 32, 32).astype(np.float32))
+ label = ms.Tensor(np.random.randint(0, 1, (32, 62)).astype(np.float32))
+ ms.export(train_network, data, label, file_name=mindir_path,
+ file_format='MINDIR') # Add the export statement to obtain the model file in MindIR format.
```
The parameter `--mindir_path` is used to set the path of the generated file in MindIR format.
-3. **Convert the MindIR file into an .ms file that can be used by the federated learning framework on the device.**
+3. Convert the MindIR file into an .ms file that can be used by the federated learning device-side framework.
For details about model conversion, see [Training Model Conversion Tutorial](https://www.mindspore.cn/lite/docs/en/master/use/converter_train.html).
@@ -72,127 +124,76 @@ Users can also define the dataset by themselves. Note that the dataset must be a
CONVERTER RESULT SUCCESS:0
```
- This indicates that the MindSpore model is successfully converted to the MindSpore device model and the new file `lenet_train.ms` is generated. If the conversion fails, the following information is displayed:
+ This indicates that the MindSpore model is successfully converted to the MindSpore device-side model and the new file `lenet_train.ms` is generated. If the conversion fails, the following information is displayed:
```sh
CONVERT RESULT FAILED:
```
- Save the generated model file in `.ms` format to a path. When the federated learning API is called, FLParameter.trainModelPath can be set to the path of the model file.
+ The generated model file in `.ms` format is the model file required by subsequent clients.
## Simulating Multi-client Participation in Federated Learning
-1. **Prepare a model file for the client.**
+1. Prepare a model file for the client.
- In the actual scenario, a client contains a model file in .ms format. In the simulation scenario, you need to copy multiple .ms files and name them in `lenet_train{i}.ms` format. In the format, i indicates the client ID. Due to the script settings in `run.py`, i must be set to a number, such as `0, 1, 2, 3, 4, 5...`. Each client uses an .ms file.
+ This example uses lenet on the device-side to simulate the actual network used, where[device-side model file](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/models/lenet_train.ms) in `.ms` format of lenet. As the real scenario where a client contains only one model file in .ms format, in the simulation scenario, multiple copies of the .ms file need to be copied and named according to the `lenet_train{i}.ms` format, where i represents the client number, since the .ms file has been automatically copied for each client in `run_client_x86.py`.
- You can copy and name the original .ms file by referring to the following steps:
-
- ```python
- import shutil
- import os
-
- def copy_file(raw_path,new_path,copy_num):
- # Copy the specified number of files from the raw path to the new path
- for i in range(copy_num):
- file_name = "lenet_train" + str(i) + ".ms"
- new_file_path = os.path.join(new_path, file_name)
- shutil.copy(raw_path ,new_file_path)
- print('====== copying ',i, ' file ======')
- print("the number of copy .ms files: ", len(os.listdir(new_path)))
-
- if __name__ == "__main__":
- raw_path = "lenet_train.ms"
- new_path = "ms/lenet"
- num = 8
- copy_file(raw_path, new_path, num)
- ```
-
- Set `raw_path` to the path of the original .ms file, `new_path` to the path of the .ms file to be copied, and `num` to the number of copies. Generally, you need to simulate the number of started clients.
-
- For example, in the preceding script, the .ms file is generated in the `ms/lenet` directory for eight clients. The directory structure is as follows:
-
- ```sh
- ms/lenet
- ├── lenet_train0.ms # .ms file used by client 0.
- ├── lenet_train1.ms # .ms file used by client 1.
- ├── lenet_train2.ms # .ms file used by client 2.
- ├── lenet_train3.ms # .ms file used by client 3.
- │
- │ ......
- │
- └── lenet_train7.ms # .ms file used by client 7.
- ```
+ See the copy_ms function in [startup script](https://gitee.com/mindspore/federated/tree/master/example/cross_device_lenet_femnist/simulate_x86/run_client_x86.py) for details.
-2. **Start the cloud side service**
+2. Start the cloud side service.
Users can first refer to [cloud-side deployment tutorial](https://www.mindspore.cn/federated/docs/en/master/deploy_federated_server.html) to deploy the cloud-side environment and start the cloud-side service.
-3. **Start the client**
+3. Start the client.
- Before starting the client, please refer to the section [x86](https://www.mindspore.cn/federated/docs/en/master/deploy_federated_client.html) in the Federated-Client deployment tutorial for deployment of device environment.
-
- Our framework provides three types of federated learning interfaces for users to call. For specific interface introduction, please refer to [API file](https://www.mindspore.cn/federated/docs/en/master/java_api_syncfljob.html) :
-
- - `SyncFLJob.flJobRun()`
-
- Used to start the client to participate in the federated learning training task, and to obtain the final trained aggregation model.
-
- - `SyncFLJob.modelInfer()`
-
- Used to obtain the inference result of a given dataset.
-
- - `SyncFLJob.getModel()`
-
- Used to get the latest model on the cloud side.
-
- After the cloud-side service starts successfully, you can write a Python script to call the federated learning framework jar package `mindspore-lite-java-flclient.jar` and the jar package corresponding to the model script `quick_start_flclient.jar` (refer to [Building a Package](https://www.mindspore.cn/federated/docs/en/master/deploy_federated_client.html) in the Federated-Client deployment tutorial) to simulate multi-client participation in federated learning tasks.
+ Before starting the client, please refer to the section [Device-side deployment tutotial](https://www.mindspore.cn/federated/docs/en/master/deploy_federated_client.html) for deployment of device environment.
We provide a reference script [run_client_x86.py](https://gitee.com/mindspore/mindspore/blob/master/tests/st/fl/cross_device_lenet/client/run_client_x86.py), users can set relevant parameters to start different federated learning interfaces.
+ After the cloud-side service is successfully started, the script providing run_client_x86.py is used to call the federated learning framework jar package `mindspore-lite-java-flclient.jar` and the corresponding jar package `quick_start_flclient.jar` of the model script, obtaining in [Compiling package Flow in device-side deployment](https://www.mindspore.cn/federated/docs/en/master/deploy_federated_client.html) to simulate starting multiple clients to participate in the federated learning task.
Taking the LeNet network as an example, some of the input parameters in the `run_client_x86.py` script have the following meanings, and users can set them according to the actual situation:
- - **`--jarPath`**
+ - `--fl_jar_path`
- Specifies the path of the JAR package of the federated learning framework. For details about how to obtain the JAR package in the x86 environment, see [Building a Package](https://www.mindspore.cn/federated/docs/en/master/deploy_federated_client.html) in the Federated-Client deployment tutorial.
+ For setting the federated learning jar package path and obtaining x86 environment federated learning jar package, refer to [Compile package process in device-side deployment](https://www.mindspore.cn/federated/docs/en/master/deploy_federated_client.html).
- Note, please make sure that only the JAR package is included in the path. For example, in the above reference script, `--jarPath` is set to `"libs/jarX86/mindspore-lite-java-flclient.jar"`, you need to make sure that the `jarX86` folder contains only one JAR package `mindspore-lite-java-flclient.jar`.
+ Please make sure that only the JAR package is included in the path. For example, in the above reference script, `--jarPath` is set to `"libs/jarX86/mindspore-lite-java-flclient.jar"`, you need to make sure that the `jarX86` folder contains only one JAR package `mindspore-lite-java-flclient.jar`.
- - **`--case_jarPath`**
+ - `--case_jar_path`
- Specifies the path of the JAR package `quick_start_flclient.jar` corresponding to the model script. For details about how to obtain the JAR package in the x86 environment, see [Building a Package](https://gitee.com/mindspore/docs/blob/master/docs/federated/docs/source_en/deploy_federated_client.md#) in the Federated-Client deployment tutorial.
+ For setting the path of jar package `quick_start_flclient.jar` generated by model script and obtaining the JAR package in the x86 environment, see [Compile package process in device-side deployment](https://www.mindspore.cn/federated/docs/en/master/deploy_federated_client.html).
- Note, please make sure that only the JAR package is included in the path. For example, in the above reference script, `--case_jarPath` is set to `"case_jar/quick_start_flclient.jar"`, you need to make sure that the `case_jar` folder contains only one JAR package `quick_start_flclient.jar`.
+ Please make sure that only the JAR package is included in the path. For example, in the above reference script, `--case_jarPath` is set to `"case_jar/quick_start_flclient.jar"`, and you need to make sure that the `case_jar` folder contains only one JAR package `quick_start_flclient.jar`.
- - **`--train_dataset`**
+ - `--train_dataset`
- Specifies the root path of the training dataset.The sentiment classification task stores the training data (in .txt format) of each client. The LeNet image classification task stores the training files data.bin and label.bin of each client, for example, `data/femnist/3500_clients_bin/`.
+ The root path of the training dataset in which the LeNet image classification task is stored is the training data.bin file and label.bin file for each client, e.g. `data/femnist/3500_clients_bin/`.
- - **`--flName`**
+ - `--flName`
- Specifies the package path of model script used by federated learning. We provide two types of model scripts for your reference ([Supervised sentiment classification task](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/quick_start_flclient/src/main/java/com/mindspore/flclient/demo/albert), [Lenet image classification task](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/quick_start_flclient/src/main/java/com/mindspore/flclient/demo/lenet)). For supervised sentiment classification tasks, this parameter can be set to the package path of the provided script file [AlBertClient.java](https://gitee.com/mindspore/mindspore/blob/master/mindspore/lite/examples/quick_start_flclient/src/main/java/com/mindspore/flclient/demo/albert/AlbertClient.java), like as `com.mindspore.flclient.demo.albert.AlbertClient`; for Lenet image classification tasks, this parameter can be set to the package path of the provided script file [LenetClient.java](https://gitee.com/mindspore/mindspore/blob/master/mindspore/lite/examples/quick_start_flclient/src/main/java/com/mindspore/flclient/demo/lenet/LenetClient.java), like as `com.mindspore.flclient.demo.lenet.LenetClient`. At the same time, users can refer to these two types of model scripts, define the model script by themselves, and then set the parameter to the package path of the customized model file ModelClient.java (which needs to inherit from the class [Client.java](https://gitee.com/mindspore/mindspore/blob/master/mindspore/lite/java/java/fl_client/src/main/java/com/mindspore/flclient/model/Client.java)).
+ Specifies the package path of model script used by federated learning. We provide two types of model scripts for your reference ([Supervised sentiment classification task](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/quick_start_flclient/src/main/java/com/mindspore/flclient/demo/albert), [Lenet image classification task](https://gitee.com/mindspore/mindspore/tree/master/mindspore/lite/examples/quick_start_flclient/src/main/java/com/mindspore/flclient/demo/lenet)). For supervised sentiment classification tasks, this parameter can be set to the package path of the provided script file [AlBertClient.java](https://gitee.com/mindspore/mindspore/blob/master/mindspore/lite/examples/quick_start_flclient/src/main/java/com/mindspore/flclient/demo/albert/AlbertClient.java), like as `com.mindspore.flclient.demo.albert.AlbertClient`. For Lenet image classification tasks, this parameter can be set to the package path of the provided script file [LenetClient.java](https://gitee.com/mindspore/mindspore/blob/master/mindspore/lite/examples/quick_start_flclient/src/main/java/com/mindspore/flclient/demo/lenet/LenetClient.java), like as `com.mindspore.flclient.demo.lenet.LenetClient`. At the same time, users can refer to these two types of model scripts, define the model script by themselves, and then set the parameter to the package path of the customized model file ModelClient.java (which needs to inherit from the class [Client.java](https://gitee.com/mindspore/mindspore/blob/master/mindspore/lite/java/java/fl_client/src/main/java/com/mindspore/flclient/model/Client.java)).
- - **`--train_model_path`**
+ - `--train_model_path`
Specifies the training model path used for federated learning. The path is the directory where multiple .ms files copied in the preceding tutorial are stored, for example, `ms/lenet`. The path must be an absolute path.
- - **`--train_ms_name`**
+ - `--domain_name`
- Set the same part of the multi-client training model file name. The model file name must be in the format `{train_ms_name}1.ms`, `{train_ms_name}2.ms`, `{train_ms_name}3.ms`, etc.
+ Used to set the url for device-cloud communication. Currently, https and http communication are supported, and the corresponding formats are like as: https://......, http://....... When `if_use_elb` is set to true, the format must be: https://127.0.0.1:6666 or http://127.0.0.1:6666, where `127.0.0.1` corresponds to the ip of the machine ip providing cloud-side services (corresponding to the cloud-side parameter `--scheduler_ip`), and `6666` corresponds to the cloud-side parameter `--fl_server_port`.
- - **`--domain_name`**
+ Note 1: When this parameter is set to `http://......`, it means that HTTP communication is used, and there may be communication security risks.
- Used to set the url for device-cloud communication. Currently, https and http communication are supported, the corresponding formats are like as: https://......, http://......, and when `if_use_elb` is set to true, the format must be: https://127.0.0.1:6666 or http://127.0.0.1:6666 , where `127.0.0.1` corresponds to the ip of the machine providing cloud-side services (corresponding to the cloud-side parameter `--scheduler_ip`), and `6666` corresponds to the cloud-side parameter `--fl_server_port`.
+ Note 2: When this parameter is set to `https://......`, it means the use of HTTPS communication. At this time, SSL certificate authentication must be performed, and the certificate path needs to be set by the parameter `-cert_path`.
- - **`--task`**
+ - `--task`
Specifies the type of the task to be started. `train` indicates that a training task is started. `inference` indicates that multiple data inference tasks are started. `getModel` indicates that the task for obtaining the cloud model is started. Other character strings indicate that the inference task of a single data record is started. The default value is `train`. The initial model file (.ms file) is not trained. Therefore, you are advised to start the training task first. After the training is complete, start the inference task. (Note that the values of client_num in the two startups must be the same to ensure that the model file used by `inference` is the same as that used by `train`.)
- - **`--batch_size`**
+ - `--batch_size`
Specifies the number of single-step training samples used in federated learning training and inference, that is, batch size. It needs to be consistent with the batch size of the input data of the model.
- - **`--client_num`**
+ - `--client_num`
Specifies the number of clients. The value must be the same as that of `start_fl_job_cnt` when the server is started. This parameter is not required in actual scenarios.
@@ -204,11 +205,11 @@ Users can also define the dataset by themselves. Note that the dataset must be a
python run_client_x86.py --jarPath="libs/jarX86/mindspore-lite-java-flclient.jar" --case_jarPath="case_jar/quick_start_flclient.jar" --train_dataset="data/femnist/3500_clients_bin/" --test_dataset="null" --vocal_file="null" --ids_file="null" --flName="com.mindspore.flclient.demo.lenet.LenetClient" --train_model_path="ms/lenet/" --infer_model_path="ms/lenet/" --train_ms_name="lenet_train" --infer_ms_name="lenet_train" --domain_name="http://127.0.0.1:6666" --cert_path="certs/https_signature_certificate/client/CARoot.pem" --use_elb="true" --server_num=4 --client_num=8 --thread_num=1 --server_mode="FEDERATED_LEARNING" --batch_size=32 --task="train"
```
- Note that the path-related parameters must give an absolute path.
+ Note that the related path in the startup command must give an absolute path.
- The above commands indicate that eight clients are started to participate in federated learning. If the startup is successful, log files corresponding to the eight clients are generated in the current folder. You can view the log files to learn the running status of each client.
+ The above commands indicate that eight clients are started to participate in federated learning. If the startup is successful, log files corresponding to the eight clients are generated in the current folder. You can view the log files to learn the running status of each client:
- ```sh
+ ```text
./
├── client_0
│ └── client.log # Log file of client 0.
@@ -258,41 +259,21 @@ Users can also define the dataset by themselves. Note that the dataset must be a
INFO: [getModel] get response from server ok!
```
-4. **Stop the client process.**
+4. Stop the client process.
- For details, see the `finish.py` script. The details are as follows:
+ For details, see the [finish.py](https://gitee.com/mindspore/federated/tree/master/example/cross_device_lenet_femnist/simulate_x86/finish.py) script. The details are as follows:
- ```python
- import os
- import argparse
- import subprocess
-
- parser = argparse.ArgumentParser(description="Finish test_mobile_lenet.py case")
- parser.add_argument("--kill_tag", type=str, default="mindspore-lite-java-flclient")
-
- args, _ = parser.parse_known_args()
- kill_tag = args.kill_tag
-
- cmd = "pid=`ps -ef|grep " + kill_tag
- cmd += " |grep -v \"grep\" | grep -v \"finish\" |awk '{print $2}'` && "
- cmd += "for id in $pid; do kill -9 $id && echo \"killed $id\"; done"
-
- subprocess.call(['bash', '-c', cmd])
- ```
-
- Run the following command to shut down the client:
+ The command of stopping the client process:
```sh
python finish.py --kill_tag=mindspore-lite-java-flclient
```
- The parameter `--kill_tag` is used to search for the keyword to kill the client process. You only need to set the special keyword in `--jarPath`. The default value is `mindspore-lite-java-flclient`, that is, the name of the federated learning JAR package.
-
- The user can check whether the process still exists through the command `ps -ef |grep "mindspore-lite-java-flclient"`.
+ The parameter `--kill_tag` is used to search for the keyword to kill the client process. You only need to set the special keyword in `--jarPath`. The default value is `mindspore-lite-java-flclient`, that is, the name of the federated learning JAR package. The user can check whether the process still exists through the command `ps -ef |grep "mindspore-lite-java-flclient"`.
-5. **Experimental results of 50 clients participating in federated learning and training tasks. **
+5. Experimental results of 50 clients participating in federated learning and training tasks.
- Currently, the **`3500_clients_bin`** folder contains data of 3500 clients. This script can simulate a maximum of 3500 clients to participate in federated learning.
+ Currently, the `3500_clients_bin` folder contains data of 3500 clients. This script can simulate a maximum of 3500 clients to participate in federated learning.
The following figure shows the accuracy of the test dataset for federated learning on 50 clients (set `server_num` to 16).
@@ -302,8 +283,8 @@ Users can also define the dataset by themselves. Note that the dataset must be a
The test accuracy in the figure refers to the accuracy of each client test dataset on the aggregated model on the cloud for each federated learning iteration:
- AVG: average accuracy of 50 client test datasets
+ AVG: average accuracy of 50 clients in the test dataset for each federated learning iteration.
- TOP5: average accuracy of the five clients with the highest accuracy in the test dataset
+ TOP5: average accuracy of the 5 clients with the highest accuracy in the test dataset for each federated learning iteration.
- LOW5: average accuracy of the five clients with the lowest accuracy in the test dataset3
\ No newline at end of file
+ LOW5: average accuracy of the 5 clients with the lowest accuracy in the test dataset for each federated learning iteration.
diff --git a/docs/federated/docs/source_en/index.rst b/docs/federated/docs/source_en/index.rst
index d37ebcc0df..5fe3f4b7c2 100644
--- a/docs/federated/docs/source_en/index.rst
+++ b/docs/federated/docs/source_en/index.rst
@@ -89,6 +89,12 @@ Common Application Scenarios
local_differential_privacy_training_noise
pairwise_encryption_training
+.. toctree::
+ :maxdepth: 1
+ :caption: Communication Compression
+
+ communication_compression
+
.. toctree::
:maxdepth: 1
:caption: API References
diff --git a/docs/federated/docs/source_zh_cn/image_classification_application.md b/docs/federated/docs/source_zh_cn/image_classification_application.md
index f3b89c71b3..c84603e4f3 100644
--- a/docs/federated/docs/source_zh_cn/image_classification_application.md
+++ b/docs/federated/docs/source_zh_cn/image_classification_application.md
@@ -12,7 +12,7 @@
### 生成端侧模型文件
-1. 定义网络和训练过程
+1. 定义网络和训练过程。
具体网络和训练过程的定义可参考[快速入门](https://www.mindspore.cn/tutorials/zh-CN/master/beginner/quick_start.html#网络构建)。
@@ -139,7 +139,7 @@
本例在端侧使用lenet模拟实际用的网络,其中lenet的`.ms`格式的[端侧模型文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/models/lenet_train.ms),由于真实场景一个客户端只包含一个.ms格式的模型文件,在模拟场景中,需要拷贝多份.ms文件,并按照`lenet_train{i}.ms`格式进行命名。其中i代表客户端编号,由于`run_client_x86.py`中,已自动为每个客户端拷贝.ms文件。
具体见[启动脚本](https://gitee.com/mindspore/federated/tree/master/example/cross_device_lenet_femnist/simulate_x86/run_client_x86.py)中的copy_ms函数。
-2. 启动云侧服务
+2. 启动云侧服务。
用户可先参考[云侧部署教程](https://www.mindspore.cn/federated/docs/zh-CN/master/deploy_federated_server.html)部署云侧环境,并启动云侧服务。
@@ -194,58 +194,58 @@
- `--client_num`
- 设置client数量, 与启动server端时的`start_fl_job_cnt`保持一致,真实场景不需要此参数。
+ 设置client数量,与启动server端时的`start_fl_job_cnt`保持一致,真实场景不需要此参数。
若想进一步了解`run_client_x86.py`脚本中其他参数含义,可参考脚本中注释部分。
- 联邦学习接口基本启动指令示例如下:
+ 联邦学习接口基本启动指令示例如下:
- ```sh
- rm -rf client_*\
- && rm -rf ms/* \
- && python3 run_client_x86.py \
- --fl_jar_path="federated/mindspore_federated/device_client/build/libs/jarX86/mindspore-lite-java-flclient.jar" \
- --case_jar_path="federated/example/quick_start_flclient/target/case_jar/quick_start_flclient.jar" \
- --train_data_dir="federated/tests/st/simulate_x86/data/3500_clients_bin/" \
- --eval_data_dir="null" \
- --infer_data_dir="null" \
- --vocab_path="null" \
- --ids_path="null" \
- --path_regex="," \
- --fl_name="com.mindspore.flclient.demo.lenet.LenetClient" \
- --origin_train_model_path="federated/tests/st/simulate_x86/ms_files/lenet/lenet_train.ms" \
- --origin_infer_model_path="null" \
- --train_model_dir="ms" \
- --infer_model_dir="ms" \
- --ssl_protocol="TLSv1.2" \
- --deploy_env="x86" \
- --domain_name="http://10.113.216.40:8010" \
- --cert_path="CARoot.pem" --use_elb="false" \
- --server_num=1 \
- --task="train" \
- --thread_num=1 \
- --cpu_bind_mode="NOT_BINDING_CORE" \
- --train_weight_name="null" \
- --infer_weight_name="null" \
- --name_regex="::" \
- --server_mode="FEDERATED_LEARNING" \
- --batch_size=32 \
- --input_shape="null" \
- --client_num=8
- ```
+ ```sh
+ rm -rf client_*\
+ && rm -rf ms/* \
+ && python3 run_client_x86.py \
+ --fl_jar_path="federated/mindspore_federated/device_client/build/libs/jarX86/mindspore-lite-java-flclient.jar" \
+ --case_jar_path="federated/example/quick_start_flclient/target/case_jar/quick_start_flclient.jar" \
+ --train_data_dir="federated/tests/st/simulate_x86/data/3500_clients_bin/" \
+ --eval_data_dir="null" \
+ --infer_data_dir="null" \
+ --vocab_path="null" \
+ --ids_path="null" \
+ --path_regex="," \
+ --fl_name="com.mindspore.flclient.demo.lenet.LenetClient" \
+ --origin_train_model_path="federated/tests/st/simulate_x86/ms_files/lenet/lenet_train.ms" \
+ --origin_infer_model_path="null" \
+ --train_model_dir="ms" \
+ --infer_model_dir="ms" \
+ --ssl_protocol="TLSv1.2" \
+ --deploy_env="x86" \
+ --domain_name="http://10.113.216.40:8010" \
+ --cert_path="CARoot.pem" --use_elb="false" \
+ --server_num=1 \
+ --task="train" \
+ --thread_num=1 \
+ --cpu_bind_mode="NOT_BINDING_CORE" \
+ --train_weight_name="null" \
+ --infer_weight_name="null" \
+ --name_regex="::" \
+ --server_mode="FEDERATED_LEARNING" \
+ --batch_size=32 \
+ --input_shape="null" \
+ --client_num=8
+ ```
- 注意,启动指令中涉及路径的必须给出绝对路径。
+ 注意,启动指令中涉及路径的必须给出绝对路径。
- 以上指令代表启动8个客户端参与联邦学习训练任务,若启动成功,会在当前文件夹生成8个客户端对应的日志文件,查看日志文件内容可了解每个客户端的运行情况:
+ 以上指令代表启动8个客户端参与联邦学习训练任务,若启动成功,会在当前文件夹生成8个客户端对应的日志文件,查看日志文件内容可了解每个客户端的运行情况:
- ```text
- ./
- ├── client_0
- │ └── client.log # 客户端0的日志文件
- │ ......
- └── client_7
- └── client.log # 客户端4的日志文件
- ```
+ ```text
+ ./
+ ├── client_0
+ │ └── client.log # 客户端0的日志文件
+ │ ......
+ └── client_7
+ └── client.log # 客户端4的日志文件
+ ```
针对不同的接口和场景,只需根据参数含义,修改特定参数值即可,比如:
--
Gitee