diff --git a/docs/federated/docs/source_en/index.rst b/docs/federated/docs/source_en/index.rst index 7c5186a6eaf546f98244dd43a92c349cffcf6447..da85f400b58536c3f8f9560d77a3c1e9c76c78d3 100644 --- a/docs/federated/docs/source_en/index.rst +++ b/docs/federated/docs/source_en/index.rst @@ -82,6 +82,7 @@ Common Application Scenarios image_classification_application sentiment_classification_application image_classification_application_in_cross_silo + object_detection_application_in_cross_silo .. toctree:: :maxdepth: 1 @@ -92,6 +93,7 @@ Common Application Scenarios :caption: Security and Privacy local_differential_privacy_training_noise + local_differential_privacy_training_signds pairwise_encryption_training .. toctree:: diff --git a/docs/federated/docs/source_en/object_detection_application_in_cross_silo.md b/docs/federated/docs/source_en/object_detection_application_in_cross_silo.md new file mode 100644 index 0000000000000000000000000000000000000000..09164bfb7464e67409293c079fa241c5684ca75e --- /dev/null +++ b/docs/federated/docs/source_en/object_detection_application_in_cross_silo.md @@ -0,0 +1,250 @@ +# Implementing a Cross-Silo Federated Target Detection Application (x86) + + + +Based on the type of participating clients, federated learning can be classified into cross-silo federated learning and cross-device federated learning. In a cross-silo federated learning scenario, the clients involved in federated learning are different organizations (e.g., healthcare or finance) or geographically distributed data centers, i.e., training models on multiple data silos. In the cross-device federated learning scenario, the participating clients are a large number of mobile or IoT devices. This framework will describe how to implement a target detection application by using network Fast R-CNN on MindSpore Federated cross-silo federated framework. + +The full script to launch cross-silo federated target detection application can be found [here](https://gitee.com/mindspore/federated/tree/master/example/cross_silo_femnist). + +## Preparation + +This tutorial deploy the cross-silo federated target detection task based on the faster_rcnn network provided in MindSpore model_zoo. Please first follow the official [faster_rcnn tutorial and code](https://gitee.com/mindspore/models/tree/master/official/cv/faster_rcnn) to understand the COCO dataset, faster_rcnn network structure, training process and evaluation process first. Since the COCO dataset is open source, please refer to its [official website](https://cocodataset.org/#home) guidelines to download a dataset by yourself and perform dataset slicing (for example, suppose there are 100 clients, the dataset can be sliced into 100 copies, each representing the data held by one client). + +Since the original COCO dataset is in json file format, the target detection script provided by cross-silo federated learning framework only supports input data in MindRecord format. You can convert the json file to MindRecord format file according to the following steps. + +- Configure the following parameters in the configuration file[default_config.yaml](https://gitee.com/mindspore/federated/tree/master/tests/st/cross_silo_faster_rcnn/default_config.yaml): + + - `mindrecord_dir` + + Used to set the generated MindRecord format file save path. The folder name must be mindrecord_{num} format, and the number num represents the client label number 0, 1, 2, 3, ...... + + ```sh + mindrecord_dir:"./datasets/coco_split/split_100/mindrecord_0" + ``` + + - `instance_set` + + Used to set original json file path. + + ```sh + instance_set: "./datasets/coco_split/split_100/train_0.json" + ``` + +- Run the script [generate_mindrecord.py](https://gitee.com/mindspore/federated/tree/master/tests/st/cross_silo_faster_rcnn/generate_mindrecord.py) to generate MindRecord file according to `train_0.json`, saved in the `mindrecord_dir` path. + +## Starting the Cross-Silo Federated Mission + +### Installing MindSpore and Mindspore Federated + +Including both downloading source code and downloading release version, supporting CPU and GPU hardware platforms, just choose to install according to the hardware platforms. For the installing step, refer to [MindSpore installation](https://www.mindspore.cn/install), [Mindspore Federated installation](https://www.mindspore.cn/federated/docs/en/master/index.html). + +Currently the federated learning framework is only supported for deployment in Linux environments, and cross-silo federated learning framework requires MindSpore version number >= 1.5.0. + +## Starting Mission + +Refer to [example](https://gitee.com/mindspore/federated/tree/master/tests/st/cross_silo_faster_rcnn) to start the cluster. The reference example directory structure is as follows: + +```text +cross_silo_faster_rcnn +├── src +│ ├── FasterRcnn +│ │ ├── __init__.py // init file +│ │ ├── anchor_generator.py // Anchor generator +│ │ ├── bbox_assign_sample.py // Phase I Sampler +│ │ ├── bbox_assign_sample_stage2.py // Phase II Sampler +│ │ ├── faster_rcnn_resnet.py // Faster R-CNN network +│ │ ├── faster_rcnn_resnet50v1.py // Faster R-CNN network taking Resnet50v1.0 as backbone +│ │ ├── fpn_neck.py // Feature Pyramid Network +│ │ ├── proposal_generator.py // Candidate generator +│ │ ├── rcnn.py // R-CNN network +│ │ ├── resnet.py // Backbone network +│ │ ├── resnet50v1.py // Resnet50v1.0 backbone network +│ │ ├── roi_align.py // ROI aligning network +│ │ └── rpn.py // Regional candidate network +│ ├── dataset.py // Create and process datasets +│ ├── lr_schedule.py // Learning rate generator +│ ├── network_define.py // Faster R-CNN network definition +│ ├── util.py // Routine operation +│ └── model_utils +│ ├── __init__.py // init file +│ ├── config.py // Obtain .yaml configuration parameter +│ ├── device_adapter.py // Obtain on-cloud id +│ ├── local_adapter.py // Get local id +│ └── moxing_adapter.py // On-cloud data preparation +├── requirements.txt +├── mindspore_hub_conf.py +├── generate_mindrecord.py // Convert annotations files in .json format to MindRecord format for reading datasets +├── default_config.yaml // Network structure, dataset address, configuration file required by fl_plan +├── default.yaml // Configuration file required for federated training +├── config.json // Configuration file required for disaster recovery +├── run_cross_silo_fasterrcnn_worker.py // Starting cross-silo federated worker script +└── test_fl_fasterrcnn.py // Training scripts used on the client side +``` + +1. Note that you can choose whether to record the loss value for each step by setting the parameter `dataset_sink_mode` in the `test_fl_fasterrcnn.py` file. + + ```python + model.train(config.client_epoch_num, dataset, callbacks=cb) # Not setting dataset_sink_mode means that only the loss value of the last step in each epoch is recorded, which is the default mode in the code + model.train(config.client_epoch_num, dataset, callbacks=cb, dataset_sink_mode=False) # Set dataset_sink_mode=False to record the loss value of each step + ``` + +2. Set the following parameters in configuration file [default_config.yaml](https://gitee.com/mindspore/federated/tree/master/tests/st/cross_silo_faster_rcnn/default_config.yaml): + + - `pre_trained` + + Used to set the pre-trained model path (.ckpt format). + + The pre-trained model experimented in this tutorial is a ResNet-50 checkpoint trained on ImageNet 2012. You can use the [resnet50](https://gitee.com/mindspore/models/tree/master/official/cv/resnet) script in ModelZoo to train, and then use src/convert_checkpoint.py to convert the trained resnet50 weight file into a loadable weight file. + +3. Start redis + + ```sh + redis-server --port 2345 --save "" + ``` + +4. Start Scheduler + + `run_sched.py` is the Python script used to start `Scheduler` and supports modifying the configuration by passing argument `argparse`. Execute the following command, which represents the `Scheduler` that starts this federated learning task. `--yaml_config` is used to set the yaml file path, and its management ip:port is `10.113.216.40:18019`. + + ```sh + python run_sched.py --yaml_config="default.yaml" --scheduler_manage_address="10.113.216.40:18019" + ``` + + For the detailed implementation, see [run_sched.py](https://gitee.com/mindspore/federated/tree/master/example/cross_device_lenet_femnist/run_cross_silo_femnist_server.py). + + The following print represents a successful starting: + + ```sh + [INFO] FEDERATED(3944,2b280497ed00,python):2022-10-10-17:11:08.154.878 [mindspore_federated/fl_arch/ccsrc/scheduler/scheduler.cc:35] Run] Scheduler started successfully. + [INFO] FEDERATED(3944,2b28c5ada700,python):2022-10-10-17:11:08.155.056 [mindspore_federated/fl_arch/ccsrc/common/communicator/http_request_handler.cc:90] Run] Start http server! + ``` + +5. Start Server + + `run_server.py` is a Python script for starting a number of `Server`s, and supports modifying the configuration by the passing argument `argparse`. Execute the following command, representing the `Server` that starts this Federated Learning task with a TCP address of `10.113.216.40`, a Federated Learning HTTP service starting port of `6668`, and a number of `Server`s of `4`. + + ```sh + python run_server.py --yaml_config="default.yaml" --tcp_server_ip="10.113.216.40" --checkpoint_dir="fl_ckpt" --local_server_num=4 --http_server_address="10.113.216.40:6668" + ``` + + The above command is equivalent to starting four `Server` processes, each with a federated learning service port of `6668`, `6669`, `6670` and `6671`, as detailed in [run_server.py](https://gitee.com/mindspore/ federated/tree/master/example/cross_device_lenet_femnist/run_server.py). + + The following print represents a successful starting: + + ```sh + [INFO] FEDERATED(3944,2b280497ed00,python):2022-10-10-17:11:08.154.645 [mindspore_federated/fl_arch/ccsrc/common/communicator/http_server.cc:122] Start] Start http server! + [INFO] FEDERATED(3944,2b280497ed00,python):2022-10-10-17:11:08.154.725 [mindspore_federated/fl_arch/ccsrc/common/communicator/http_request_handler.cc:85] Initialize] Ev http register handle of: [/d isableFLS, /enableFLS, /state, /queryInstance, /newInstance] success. + [INFO] FEDERATED(3944,2b280497ed00,python):2022-10-10-17:11:08.154.878 [mindspore_federated/fl_arch/ccsrc/scheduler/scheduler.cc:35] Run] Scheduler started successfully. + [INFO] FEDERATED(3944,2b28c5ada700,python):2022-10-10-17:11:08.155.056 [mindspore_federated/fl_arch/ccsrc/common/communicator/http_request_handler.cc:90] Run] Start http server! + ``` + +6. Start Worker + + `run_cross_silo_femnist_worker.py` is a Python script for starting a number of `worker`s, and supports modifying the configuration by the passing argument `argparse`. The following instruction is executed, representing the `worker` that starts this federated learning task, and the number of `workers` needed for the federated learning task to proceed properly is `2`. + + ```sh + python run_cross_silo_fasterrcnn_worker.py --worker_num=2 --dataset_path datasets/coco_split/split_100 --http_server_address=10.113.216.40:6668 + ``` + + For the detailed implementation, see [run_cross_silo_femnist_worker.py](https://gitee.com/mindspore/federated/tree/master/tests/st/cross_silo_faster_rcnn/run_cross_silo_femnist_worker.py). + + As the above command, `--worker_num=2` means starting two clients, and the datasets used by the two clients are `datasets/coco_split/split_100/mindrecord_0` and `datasets/coco_split/split_100/mindrecord_1`. Please prepare the required datasets for the corresponding clients according to the `pre-task preparation` tutorial. + + After executing the above three commands and waiting for a while, go to the `worker_0` folder in the current directory and check the `worker_0` log with the command `grep -rn "\epoch:" *` and you will see a log message similar to the following: + + ```sh + epoch: 1 step: 1 total_loss: 0.6060338 + ``` + + Then it means that cross-silo federated is started successfully and `worker_0` is training. Other workers can be viewed in a similar way. + + Please refer to [yaml configuration notes](https://gitee.com/mindspore/federated/blob/master/docs/federated_server_yaml.md#) for the description of parameter configuration in the above script. + +### Viewing the Log + +After successfully starting the task, the corresponding log file will be generated under the current directory `cross_silo_faster_rcnn`. The log file directory structure is as follows: + +```text +cross_silo_faster_rcnn +├── scheduler +│ └── scheduler.log # Print logs during running scheduler +├── server_0 +│ └── server.log # Print logs during running server_0 +├── server_1 +│ └── server.log # Print logs during running server_1 +├── server_2 +│ └── server.log # Print logs during running server_2 +├── server_3 +│ └── server.log # Print logs during running server_3 +├── worker_0 +│ ├── ckpt # Store the aggregated model ckpt obtained by worker_0 at the end of each federated learning iteration +│ │ └── mindrecord_0 +│ │ ├── mindrecord_0-fast-rcnn-0epoch.ckpt +│ │ ├── mindrecord_0-fast-rcnn-1epoch.ckpt +│ │ │ +│ │ │ ...... +│ │ │ +│ │ └── mindrecord_0-fast-rcnn-29epoch.ckpt +│ ├──loss_0.log # Record the loss value of each step in the training process of worker_0 +│ └── worker.log # Record the output logs during worker_0 participation in the federal learning task +└── worker_1 + ├── ckpt # Store the aggregated model ckpt obtained by worker_1 at the end of each federated learning iteration + │ └── mindrecord_1 + │ ├── mindrecord_1-fast-rcnn-0epoch.ckpt + │ ├── mindrecord_1-fast-rcnn-1epoch.ckpt + │ │ + │ │ ...... + │ │ + │ └── mindrecord_1-fast-rcnn-29epoch.ckpt + ├──loss_0.log # Record the loss value of each step in the training process of worker_1 + └── worker.log # Record the output logs during worker_1 participation in the federal learning task +``` + +### Closing the Mission + +If you want to exit in the middle, the following command is available: + +```sh +python finish_cloud.py --redis_port=2345 +``` + +For the detailed implementation, see [finish_cloud.py](https://gitee.com/mindspore/federated/tree/master/example/cross_device_lenet_femnist/finish_cloud.py). + +Or when the training task is finished, the cluster exits automatically, no need to close it manually. + +### Results + +- Use data: + + COCO dataset is split into 100 copies, and the first two copies are taken as two worker datasets respectively + +- The number of client-side local training epochs: 1 + +- Total number of cross-silo federated learning iterations: 30 + +- Results (recording the loss values during the client-side local training): + + Go to the `worker_0` folder in the current directory, and check the `worker_0` log with the command `grep -rn "\]epoch:" *` to see the loss values output in each step: + + ```sh + epoch: 1 step: 1 total_loss: 5.249325 + epoch: 1 step: 2 total_loss: 4.0856013 + epoch: 1 step: 3 total_loss: 2.6916502 + epoch: 1 step: 4 total_loss: 1.3917351 + epoch: 1 step: 5 total_loss: 0.8109232 + epoch: 1 step: 6 total_loss: 0.99101084 + epoch: 1 step: 7 total_loss: 1.7741735 + epoch: 1 step: 8 total_loss: 0.9517553 + epoch: 1 step: 9 total_loss: 1.7988946 + epoch: 1 step: 10 total_loss: 1.0213892 + epoch: 1 step: 11 total_loss: 1.1700443 + . + . + . + ``` + +The histograms of the training loss transformations in each step of worker_1 and worker_2 during the 30 iterations training are as follows, [1] and [2]: + +The polygrams of the average loss (the sum of the losses of all the steps in an epoch divided by the number of steps) in each step of worker_1 and worker_2 during the 30 iterations training are as follows, [3] and [4]: + +![cross-silo_fastrcnn-2workers-loss.png](https://gitee.com/mindspore/docs/blob/master/docs/federated/docs/source_zh_cn/images/cross-silo_fastrcnn-2workers-loss.png) diff --git a/docs/federated/docs/source_en/pairwise_encryption_training.md b/docs/federated/docs/source_en/pairwise_encryption_training.md index 40ba719dfa69aa97342fc918e1a4a5266126ca4d..f0cb094f949637c0a08bc966bf2faec0fd3c42b8 100644 --- a/docs/federated/docs/source_en/pairwise_encryption_training.md +++ b/docs/federated/docs/source_en/pairwise_encryption_training.md @@ -43,13 +43,13 @@ If you are interested in the specific steps of the algorithm, refer to the paper ### Cross device scenario -Enabling pairwise encryption training is simple. You only need to set `encrypt_type='PW_ENCRYPT'` in `set_fl_context()`. +Enabling pairwise encryption training is simple. Just set the `encrypt_type` field to `PW_ENCRYPT` through yaml file when starting the cloud-side service. In addition, most of the workers participating in the training are unstable edge computing nodes such as mobile phones, so the problems of dropping the line and secret key reconstruction should be considered. Related parameters are `share_secrets_ratio`, `reconstruct_secrets_threshold`, and `cipher_time_window`. `share_client_ratio` indicates the client threshold decrease ratio of public key broadcast round, secret sharing round and secret reconstruction round. The value must be less than or equal to 1. -`reconstruct_secrets_threshold` indicates the number of secret shares required to reconstruct a secret. The value must be less than the number of clients that participate in updateModel, which is start_fl_job_threshold*update_model_ratio (those two parameters can refer to 'set_fl_context' in [this file](https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/context.py)). +`reconstruct_secrets_threshold` indicates the number of secret shares required to reconstruct a secret. The value must be less than the number of clients that participate in updateModel (start_fl_job_threshold*update_model_ratio). To ensure system security, the value of `reconstruct_secrets_threshold` must be greater than half of the number of federated learning clients when the server and client are not colluded. When the server and client are colluded, the value of `reconstruct_secrets_threshold` must be greater than two thirds of the number of federated learning clients. @@ -58,7 +58,7 @@ When the server and client are colluded, the value of `reconstruct_secrets_thres ### Cross silo scenario -In cross silo scenario, you only need to set `encrypt_type='STABLE_PW_ENCRYPT'` in `set_fl_context()` for both server startup script and client startup script. +In cross silo scenario, you only need to set the `encrypt_type` field to `PW_ENCRYPT` through yaml file in the cloud-side startup script. Different from cross silo scenario, all of the workers are stable computing nodes in cross silo scenario. You only need to set the parameter `cipher_time_window`. diff --git a/docs/federated/docs/source_zh_cn/object_detection_application_in_cross_silo.md b/docs/federated/docs/source_zh_cn/object_detection_application_in_cross_silo.md index 16a102aca95f62e73e2d08d4b0e108aaef540b65..e9d1040442b46c93475be7a171d088139aeac825 100644 --- a/docs/federated/docs/source_zh_cn/object_detection_application_in_cross_silo.md +++ b/docs/federated/docs/source_zh_cn/object_detection_application_in_cross_silo.md @@ -92,7 +92,7 @@ cross_silo_faster_rcnn - 参数`pre_trained` - 用于设置预训练模型路径(.ckpt 格式) + 用于设置预训练模型路径(.ckpt 格式)。 本教程中实验的预训练模型是在ImageNet2012上训练的ResNet-50检查点。你可以使用ModelZoo中 [resnet50](https://gitee.com/mindspore/models/tree/master/official/cv/resnet) 脚本来训练,然后使用src/convert_checkpoint.py把训练好的resnet50的权重文件转换为可加载的权重文件。 @@ -138,25 +138,25 @@ cross_silo_faster_rcnn [INFO] FEDERATED(3944,2b28c5ada700,python):2022-10-10-17:11:08.155.056 [mindspore_federated/fl_arch/ccsrc/common/communicator/http_request_handler.cc:90] Run] Start http server! ``` -5. 启动Worker +6. 启动Worker - `run_cross_silo_femnist_worker.py`是用于启动若干`worker`的Python脚本,并支持通过`argparse`传参修改配置。执行指令如下,代表启动本次联邦学习任务的`worker`,联邦学习任务正常进行需要的`worker`数量为`2`个: + `run_cross_silo_femnist_worker.py`是用于启动若干`worker`的Python脚本,并支持通过`argparse`传参修改配置。执行指令如下,代表启动本次联邦学习任务的`worker`,联邦学习任务正常进行需要的`worker`数量为`2`个: - ```sh - python run_cross_silo_fasterrcnn_worker.py --worker_num=2 --dataset_path datasets/coco_split/split_100 --http_server_address=10.113.216.40:6668 - ``` + ```sh + python run_cross_silo_fasterrcnn_worker.py --worker_num=2 --dataset_path datasets/coco_split/split_100 --http_server_address=10.113.216.40:6668 + ``` 具体实现详见[run_cross_silo_femnist_worker.py](https://gitee.com/mindspore/federated/tree/master/example/cross_silo_faster_rcnn/run_cross_silo_femnist_worker.py)。 - 如上指令,`--worker_num=2`代表启动两个客户端,且两个客户端使用的数据集分别为`datasets/coco_split/split_100/mindrecord_0`和`datasets/coco_split/split_100/mindrecord_1`,请根据`任务前准备`教程准备好对应客户端所需数据集。 + 如上指令,`--worker_num=2`代表启动两个客户端,且两个客户端使用的数据集分别为`datasets/coco_split/split_100/mindrecord_0`和`datasets/coco_split/split_100/mindrecord_1`,请根据`任务前准备`教程准备好对应客户端所需数据集。 -当执行以上三个指令之后,等待一段时间之后,进入当前目录下`worker_0`文件夹,通过指令`grep -rn "\epoch:" *`查看`worker_0`日志,可看到类似如下内容的日志信息: + 当执行以上三个指令之后,等待一段时间之后,进入当前目录下`worker_0`文件夹,通过指令`grep -rn "\epoch:" *`查看`worker_0`日志,可看到类似如下内容的日志信息: -```sh -epoch: 1 step: 1 total_loss: 0.6060338 -``` + ```sh + epoch: 1 step: 1 total_loss: 0.6060338 + ``` -则说明云云联邦启动成功,`worker_0`正在训练,其他worker可通过类似方式查看。 + 则说明云云联邦启动成功,`worker_0`正在训练,其他worker可通过类似方式查看。 以上脚本中参数配置说明请参考[yaml配置说明](https://gitee.com/mindspore/federated/blob/master/docs/api/api_python/federated_server_yaml.md#)。 @@ -220,7 +220,7 @@ python finish_cloud.py --redis_port=2345 - 客户端本地训练epoch数:1 -- 云云联邦学习总迭代数: 30 +- 云云联邦学习总迭代数:30 - 实验结果(记录客户端本地训练过程中的loss值): diff --git a/tutorials/application/source_zh_cn/index.rst b/tutorials/application/source_zh_cn/index.rst index 041dfbb30ada2df1c4f4d110b81889328a75ddb1..5dc679ce5aa32a9365f35c7bdccab7dbf7d8ce47 100644 --- a/tutorials/application/source_zh_cn/index.rst +++ b/tutorials/application/source_zh_cn/index.rst @@ -12,6 +12,7 @@ :caption: 计算机视觉 cv/resnet50 + cv/transfer_learning cv/fgsm cv/dcgan cv/vit