From 29b616981c368a574492387bf2c14d83a6259ea6 Mon Sep 17 00:00:00 2001 From: huanxiaoling <3174348550@qq.com> Date: Tue, 25 Oct 2022 14:40:25 +0800 Subject: [PATCH] update the en files and correct the links --- docs/federated/docs/source_en/index.rst | 1 + .../source_en/private_set_intersection.md | 2 +- .../split_pangu_alpha_application.md | 324 ++++++++++++++++++ .../split_pangu_alpha_application.md | 42 +-- .../source_en/migration_guide/faq.md | 16 +- .../source_en/migration_guide/sample_code.md | 2 +- 6 files changed, 356 insertions(+), 31 deletions(-) create mode 100644 docs/federated/docs/source_en/split_pangu_alpha_application.md diff --git a/docs/federated/docs/source_en/index.rst b/docs/federated/docs/source_en/index.rst index e7f9e4f78d..fe44296fcf 100644 --- a/docs/federated/docs/source_en/index.rst +++ b/docs/federated/docs/source_en/index.rst @@ -90,6 +90,7 @@ Common Application Scenarios data_join split_wnd_application + split_pangu_alpha_application .. toctree:: :maxdepth: 1 diff --git a/docs/federated/docs/source_en/private_set_intersection.md b/docs/federated/docs/source_en/private_set_intersection.md index 332be22a96..26cc3fa05e 100644 --- a/docs/federated/docs/source_en/private_set_intersection.md +++ b/docs/federated/docs/source_en/private_set_intersection.md @@ -6,7 +6,7 @@ With the rise in demand for digital transformation and the circulation of data elements, as well as the implementation of the Data Security Law, the Personal Information Protection Law and the EU General Data Protection Regulation (GDPR), privacy of data is increasingly becoming a necessary requirement in many scenarios. For example, when the dataset is sensitive information of users (medical diagnosis information, transaction records, identification codes, device unique identifier OAID, etc.) or secret information of the company, cryptography or desensitization must be used to ensure the confidentiality of the data before using it in the open state to achieve the goal of "usable but invisible" of the data in order to prevent information leakage. Considering two participants who jointly train a machine learning model (e.g., vertical federated learning) by using their respective data, the first step of this task is to align the sample sets of both parties, a process known as Entity Resolution. Traditional plaintext intersection inevitably reveals the OAID of the entire database and damages the data privacy of both parties, so the Privacy Set Intersection (PSI) technique is needed to accomplish this task. -PSI is a type of secure multi-party computing (MPC) protocol that takes data collection from two parties as input, after a series of hashing, encryption and data exchange steps, eventually outputs the intersection of the collection to an agreed output party, while ensuring that the participating parties cannot obtain any information about the data outside the intersection. The use of the PSI protocol in vertical federated learning tasks, in compliance with the GDPR requirement of Data Minimisation, i.e. there is no non-essential exposure of data, except for the parts necessary for the training process (intersections). From the data controller's perspective, the service has to share data appropriately, but wants to share only necessary data based on the service and not expose additional data to the public. It should be noted that while PSI can directly apply existing MPC protocols to its calculations, this often results in a large computational and communication overhead, which is not conducive to business. In this paper, we present a technique of combining Bloom Filter and Elliptic Curve with point multiplication Inverse Element offset to implement ECDH-PSI (Elliptic Curve Diffie-Hellman key Exchange-PSI) to better support cloud services and carry out privacy preserving set intersection computing services. +PSI is a type of secure multi-party computing (MPC) protocol that takes data collection from two parties as input, after a series of hashing, encryption and data exchange steps, eventually outputs the intersection of the collection to an agreed output party, while ensuring that the participating parties cannot obtain any information about the data outside the intersection. The use of the PSI protocol in vertical federated learning tasks, in compliance with the GDPR requirement of Data Minimisation, i.e. there is no non-essential exposure of data, except for the parts necessary for the training process (intersections). From the data controller's perspective, the service has to share data appropriately, but wants to share only necessary data based on the service and not expose additional data to the public. It should be noted that while PSI can directly apply existing MPC protocols to its calculations, this often results in a large computational and communication overhead, which is not conducive to business. In this paper, we introduce a technique combining Bloom filter and eliminable inverse scalar multiplication on the elliptic curve to implement ECDH-PSI (Elliptic Curve Diffie-Hellman key Exchange-PSI) to better support cloud services and carry out privacy preserving set intersection computing services. ## Algorithm Process Introduction diff --git a/docs/federated/docs/source_en/split_pangu_alpha_application.md b/docs/federated/docs/source_en/split_pangu_alpha_application.md new file mode 100644 index 0000000000..35eca8754e --- /dev/null +++ b/docs/federated/docs/source_en/split_pangu_alpha_application.md @@ -0,0 +1,324 @@ +# Vertical Federated Learning Model Training - Pangu Alpha Large Model Cross-Domain Training + + + +## Overview + +With the advancement of hardware computing power and the continuous expansion of network data size, pre-training large models has increasingly become an important research direction in fields such as natural language processing and graphical multimodality. Take Pangu Alpha, which released a large pre-trained model of Chinese NLP in 2021, as an example, the number of model parameters reaches 200 billion, and the training process relies on massive data and advanced computing centers, which limits its application landing and technology evolution. A feasible solution is to integrate the computing power and data resources of multiple participants based on vertical federated learning or split learning techniques to achieve cross-domain collaborative training of pre-trained large models while ensuring security and privacy. + +MindSpore Federated provides a vertical federated learning base functional component based on split learning. This sample provides a federated learning training sample for large NLP models by taking the Pangaea alpha model as an example. + +As shown in the figure above, in this case, the Pangaea α model is sliced into three sub-networks, such as Embedding, Backbone and Head. The front-level subnetwork Embedding and the end-level subnetwork Head are deployed in the network domain of participant A, and the Backbone subnetwork containing multi-level Transformer modules is deployed in the network domain of participant B. The Embedding subnetwork and Head subnetwork read the data held by participant A and dominate the training and inference tasks for performing the Pangaea α model. + +* In the forward inference stage, Participant A uses the Embedding subnetwork to process the original data and transmits the output Embedding Feature tensor and Attention Mask Feature tensor to Participant B as the input of Participant B Backbone subnetwork. Then, Participant A reads the Hide State Feature tensor output from the Backbone subnetwork as the input of Participant A Head subnetwork, and finally the predicted result or loss value is output by the Head sub-network. + +* In the backward propagation phase, after completing the gradient calculation and parameter update of the Head subnetwork, Participant A transmits the gradient tensor associated with the Hide State Feature tensor to Participant B for the gradient calculation and parameter update of the Backbone subnetwork. Then, Participant B transmits the gradient tensor associated with the Embedding Feature tensor to Participant A for the gradient calculation and parameter update of the Embedding subnetwork after completing the gradient calculation and parameter update of the Backbone subnetwork. + +The feature tensor and gradient tensor exchanged between participant A and participant B during the above forward inference and backward propagation are processed by using privacy security mechanisms and encryption algorithms, so that it is not necessary to transmit the data held by participant A to participant B for implementing the collaboration training of the network model by the two participants. Due to the small number of Embedding and Head subnetwork parameters and the huge number of Backbone subnetwork parameters, this sample application is suitable for the large model collaboration training or deployment between the service side (corresponding to participant A) and the computing center (corresponding to participant B). + +For a detailed introduction to the pangu α model principles, please refer to [MindSpore ModelZoo - pangu_alpha](https://gitee.com/mindspore/models/tree/master/official/nlp/pangu_alpha), [Introduction to Pengcheng -pangu α](https://git.openi.org.cn/PCL-Platform.Intelligence/PanGu-Alpha), and its [research paper](https://arxiv.org/pdf/2104.12369.pdf). + +## Preparation + +### Environment Preparation + +1. Refer to [Obtaining MindSpore Federated](https://mindspore.cn/federated/docs/en/master/federated_install.html) to install MindSpore version 1.8.1 and above and MindSpore Federated. + +2. Download the MindSpore Federated code and install the Python packages that this sample application depends on. + + ```bash + git https://gitee.com/mindspore/federated.git + cd federated/example/splitnn_pangu_alpha/ + python -m pip install -r requirements.txt + ``` + +### Dataset Preparation + +Before running the sample, refer to [MindSpore ModelZoo - pangu_alpha - Dataset Generation](https://gitee.com/mindspore/models/tree/master/official/nlp/pangu_alpha#dataset-generation) and use the preprocess.py script to convert the raw text corpus for training into a dataset that can be used for model training. + +## Defining the Vertical Federated Learning Training Process + +MindSpore Federated Vertical Federated Learning Framework uses FLModel (see [Vertical Federated Learning Model Training Interface](https://mindspore.cn/federated/docs/en/master/vertical/vertical_federated_FLModel.html)) and yaml files (see [Yaml Configuration file for model training of vertical federated learning](https://mindspore.cn/federated/docs/en/master/vertical/vertical_federated_yaml.html)), to model vertical federated learning training process. + +### Defining the Network Model + +1. Call the function components provided by MindSpore and take nn.Cell (see [mindspore.nn.Cell](https://mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Cell.html#mindspore-nn-cell)) as a base class to program the training network of this participant to be involved in vertical federated learning. Taking the Embedding subnetwork of participant A in this application practice as an example, [sample code](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/src/split_pangu_alpha.py) is as follows: + + ```python + class EmbeddingLossNet(nn.Cell): + """ + Train net of the embedding party, or the tail sub-network. + Args: + net (class): EmbeddingLayer, which is the 1st sub-network. + config (class): default config info. + """ + + def __init__(self, net: EmbeddingLayer, config): + super(EmbeddingLossNet, self).__init__(auto_prefix=False) + + self.batch_size = config.batch_size + self.seq_length = config.seq_length + dp = config.parallel_config.data_parallel + self.eod_token = config.eod_token + self.net = net + self.slice = P.StridedSlice().shard(((dp, 1),)) + self.not_equal = P.NotEqual().shard(((dp, 1), ())) + self.batch_size = config.batch_size + self.len = config.seq_length + self.slice2 = P.StridedSlice().shard(((dp, 1, 1),)) + + def construct(self, input_ids, position_id, attention_mask): + """forward process of FollowerLossNet""" + tokens = self.slice(input_ids, (0, 0), (self.batch_size, -1), (1, 1)) + embedding_table, word_table = self.net(tokens, position_id, batch_valid_length=None) + return embedding_table, word_table, position_id, attention_mask + ``` + +2. In the yaml configuration file, describe the corresponding name, input, output and other information of the training network. Taking the Embedding subnetwork of Participant A in this application practice, [example code](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/embedding.yaml) is as follows: + + ```yaml + train_net: + name: follower_loss_net + inputs: + - name: input_ids + source: local + - name: position_id + source: local + - name: attention_mask + source: local + outputs: + - name: embedding_table + destination: remote + - name: word_table + destination: remote + - name: position_id + destination: remote + - name: attention_mask + destination: remote + ``` + + The `name` field is the name of the training network and will be used to name the checkpoints file saved during the training process. The `inputs` field is the list of input tensor in the training network, and the `outputs` field is the list of output tensor in the training network. + + The `name` fields under the `inputs` and `outputs` fields are the input/output tensor names. The names and order of the input/output tensors need to correspond strictly to the inputs/outputs of the `construct` method in the corresponding Python code of the training network. + + `source` under the `inputs` field identifies the data source of the input tensor, with `local` representing that the input tensor is loaded from local data and `remote` representing that the input tensor is from network transmission of other participants. + + `destination` under the `outputs` field identifies the destination of the output tensor, with `local` representing the output tensor for local use only, and `remote` representing that the output tensor is transferred to other participants via networks. + +3. Optionally, a similar approach is used to model the assessment network of vertical federated learning that this participant is to be involved. + +### Defining the Optimizer + +1. Call the functional components provided by MindSpore, to program the optimizer for parameter updates of this participant training network. As an example of a custom optimizer used by Participant A for Embedding subnetwork training in this application practice, [sample code](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/src/pangu_optim.py) is as follows: + + ```python + class PanguAlphaAdam(TrainOneStepWithLossScaleCell): + """ + Customized Adam optimizer for training of pangu_alpha in the splitnn demo system. + """ + def __init__(self, net, optim_inst, scale_update_cell, config, yaml_data) -> None: + # Custom optimizer-related operators + ... + + def __call__(self, *inputs, sens=None): + # Define the gradient calculation and parameter update process + ... + ``` + + Developers can customize the input and output of the `__init__` method in the optimizer class, but the input of the `__call__` method in the optimizer class needs to contain only `inputs` and `sens`. `inputs` is of type `list`, corresponding to the input tensor list of the training network, and its elements are of type `mindspore.Tensor`. `sens` is of type `dict`, which saves the weighting coefficients used to calculate the gradient values of the training network parameters, and its key is a gradient weighting coefficient identifier of type `str`. Value is of type `dict`, whose key is of type `str`, and it is the name of the output tensor of the training network. Value is of type `mindspore.Tensor`, which is the weighting coefficient of the training network parameter gradient values corresponding to this output tensor. + +2. In the yaml configuration file, describe the corresponding gradient calculation, parameter update, and other information of the optimizer. The [sample code](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/embedding.yaml) is as follows: + + ```yaml + opts: + - type: PanguAlphaAdam + grads: + - inputs: + - name: input_ids + - name: position_id + - name: attention_mask + output: + name: embedding_table + sens: hidden_states + - inputs: + - name: input_ids + - name: position_id + - name: attention_mask + output: + name: word_table + sens: word_table + params: + - name: word_embedding + - name: position_embedding + hyper_parameters: + learning_rate: 5.e-6 + eps: 1.e-8 + loss_scale: 1024.0 + ``` + + The `type` field is of the optimizer type. Here is the developer-defined optimizer. + + The `grads` field is a list of `GradOperation` associated with the optimizer, which will use the `GradOperation` operator in the list to compute the output gradient values and update the training network parameters. The `inputs` and `output` fields are input and output tensor lists of the `GradOperation` operator, whose elements are an input/output tensor name, respectively. The `sens` field is the gradient weighting coefficient or the sensitivity identifier of the `GradOperation` operator (refer to [mindspore.ops.GradOperation](https://mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.GradOperation.html?highlight=gradoperation)). + + The `params` field is a list of training network parameter names to be updated by the optimizer, whose elements are the names of one training network parameter each. In this example, the custom optimizer will update the network parameters with the `word_embedding` string and the `position_embedding` string in their names. + + The `hyper_parameters` field is a list of hyperparameters for the optimizer. + +### Defining Gradient weighting coefficient Calculation + +According to the chain rule of gradient calculation, the subnetwork located at the backstream of the global network needs to calculate the gradient value of its output tensor relative to the input tensor, i.e., the gradient weighting coefficient or sensitivity, to be passed to the sub-network located at the upstream of the global network for its training parameter update. + +MindSpore Federated uses the `GradOperation` operator to complete the above gradient weighting coefficient or sensitivity calculation process. The developer needs to describe the `GradOperation` operator used to calculate the gradient weighting coefficients in the yaml configuration file. Taking Head of participant A in this application practice as an example, [sample code](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/Head.yaml) is as follows: + +```yaml +grad_scalers: + - inputs: + - name: hidden_states + - name: input_ids + - name: word_table + - name: position_id + - name: attention_mask + output: + name: output + sens: 1024.0 +``` + +The `inputs` and `output` fields are lists of input and output tensors of the `GradOperation` operator, whose elements are input/output tensor names, respectively. The `sens` field is the gradient weighting coefficient or sensitivity of this `GradOperation` operator (refer to [mindspore.ops.GradOperation](https://mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.GradOperation.html?highlight=gradoperation)). If it is a `float` or `int` type value, a constant tensor will be constructed as the gradient weighting coefficient. If it is a `str` type string, the tensor corresponding to the name will be parsed as a weighting coefficient from the weighting coefficients transmitted by the other participants via the network. + +### Executing the Training + +1. After completing the above Python programming development and yaml configuration file, the `FLModel` class and `FLYamlData` class provided by MindSpore Federated are used to build the vertical federated learning process. Taking the Embedding subnetwork of participant A in this application practice as an example, [sample code](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/src/split_pangu_alpha.py) is as follows: + + ```python + embedding_yaml = FLYamlData('./embedding.yaml') + embedding_base_net = EmbeddingLayer(config) + embedding_eval_net = embedding_train_net = EmbeddingLossNet(embedding_base_net, config) + embedding_with_loss = _VirtualDatasetCell(embedding_eval_net) + embedding_params = embedding_with_loss.trainable_params() + embedding_group_params = set_embedding_weight_decay(embedding_params) + embedding_optim_inst = FP32StateAdamWeightDecay(embedding_group_params, lr, eps=1e-8, beta1=0.9, beta2=0.95) + embedding_optim = PanguAlphaAdam(embedding_train_net, embedding_optim_inst, update_cell, config, embedding_yaml) + + embedding_fl_model = FLModel(yaml_data=embedding_yaml, + network=embedding_train_net, + eval_network=embedding_eval_net, + optimizers=embedding_optim) + ``` + + The `FLYamlData` class mainly completes the parsing and verification of yaml configuration files, and the `FLModel` class mainly provides the control interface for vertical federated learning training, inference and other processes. + +2. Call the interface methods of the `FLModel` class to perform vertical federated learning training. Taking the Embedding subnetwork of participant A in this application practice as an example, [sample code](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/src/split_pangu_alpha.py) is as follows: + + ```python + embedding_fl_model.load_ckpt() + for epoch in range(50): + for step, item in enumerate(train_iter, start=1): + # forward process + step = epoch * train_size + step + embedding_out = embedding_fl_model.forward_one_step(item) + ... + # backward process + head_scale = head_fl_model.backward_one_step(item, backbone_out) + ... + if step % 10 == 0: + embedding_fl_model.save_ckpt() + ``` + + The `forward_one_step` method and the `backward_one_step` method perform the forward inference and backward propagation operations of a data batch, respectively. The `load_ckpt` method and the `save_ckpt` method perform the checkpoints file loading and saving operations respectively. + +## Running the Example + +This example provides 2 sample programs, both running as shell scripts to pull up Python programs. + +1. `run_pangu_train_local.sh`: Single-process example program. Participant A and participant B are trained in the same process, which transmits the feature tensor and gradient tensor directly to the other participant in the form of intra-program variables. + +2. `run_pangu_train_leader.sh` and `run_pangu_train_follower.sh`: Multi-process example program. Participant A and participant B run a separate process, which encapsulates the feature tensor and gradient tensor as protobuf messages, respectively, and transmits them to the other participant via the https communication interface. `run_pangu_train_leader.sh` and `run_pangu_train_follower.sh` can be run on two servers separately to achieve cross-domain collaboration training. + +### Running a Single-Process Example + +Taking `run_pangu_train_local.sh` as an example, run the sample program as follows: + +1. Go to the sample program directory: + + ```bash + cd federated/example/splitnn_pangu_alpha/ + ``` + +2. Taking the wiki dataset as an example, copy the dataset to the sample program directory: + + ```bash + cp -r {dataset_dir}/wiki ./ + ``` + +3. Install the dependent Python packages: + + ```bash + python -m pip install -r requirements.txt + ``` + +4. Modify `src/utils.py` to configure parameters such as checkpoint file load path, training dataset path, and evaluation dataset path. Examples are as follows: + + ```python + parser.add_argument("--load_ckpt_path", type=str, default='./checkpoints', help="predict file path.") + parser.add_argument('--data_url', required=False, default='./wiki/train/', help='Location of data.') + parser.add_argument('--eval_data_url', required=False, default='./wiki/eval/', help='Location of eval data.') + ``` + +5. Execute the training script: + + ```bash + ./run_pangu_train_local.sh + ``` + +6. View the training loss information recorded in the training log `splitnn_pangu_local.txt`. + + ```text + INFO:root:epoch 0 step 10/43391 loss: 10.616087 + INFO:root:epoch 0 step 20/43391 loss: 10.424824 + INFO:root:epoch 0 step 30/43391 loss: 10.209235 + INFO:root:epoch 0 step 40/43391 loss: 9.950026 + INFO:root:epoch 0 step 50/43391 loss: 9.712448 + INFO:root:epoch 0 step 60/43391 loss: 9.557744 + INFO:root:epoch 0 step 70/43391 loss: 9.501564 + INFO:root:epoch 0 step 80/43391 loss: 9.326054 + INFO:root:epoch 0 step 90/43391 loss: 9.387547 + INFO:root:epoch 0 step 100/43391 loss: 8.795234 + ... + ``` + + The corresponding visualization results are shown below, where the horizontal axis is the number of training steps, the vertical axis is the loss value, the red curve is the Pangu α training loss value, and the blue curve is the Pangu α training loss value based on splitting learning in this example. The trend of decreasing loss values is basically the same, and the correctness of the training process can be verified considering that the initialization of the network parameter values has randomness. + + ![盘古α大模型跨域训练结果](https://gitee.com/mindspore/docs/blob/master/docs/federated/docs/source_zh_cn/images/splitnn_pangu_alpha_result.png) + +### Running a Multi-Process Example + +1. Similar to the single-process example, go to the sample program directory, and install the dependent Python packages: + + ```bash + cd federated/example/splitnn_pangu_alpha/ + python -m pip install -r requirements.txt + ``` + +2. Copy the dataset to the sample program directory on Server 1: + + ```bash + cp -r {dataset_dir}/wiki ./ + ``` + +3. Start the training script for Participant A on Server 1: + + ```bash + ./run_pangu_train_leader.sh {ip_address_server1} {ip_address_server2} ./wiki/train ./wiki/train + ``` + + The first parameter of the training script is the IP address and port number of the local server (Server 1), and the second parameter is the IP address and port number of the peer server (Server 2). The third parameter is the training dataset file path. The fourth parameter is the evaluation dataset file path, and the fifth parameter identifies whether to load an existing checkpoint file. + +4. Start the training script for Participant B on Server 2. + + ```bash + ./run_pangu_train_follower.sh {ip_address_server2} {ip_address_server1} + ``` + + The first parameter of the training script is the IP address and port number of the local server (Server 2), and the second parameter is the IP address and port number of the peer server (Server 2). The third parameter identifies whether to load an existing checkpoint file. + +5. Check the training loss information recorded in the training log `leader_processs.log` of Server 1. If the trend of its loss information is consistent with that of the centralized training loss values of Pangaea α, the correctness of the training process can be verified. \ No newline at end of file diff --git a/docs/federated/docs/source_zh_cn/split_pangu_alpha_application.md b/docs/federated/docs/source_zh_cn/split_pangu_alpha_application.md index 4682dbcafe..5d680b94ca 100644 --- a/docs/federated/docs/source_zh_cn/split_pangu_alpha_application.md +++ b/docs/federated/docs/source_zh_cn/split_pangu_alpha_application.md @@ -10,7 +10,7 @@ MindSpore Federated提供基于拆分学习的纵向联邦学习基础功能组 ![实现盘古α大模型跨域训练](./images/splitnn_pangu_alpha.png) -如上图所示,该案例中, 盘古α模型被依次切分为Embedding、Backbone、Head等3个子网络。其中,前级子网络Embedding和末级子网络Head部署在的参与方A网络域内,包含多级Transformer模块的Backbone子网络部署在参与方B网络域内。Embedding子网络和Head子网络读取参与方A所持有的数据,主导执行盘古α模型的训练和推理任务。 +如上图所示,该案例中,盘古α模型被依次切分为Embedding、Backbone、Head等3个子网络。其中,前级子网络Embedding和末级子网络Head部署在的参与方A网络域内,包含多级Transformer模块的Backbone子网络部署在参与方B网络域内。Embedding子网络和Head子网络读取参与方A所持有的数据,主导执行盘古α模型的训练和推理任务。 * 前向推理阶段,参与方A采用Embedding子网络处理原始数据后,将输出的Embedding Feature特征张量和Attention Mask特征张量传输给参与方B,作为参与方B Backbone子网络的输入。然后,参与方A读取Backbone子网络输出的Hide State特征张量,作为参与方A Head子网络的输入,最终由Head子网络输出预测结果或损失值。 @@ -40,11 +40,11 @@ MindSpore Federated提供基于拆分学习的纵向联邦学习基础功能组 ## 定义纵向联邦学习训练过程 -MindSpore Federated纵向联邦学习框架采用FLModel(参见 [纵向联邦学习模型训练接口](https://mindspore.cn/federated/docs/zh-CN/master/vertical/vertical_federated_FLModel.html) )和yaml文件(参见 [纵向联邦学习yaml详细配置项](https://mindspore.cn/federated/docs/zh-CN/master/vertical/vertical_federated_yaml.html) ),建模纵向联邦学习的训练过程。 +MindSpore Federated纵向联邦学习框架采用FLModel(参见[纵向联邦学习模型训练接口](https://mindspore.cn/federated/docs/zh-CN/master/vertical/vertical_federated_FLModel.html))和yaml文件(参见[纵向联邦学习yaml详细配置项](https://mindspore.cn/federated/docs/zh-CN/master/vertical/vertical_federated_yaml.html)),建模纵向联邦学习的训练过程。 ### 定义网络模型 -1. 采用MindSpore提供的功能组件,以nn.Cell(参见 [mindspore.nn.Cell](https://mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.Cell.html?highlight=cell#mindspore-nn-cell) )为基类,编程开发本参与方待参与纵向联邦学习的训练网络。以本应用实践中参与方A的Embedding子网络为例,[示例代码](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/src/split_pangu_alpha.py)如下: +1. 采用MindSpore提供的功能组件,以nn.Cell(参见[mindspore.nn.Cell](https://mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.Cell.html?highlight=cell#mindspore-nn-cell))为基类,编程开发本参与方待参与纵向联邦学习的训练网络。以本应用实践中参与方A的Embedding子网络为例,[示例代码](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/src/split_pangu_alpha.py)如下: ```python class EmbeddingLossNet(nn.Cell): @@ -99,19 +99,19 @@ MindSpore Federated纵向联邦学习框架采用FLModel(参见 [纵向联邦 destination: remote ``` -其中,`name`字段为训练网络名称,将用于命名训练过程中保存的checkpoints文件。`inputs`字段为训练网络输入张量列表,`outputs`字段为训练网络输入张量列表。 + 其中,`name`字段为训练网络名称,将用于命名训练过程中保存的checkpoints文件。`inputs`字段为训练网络输入张量列表,`outputs`字段为训练网络输出张量列表。 -`inputs`和`outputs`字段下的`name`字段,为输入/输出张量名称。输入/输出张量的名称和顺序,需要与训练网络对应Python代码中`construct`方法的输入/输出严格对应。 + `inputs`和`outputs`字段下的`name`字段,为输入/输出张量名称。输入/输出张量的名称和顺序,需要与训练网络对应Python代码中`construct`方法的输入/输出严格对应。 -`inputs`字段下的`source`字段标识输入张量的数据来源,`local`代表输入张量来源于本地数据加载,`remote`代表输入张量来源于其它参与方网络传输。 + `inputs`字段下的`source`字段标识输入张量的数据来源,`local`代表输入张量来源于本地数据加载,`remote`代表输入张量来源于其它参与方网络传输。 -`outputs`字段下的`destination`字段标识输出张量的数据去向,`local`代表输出张量仅用于本地,`remote`代表输出张量将通过网络传输给其它参与方。 + `outputs`字段下的`destination`字段标识输出张量的数据去向,`local`代表输出张量仅用于本地,`remote`代表输出张量将通过网络传输给其它参与方。 3. 可选的,采用类似方法建模本参与方待参与纵向联邦学习的评估网络。 ### 定义优化器 -1. 采用MindSpore提供的功能组件,编程开发用于本参与方训练网络参数更新的优化器。以本应用实践中参与方A用于Embedding子网络训练的自定义优化器为例,[示例代码](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/src/pangu_optim.py)如下: +1. 采用MindSpore提供的功能组件,编程开发用于本参与方训练网络参数更新的优化器。以本应用实践中参与方A用于Embedding子网络训练的自定义优化器为例,[示例代码](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/src/pangu_optim.py)如下: ```python class PanguAlphaAdam(TrainOneStepWithLossScaleCell): @@ -119,7 +119,7 @@ MindSpore Federated纵向联邦学习框架采用FLModel(参见 [纵向联邦 Customized Adam optimizer for training of pangu_alpha in the splitnn demo system. """ def __init__(self, net, optim_inst, scale_update_cell, config, yaml_data) -> None: - # 定义自定义优化器相关算子 + # 自定义优化器相关算子 ... def __call__(self, *inputs, sens=None): @@ -127,7 +127,7 @@ MindSpore Federated纵向联邦学习框架采用FLModel(参见 [纵向联邦 ... ``` -开发者可自定义优化器类的`__init__`方法的输入输出,但优化器类的`__call__`方法的输入需仅包含`inputs`和`sens`。其中,`inputs`为`list`类型,对应训练网络的输入张量列表,其元素为`mindspore.Tensor`类型。`sens`为`dict`类型,保存用于计算训练网络参数梯度值的加权系数,其key为`str`类型的梯度加权系数标识符;value为`dict`类型,其key为`str`类型,是训练网络输出张量名称,value为`mindspore.Tensor`类型,是该输出张量对应的训练网络参数梯度值的加权系数。 + 开发者可自定义优化器类的`__init__`方法的输入输出,但优化器类的`__call__`方法的输入需仅包含`inputs`和`sens`。其中,`inputs`为`list`类型,对应训练网络的输入张量列表,其元素为`mindspore.Tensor`类型。`sens`为`dict`类型,保存用于计算训练网络参数梯度值的加权系数,其key为`str`类型的梯度加权系数标识符;value为`dict`类型,其key为`str`类型,是训练网络输出张量名称,value为`mindspore.Tensor`类型,是该输出张量对应的训练网络参数梯度值的加权系数。 2. 在yaml配置文件中,描述优化器对应的梯度计算、参数更新等信息。[示例代码](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/embedding.yaml)如下: @@ -158,13 +158,13 @@ MindSpore Federated纵向联邦学习框架采用FLModel(参见 [纵向联邦 loss_scale: 1024.0 ``` -其中,`type`字段为优化器类型,此处为开发者自定义优化器。 + 其中,`type`字段为优化器类型,此处为开发者自定义优化器。 -`grads`字段为优化器关联的`GradOperation`列表,优化器将使用列表中`GradOperation`算子计算输出的梯度值,更新训练网络参数。`inputs`和`output`字段为`GradOperation`算子的输入和输出张量列表,其元素分别为一个输入/输出张量名称。`sens`字段为`GradOperation`算子的梯度加权系数或灵敏度(参考[mindspore.ops.GradOperation](https://mindspore.cn/docs/zh-CN/master/api_python/ops/mindspore.ops.GradOperation.html?highlight=gradoperation) )的标识符。 + `grads`字段为优化器关联的`GradOperation`列表,优化器将使用列表中`GradOperation`算子计算输出的梯度值,更新训练网络参数。`inputs`和`output`字段为`GradOperation`算子的输入和输出张量列表,其元素分别为一个输入/输出张量名称。`sens`字段为`GradOperation`算子的梯度加权系数或灵敏度(参考[mindspore.ops.GradOperation](https://mindspore.cn/docs/zh-CN/master/api_python/ops/mindspore.ops.GradOperation.html?highlight=gradoperation))的标识符。 -`params`字段为优化器即将更新的训练网络参数名称列表,其元素分别为一个训练网络参数名称。本示例中,自定义优化器将更新名称中包含`word_embedding`字符串和`position_embedding`字符串的网络参数。 + `params`字段为优化器即将更新的训练网络参数名称列表,其元素分别为一个训练网络参数名称。本示例中,自定义优化器将更新名称中包含`word_embedding`字符串和`position_embedding`字符串的网络参数。 -`hyper_parameters`字段为优化器的超参数列表。 + `hyper_parameters`字段为优化器的超参数列表。 ### 定义梯度加权系数计算 @@ -185,7 +185,7 @@ grad_scalers: sens: 1024.0 ``` -其中,`inputs`和`output`字段为`GradOperation`算子的输入和输出张量列表,其元素分别为一个输入/输出张量名称。`sens`字段为该`GradOperation`算子的梯度加权系数或灵敏度(参考[mindspore.ops.GradOperation](https://mindspore.cn/docs/zh-CN/master/api_python/ops/mindspore.ops.GradOperation.html?highlight=gradoperation) ),如果为`float`或`int`型数值,则将构造一个常量张量作为梯度加权系数,如果为`str`型字符串,则将从其它参与方经网络传输的加权系数中,解析名称与其对应的张量作为加权系数。 +其中,`inputs`和`output`字段为`GradOperation`算子的输入和输出张量列表,其元素分别为一个输入/输出张量名称。`sens`字段为该`GradOperation`算子的梯度加权系数或灵敏度(参考[mindspore.ops.GradOperation](https://mindspore.cn/docs/zh-CN/master/api_python/ops/mindspore.ops.GradOperation.html?highlight=gradoperation)),如果为`float`或`int`型数值,则将构造一个常量张量作为梯度加权系数,如果为`str`型字符串,则将从其它参与方经网络传输的加权系数中,解析名称与其对应的张量作为加权系数。 ### 执行训练 @@ -207,7 +207,7 @@ grad_scalers: optimizers=embedding_optim) ``` -其中,`FLYamlData`类主要完成yaml配置文件的解析和校验,`FLModel`类主要提供纵向联邦学习训练、推理等流程的控制接口。 + 其中,`FLYamlData`类主要完成yaml配置文件的解析和校验,`FLModel`类主要提供纵向联邦学习训练、推理等流程的控制接口。 2. 调用`FLModel`类的接口方法,执行纵向联邦学习训练。以本应用实践中参与方A的Embedding子网络为例,[示例代码](https://gitee.com/mindspore/federated/blob/master/example/splitnn_pangu_alpha/src/split_pangu_alpha.py)如下: @@ -226,7 +226,7 @@ grad_scalers: embedding_fl_model.save_ckpt() ``` -其中,`forward_one_step`方法和`backward_one_step`方法分别执行一个数据batch的前向推理和反向传播操作。`load_ckpt`方法和`save_ckpt`方法分别执行checkpoints文件的加载和保存操作。 + 其中,`forward_one_step`方法和`backward_one_step`方法分别执行一个数据batch的前向推理和反向传播操作。`load_ckpt`方法和`save_ckpt`方法分别执行checkpoints文件的加载和保存操作。 ## 运行样例 @@ -288,9 +288,9 @@ grad_scalers: ... ``` -对应的可视化结果如下图所示,其中横轴为训练步数,纵轴为loss值,红色曲线为盘古α训练loss值,蓝色曲线为本示例中基于拆分学习的盘古α训练loss值。二者loss值下降的趋势基本一致,考虑到网络参数值初始化具有随机性,可验证训练过程的正确性。 + 对应的可视化结果如下图所示,其中横轴为训练步数,纵轴为loss值,红色曲线为盘古α训练loss值,蓝色曲线为本示例中基于拆分学习的盘古α训练loss值。二者loss值下降的趋势基本一致,考虑到网络参数值初始化具有随机性,可验证训练过程的正确性。 -![盘古α大模型跨域训练结果](./images/splitnn_pangu_alpha_result.png) + ![盘古α大模型跨域训练结果](./images/splitnn_pangu_alpha_result.png) ### 运行多进程样例 @@ -313,7 +313,7 @@ grad_scalers: ./run_pangu_train_leader.sh {ip_address_server1} {ip_address_server2} ./wiki/train ./wiki/train ``` -训练脚本的第1个参数是本地服务器(服务器1)的IP地址和端口号,第2个参数是对端服务器(服务器2)的IP地址和端口号,第3个参数是训练数据集文件路径,第4个参数是评估数据集文件路径,第5个参数标识是否加载已有的checkpoint文件。 + 训练脚本的第1个参数是本地服务器(服务器1)的IP地址和端口号,第2个参数是对端服务器(服务器2)的IP地址和端口号,第3个参数是训练数据集文件路径,第4个参数是评估数据集文件路径,第5个参数标识是否加载已有的checkpoint文件。 4. 在服务器2启动参与方B的训练脚本: @@ -321,6 +321,6 @@ grad_scalers: ./run_pangu_train_follower.sh {ip_address_server2} {ip_address_server1} ``` -训练脚本的第1个参数是本地服务器(服务器2)的IP地址和端口号,第2个参数是对端服务器(服务器2)的IP地址和端口号,第3个参数标识是否加载已有的checkpoint文件。 + 训练脚本的第1个参数是本地服务器(服务器2)的IP地址和端口号,第2个参数是对端服务器(服务器2)的IP地址和端口号,第3个参数标识是否加载已有的checkpoint文件。 5. 查看服务器1的训练日志`leader_processs.log`中记录的训练loss信息。若其loss信息与盘古α集中式训练loss值趋势一致,可验证训练过程的正确性。 \ No newline at end of file diff --git a/docs/mindspore/source_en/migration_guide/faq.md b/docs/mindspore/source_en/migration_guide/faq.md index aeb613374c..65bcd1f311 100644 --- a/docs/mindspore/source_en/migration_guide/faq.md +++ b/docs/mindspore/source_en/migration_guide/faq.md @@ -8,20 +8,20 @@ MindSpore provides a [FAQ](https://mindspore.cn/docs/en/master/faq/installation. [Typical Differences from PyTorch](https://www.mindspore.cn/docs/en/master/migration_guide/typical_api_comparision.html) - [API Mapping and Handling Strategy of Missing API](https://www.mindspore.cn/docs/en/master/migration_guide/analysis_and_preparation.html#analyzing-API-Compliance) + [API Mapping and Handling Strategy of Missing API](https://www.mindspore.cn/docs/en/master/migration_guide/analysis_and_preparation.html#analyzing-api-compliance) - [Dynamic Shape Analysis](https://www.mindspore.cn/docs/en/master/migration_guide/analysis_and_preparation.html#dynamic-shape) and [Mitigation Program](https://www.mindspore.cn/docs/zh-CN/master/migration_guide/model_development/model_and_loss.html#dynamic-shape-mitigation-program) + [Dynamic Shape Analysis](https://www.mindspore.cn/docs/en/master/migration_guide/analysis_and_preparation.html#dynamic-shape) and [Mitigation Program](https://www.mindspore.cn/docs/en/master/migration_guide/model_development/model_and_loss.html#dynamic-shape-workarounds) - [Mitigation Program for Sparse Characteristic](https://www.mindspore.cn/docs/en/master/migration_guide/analysis_and_preparation.html#sparse) + [Mitigation Program for Sparse Characteristic](https://www.mindspore.cn/docs/en/master/migration_guide/analysis_and_preparation.html#sparsity) - [Common Syntax Restrictions and Handling Strategies for Static Graphs](https://www.mindspore.cn/docs/zh-CN/master/migration_guide/model_development/model_and_loss.html#common-restrictions) + [Common Syntax Restrictions and Handling Strategies for Static Graphs](https://www.mindspore.cn/docs/en/master/migration_guide/model_development/model_and_loss.html#common-restrictions) - [Notes for MindSpore Web Authoring](https://www.mindspore.cn/docs/zh-CN/master/migration_guide/model_development/model_development.html#mindspore-web- authoring-note) + [Notes for MindSpore Web Authoring](https://www.mindspore.cn/docs/en/master/migration_guide/model_development/model_development.html#considerations-for-mindspore-network-authoring) - Network Debugging - [Function Debugging](https://www.mindspore.cn/docs/zh-CN/master/migration_guide/debug_and_tune.html#function-debugging) + [Function Debugging](https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html#function-debugging) - [Precision Debugging](https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html#precision-debugging) + [Precision Debugging](https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html#accuracy-debugging) - [Performance Debugging](https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html#performance-debugging) + [Performance Debugging](https://www.mindspore.cn/docs/en/master/migration_guide/debug_and_tune.html#performance-tuning) diff --git a/docs/mindspore/source_en/migration_guide/sample_code.md b/docs/mindspore/source_en/migration_guide/sample_code.md index 74227e67e8..577dcd314f 100644 --- a/docs/mindspore/source_en/migration_guide/sample_code.md +++ b/docs/mindspore/source_en/migration_guide/sample_code.md @@ -808,7 +808,7 @@ MindSpore has three methods to use mixed precision: 1. Use `Cast` to convert the network input `cast` into `float16` and the loss input `cast` into `float32`. 2. Use the `to_float` method of `Cell`. For details, see [Network Entity and Loss Construction](https://www.mindspore.cn/docs/en/master/migration_guide/model_development/model_and_loss.html). -3. Use the `amp_level` interface of the `Model` to perform mixed precision. For details, see [Automatic Mixed-Precision](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/mixed_precision.html#automatic-mixed-precision). +3. Use the `amp_level` interface of the `Model` to perform mixed precision. For details, see [Automatic Mixed-Precision](https://www.mindspore.cn/tutorials/en/master/advanced/mixed_precision.html#mixed-precision). Use the third method to set `amp_level` in `Model` to `O3` and check the profiler result. -- Gitee