diff --git a/Migration Script.md b/Migration Script.md new file mode 100644 index 0000000000000000000000000000000000000000..e73e48169510f1c63ba19b99042690368444e009 --- /dev/null +++ b/Migration Script.md @@ -0,0 +1,462 @@ +# Migration Script + +## Summary + +This document mainly describes how to migrate network scripts from TensorFlow or PyTorch framework to Mindspore. + +## Migrating TensorFlow scripts to MindSpore + +To read TensorBoard diagram to migrate scripts. + +1. Taking [PoseNet](https://arxiv.org/pdf/1505.07427v4.pdf) implemented by TensorFlow for example, it shows how to use TensorBoard to read the diagram, write Mindspore code and migrate [TensorFlow model](https://github.com/kentsommer/tensorflow-posenet) to Mindspore. + +2. Rewrite the code, use the `tf.summary` interface to save the logs that TensorBoard needs and start TensorBoard。 + +3. The opened TensorBoard as shown. The legend is only for reference. + + ![PoseNet TensorBoard](https://images.gitee.com/uploads/images/2021/0601/215702_6692a2fe_9186121.png "pic1.png") + +4. Find the three inputed Placeholders. We can know that the second and third input are only used in the calculation of loss by reading the figure and code. + + ![PoseNet Placeholder](https://images.gitee.com/uploads/images/2021/0601/215849_0f3c7a97_9186121.png "pic3.png") + + ![PoseNet Placeholder_1 Placeholder_2](https://images.gitee.com/uploads/images/2021/0601/215743_ad128e13_9186121.png "pic2.png") + + ![PoseNet script input1 2 3](https://images.gitee.com/uploads/images/2021/0601/215946_a3951ce2_9186121.png "pic4.png") + + Now we can preliminarily divide into three steps to construct the network model: + + First, it will calculate six outputs by using the first input among the three inputs of the network in backbone; + + Second, the loss is calculated in the loss subnet with the results of the previous step and the second and third inputs; + + Third, constructing the reverse network by using the automatic differentiation of `TrainOneStepCell`; And next, writing the corresponding Mindspore optimizer to update the parameters by using the Adam optimizer and properties provided in TensorFlow project. We can write the backbone of network scripts as: + +```python + import mindspore + from mindspore import nn + from mindspore.nn import TrainOneStepCell + from mindspore.nn import Adam + + # combine backbone and loss + class PoseNetLossCell(nn.Cell): + def __init__(self, backbone, loss): + super(PoseNetLossCell, self).__init__() + self.pose_net = backbone + self.loss = loss + def construct(self, input_1, input_2, input_3): + p1_x, p1_q, p2_x, p2_q, p3_x, p3_q = self.poss_net(input_1) + loss = self.loss(p1_x, p1_q, p2_x, p2_q, p3_x, p3_q, input_2, input_3) + return loss + + # define backbone + class PoseNet(nn.Cell): + def __init__(self): + super(PoseNet, self).__init__() + def construct(self, input_1): + """do something with input_1, output num 6""" + return p1_x, p1_q, p2_x, p2_q, p3_x, p3_q + + # define loss + class PoseNetLoss(nn.Cell): + def __init__(self): + super(PoseNetLoss, self).__init__() + + def construct(self, p1_x, p1_q, p2_x, p2_q, p3_x, p3_q, poses_x, poses_q): + """do something to calc loss""" + return loss + + # define network + backbone = PoseNet() + loss = PoseNetLoss() + net_with_loss = PoseNetLossCell(backbone, loss) + opt = Adam(net_with_loss.trainable_params(), learning_rate=0.001, beta1=0.9, beta2=0.999, eps=1e-08, use_locking=False) + net_with_grad = TrainOneStepCell(net_with_loss, opt) + ``` + +5. Next, let's implement the computing logic in the backbone. + + At first, the first inputgoes through a subgraph named conv1. Through reading the graph, the calculation logic is as follows: + + ![PoseNet conv1 subgraph](https://images.gitee.com/uploads/images/2021/0601/220021_06b0149e_9186121.png "pic5.png") + + Input->Conv2D->BiasAdd->ReLU. Although it seems that the name of after BiasAdd is conv1 from diagram, it actually executes ReLU. + + ![PoseNet Conv1 conv1 relu](https://images.gitee.com/uploads/images/2021/0601/220036_4949b7d7_9186121.png "pic6.png") + + In this way, the first subgraph conv1 can be defined as follows. The specific parameters are aligned with the parameters in the original project. + +```python + class Conv1(nn.Cell): + def __init__(self): + super(Conv1, self).__init__() + self.conv = Conv2d() + self.relu = ReLU() + def construct(self, x): + x = self.conv(x) + x = self.relu(x) + return x + ``` + + We can easily find that the subnets of conv type defined in the original TensorFlow project can be duplicated as the subnets of Mindspore to reduce duplicate code by observing the TensorBoard diagram and code. + + The definition of TensorFlow project conv subnet: + +```python + def conv(self, + input, + k_h, + k_w, + c_o, + s_h, + s_w, + name, + relu=True, + padding=DEFAULT_PADDING, + group=1, + biased=True): + # Verify that the padding is acceptable + self.validate_padding(padding) + # Get the number of channels in the input + c_i = input.get_shape()[-1] + # Verify that the grouping parameter is valid + assert c_i % group == 0 + assert c_o % group == 0 + # Convolution for a given input and kernel + convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding=padding) + with tf.variable_scope(name) as scope: + kernel = self.make_var('weights', shape=[k_h, k_w, c_i / group, c_o]) + if group == 1: + # This is the common-case. Convolve the input without any further complications. + output = convolve(input, kernel) + else: + # Split the input into groups and then convolve each of them independently + input_groups = tf.split(3, group, input) + kernel_groups = tf.split(3, group, kernel) + output_groups = [convolve(i, k) for i, k in zip(input_groups, kernel_groups)] + # Concatenate the groups + output = tf.concat(3, output_groups) + # Add the biases + if biased: + biases = self.make_var('biases', [c_o]) + output = tf.nn.bias_add(output, biases) + if relu: + # ReLU non-linearity + output = tf.nn.relu(output, name=scope.name) + return output + ``` + + And the definition of MindSpore subnet is as follows: + +```python + + from mindspore import nn + from mindspore.nn import Conv2d, ReLU + + class ConvReLU(nn.Cell): + def __init__(self, channel_in, kernel_size, channel_out, strides): + super(ConvReLU, self).__init__() + self.conv = Conv2d(channel_in, channel_out, kernel_size, strides, has_bias=True) + self.relu = ReLU() + + def construct(self, x): + x = self.conv(x) + x = self.relu(x) + return x + ``` + + Then, according to the data flow direction and attributes in TensorBoard, the logic of backbone calculation can be written as follows: + +```python + from mindspore.nn import MaxPool2d + import mindspore.ops as ops + + + class LRN(nn.Cell): + def __init__(self, radius, alpha, beta, bias=1.0): + super(LRN, self).__init__() + self.lrn = ops.LRN(radius, bias, alpha, beta) + def construct(self, x): + return self.lrn(x) + + class PoseNet(nn.Cell): + def __init__(self): + super(PoseNet, self).__init__() + self.conv1 = ConvReLU(3, 7, 64, 2) + self.pool1 = MaxPool2d(3, 2, pad_mode="SAME") + self.norm1 = LRN(2, 2e-05, 0.75) + self.reduction2 = ConvReLU(64, 1, 64, 1) + self.conv2 = ConvReLU(64, 3, 192, 1) + self.norm2 = LRN(2, 2e-05, 0.75) + self.pool2 = MaxPool2d(3, 2, pad_mode="SAME") + self.icp1_reduction1 = ConvReLU(192, 1, 96, 1) + self.icp1_out1 = ConvReLU(96, 3, 128, 1) + self.icp1_reduction2 = ConvReLU(192, 1, 16, 1) + self.icp1_out2 = ConvReLU(16, 5, 32, 1) + self.icp1_pool = MaxPool2d(3, 1, pad_mode="SAME") + self.icp1_out3 = ConvReLU(192, 5, 32, 1) + self.icp1_out0 = ConvReLU(192, 1, 64, 1) + self.concat = ops.Concat(axis=1) + self.icp2_reduction1 = ConvReLU(256, 1, 128, 1) + self.icp2_out1 = ConvReLU(128, 3, 192, 1) + self.icp2_reduction2 = ConvReLU(256, 1, 32, 1) + self.icp2_out2 = ConvReLU(32, 5, 96, 1) + self.icp2_pool = MaxPool2d(3, 1, pad_mode="SAME") + self.icp2_out3 = ConvReLU(256, 1, 64, 1) + self.icp2_out0 = ConvReLU(256, 1, 128, 1) + self.icp3_in = MaxPool2d(3, 2, pad_mode="SAME") + self.icp3_reduction1 = ConvReLU(480, 1, 96, 1) + self.icp3_out1 = ConvReLU(96, 3, 208, 1) + self.icp3_reduction2 = ConvReLU(480, 1, 16, 1) + self.icp3_out2 = ConvReLU(16, 5, 48, 1) + self.icp3_pool = MaxPool2d(3, 1, pad_mode="SAME") + self.icp3_out3 = ConvReLU(480, 1, 64, 1) + self.icp3_out0 = ConvReLU(480, 1, 192, 1) + """etc""" + """...""" + + def construct(self, input_1): + """do something with input_1, output num 6""" + x = self.conv1(input_1) + x = self.pool1(x) + x = self.norm1(x) + x = self.reduction2(x) + x = self.conv2(x) + x = self.norm2(x) + x = self.pool2(x) + pool2 = x + + x = self.icp1_reduction1(x) + x = self.icp1_out1(x) + icp1_out1 = x + + icp1_reduction2 = self.icp1_reduction2(pool2) + icp1_out2 = self.icp1_out2(icp1_reduction2) + + icp1_pool = self.icp1_pool(pool2) + icp1_out3 = self.icp1_out3(icp1_pool) + + icp1_out0 = self.icp1_out0(pool2) + + icp2_in = self.concat((icp1_out0, icp1_out1, icp1_out2, icp1_out3)) + """etc""" + """...""" + + return p1_x, p1_q, p2_x, p2_q, p3_x, p3_q + ``` + + Accordingly, the logic of loss calculation can be written as follows: + + ```python + class PoseNetLoss(nn.Cell): + def __init__(self): + super(PoseNetLoss, self).__init__() + self.sub = ops.Sub() + self.square = ops.Square() + self.reduce_sum = ops.ReduceSum() + self.sqrt = ops.Sqrt() + + def construct(self, p1_x, p1_q, p2_x, p2_q, p3_x, p3_q, poses_x, poses_q): + """do something to calc loss""" + l1_x = self.sqrt(self.reduce_sum(self.square(self.sub(p1_x, poses_x)))) * 0.3 + l1_q = self.sqrt(self.reduce_sum(self.square(self.sub(p1_q, poses_q)))) * 150 + l2_x = self.sqrt(self.reduce_sum(self.square(self.sub(p2_x, poses_x)))) * 0.3 + l2_q = self.sqrt(self.reduce_sum(self.square(self.sub(p2_q, poses_q)))) * 150 + l3_x = self.sqrt(self.reduce_sum(self.square(self.sub(p3_x, poses_x)))) * 1 + l3_q = self.sqrt(self.reduce_sum(self.square(self.sub(p3_q, poses_q)))) * 500 + return l1_x + l1_q + l2_x + l2_q + l3_x + l3_q + ``` + + Finally, your training script should look like this: + + ```python + if __name__ == "__main__": + backbone = PoseNet() + loss = PoseNetLoss() + net_with_loss = PoseNetLossCell(backbone, loss) + opt = Adam(net_with_loss.trainable_params(), learning_rate=0.001, beta1=0.9, beta2=0.999, eps=1e-08, use_locking=False) + net_with_grad = TrainOneStepCell(net_with_loss, opt) + """dataset define""" + model = Model(net_with_grad) + model.train(epoch_size, dataset) + ``` + + In this way, the migration of model scripts from TensorFlow to MindSpore is basically completed. The next step is to use rich MindSpore tools and calculation strategies to tune the accuracy. But it will not be detailed here. + +## Migrating PyTorch scripts to MindSpore + +To read PyTorch scripts to migrate directly. + +1. PyTorch subnet module usually inherits `torch.nn.Module`, MindSpore usually inherits `mindspore.nn.Cell`; The forward calculation logic of PyTorch subnet module needs to rewrite the forward method, and the forward calculation logic of MindSpore subnet module needs to rewrite the construct method. + +2. Take the migration of common Bottleneck classes under MindSpore as an example. + + PyTorch project codes + + ```python + # defined in PyTorch + class Bottleneck(nn.Module): + def __init__(self, inplanes, planes, stride=1, mode='NORM', k=1, dilation=1): + super(Bottleneck, self).__init__() + self.mode = mode + self.relu = nn.ReLU(inplace=True) + self.k = k + + btnk_ch = planes // 4 + self.bn1 = nn.BatchNorm2d(inplanes) + self.conv1 = nn.Conv2d(inplanes, btnk_ch, kernel_size=1, bias=False) + + self.bn2 = nn.BatchNorm2d(btnk_ch) + self.conv2 = nn.Conv2d(btnk_ch, btnk_ch, kernel_size=3, stride=stride, padding=dilation, + dilation=dilation, bias=False) + + self.bn3 = nn.BatchNorm2d(btnk_ch) + self.conv3 = nn.Conv2d(btnk_ch, planes, kernel_size=1, bias=False) + + if mode == 'UP': + self.shortcut = None + elif inplanes != planes or stride > 1: + self.shortcut = nn.Sequential( + nn.BatchNorm2d(inplanes), + self.relu, + nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False) + ) + else: + self.shortcut = None + + def _pre_act_forward(self, x): + residual = x + + out = self.bn1(x) + out = self.relu(out) + out = self.conv1(out) + + out = self.bn2(out) + out = self.relu(out) + out = self.conv2(out) + + out = self.bn3(out) + out = self.relu(out) + out = self.conv3(out) + + if self.mode == 'UP': + residual = self.squeeze_idt(x) + elif self.shortcut is not None: + residual = self.shortcut(residual) + + out += residual + + return out + + def squeeze_idt(self, idt): + n, c, h, w = idt.size() + return idt.view(n, c // self.k, self.k, h, w).sum(2) + + def forward(self, x): + out = self._pre_act_forward(x) + return out + + ``` + + According to the different definitions of convolution parameters by PyTorch and MindSpore, it can be translated into as follows: + + ```python + from mindspore import nn + import mindspore.ops as ops + + # defined in MindSpore + class Bottleneck(nn.Cell): + def __init__(self, inplanes, planes, stride=1, k=1, dilation=1): + super(Bottleneck, self).__init__() + self.mode = mode + self.relu = nn.ReLU() + self.k = k + + btnk_ch = planes // 4 + self.bn1 = nn.BatchNorm2d(num_features=inplanes, momentum=0.9) + self.conv1 = nn.Conv2d(in_channels=inplanes, out_channels=btnk_ch, kernel_size=1, pad_mode='pad', has_bias=False) + + self.bn2 = nn.BatchNorm2d(num_features=btnk_ch, momentum=0.9) + self.conv2 = nn.Conv2d(in_channels=btnk_ch, out_channels=btnk_ch, kernel_size=3, stride=stride, pad_mode='pad', padding=dilation, dilation=dilation, has_bias=False) + + self.bn3 = nn.BatchNorm2d(num_features=btnk_ch, momentum=0.9) + self.conv3 = nn.Conv2d(in_channels=btnk_ch, out_channels=planes, kernel_size=1, pad_mode='pad', has_bias=False) + + self.shape = ops.Shape() + self.reshape = ops.Reshape() + self.reduce_sum = ops.ReduceSum() + + if mode == 'UP': + self.shortcut = None + elif inplanes != planes or stride > 1: + self.shortcut = nn.SequentialCell([ + nn.BatchNorm2d(num_features=inplanes, momentum=0.9), + nn.ReLU(), + nn.Conv2d(in_channels=inplanes, out_channels=planes, kernel_size=1, stride=stride, pad_mode='pad', has_bias=False) + ]) + else: + self.shortcut = None + + def _pre_act_forward(self, x): + residual = x + + out = self.bn1(x) + out = self.relu(out) + out = self.conv1(out) + + out = self.bn2(out) + out = self.relu(out) + out = self.conv2(out) + + out = self.bn3(out) + out = self.relu(out) + out = self.conv3(out) + + if self.shortcut is not None: + residual = self.shortcut(residual) + + out += residual + return out + + def construct(self, x): + out = self._pre_act_forward(x) + return out + ``` + +3. The back propagation of PyTorch is usually implemented by `loss.backward()`, and parameter updates are implemented by `optimizer.step()`. In MindSpore, these do not need to be explicitly called by users. It can be sent to `TrainOneStepCell` class for back propagation and gradient updates. Finally, the structure of the training script should be as follows: + + ```python + # define dataset + dataset = ... + + # define backbone and loss + backbone = Net() + loss = NetLoss() + + # combine backbone and loss + net_with_loss = WithLossCell(backbone, loss) + + # define optimizer + opt = ... + + # combine forward and backward + net_with_grad = TrainOneStepCell(net_with_loss, opt) + + # define model and train + model = Model(net_with_grad) + model.train(epoch_size, dataset) + ``` + +PyTorch and MindSpore are similar in some basic API definitions, such as [mindspore.nn.SequentialCell](https://www.mindspore.cn/doc/api_python/zh-CN/master/mindspore/nn/mindspore.nn.SequentialCell.html#mindspore.nn.SequentialCell) and [torch.nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential). In addition, some APIs may be different. There are some different common API. For more information, please refer to the [The comparison table between MindSpore and PyTorch](https://www.mindspore.cn/doc/note/zh-CN/master/index.html#operator_api) on MindSpore's official website. + +| PyTorch | MindSpore | +| :-------------------------------: | :------------------------------------------------: | +| tensor.view() | mindspore.ops.operations.Reshape()(tensor) | +| tensor.size() | mindspore.ops.operations.Shape()(tensor) | +| tensor.sum(axis) | mindspore.ops.operations.ReduceSum()(tensor, axis) | +| torch.nn.Upsample[mode: nearest] | mindspore.ops.operations.ResizeNearestNeighbor | +| torch.nn.Upsample[mode: bilinear] | mindspore.ops.operations.ResizeBilinear | +| torch.nn.Linear | mindspore.nn.Dense | +| torch.nn.PixelShuffle | mindspore.ops.operations.DepthToSpace | + +It is worth noting that, although the interface definitions of `torch.nn.MaxPool2d` and `mindspore.nn.MaxPool2d` are similar, MindSpore actually calls the `MaxPoolWithArgMax` operator, which has the same function as the operator with the same name of tensorflow during the training process on ascend. It is normal that the output MindSpore MaxPool is inconsistent with PyTorch. And it does not affect the final training result theoretically. \ No newline at end of file