diff --git a/docs/mindspore/faq/source_en/network_compilation.md b/docs/mindspore/faq/source_en/network_compilation.md index c710570111b2ffe47bef1e6bb1cf97fea32edd55..1710320d10092b0fac26fef9d7622b8629185fc0 100644 --- a/docs/mindspore/faq/source_en/network_compilation.md +++ b/docs/mindspore/faq/source_en/network_compilation.md @@ -16,20 +16,20 @@ A: For the syntax `is` or `is not`, currently `MindSpore` only supports comparis **Q: What can I do if an error "MindSpore does not support comparison with operators more than one now, ops size =2" is reported?** -A: For comparison statements, `MindSpore` supports at most one operator. Please modify your code. For example, you can use `1 < x and x < 3` to take the place of `1 < x < 3`. +A: For comparison statements, `MindSpore` supports at most one operator. For example, you can use `1 < x and x < 3` to take the place of `1 < x < 3`.
**Q: What can I do if an error "TypeError: The function construct need 1 positional argument and 0 default argument, but provided 2" is reported?** A: When you call the instance of a network, the function `construct` will be executed. And the program will check the number of parameters required by the function `construct` and the number of parameters actually given. If they are not equal, the above exception will be thrown. -Please check your code to make sure they are equal. +Please check that the number of parameters passed in when the instance of the network in the script is called matches the number of parameters required by the `construct` function in the defined network.
**Q: What can I do if an error "Type Join Failed" or "Shape Join Failed" is reported?** -A: In the inference stage of front-end compilation, the abstract types of nodes, including `type` and `shape`, will be inferred. Common abstract types include `AbstractScalar`, `AbstractTensor`, `AbstractFunction`, `AbstractTuple`, `AbstractList`, etc. In some scenarios, such as multi-branch scenarios, the abstract types of the return values of different branches will be joined to infer the abstract type of the returned result. If these abstract types do not match, or `type`/`shape` are inconsistent, the above exception will be thrown. +A: In the inference stage of front-end compilation, the abstract types of nodes, including `type` and `shape`, will be inferred. Common abstract types include `AbstractScalar`, `AbstractTensor`, `AbstractFunction`, `AbstractTuple`, `AbstractList`, etc. In some scenarios, such as multi-branch scenarios, the abstract types of the return values of different branches will be `join` to infer the abstract type of the returned result. If these abstract types do not match, or `type`/`shape` are inconsistent, the above exception will be thrown. When an error similar to "Type Join Failed: dtype1 = Float32, dtype2 = Float16" appears, it means that the data types are inconsistent, resulting in an exception when joining abstract. According to the provided data types and code line, the error can be quickly located. In addition, the specific abstract information and node information are provided in the error message. You can view the MindIR information through the `analyze_fail.dat` file to locate and solve the problem. For specific introduction of MindIR, please refer to [MindSpore IR (MindIR)](https://www.mindspore.cn/docs/programming_guide/en/master/design/mindir.html). The code sample is as follows: @@ -47,10 +47,10 @@ class Net(nn.Cell): self.cast = ops.Cast() def construct(self, x, a, b): - if a > b: # The type of the two branches are inconsistent. + if a > b: # The type of the two branches has inconsistent return values. return self.relu(x) # shape: (2, 3, 4, 5), dtype:Float32 else: - return self.cast(self.relu(x), ms.float16) # shape: (2, 3, 4, 5), dtype:Float16 + return self.cast(self.relu(x), ms.float16) # shape:(), dype: Float32 input_x = Tensor(np.random.rand(2, 3, 4, 5).astype(np.float32)) input_a = Tensor(2, ms.float32) @@ -76,52 +76,7 @@ The function call stack (See file 'analyze_fail.dat' for more details): ^ ``` -When an error similar to "Shape Join Failed: shape1 = (2, 3, 4, 5), shape2 = ()" appears, it means that the shapes are inconsistent, resulting in an exception when joining abstract. The code sample is as follows: - -```python -import numpy as np -import mindspore as ms -import mindspore.ops as ops -from mindspore import nn, Tensor, context - -context.set_context(mode=context.GRAPH_MODE) -class Net(nn.Cell): - def __init__(self): - super().__init__() - self.relu = ops.ReLU() - self.reducesum = ops.ReduceSum() - - def construct(self, x, a, b): - if a > b: # The shape of the two branches are inconsistent. - return self.relu(x) # shape: (2, 3, 4, 5), dtype:Float32 - else: - return self.reducesum(x) # shape:(), dype: Float32 - -input_x = Tensor(np.random.rand(2, 3, 4, 5).astype(np.float32)) -input_a = Tensor(2, ms.float32) -input_b = Tensor(6, ms.float32) -net = Net() -out = net(input_x, input_a, input_b) -``` - -The result is as follows: - -```text -ValueError: Cannot join the return values of different branches, perhaps you need to make them equal. -Shape Join Failed: shape1 = (2, 3, 4, 5), shape2 = (). -For more details, please refer to the FAQ at https://www.mindspore.cn -The abstract type of the return value of the current branch is AbstractTensor(shape: (), element: AbstractScalar(Type: Float32, Value: AnyValue, Shape: NoShape), value_ptr: 0x55658aa9b090, value: AnyValue), and that of the previous branch is AbstractTensor(shape: (2, 3, 4, 5), element: AbstractScalar(Type: Float32, Value: AnyValue, Shape: NoShape), value_ptr: 0x55658aa9b090, value: AnyValue). -The node is construct.6:[CNode]13{[0]: construct.6:[CNode]12{[0]: ValueNode Switch, [1]: [CNode]11, [2]: ValueNode ✓construct.4, [3]: ValueNode ✗construct.5}}, true branch: ✓construct.4, false branch: ✗construct.5 -The function call stack: -In file test.py(14)/ if a > b: - -The function call stack (See file 'analyze_fail.dat' for more details): -# 0 In file test.py(14) - if a > b: - ^ -``` - -When an error similar to "Type Join Failed: abstract type AbstractTensor can not join with AbstractTuple" appears, it means that the two abstract types are mismatched, resulting in an exception when joining abstract. The code sample is as follows: +When an error similar to "Shape Join Failed: shape1 = (2, 3, 4, 5), shape2 = ()" appears, it means that the `shape` are inconsistent, resulting in an exception when joining abstract. The code sample is as follows: ```python import mindspore.ops as ops @@ -137,9 +92,9 @@ def test_net(a, b): @ms_function() def join_fail(): - sens_i = ops.Fill()(ops.DType()(x), ops.Shape()(x), sens) # sens_i is a Scalar Tensor with shape: (1), dtype:Float64, value:1.0 + sens_i = ops.Fill()(ops.DType()(x), ops.Shape()(x), sens) # sens_i is a scalar shape: (1), dtype:Float64, value:1.0 # sens_i = (sens_i, sens_i) - a = grad(test_net)(x, y, sens_i) # For test_net output with type tuple(Tensor, Tensor), sens_i wih same type are needed to calculate the gradient, but sens_i is a Tensor;Setting sens_i = (sens_i, sens_i) before grad can fix the problem. + a = grad(test_net)(x, y, sens_i) # For a test_net gradient with an output type of tuple(Tensor, Tensor) requires that the type of sens_i be consistent with the output, but sens_i is a Tensor; Setting sens_i = (sens_i, sens_i) before grad can fix the problem. return a join_fail() @@ -159,9 +114,9 @@ The function call stack (See file 'analyze_fail.dat' for more details):
-**Q: What can I do if an error "The params of function 'bprop' of Primitive or Cell requires the forward inputs as well as the 'out' and 'dout" is reported?** +**Q: What can I do if an error "The params of function 'bprop' of Primitive or Cell requires the forward inputs as well as the 'out' and 'dout" is reported during compilation?** -A: The inputs of user-defined back propagation function `bprop` should contain all the inputs of the forward pass, `out` and `dout`. The example is as follow: +A: The inputs of user-defined back propagation function `bprop` should contain all the inputs of the forward network, `out` and `dout`. The example is as follow: ```python class BpropUserDefinedNet(nn.Cell): @@ -178,8 +133,9 @@ class BpropUserDefinedNet(nn.Cell):
-**Q: What can I do if an error “There isn't any branch that can be evaluated“ is reported?** -When an error similar to "There isn't any branch that can be evaluated" appears. +**Q: What can I do if an error "There isn't any branch that can be evaluated" is reported during compilation?** + +A: When an error similar to "There isn't any branch that can be evaluated" appears, it means that there may be infinite recursion or loop in the code, which causes each branch of the if condition to be unable to deduce the correct type and dimension information. The example is as follow: @@ -211,27 +167,27 @@ def test_endless(): ``` -the f(x)'s each branch of the if condition cannot deduce the correct type and dimension information +The f(x) fails because each if branch cannot derive the correct type information.
-**Q: What can I do if an error "Exceed function call depth limit 1000" is reported?** +**Q: What can I do if an error "Exceed function call depth limit 1000" is reported during compilation?** -This indicates that there is an infinite recursive loop in the code, or the code is too complex, that caused the stack depth exceed. +When Exceed function call depth limit 1000 is displayed, this indicates that there is an infinite recursive loop in the code, or the code is too complex. The type derivation process causes the stack depth to exceed the set maximum depth. At this time, you can set context.set_context(max_call_depth = value) to change the maximum depth of the stack, and consider simplifying the code logic or checking whether there is infinite recursion or loop in the code. -Otherwise, set max_call_depth can change the recursive depth of MindSpore, it may also cause exceed the maximum depth of the system stack and cause segment fault. At this time, you may also need to set the system stack depth. +Otherwise, set max_call_depth can change the recursive depth of MindSpore, and it may also cause exceed the maximum depth of the system stack and cause segment fault. At this time, you may also need to set the system stack depth.
-**Q: Why report an error that 'could not get source code' and 'Mindspore can not compile temporary source code in terminal. Please write source code to a python file and run the file.'?** +**Q: What can I do if an error that 'could not get source code' and 'Mindspore can not compile temporary source code in terminal. Please write source code to a python file and run the file.' is displayed during compilation?** -A: When compiling a network, MindSpore use `inspect.getsourcelines(self.fn)` to get the code file. If the network is the temporary code which edited in terminal, MindSpore will report an error as the title. It can be solved if writing the network to a python file. +A: When compiling a network, MindSpore uses `inspect.getsourcelines(self.fn)` to get the file located in the network code. If the network is the temporary code which is edited in terminal, MindSpore will report an error as the title. It can be solved if writing the network to a Python file.
-**Q: Why report an error that 'Corresponding forward node candidate:' and 'Corresponding code candidate:'?** +**Q: What can I do when an error that 'Corresponding forward node candidate:' and 'Corresponding code candidate:' is reported?** A: "Corresponding forward node candidate:" is the code in the associated forward network, indicating that the backpropagation operator corresponds to the forward code. "Corresponding code candidate:" means that the operator is fused by these code, and the separator "-" is used to distinguish different code. @@ -268,7 +224,7 @@ For example: In file /home/workspace/mindspore/build/package/mindspore/train/dataset_helper.py(98)/ return self.network(*outputs)/ ``` - The first line is the corresponding source code of the operator. The operator is a bprop operator realized by MindSpore. The second line indicates that the operator has an associated forward node, and points to 'out = self.conv1(x)' on line 149 of the network script file. In summary, the operator Conv2DBackpropFilter is a bprop operator, and the corresponding forward node is a convolution operator. + The first line is the corresponding source code of the operator. The operator is a bprop operator realized by MindSpore. The second line indicates that the operator has an associated forward node, and the fourth line points to 'out = self.conv1(x)' on line 149 of the network script file. In summary, the operator Conv2DBackpropFilter is a bprop operator, and the corresponding forward node is a convolution operator.
@@ -276,7 +232,7 @@ For example: A: JIT Fallback is to realize the unification of static graph mode and dynamic graph mode from the perspective of static graph, so that the static graph mode can support the syntax of the dynamic mode as much as possible. It draws on the fallback idea of traditional JIT compilation. When compiling a static graph, if the syntax is not supported, the relevant sentence will be recorded and an interpret node will be generated. In the subsequent processing, the relevant sentence will be fallbacked to the Python interpreter for interpretation and execution, so that the syntax can be supported. The environment variable switch of JIT Fallback is `DEV_ENV_ENABLE_FALLBACK`, and JIT Fallback is enabled by default. -When the errors "Should not use Python object in runtime" and "We suppose all nodes generated by JIT Fallback would not return to outside of graph" appear, it means that there is an incorrect syntax in the code. The generated interpret node cannot be executed normally during the compilation phase, resulting in an error. The current JIT Fallback conditionally supports some constant scenes in Graph mode, and it also needs to conform to MindSpore's programming syntax. Please refer to [Static Graph Syntax Support](https://www.mindspore.cn/docs/note/en/master/static_graph_syntax_support.html). +When the errors "Should not use Python object in runtime" and "We suppose all nodes generated by JIT Fallback would not return to outside of graph" appear, it means that there is an incorrect syntax in the code. The generated interpret node cannot be executed normally during the compilation phase, resulting in an error. The current JIT Fallback conditionally supports some constant scenarios in Graph mode, and it also needs to conform to MindSpore's programming syntax. When you write the code, please refer to [Static Graph Syntax Support](https://www.mindspore.cn/docs/note/en/master/static_graph_syntax_support.html). For example, when calling the third-party library NumPy, JIT Fallback supports the syntax of `np.add(x, y)` and `Tensor(np.add(x, y))`, but MindSpore does not support returning the NumPy type. Therefore, the program will report an error. The code sample is as follows: @@ -312,7 +268,7 @@ When there is an error related to JIT Fallback, please review the code syntax an **Q: What can I do if an error "Operator[AddN] input(kNumberTypeBool,kNumberTypeBool) output(kNumberTypeBool) is not support. This error means the current input type is not supported, please refer to the MindSpore doc for supported types."** -A: Currently, Tensor with bool data type has weak support by MindSpore, only a few primitives support Tensor (bool). If Tensor(bool) used in forward graph correctly, but get total derivative in the backward graph will using primitive `AddN` that not support Tensor(bool), which will raise exception. +A: Currently, Tensor [subsequent abbreviation Tensor (bool)] with bool data type has weak support by MindSpore, and only a small number of operators support Tensor(bool) type data participation operations. If an operator supporting the Tensor(bool) type is used in a forward graph and the forward graph syntax is correct, since the reverse graph solves the full derivative introduces `AddN`, `AddN` does not support the Tensor (bool) type, and the reverse graph run will throw the exception. The example is as follow: @@ -333,7 +289,7 @@ grad_net = grad(test_logic) out = grad_net(x, y) ``` -The forward processing of the above code can be expressed as: `r = f(z, x), z = z(x, y)`, the corresponding full derivative formula is: `dr/dx = df/dz * dz/dx + df/dx`, function`f(z,x)` and `z(x,y)` are primitive `and`; Primitive `and` in the forward graph supports Tensor (bool), but primitive `AddN` in the backward graph not supports Tensor(bool). And the error cannot be mapped to a specific forward code line. +The forward processing of the above code can be expressed as: the corresponding full derivative formula of `r = f(z, x), z = z(x, y)` is: `dr/dx = df/dz * dz/dx + df/dx`. Function`f(z,x)` and `z(x,y)` are primitive `and`. Primitive `and` in the forward graph supports Tensor (bool) type, and the `AddN` introduced when reversing the full derivative of the graph does not support the Tensor(bool) type. And the error cannot be mapped to a specific forward code line. The result is as follows: @@ -354,14 +310,12 @@ Trace: In file /usr/local/python3.7/lib/python3.7/site-packages/mindspore/ops/composite/multitype_ops/add_impl.py(287)/ return F.addn((x, y))/ ``` -If you encounter problems like this one, please remove the use of tensor (bool). In this example, replace tensor (bool) with bool can solve the problem. +If you encounter problems like this one, please remove the use of tensor (bool). In this example, replacing Tensor (bool) with bool can solve the problem.
**Q: What can I do if encountering an error "The 'setitem' operation does not support the type [List[List[Int642],Int643], Slice[Int64 : Int64 : kMetaTypeNone], Tuple[Int64*3]]"?** A: The MindSpore static graph mode needs to translate the assign operation as the MindSpore operation. -This assign is implemented by the [HyperMap](https://www.mindspore.cn/docs/programming_guide/en/master/hypermap.html#multitypefuncgraph) in MindSpore. -The Type is not registered in the HyperMap. -Since the type inference is an indispensable part of MindSpore, ME Compiler cannot find this type in HyperMap, and this error will be reported, which shows the current supported type. -The user can use other operators which supported in MindSpore to replace this operation, or can add the needed type in HyperMap manually. +This assign is implemented by the [HyperMap](https://www.mindspore.cn/docs/programming_guide/en/master/hypermap.html#multitypefuncgraph) in MindSpore. The Type is not registered in the HyperMap. Since the type inference is an indispensable part of MindSpore, When the front-end compiler expands this assignment operation into a concrete type, it finds that the type is not registered and reports an error. In general, the existing support types will be prompted below. +Users can consider replacing them with other operators, or changing the way the MindSpore source code extends the current Hypermap type [operation overload](https://www.mindspore.cn/docs/programming_guide/en/master/hypermap.html#multitypefuncgraph) that MindSpore does not yet support. \ No newline at end of file diff --git a/docs/mindspore/faq/source_en/operators_compile.md b/docs/mindspore/faq/source_en/operators_compile.md index 1a2c7deeb1a2c5d263c125cd1eb808cc4d8c7124..4a0175c27eed4510f03f58e3d3adb7f43f83c177 100644 --- a/docs/mindspore/faq/source_en/operators_compile.md +++ b/docs/mindspore/faq/source_en/operators_compile.md @@ -2,7 +2,7 @@ -**Q: When the `ops.concat` operator is used, the error message `Error:Input and (output + workspace) num should <=192!` is displayed indicating that the data volume is large. What can I do?** +**Q: When the `ops.concat` operator is used, the error message `Error:Input and (output + workspace) num should <=192!` is displayed, which indicating that the data volume is large. What can I do?** A: The `shape` of the `ops.concat` operator is too large. You are advised to set the output to `numpy` when creating an iterator for the `dataset` object. The setting is as follows: @@ -10,7 +10,7 @@ A: The `shape` of the `ops.concat` operator is too large. You are advised to set gallaryloader.create_dict_iterator(output_numpy=True) ``` -In the post-processing phase (in a non-network calculation process, that is, in a non-construct function), `numpy` can be directly used for computation. For example, `numpy.concatenate` is used to replace the `ops.concat` for computation. +In the post-processing phase (in a non-network calculation process, that is, in a non-`construct` function), `numpy` can be directly used for computation. For example, `numpy.concatenate` is used to replace the `ops.concat` for computation.
@@ -32,9 +32,9 @@ A: The number of tensors to be concatenated at a time cannot exceed 192 accordin
-**Q: When `Conv2D` is used to define convolution, the `group` parameter is used. Is it necessary to ensure that the value of `group` can be exactly divided by the input and output dimensions? How is the group parameter transferred?** +**Q: When `Conv2D` is used to define convolution, the `group` parameter is used. Is it necessary to ensure that the value of `group` can be exactly divided by the input and output dimensions? How is the `group` parameter transferred?** -A: The `Conv2d` operator has the following constraint: When the value of `group` is greater than 1, the value must be the same as the number of input and output channels. Do not use `ops.Conv2D`. Currently, this operator does not support a value of `group` that is greater than 1. Currently, only the `nn.Conv2d` API of MindSpore supports `group` convolution. However, the number of groups must be the same as the number of input and output channels. +A: The `Conv2d` operator has the following constraint: When the value of `group` is greater than 1, the value must be the same as the number of input and output channels. Do not use `ops.Conv2D`. Currently, this operator does not support a value of `group` that is greater than 1. Currently, only the `nn.Conv2d` API of MindSpore supports group convolution. However, the number of `group` must be the same as the number of input and output channels.
@@ -44,34 +44,34 @@ A: Yes. For details, see [mindspore.ops.Transpose](https://www.mindspore.cn/docs
-**Q: Can MindSpore calculate the variance of any tensor?** +**Q: Can MindSpore calculate the variance of any `tensor`?** -A: Currently, MindSpore does not have APIs or operators similar to variance which can directly calculate the variance of a `tensor`. However, MindSpore has sufficient small operators to support such operations. For details, see [class Moments(Cell)](https://www.mindspore.cn/docs/api/en/master/_modules/mindspore/nn/layer/math.html#Moments). +A: Currently, MindSpore does not have APIs or operators which can directly calculate the variance of a `tensor`. However, MindSpore has sufficient small operators to support such operations. For details, see [class Moments(Cell)](https://www.mindspore.cn/docs/api/en/master/_modules/mindspore/nn/layer/math.html#Moments).
-**Q: Compared with PyTorch, the `nn.Embedding` layer lacks the padding operation. Can other operators implement this operation?** +**Q: Compared with PyTorch, the `nn.Embedding` layer lacks the `padding` operation. Can other operators implement this operation?** A: In PyTorch, `padding_idx` is used to set the word vector in the `padding_idx` position in the embedding matrix to 0, and the word vector in the `padding_idx` position is not updated during backward propagation. -In MindSpore, you can manually initialize the weight corresponding to the `padding_idx` position of embedding to 0. In addition, the loss corresponding to `padding_idx` is filtered out through the mask operation during training. +In MindSpore, you can manually initialize the weight corresponding to the `padding_idx` position of embedding to 0. In addition, the `loss` corresponding to `padding_idx` is filtered out through the `mask` operation during training.
-**Q: When the `Tile` module in operations executes `__infer__`, the `value` is `None`. Why is the value lost?** +**Q: When the `Tile` operator in operations executes `__infer__`, the `value` is `None`. Why is the value lost?** -A: The `multiples input` of the `Tile` operator must be a constant. (The value cannot directly or indirectly come from the input of the graph.) Otherwise, the `None` data will be obtained during graph composition because the graph input is transferred only during graph execution and the input data cannot be obtained during graph composition. +A: The `multiples input` of the `Tile` operator must be a constant (The value cannot directly or indirectly come from the input of the graph). Otherwise, the `None` data will be obtained during graph composition because the graph input is transferred only during graph execution and the input data cannot be obtained during graph composition. For the detailed imformation, refer to [Static Graph Syntax Support](https://www.mindspore.cn/docs/note/en/master/static_graph_syntax_support.html).
-**Q: When conv2d is set to (3,10), Tensor[2,2,10,10] and it runs on Ascend on ModelArts, the error message `FM_W+pad_left+pad_right-KW>=strideW` is displayed. However, no error message is displayed when it runs on a CPU. What should I do?** +**Q: When conv2d is set to (3,10), Tensor [2,2,10,10] and it runs on Ascend on ModelArts, the error message `FM_W+pad_left+pad_right-KW>=strideW` is displayed. However, no error message is displayed when it runs on a CPU. What should I do?** -A: TBE (Tensor Boost Engine) operator is Huawei's self-developed Ascend operator development tool, which is extended on the basis of the TVM framework to develop custom operators. The above problem is the limitation of this TBE operator, the width of x must be greater than the width of the kernel. The CPU operator does not have this restriction, so no error is reported. +A: TBE (Tensor Boost Engine) operator is Huawei's self-developed Ascend operator development tool, which is extended on the basis of the TVM framework to develop custom operators. The above problem is the limitation of this TBE operator, and the width of x must be greater than the width of the kernel. The CPU operator does not have this restriction, so no error is reported.
**Q: Has MindSpore implemented the anti-pooling operation similar to `nn.MaxUnpool2d`?** -A: Currently, MindSpore does not provide anti-pooling APIs but you can customize the operator to implement the operation. For details, refer to [Custom Operators](https://www.mindspore.cn/docs/programming_guide/en/master/custom_operator.html). +A: Currently, MindSpore does not provide anti-pooling APIs but you can customize the operator to implement the operation. For details, refer to [Customize Operators](https://www.mindspore.cn/docs/programming_guide/en/master/custom_operator.html).
@@ -87,15 +87,16 @@ output=expand_dims(input_tensor,0) A: The problem is that the Graph mode is selected but the PyNative mode is used. As a result, an error is reported. MindSpore supports the following running modes which are optimized in terms of debugging or running: - PyNative mode: dynamic graph mode. In this mode, operators in the neural network are delivered and executed one by one, facilitating the compilation and debugging of the neural network model. -- Graph mode: static graph mode. In this mode, the neural network model is compiled into an entire graph and then delivered for execution. This mode uses technologies such as graph optimization to improve the running performance and facilitates large-scale deployment and cross-platform running. +- Graph mode: static graph mode or graph mode. In this mode, the neural network model is compiled into an entire graph and then delivered for execution. This mode uses technologies such as graph optimization to improve the running performance and facilitates large-scale deployment and cross-platform running.
-**Q: What can I do if the Kernel Select Failed message:`Can not select a valid kernel info for [xxx] in AI CORE or AI CPU kernel info candidates list` is displayed on Ascend backend?** +**Q: Ascend backend error: `AI CORE` and `AI CPU` can not find a valid `kernel info` when the Kernel Select Failed, how to locate?** -A: The `Ascend` backend operators can be divided into `AI CORE` operators and `AI CPU` operators. Some operators are supported by `AI CORE`, some operators are supported by `AI CPU`, and some operators are supported by `AI CORE` and `AI CPU` at the same time. According to the error message: +A: The `Ascend` backend operators can be divided into AI CORE operators and AI CPU operators. Some operators are supported by AI CORE, some operators are supported by AI CPU, and some operators are supported by AI CORE and AI CPU at the same time. According to the error message: 1. If the `AI CORE` operator's candidates list is empty, it may be that all operator information failed to pass the verification in the `check support` stage. You can search the keyword `CheckSupport` in the log to find the reason for the failure. Modify the shape or data type according to the specific information, or ask the developer to further locate the problem. -2. If the `AI CPU` operator's candidates list is not empty, or the candidates list of `AI CORE` and `AI CPU` are both not empty, it may be that the given input data type was not in the candidate list and was filtered out in the selection stage. Try to modify the input data type of the operator according to the candidate list. +2. If the `AI CPU` candidate operator information is not empty, or the candidate operator information of `AI CORE` and `AI CPU` are both not empty, it may be that the given input data type was not in the candidate list and was filtered out in the selection stage. Try to modify the input data type of the operator according to the candidate list. + +You can select a proper mode and writing method to complete the training by referring to the [official website tutorial](https://www.mindspore.cn/docs/programming_guide/en/master/debug_in_pynative_mode.html). -You can select a proper mode and writing method to complete the training by referring to the official website [tutorial](https://www.mindspore.cn/docs/programming_guide/en/master/debug_in_pynative_mode.html). diff --git a/docs/mindspore/faq/source_en/performance_tuning.md b/docs/mindspore/faq/source_en/performance_tuning.md index 737c8b16ff3072264eabefdeff19466db3ea360e..eea46b2b25ae05e1909daff3bc9975d25ac32e63 100644 --- a/docs/mindspore/faq/source_en/performance_tuning.md +++ b/docs/mindspore/faq/source_en/performance_tuning.md @@ -4,6 +4,8 @@ **Q: What can I do if the network performance is abnormal and weight initialization takes a long time during training after MindSpore is installed?** -A: The `SciPy 1.4` series versions may be used in the environment. Run the `pip list | grep scipy` command to view the `SciPy` version and change the `SciPy` version to that required by MindSpore. You can view the third-party library dependency in the `requirement.txt` file. +A: The `scipy 1.4` series versions may be used in the environment. Run the `pip list | grep scipy` command to view the scipy version and change the `scipy` version to that required by MindSpore. You can view the third-party library dependency in the `requirement.txt` file. + > Replace version with the specific version branch of MindSpore. + diff --git a/docs/mindspore/faq/source_en/usage_migrate_3rd.md b/docs/mindspore/faq/source_en/usage_migrate_3rd.md index 3d402e4eb4a67757ae0d8acb41b68ef03be83580..d355cab5c515f204e187121881b8cf037fbac586 100644 --- a/docs/mindspore/faq/source_en/usage_migrate_3rd.md +++ b/docs/mindspore/faq/source_en/usage_migrate_3rd.md @@ -31,11 +31,11 @@ def pytorch2mindspore(default_file = 'torch_resnet.pth'): **Q: How do I convert a PyTorch `dataset` to a MindSpore `dataset`?** -A: The custom dataset logic of MindSpore is similar to that of PyTorch. You need to define a `dataset` class containing `__init__`, `__getitem__`, and `__len__` to read your dataset, instantiate the class into an object (for example, `dataset/dataset_generator`), and transfer the instantiated object to `GeneratorDataset` (on MindSpore) or `DataLoader` (on PyTorch). Then, you are ready to load the custom dataset. MindSpore provides further `map`->`batch` operations based on `GeneratorDataset`. Users can easily add other custom operations to `map` and start `batch`. -The custom dataset of MindSpore is loaded as follows: +A: The customized dataset logic of MindSpore is similar to that of PyTorch. You need to define a `dataset` class containing `__init__`, `__getitem__`, and `__len__` to read your dataset, instantiate the class into an object (for example, `dataset/dataset_generator`), and transfer the instantiated object to `GeneratorDataset` (on MindSpore) or `DataLoader` (on PyTorch). Then, you are ready to load the customized dataset. MindSpore provides further `map`->`batch` operations based on `GeneratorDataset`. Users can easily add other customized operations to `map` and start `batch`. +The customized dataset of MindSpore is loaded as follows: ```python -# 1. Perform operations such as data argumentation, shuffle, and sampler. +# 1 Data enhancement,shuffle,sampler. class Mydata: def __init__(self): np.random.seed(58) @@ -47,9 +47,9 @@ class Mydata: return len(self.__data) dataset_generator = Mydata() dataset = ds.GeneratorDataset(dataset_generator, ["data", "label"], shuffle=False) -# 2. Customize data argumentation. +# 2 Customized data enhancement dataset = dataset.map(operations=pyFunc, {other_params}) -# 3. batch +# 3 batch dataset = dataset.batch(batch_size, drop_remainder=True) ``` @@ -57,25 +57,25 @@ dataset = dataset.batch(batch_size, drop_remainder=True) **Q: How do I migrate scripts or models of other frameworks to MindSpore?** -A: For details about script or model migration, please visit the [MindSpore official website](https://www.mindspore.cn/docs/migration_guide/en/master/migration_script.html). +A: For details about script or model migration, please visit the [Migration Script](https://www.mindspore.cn/docs/migration_guide/en/master/migration_script.html) in MindSpore official website.
**Q: MindConverter converts TensorFlow script error prompt`terminate called after throwing an instance of 'std::system_error', what(): Resource temporarily unavailable, Aborted (core dumped)`** -A: This problem is caused by TensorFlow. During script conversion, you need to load the TensorFlow model file through the TensorFlow library. At this time, TensorFlow will apply for relevant resources for initialization. If the resource application fails (maybe because the number of system processes exceeds the maximum number of Linux processes), the TensorFlow C/C++ layer will appear Core Dumped problem. For more information, please refer to the official ISSUE of TensorFlow. The following ISSUE is for reference only: [TF ISSUE 14885](https://github.com/tensorflow/tensorflow/issues/14885), [TF ISSUE 37449](https://github.com/tensorflow/tensorflow/issues/37449) +A: This problem is caused by TensorFlow. During script conversion, you need to load the TensorFlow model file through the TensorFlow library. At this time, TensorFlow will apply for relevant resources for initialization. If the resource application fails (maybe because the number of system processes exceeds the maximum number of Linux processes), the TensorFlow C/C++ layer will appear Core Dumped problem. For more information, please refer to the official ISSUE of TensorFlow. The following ISSUE is for reference only: [TF ISSUE 14885](https://github.com/tensorflow/tensorflow/issues/14885), [TF ISSUE 37449](https://github.com/tensorflow/tensorflow/issues/37449).
**Q: Can MindConverter run on ARM platform?** -A: MindConverter supports both x86 and ARM platform. Please ensure all required dependencies and environments have been installed in the ARM platform. +A: MindConverter supports both x86 and ARM platforms. Please ensure all required dependencies and environments have been installed in the ARM platform.
-**Q: Why does the conversion process take a lot of time (more than 10 minutes), but the model is not so large?** +**Q: Why does the conversion process take a lot of time (more than 10 minutes) by using MindConverter, but the model is not so large?** -A: When converting, MindConverter needs to use Protobuf to deserialize the model file. Please make sure that the Protobuf installed in Python environment is implemented by C++ backend. The validation method is as follows. If the output is "python", you need to install Python Protobuf implemented by C++ (download the Protobuf source code, enter the "python" subdirectory in the source code, and use `python setup.py install --cpp_implementation` to install). If the output is "cpp" and the conversion process still takes a long time, please add environment variable `export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp` before conversion. +A: When converting, MindConverter needs to use Protobuf to deserialize the model file. Please make sure that the Protobuf installed in Python environment is implemented by C++ backend. The validation method is as follows. If the output is "python", you need to install Python Protobuf implemented by C++ (download the Protobuf source code, enter the "python" subdirectory in the source code, and use python setup.py install --cpp_implementation to install). If the output is cpp and the conversion process still takes a long time, please add environment variable `export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp` before conversion. ```python from google.protobuf.internal import api_implementation @@ -86,10 +86,10 @@ print(api_implementation.Type()) **Q: While converting .pb file to MindSpore script, what may be the cause of error code 1000001 with ensuring `model_file`, `shape`, `iput_nodes` and `output_nodes` set right and third party requirements installed correctly?** -A: Make sure that the TensorFlow version to generate .pb file is no higher than that to convert .pb file, avoiding the conflict which caused by using low version TensorFlow to parse .pb file generated by the high version. +A: Make sure that the TensorFlow version to generate .pb file is no higher than that to convert .pb file, and avoid the conflict which caused by using low version TensorFlow to parse .pb file generated by the high version.
-**Q: What should I do to deal with Exception `[ERROR] MINDCONVERTER: [BaseConverterError] code: 0000000, msg: {python_home}/lib/libgomp.so.1: cannot allocate memory in static TLS block`?** +**Q: What should I do to deal with an error `[ERROR] MINDCONVERTER: [BaseConverterError] code: 0000000, msg: {python_home}/lib/libgomp.so.1: cannot allocate memory in static TLS block`?** A: In most cases, the problem is caused by environment variable exported incorrectly. Please set `export LD_PRELOAD={python_home}/lib/libgomp.so.1.0.0`, then try to run MindConverter again. diff --git a/docs/mindspore/faq/source_zh_cn/usage_migrate_3rd.md b/docs/mindspore/faq/source_zh_cn/usage_migrate_3rd.md index 3ea99acd49c6499af0fed442ea795807b8a6dbea..d4c663653910f9dc25aa7797cf39b2c70ce2aa01 100644 --- a/docs/mindspore/faq/source_zh_cn/usage_migrate_3rd.md +++ b/docs/mindspore/faq/source_zh_cn/usage_migrate_3rd.md @@ -35,7 +35,7 @@ A: MindSpore和PyTorch的自定义数据集逻辑是比较类似的,需要用 对应的MindSpore的自定义数据集加载如下: ```python -#1 Data enhancement,shuffle,sampler. +# 1 Data enhancement,shuffle,sampler. class Mydata: def __init__(self): np.random.seed(58) @@ -47,9 +47,9 @@ class Mydata: return len(self.__data) dataset_generator = Mydata() dataset = ds.GeneratorDataset(dataset_generator, ["data", "label"], shuffle=False) -#2 Custom data enhancement +# 2 Customized data enhancement dataset = dataset.map(operations=pyFunc, {other_params}) -#3 batch +# 3 batch dataset = dataset.batch(batch_size, drop_remainder=True) ```