diff --git a/docs/mindinsight/docs/source_en/debugger_offline.md b/docs/mindinsight/docs/source_en/debugger_offline.md index 19850264902342df2d3a2717fcce8cc8f9f6a0a5..1633df7de256c6d9920a9e1b21af87704960f2ac 100644 --- a/docs/mindinsight/docs/source_en/debugger_offline.md +++ b/docs/mindinsight/docs/source_en/debugger_offline.md @@ -86,9 +86,8 @@ The UI of the offline debugger is the same as that of the online debugger. For d - Scenarios: - The offline debugger does not support the CPU scenario currently. - The offline debugger supports the single-node multi-device scenario. To analyze the multi-node multi-device scenario, you need to summarize the data of multiple nodes. - - The offline debugger does not support checking the initial weight and operator overflow currently. + - The offline debugger does not support checking the initial weight. - The offline debugger does not support checking watchpoints in multi-graph scenario. - - The offline debugger does not support PyNative mode. - GPU scenario: - Different from the online debugger, the offline debugger does not support node-by-node execution. diff --git a/docs/mindinsight/docs/source_en/debugger_online.md b/docs/mindinsight/docs/source_en/debugger_online.md index 218ca27c270446ccf521edaa7a86e67b782ba502..3c84538adb9e4c93728a93d5757d898749952483 100644 --- a/docs/mindinsight/docs/source_en/debugger_online.md +++ b/docs/mindinsight/docs/source_en/debugger_online.md @@ -128,7 +128,6 @@ After a watchpoint is created, manually select the node to be checked and click The following conditions are supported (abbreviations in parentheses): - Tensor check - - Operator overflow (OO): Check whether overflow occurs during operator computation. Only the Ascend AI Processor is supported. - Whether tensor values are all 0 (TZ): Set the threshold to `Percentage of 0 values ≥` to check the percentage of 0 tensor values. - Tensor overflow (TO): Check whether a tensor value overflow occurs. - Tensor value range (TR): Set a threshold to check the tensor value range. The options are `Percentage of the value in the range >`, `Percentage of the value in the range <`, `MAX-MIN>` and `MAX-MIN<`. If setting the threshold to `Percentage of the value in the range >` or `Percentage of the value in the range <`, you need to set the `Upper limit of the range (inclusive)` or `Lower limit of the range (inclusive)` at the same time. @@ -270,6 +269,5 @@ Tensors can be downloaded in tensor check view. Users can download the desired t - When using the debugger, make sure that the version numbers of MindInsight and MindSpore are the same. - Recheck only watchpoints that have tensor values. -- To check overflow during computation, you need to enable the overflow detection function of the asynchronous dump. For details about how to enable the function, see [Asynchronous Dump](https://www.mindspore.cn/docs/programming_guide/en/r1.5/custom_debugging_info.html#asynchronous-dump). - The graph displayed by the debugger is the finally optimized execution graph. The called operator may have been integrated with other operators, or the name of the called operator is changed after optimization. - Enabling the debugger will turn off memory reuse mode, which may lead to an 'out of memory' error when the training network is too large. diff --git a/docs/mindinsight/docs/source_zh_cn/accuracy_optimization.md b/docs/mindinsight/docs/source_zh_cn/accuracy_optimization.md index 9ab6750300c94df8dc0b9968389d01c0fc8da4ea..6094df1fee06970a415eb94cad609dcc8e5403a3 100644 --- a/docs/mindinsight/docs/source_zh_cn/accuracy_optimization.md +++ b/docs/mindinsight/docs/source_zh_cn/accuracy_optimization.md @@ -354,14 +354,13 @@ MindInsight可以辅助用户对输入数据、数据处理流水线进行检查 6. 激活值饱和或过弱(例如Sigmoid的输出接近1,Relu的输出全为0); 7. 梯度爆炸、消失; 8. 训练epoch不足; -9. 算子计算结果存在NAN、INF; -10. 算子计算过程溢出(计算过程中的溢出不一定都是有害的)等。 +9. 算子计算结果存在NAN、INF等。 上述这些问题或现象,有的可以通过loss表现出来,有的则难以观察。MindInsight提供了针对性的功能,可以观察上述现象、自动检查问题,帮助您更快定位问题根因。例如: - MindInsight的参数分布图模块可以展示模型权重随训练过程的变化趋势; - MindInsight的张量可视模块可以展示张量的具体取值,对不同张量进行对比; -- [MindInsight调试器](https://www.mindspore.cn/mindinsight/docs/zh-CN/r1.5/debugger.html)内置了种类丰富,功能强大的检查能力,可以检查权重问题(例如权重不更新、权重更新过大、权重值过大/过小)、梯度问题(例如梯度消失、梯度爆炸)、激活值问题(例如激活值饱和或过弱)、张量全为0、NAN/INF、算子计算过程溢出等问题。 +- [MindInsight调试器](https://www.mindspore.cn/mindinsight/docs/zh-CN/r1.5/debugger.html)内置了种类丰富,功能强大的检查能力,可以检查权重问题(例如权重不更新、权重更新过大、权重值过大/过小)、梯度问题(例如梯度消失、梯度爆炸)、激活值问题(例如激活值饱和或过弱)、张量全为0、NAN/INF等问题。 ![loss](./images/loss.png) diff --git a/docs/mindinsight/docs/source_zh_cn/debugger_offline.md b/docs/mindinsight/docs/source_zh_cn/debugger_offline.md index 7f32fd5041314ffbf1070950a9444c9dbd42f3e9..18942bc8a0f1397af794382d9c57cdb9a3be5c33 100644 --- a/docs/mindinsight/docs/source_zh_cn/debugger_offline.md +++ b/docs/mindinsight/docs/source_zh_cn/debugger_offline.md @@ -86,9 +86,8 @@ mindinsight start --port {PORT} --summary-base-dir {SUMMARY_BASE_DIR} --offline- - 场景支持: - 离线调试器暂不支持CPU场景。 - 离线调试器支持单机多卡场景。若要分析多机多卡的场景。需要自行把多机数据汇总到一起。 - - 离线调试器暂不支持初始权重和计算过程溢出的检查。 + - 离线调试器暂不支持初始权重的检查。 - 离线调试器暂不支持多图场景的规则检查。 - - 离线调试器暂不支持PyNative模式。 - GPU场景: - 与在线调试器不同,离线调试器不支持逐节点执行。 diff --git a/docs/mindinsight/docs/source_zh_cn/debugger_online.md b/docs/mindinsight/docs/source_zh_cn/debugger_online.md index f48980c5d5bf60c16fe5391ea2ca0f04c544a167..61105df96591fa3953ad4131a90c78d84d9c1aa1 100644 --- a/docs/mindinsight/docs/source_zh_cn/debugger_online.md +++ b/docs/mindinsight/docs/source_zh_cn/debugger_online.md @@ -122,7 +122,6 @@ mindinsight start --port {PORT} --enable-debugger True --debugger-port {DEBUGGER 支持的条件包括(括号中为缩写): - 检查张量 - - 检查计算过程溢出(OO):检查算子计算过程中是否存在溢出现象,仅支持昇腾AI处理器。 - 检查张量是否全为0(TZ):通过对条件参数设置阈值来检查张量的0值比例,可选参数为`0值比例>=`。 - 检查张量溢出(TO):检查张量值是否存在溢出现象。 - 检查张量值范围(TR):通过对条件参数设置阈值来检查张量值的范围,可选参数为`在范围中的值所占百分比>`、`在范围中的值所占百分比<`、`MAX-MIN>`和`MAX-MIN<`。其中在设置`在范围中的值所占百分比>`和`在范围中的值所占百分比<`时需要同时设置支持参数`范围上界(含)`和`范围下界(含)`。 @@ -262,6 +261,5 @@ mindinsight start --port {PORT} --enable-debugger True --debugger-port {DEBUGGER - 使用调试器时要保证MindInsight和MindSpore的版本号相同。 - 重新检查只检查当前有张量值的监测点。 -- 检查计算过程溢出需要用户开启异步Dump的全部溢出检测功能,开启方式请参照[异步Dump功能介绍](https://www.mindspore.cn/docs/programming_guide/zh-CN/r1.5/custom_debugging_info.html#id5) - 调试器展示的图是优化后的最终执行图。调用的算子可能已经与其它算子融合,或者在优化后改变了名称。 - 开启调试器会关闭内存复用,在训练网络过大时有可能导致'out of memory'错误。 diff --git a/docs/mindspore/programming_guide/source_en/dump_in_graph_mode.md b/docs/mindspore/programming_guide/source_en/dump_in_graph_mode.md index 94e8ae73ab82bd789862d0910aca4f3daf55f341..f66522ac661fb877d8929cf9f93f6c825a1b3205 100644 --- a/docs/mindspore/programming_guide/source_en/dump_in_graph_mode.md +++ b/docs/mindspore/programming_guide/source_en/dump_in_graph_mode.md @@ -402,7 +402,7 @@ Large networks (such as Bert Large) will cause memory overflow when using synchr - `kernels`: List of operator names. Turn on the IR save switch `context.set_context(save_graphs=True)` and execute the network to obtain the operator name from the generated `trace_code_graph_{graph_id}`IR file. `kernels` only supports TBE operator, AiCPU operator and communication operator. The data of communication operation input operator will be dumped if `kernels` is set to the name of communication operator. For details, please refer to [Saving IR](https://www.mindspore.cn/docs/programming_guide/en/r1.5/design/mindir.html#saving-ir). - `support_device`: Supported devices, default setting is `[0,1,2,3,4,5,6,7]`. You can specify specific device ids to dump specific device data. - `enable`: Enable Asynchronous Dump. If synchronous dump and asynchronous dump are enabled at the same time, only synchronous dump will take effect. - - `op_debug_mode`: 0: disable overflow check function; 1: enable AiCore overflow check; 2: enable Atomic overflow check; 3: enable all overflow check function. If it is not set to 0, only the data of the overflow operator will be dumped. + - `op_debug_mode`: Reserved field, set to 0. 2. Set Dump environment. diff --git a/docs/mindspore/programming_guide/source_zh_cn/dump_in_graph_mode.md b/docs/mindspore/programming_guide/source_zh_cn/dump_in_graph_mode.md index de482e9ef42cf649a3ce004341a753380e5af384..671ed6e447f4869f8d42c33bf7b9b833b3fbf603 100644 --- a/docs/mindspore/programming_guide/source_zh_cn/dump_in_graph_mode.md +++ b/docs/mindspore/programming_guide/source_zh_cn/dump_in_graph_mode.md @@ -402,7 +402,7 @@ numpy.load("Conv2D.Conv2D-op107.2.2.1623124369613540.output.0.DefaultFormat.npy" - `input_output`:设置成0,表示Dump出算子的输入和算子的输出;设置成1,表示Dump出算子的输入;设置成2,表示Dump出算子的输出。 - `kernels`:算子的名称列表。开启IR保存开关`context.set_context(save_graphs=True)`并执行用例,从生成的`trace_code_graph_{graph_id}`IR文件中获取算子名称。`kernels`仅支持TBE算子、AiCPU算子、通信算子,若设置成通信算子的名称,将会Dump出通信算子的输入算子的数据。详细说明可以参照教程:[如何保存IR](https://www.mindspore.cn/docs/programming_guide/zh-CN/r1.5/read_ir_files.html#id2)。 - `support_device`:支持的设备,默认设置成0到7即可;在分布式训练场景下,需要dump个别设备上的数据,可以只在`support_device`中指定需要Dump的设备Id。 - - `op_debug_mode`:该属性用于算子溢出调试,设置成0,表示不开启溢出;设置成1,表示开启AiCore溢出检测;设置成2,表示开启Atomic溢出检测;设置成3,表示开启全部溢出检测功能。在Dump数据的时候请设置成0,若设置成其他值,则只会Dump溢出算子的数据。 + - `op_debug_mode`:预留字段,设置为0。 2. 设置数据Dump的环境变量。