diff --git a/tutorials/experts/source_en/others/mixed_precision.md b/tutorials/experts/source_en/others/mixed_precision.md index f28930672793e87c15e6a855d5f2be451c6533f1..e26bf033281e44513def1c81d977d115d8a789f0 100644 --- a/tutorials/experts/source_en/others/mixed_precision.md +++ b/tutorials/experts/source_en/others/mixed_precision.md @@ -54,7 +54,7 @@ Why do we need mixed-precision? Compared with FP32, FP16 has the following advan However, using FP16 also brings some problems, the most important of which are precision overflow and rounding error. -- Data overflow: Data overflow is easliy to understand. The valid data range of FP16 is $[6.10\times10^{-5}, 65504]$, and that of FP32 is $[1.4\times10^{-45}, 1.7\times10^{38}]$. We can see that the valid range of FP16 is much narrower than that of FP32. When FP16 is used to replace FP32, overflow and underflow occur. In deep learning, a gradient (a first-order derivative) of a weight in a network model needs to be calculated. Therefore, the gradient is smaller than the weight value, and underflow often occurs. +- Data overflow: Data overflow is easliy to understand. The valid data range of FP16 is $[5.9\\times10^{-8}, 65504]$, and that of FP32 is $[1.4\times10^{-45}, 1.7\times10^{38}]$. We can see that the valid range of FP16 is much narrower than that of FP32. When FP16 is used to replace FP32, overflow and underflow occur. In deep learning, a gradient (a first-order derivative) of a weight in a network model needs to be calculated. Therefore, the gradient is smaller than the weight value, and underflow often occurs. - Rounding error: Rounding error instruction is when the backward gradient of a network model is small, FP32 is usually used. However, when it is converted to FP16, the interval is smaller than the minimum interval, causing data overflow. For example, 0.00006666666 can be properly represented in FP32, but it will be represented as 0.000067 in FP16. The number that does not meet the minimum interval requirement of FP16 will be forcibly rounded off. ## Mixed-precision Computing Process diff --git a/tutorials/experts/source_zh_cn/others/mixed_precision.ipynb b/tutorials/experts/source_zh_cn/others/mixed_precision.ipynb index 308ee0c8dffeb3c6a8ebc8e4201d5d5793aaabcc..4b979ceef5406d61ba6b7f77ab34e9c779c0d150 100644 --- a/tutorials/experts/source_zh_cn/others/mixed_precision.ipynb +++ b/tutorials/experts/source_zh_cn/others/mixed_precision.ipynb @@ -63,7 +63,7 @@ "\n", "但是使用FP16同样会带来一些问题,其中最重要的是精度溢出和舍入误差。\n", "\n", - "- 数据溢出:数据溢出比较好理解,FP16的有效数据表示范围为 $[6.10\\times10^{-5}, 65504]$,FP32的有效数据表示范围为 $[1.4\\times10^{-45}, 1.7\\times10^{38}]$。可见FP16相比FP32的有效范围要窄很多,使用FP16替换FP32会出现上溢(Overflow)和下溢(Underflow)的情况。而在深度学习中,需要计算网络模型中权重的梯度(一阶导数),因此梯度会比权重值更加小,往往容易出现下溢情况。\n", + "- 数据溢出:数据溢出比较好理解,FP16的有效数据表示范围为 $[5.9\\times10^{-8}, 65504]$,FP32的有效数据表示范围为 $[1.4\\times10^{-45}, 1.7\\times10^{38}]$。可见FP16相比FP32的有效范围要窄很多,使用FP16替换FP32会出现上溢(Overflow)和下溢(Underflow)的情况。而在深度学习中,需要计算网络模型中权重的梯度(一阶导数),因此梯度会比权重值更加小,往往容易出现下溢情况。\n", "- 舍入误差:Rounding Error指示是当网络模型的反向梯度很小,一般FP32能够表示,但是转换到FP16会小于当前区间内的最小间隔,会导致数据溢出。如0.00006666666在FP32中能正常表示,转换到FP16后会表示成为0.000067,不满足FP16最小间隔的数会强制舍入。\n", "\n", "## 混合精度计算流程\n",