diff --git a/docs/mindformers/docs/source_en/advanced_development/images/infer_precision_comparison.png b/docs/mindformers/docs/source_en/advanced_development/images/infer_precision_comparison.png
new file mode 100644
index 0000000000000000000000000000000000000000..6ee7abc6ee3b9234c97586f4bf4fedee8390bb31
Binary files /dev/null and b/docs/mindformers/docs/source_en/advanced_development/images/infer_precision_comparison.png differ
diff --git a/docs/mindformers/docs/source_en/advanced_development/inference_precision_comparison.md b/docs/mindformers/docs/source_en/advanced_development/inference_precision_comparison.md
new file mode 100644
index 0000000000000000000000000000000000000000..81f734ac7a13a3ae3373b786a9068b2a464108aa
--- /dev/null
+++ b/docs/mindformers/docs/source_en/advanced_development/inference_precision_comparison.md
@@ -0,0 +1,107 @@
+# Comparison of Reasoning Precision
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/advanced_development/inference_precision_comparison.md)
+
+## Overview
+
+For the model, after the adaptation and development are completed, if users want to use the newly adapted or newly developed model for reasoning, they need to ensure the correctness of the reasoning precision. The acceptance criteria for the precision of reasoning mainly lie in the evaluation scores of open-source datasets within the industry or closed-source datasets prepared by users themselves. This document mainly provides an overall process for comparing reasoning precision, as well as some positioning ideas and methods when there are precision issues.
+
+## Precision Acceptance Process
+
+### Overall Process
+
+In the current development process of reasoning, the process of verifying precision first examines the precision of online reasoning. Only if the precision of online reasoning is normal will the evaluation score of the dataset be further verified. The following flowchart shows the entire process of precision verification.
+
+
+

+
+
+### Online Reasoning Verification
+
+The main objective of online reasoning verification is to verify whether the precision of the reasoning output from a single or multiple inputs is normal. If all the outputs are normal and can be basically aligned with the output of the benchmark in the GPU environment, the next step of verifying the dataset evaluation can be taken.
+For information on how the model performs online reasoning tasks, please refer to the [Reasoning Guide](https://www.mindspore.cn/mindformers/docs/en/master/guide/inference.html).
+
+### Dataset Evaluation
+
+After verification through online reasoning, the output of the benchmark of the model can remain basically consistent while keeping the input the same. However, the data volume is relatively small and the problem involved is not comprehensive enough in terms of domain. Therefore, the precision of the model needs to be ultimately verified through dataset evaluation. Only when the evaluation score of the dataset and the benchmark data can meet an error of 0.4% can it be proved that the precision of the model meets the acceptance criteria.
+For information on how to evaluate the model using datasets, please refer to the [Evaluation Guide](https://www.mindspore.cn/mindformers/docs/en/master/guide/benchmarks.html).
+
+## Positioning Precision Issue
+
+- Scenario: The preset model weights are correct, meaning the model inference precision is normal in the GPU environment. The output of the GPU is used as the benchmark.
+- Possible situations: There are two possible scenarios for the precision comparison process provided in this document. The first is that there is a problem with the precision, and the second is that there is an error in the precision.
+
+### Precision Issue
+
+Precision issues generally refer to the situation where the answers in the reasoning task are garbled or completely illogical. Common causes are usually problems with weight loading or issues with the code implementation of the network.
+
+#### 1. Weight Loading Issue
+
+The investigation process is as follows:
+
+1. Search for the following keywords in the log of the executed reasoning task.
+
+ ```text
+ These parameters are not loaded in the network:
+ These parameters are not loaded in the weights:
+ ```
+
+2. Based on the content of the log, analyze whether the loading of weights is correct. The KEY values following the colons in the two logs respectively represent the KEY values of the weights that the network needs to load but are not actually loaded in the ownership weights and the KEY values of the weights that are not loaded into the network in the ownership weights of the weight files.
+
+Specific problems that may arise and their solutions:
+
+- Question 1: There is a KEY value after the colon, and some weights have not been loaded into the network.
+ - Reason: The KEY values of the network and the KEY values of the weights do not correspond one-to-one.
+ - Location method: Analyze by combining the network structure and the unloaded weights to determine whether it is reasonable that the weights corresponding to each KEY value are not loaded.
+ - Solution: Re-convert the unreasonable weight KEY values. For specific details, please refer to [New Model Weight Conversion Adaptation Tutorial](https://www.mindspore.cn/mindformers/docs/en/master/advanced_development/weight_transfer.html).
+
+- Question 2: There is no KEY value after the colon, and all weights are loaded into the network. However, there is still a possibility that incorrect splitting during the weight fusion or splitting process may lead to incorrect data loading.
+ - Reason: In most open-source weights, there are fused weights. Sometimes, they need to be split and then fused with other weights. During this process, various divisions may be involved, which can easily lead to problems.
+ - Location method: First, focus on analyzing the error-prone areas, such as the qkv part in Attention. Combine the writing method in the network structure to analyze whether various operations during the weight loading process are correct. If the theoretical analysis fails, the weights of the suspected parts can be directly printed out and compared with the weights loaded at the corresponding positions of the benchmark.
+ - Solution: Identify the module with incorrect weight loading through analysis or experimentation. For the solution, please refer to [New Model Weight Conversion and Adaptation Tutorial](https://www.mindspore.cn/mindformers/docs/en/master/advanced_development/weight_transfer.html).
+
+#### 2. There are problems in the construction of the new model
+
+The investigation process is as follows:
+
+When adapting a new model with a similar structure, it is generally done by directly replacing the configuration file and then directly loading the weights to perform the inference task. This way, it is easy to overlook some differences in details. It is necessary to check these differences module by module.
+
+Possible problems and solutions:
+
+- Problem: The reasoning output remains unchanged for different problems.
+ - Possible reasons: The MLP module, MoE module, and the linear module involved in the Attention module do not require bias, but they impose bias, and there are Nans in the input and output, etc.
+ - Positioning method: You can directly print the input and output of each module and observe whether the printing result is normal.
+ - Solution: After confirming that a certain module has a problem, compare it with the benchmark to determine whether bias is needed for that module. If bias is not needed, simply set the configuration item of bias to False.
+
+### Precision Error
+
+Precision error generally refers to the situation where the online reasoning response is logical but does not align with the benchmark response or the dataset evaluation score does not meet the acceptance criteria.
+
+#### 1. The answers are logical but do not align with the benchmark answers
+
+The fundamental reason for the occurrence of logical but inaccurate and inconsistent responses in reasoning tasks is that a certain module has caused an error. The magnitude of the error will determine the timing of the appearance of tokens that do not match the benchmark in the response.
+
+Possible problems and solutions:
+
+- Question: The first token is consistent, but after pushing about 10 tokens, the phenomenon of inconsistent precision occurs.
+ - Positioning method: Generally, the differences in data are compared by printing and dumping data. If the printed data cannot be observed by the naked eye to determine whether it is within the acceptable range, then the dumped data can be used, and then the comparison tool can be used to determine whether the module meets the precision standard.The comparison tool can be compared using the methods provided by MindSpore Transformers. The usage method is as follows:
+
+ ```py
+ import numpy as np
+ from tests.utils.precision_utils import PrecisionChecker
+
+ checker = PrecisionChecker()
+ gpu_data = np.load('path/to/gpu.npy')
+ npu_data = np.load('path/to/npu.npy')
+ checker.check_precision(gpu_data, npu_data)
+ ```
+
+ > For information on how to dump data, you can refer to the [Dump Tutorial Document](https://www.mindspore.cn/tutorials/en/master/debug/dump.html) provided on the MindSpore official website.
+ - Possible reasons: Precision loss caused by inconsistent dtype types of a certain input,etc.
+ - Solution: Align the dtype of the benchmark.
+
+#### 2. The evaluation score of the dataset does not meet the acceptance criteria
+
+According to the process of precision comparison, the prerequisite for dataset evaluation is that the responses from online reasoning are already logical. However, now there is a significant difference between the evaluation scores of the dataset and the benchmark data. The reason is that some responses do not align with those of the benchmark.
+
+Location method: Identify the questions where the output does not align with the benchmark answers, extract the questions separately as the input for online reasoning, and then locate and solve the problems following the approach of [answering questions with logical precision but inconsistent with the benchmark](#precision-questions-that-answer-are-logical-but-inconsistent-with-the-benchmark).
diff --git a/docs/mindformers/docs/source_en/index.rst b/docs/mindformers/docs/source_en/index.rst
index 5aaeedb0c8e79bb8473c13fb853700bbc575477a..a329f6767b798b9dab3005b70ea4de97cfb58f21 100644
--- a/docs/mindformers/docs/source_en/index.rst
+++ b/docs/mindformers/docs/source_en/index.rst
@@ -122,6 +122,7 @@ Advanced developing with MindSpore Transformers
- Accuracy Comparison
- `Compare Training Accuracy with Megatron-LM `_
+ - `Comparison of Reasoning Precision `_
Environment Variables
------------------------------------
@@ -194,6 +195,7 @@ FAQ
advanced_development/performance_optimization
advanced_development/dev_migration
advanced_development/yaml_config_inference
+ advanced_development/inference_precision_comparison
advanced_development/accuracy_comparison
advanced_development/api
diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/images/infer_precision_comparison.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/infer_precision_comparison.png
new file mode 100644
index 0000000000000000000000000000000000000000..115104e4660b1606d08c5fd777950f4c534cfa53
Binary files /dev/null and b/docs/mindformers/docs/source_zh_cn/advanced_development/images/infer_precision_comparison.png differ
diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/inference_precision_comparison.md b/docs/mindformers/docs/source_zh_cn/advanced_development/inference_precision_comparison.md
new file mode 100644
index 0000000000000000000000000000000000000000..fe086cffe75048aee612a394abd5d84cfae811a4
--- /dev/null
+++ b/docs/mindformers/docs/source_zh_cn/advanced_development/inference_precision_comparison.md
@@ -0,0 +1,106 @@
+# 推理精度比对
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/advanced_development/infernece_precision_comparison.md)
+
+## 概述
+
+对于模型来说,在适配和开发完成之后,用户如果要使用新适配或者新开发的模型来进行推理,需要确保推理精度的正确性。推理的精度验收标准主要是在于业内开源的数据集评测得分,或者用户自己准备的闭源数据集。该文档主要提供一个推理精度比对的整体流程,以及精度存在问题后的一些定位思路和手段。
+
+## 精度验收流程
+
+### 整体流程
+
+目前推理的开发流程中,验证精度的过程会先看在线推理的精度,如果在线推理的精度正常,才会进一步验证数据集的评测得分。下面流程图是整个精度验证的过程。
+
+
+

+
+
+### 在线推理验证
+
+在线推理验证的主要目标是验证单条或者多条输入的推理输出的精度是否正常。如果所有输出都正常,并且和GPU环境下标杆的输出能够基本对齐,可以进下一步验证数据集评测。
+关于模型如何执行在线推理任务可以参考[推理指南](https://www.mindspore.cn/mindformers/docs/zh-CN/master/guide/inference.html)。
+
+### 数据集评测
+
+通过在线推理验证之后,模型在保持输入相同的情况下,标杆的输出可以基本保持一致,但是数据量比较小并且问题涉及领域不够全面,需要通过数据集评测来最终验证模型的精度。只有数据集的评测得分和标杆数据能够满足0.4%的误差,才能证明模型的精度符合验收标准。
+关于模型如何用数据集评测可以参考[评测指南](https://www.mindspore.cn/mindformers/docs/zh-CN/master/guide/benchmarks.html)。
+
+## 定位精度问题
+
+- 场景:预设模型权重没问题,即GPU环境下模型推理精度正常,将GPU的输出作为标杆。
+- 可能出现的情况:针对该文档提供的精度比对流程可能会出现的两种情况,第一种是精度存在问题,第二种是精度存在误差。
+
+### 精度存在问题
+
+精度存在问题一般是指推理任务出现回答乱码或者完全没有逻辑的情况,常见的原因一般时权重加载存在问题或者网络的代码实现存在问题。
+
+#### 1. 权重加载问题
+
+排查流程如下:
+
+1. 在执行的推理任务的日志中搜索以下关键字。
+
+ ```text
+ These parameters are not loaded in the network:
+ These parameters are not loaded in the weights:
+ ```
+
+2. 根据日志的内容分析权重的加载是否正确,两条日志冒号后面的KEY值分别代表网络需要加载的所有权重中实际没有加载的权重的KEY值和权重文件里的所有权重中没有加载进网络的权重的KEY值。
+
+可能出现的具体问题和解决方法:
+
+- 问题 1:冒号后存在KEY值,部分权重没有加载进网络。
+ - 原因:网络的KEY值和权重的KEY值没有一一对应上。
+ - 定位方法:结合网络结构和没有加载的权重分析,每个KEY值对应的权重没有加载是否合理。
+ - 解决方法:对不合理权重KEY值的转换重新转换,具体参考[新模型权重转换适配教程](https://www.mindspore.cn/mindformers/docs/zh-CN/master/advanced_development/weight_transfer.html)。
+- 问题 2:冒号后不存在任何KEY值,所有权重都加载进网络,但依旧可能存在权重融合或者拆分过程中切分不对导致加载错数据。
+ - 原因:大多是开源的权重中存在融合的权重,有时候需要拆分之后再和其他权重融合,过程中有可能会涉及各种切分,容易出现问题。
+ - 定位方法:先重点分析容易出错的地方,如Attention中qkv的部分,结合网络结构中的写法,分析权重加载过程中的各种操作是否正确;如果理论分析不出来,可以直接将对怀疑的部分的权重打印出来和标杆的对应位置加载的权重对比。
+ - 解决方法:通过分析或者实验找到权重加载错误的模块,解决方法参考[新模型权重转换适配教程](https://www.mindspore.cn/mindformers/docs/zh-CN/master/advanced_development/weight_transfer.html)。
+
+#### 2. 新模型的搭建存在问题
+
+排查流程如下:
+
+在适配模型结构相似的新模型时,一般会直接通过替换配置文件,然后直接加载权重执行推理任务。这样容易忽略一些细节上的差别,需要逐模块排查这些差异点。
+
+可能出现的问题和解决方法:
+
+- 问题:不同的问题推理输出依旧不变。
+ - 可能的原因:MLP模块,MoE模块以及Attention模块涉及的linear模块不要需要bias,但是强加了bias,输入输出存在nan等。
+ - 定位方法:可以直接打印各个模块的输入输出,观察打印结果是否正常。
+ - 解决方法:确定某个模块有问题之后,对比标杆确定该模块是否需要bias,如果不需要bias,将bias的配置项设置成False即可。
+
+### 精度存在误差
+
+精度存在误差一般是指在线推理的回答符合逻辑但是不能对齐标杆的回答或者数据集评测得分不满足验收标准的情况。
+
+#### 1. 在线推理的回答符合逻辑但是不能对齐标杆的回答
+
+推理任务出现回答符合逻辑但精度和标杆不一致的根本原因是某个模块引起了误差,误差的大小会决定回答和标杆对不齐的token出现的早晚。
+
+可能出现的问题和解决方法:
+
+- 问题:首Token一致,但是在推了10个token左右就出现精度不一致的现象。
+ - 定位方法:一般采用打印和dump数据的方式去对比数据的差异,如果打印的数据无法通过肉眼观察出是否在可接受范围之内,那么可以采用dump数据,然后通过对比工具判定该模块是否符合精度标准。对比工具可以使用MindSpore Transformers提供的方法进行对比,使用方法如下:
+
+ ```py
+ import numpy as np
+ from tests.utils.precision_utils import PrecisionChecker
+
+ checker = PrecisionChecker()
+ gpu_data = np.load('path/to/gpu.npy')
+ npu_data = np.load('path/to/npu.npy')
+ checker.check_precision(gpu_data, npu_data)
+ ```
+
+ > 关于如何dump数据可以参考MindSpore官网提供的[Dump教程文档](https://www.mindspore.cn/tutorials/zh-CN/master/debug/dump.html)。
+ - 可能的原因:某个输入的dtype类型不一致等导致的精度损失。
+ - 解决方法:对齐标杆的dtype。
+
+#### 2. 数据集评测得分不满足验收标准
+
+按照精度比对的流程,数据集评测的前提是在线推理的回答已经符合逻辑,但是现在出现数据集评测得分和标杆数据存在较大差异,其原因是部分回答和标杆的回答无法对齐。
+
+定位方法:找出输出和标杆回答无法对齐的问题,将问题单独截取出来作为在线推理的输入,然后按照[在线推理的回答符合逻辑但是不能对齐标杆的回答](#1-在线推理的回答符合逻辑但是不能对齐标杆的回答)的定位思路去解决问题。
diff --git a/docs/mindformers/docs/source_zh_cn/index.rst b/docs/mindformers/docs/source_zh_cn/index.rst
index 93e84c2ee862f8a47d43c9395d41b463fbdac9c2..753e2340462b0102d8b025b53a7365261b78d3f1 100644
--- a/docs/mindformers/docs/source_zh_cn/index.rst
+++ b/docs/mindformers/docs/source_zh_cn/index.rst
@@ -149,6 +149,7 @@ MindSpore Transformers功能特性说明
- 精度对比
- `与 Megatron-LM 比对训练精度 `_
+ - `推理精度比对 `_
环境变量
------------------------------------
@@ -221,6 +222,7 @@ FAQ
advanced_development/performance_optimization
advanced_development/dev_migration
advanced_development/yaml_config_inference
+ advanced_development/inference_precision_comparison
advanced_development/accuracy_comparison
advanced_development/api