diff --git a/DPDLDA/README.md b/DPDLDA/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..9e5964f7d7fee26064974c1aada8fd45f65babd5
--- /dev/null
+++ b/DPDLDA/README.md
@@ -0,0 +1,98 @@
+# 数据集介绍
+
+本项目使用了中文医学语言理解测评（[Chinese Biomedical Language Understanding Evaluation，CBLUE](https://github.com/CBLUEbenchmark/CBLUE)）1.0 版本数据集，这是国内首个面向中文医疗文本处理的多任务榜单，涵盖了医学文本信息抽取（实体识别、关系抽取）、医学术语归一化、医学文本分类、医学句子关系判定和医学问答共5大类任务8个子任务。其数据来源分布广泛，包括医学教材、电子病历、临床试验公示以及互联网用户真实查询等。该榜单一经推出便受到了学界和业界的广泛关注，已逐渐发展成为检验AI系统中文医疗信息处理能力的“金标准”。
+
+* CMeEE：中文医学命名实体识别
+* CMeIE：中文医学文本实体关系抽取
+* CHIP-CDN：临床术语标准化任务
+* CHIP-CTC：临床试验筛选标准短文本分类
+* CHIP-STS：平安医疗科技疾病问答迁移学习
+* KUAKE-QIC：医疗搜索检索词意图分类
+* KUAKE-QTR：医疗搜索查询词-页面标题相关性
+* KUAKE-QQR：医疗搜索查询词-查询词相关性
+
+更多信息可参考CBLUE的[github](https://github.com/CBLUEbenchmark/CBLUE/blob/main/README_ZH.md)。
+
+## 模型介绍
+
+模型的整体结构与 ELECTRA 相似，包括生成器和判别器两部分。 而 Fine-tune 过程只用到了判别器模块，由 12 层 Transformer 网络组成。
+
+## 快速开始
+
+### 代码结构说明
+
+以下是本项目主要代码结构及说明：
+
+```text
+├── train_classification.py   # 文本分类任务训练评估
+├── model.py                  # 模型的结构定义
+├── utils.py                  # 数据的处理流程
+├── export_model.py           # 动态图模型导出静态图参数
+└── README.md
+```
+
+模型的具体使用在deploy/predictor文件夹下
+
+### 依赖安装
+
+```shell
+pip install xlrd==1.2.0
+```
+
+### 模型训练
+
+我们按照任务类别划分，同时提供了8个任务的不同参数设置。可以运行下边的命令，在训练集上进行训练，并在**验证集**上进行验证。
+
+**训练参数设置（Training setup）及结果**
+
+| Task      | epochs | batch_size | learning_rate | max_seq_length |  metric  | results | results (fp16) |
+| --------- | :----: | :--------: | :-----------: | :------------: | :------: | :-----: | :------------: |
+| CHIP-STS  |    4   |     16     |      3e-5     |       96       | Macro-F1 | 0.88749 |    0.88555     |
+| CHIP-CTC  |    4   |     32     |      6e-5     |      160       | Macro-F1 | 0.84136 |    0.83514     |
+| CHIP-CDN  |   16   |    256     |      3e-5     |       32       |    F1    | 0.76979 |    0.76489     |
+| KUAKE-QQR |    2   |     32     |      6e-5     |       64       | Accuracy | 0.83865 |    0.84053     |
+| KUAKE-QTR |    4   |     32     |      6e-5     |       64       | Accuracy | 0.69722 |    0.69722     |
+| KUAKE-QIC |    4   |     32     |      6e-5     |      128       | Accuracy | 0.81483 |    0.82046     |
+| CMeEE     |    2   |     32     |      6e-5     |      128       | Micro-F1 | 0.66120 |    0.66026     |
+| CMeIE     |  100   |     12     |      6e-5     |      300       | Micro-F1 | 0.61385 |    0.60076     |
+
+可支持配置的参数：
+
+* `save_dir`：可选，保存训练模型的目录；默认保存在当前目录checkpoints文件夹下。
+* `max_seq_length`：可选，ELECTRA模型使用的最大序列长度，最大不能超过512, 若出现显存不足，请适当调低这一参数；默认为128。
+* `batch_size`：可选，批处理大小，请结合显存情况进行调整，若出现显存不足，请适当调低这一参数；默认为32。
+* `learning_rate`：可选，Fine-tune的最大学习率；默认为6e-5。
+* `weight_decay`：可选，控制正则项力度的参数，用于防止过拟合，默认为0.01。
+* `epochs`: 训练轮次，默认为3。
+* `max_steps`: 最大训练步数。若训练`epochs`轮包含的训练步数大于该值，则达到`max_steps`后就提前结束。
+* `valid_steps`: evaluate的间隔steps数，默认100。
+* `save_steps`: 保存checkpoints的间隔steps数，默认100。
+* `logging_steps`: 日志打印的间隔steps数，默认10。
+* `warmup_proption`：可选，学习率warmup策略的比例，如果0.1，则学习率会在前10%训练step的过程中从0慢慢增长到learning_rate, 而后再缓慢衰减，默认为0.1。
+* `init_from_ckpt`：可选，模型参数路径，恢复模型训练；默认为None。
+* `seed`：可选，随机种子，默认为1000.
+* `device`: 选用什么设备进行训练，可选cpu、gpu或npu。如使用gpu训练则参数gpus指定GPU卡号。
+* `use_amp`: 是否使用混合精度训练，默认为False。
+
+
+#### 医疗文本分类任务
+
+```shell
+$ unset CUDA_VISIBLE_DEVICES
+$ python -m paddle.distributed.launch --gpus '0,1,2,3' train_classification.py --dataset CHIP-CDN-2C --batch_size 256 --max_seq_length 32 --learning_rate 3e-5 --epochs 16
+```
+
+其他可支持配置的参数：
+
+* `dataset`：可选，CHIP-CDN-2C CHIP-CTC CHIP-STS KUAKE-QIC KUAKE-QTR KUAKE-QQR，默认为KUAKE-QIC数据集。
+
+### 静态图模型导出
+
+使用动态图训练结束之后，还可以将动态图参数导出成静态图参数，用于部署推理等，具体代码见export_model.py。静态图参数保存在`output_path`指定路径中。
+
+运行方式：
+```shell
+python export_model.py --train_dataset CHIP-CDN-2C --params_path=./checkpoint/model_900/ --output_path=./export
+```
+
+**NOTICE**: train_dataset分类任务选择填上训练数据集名称，params_path选择最好参数的模型的路径。
diff --git a/DPDLDA/deploy/predictor/README.md b/DPDLDA/deploy/predictor/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a24e5a71711f23a7fc4d6edb30c5f6824e6f0dc5
--- /dev/null
+++ b/DPDLDA/deploy/predictor/README.md
@@ -0,0 +1,100 @@
+# 基于ONNXRuntime推理部署指南
+
+本示例以CBLUE数据集微调得到的模型为例，提供了文本分类任务的部署代码，自定义数据集可参考实现。
+在推理部署前需将微调后的动态图模型转换导出为静态图，详细步骤见静态图模型导出。
+
+以下是本部分主要代码结构及说明：
+
+```text
+├── infer_classification.py   # 模型推理的参数设置
+├── predictor.py              # 模型推理的处理流程
+└── README.md
+```
+
+## 环境安装
+
+ONNX模型转换和推理部署依赖于Paddle2ONNX和ONNXRuntime。其中Paddle2ONNX支持将Paddle静态图模型转化为ONNX模型格式。
+
+#### GPU端
+请先确保机器已正确安装NVIDIA相关驱动和基础软件，确保CUDA >= 11.2，CuDNN >= 8.2，并使用以下命令安装所需依赖:
+```
+python -m pip install -r requirements_gpu.tx
+```
+\* 如需使用半精度（FP16）部署，请确保GPU设备的CUDA计算能力 (CUDA Compute Capability) 大于7.0。
+
+#### CPU端
+请使用如下命令安装所需依赖:
+```
+python -m pip install -r requirements_cpu.txt
+```
+## GPU部署推理样例
+
+请使用如下命令进行GPU上的部署，可用`use_fp16`开启**半精度部署推理加速**，可用`device_id`**指定GPU卡号**。
+
+- 文本分类任务
+
+```
+python infer_classification.py --device gpu --device_id 0 --dataset KUAKE-QIC --model_path_prefix ../../export/inference
+```
+
+可支持配置的参数：
+
+* `model_path_prefix`：必须，待推理模型路径前缀。
+* `model_name_or_path`：选择预训练模型；默认为"ernie-health-chinese"。
+* `dataset`：CBLUE中的训练数据集。
+   * `文本分类任务`：包括KUAKE-QIC, KUAKE-QQR, KUAKE-QTR, CHIP-CTC, CHIP-STS, CHIP-CDN-2C；默认为KUAKE-QIC。
+* `max_seq_length`：模型使用的最大序列长度，最大不能超过512；`关系抽取任务`默认为300，其余默认为128。
+* `use_fp16`：选择是否开启FP16进行加速，仅在`devive=gpu`时生效；默认关闭。
+* `batch_size`：批处理大小，请结合显存情况进行调整，若出现显存不足，请适当调低这一参数；默认为200。
+* `device`: 选用什么设备进行训练，可选cpu、gpu；默认为gpu。
+* `device_id`: 选择GPU卡号；默认为0。
+* `data_file`：本地待预测数据文件；默认为None。
+
+#### 本地数据集加载
+如需使用本地数据集，请指定本地待预测数据文件 `data_file`，每行一条样例，单文本输入每句一行，双文本输入以`\t`分隔符隔开。例如
+
+**ctc-data.txt**
+```
+在过去的6个月曾服用偏头痛预防性药物或长期服用镇痛药物者，以及有酒精依赖或药物滥用习惯者；
+患有严重的冠心病、脑卒中，以及传染性疾病、精神疾病者；
+活动性乙肝（包括大三阳或小三阳）或血清学指标（HBsAg或/和HBeAg或/和HBcAb）阳性者，丙肝、肺结核、巨细胞病毒、严重真菌感染或HIV感染；
+...
+```
+
+## CPU部署推理样例
+
+请使用如下命令进行CPU上的部署，可用`num_threads`**调整预测线程数量**。
+
+- 文本分类任务
+
+```
+python infer_classification.py --device cpu --dataset KUAKE-QIC --model_path_prefix ../../export/inference
+```
+
+可支持配置的参数：
+
+* `model_path_prefix`：必须，待推理模型路径前缀。
+* `model_name_or_path`：选择预训练模型；默认为"ernie-health-chinese"。
+* `dataset`：CBLUE中的训练数据集。
+   * `文本分类任务`：包括KUAKE-QIC, KUAKE-QQR, KUAKE-QTR, CHIP-CTC, CHIP-STS, CHIP-CDN-2C；默认为KUAKE-QIC。
+* `max_seq_length`：模型使用的最大序列长度，最大不能超过512；`关系抽取任务`默认为300，其余默认为128。
+* `batch_size`：批处理大小，请结合显存情况进行调整，若出现显存不足，请适当调低这一参数；默认为200。
+* `device`: 选用什么设备进行训练，可选cpu、gpu；默认为gpu。
+* `num_threads`：cpu线程数，在`device=gpu`时影响较小；默认为cpu的物理核心数量。
+* `data_file`：本地待预测数据文件，格式见[GPU部署推理样例](#本地数据集加载)中的介绍；默认为None。
+
+## 性能与精度测试
+
+本节提供了在CBLUE数据集上预测的性能和精度数据，以供参考。
+在CPU上测试，得到的数据如下。
+
+| 数据集      | 最大文本长度 | 精度评估指标 | FP32 指标值 | FP32 latency(ms) |
+| ----------  | ------------ | ------------ | ---------- | ---------------- |
+| KUAKE-QIC   | 128          | Accuracy     | 0.8046     | 37.72            |
+| KUAKE-QTR   | 64           | Accuracy     | 0.6886     | 18.40            |
+| KUAKE-QQR   | 64           | Accuracy     | 0.7755     | 10.34            |
+| CHIP-CTC    | 160          | Macro F1     | 0.8445     | 47.43            |
+| CHIP-STS    | 96           | Macro F1     | 0.8892     | 27.67            |
+| CHIP-CDN-2C | 256          | Micro F1     | 0.8921     | 26.86            |
+| CMeEE       | 128          | Micro F1     | 0.6469     | 37.59            |
+| CMeIE       | 300          | Micro F1     | 0.5902     | 213.04           |
diff --git a/DPDLDA/deploy/predictor/infer_classification.py b/DPDLDA/deploy/predictor/infer_classification.py
new file mode 100644
index 0000000000000000000000000000000000000000..b2ac865b261cf3d23693fb45ed0db1d62bc0b9aa
--- /dev/null
+++ b/DPDLDA/deploy/predictor/infer_classification.py
@@ -0,0 +1,153 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+
+import psutil
+from predictor import CLSPredictor
+
+from paddlenlp.utils.log import logger
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    '''
+    parser.add_argument(
+        "--model_path_prefix", default="D:\PyCharmFile\PaddleNLP-develop\model_zoo\ernie-health\cblue\checkpoint\model_1", type=str, required=True, help="The path prefix of inference model to be used."
+    )
+    '''
+    parser.add_argument(
+        "--model_path_prefix",
+        default="D:/PyCharmFile/PaddleNLP-develop/model_zoo/ernie-health/cblue/checkpoint/model_train/", type=str,
+        help="The path prefix of inference model to be used."
+    )
+    parser.add_argument(
+        "--model_name_or_path", default="ernie-health-chinese", type=str, help="The directory or name of model."
+    )
+    parser.add_argument("--dataset", default="CHIP-STS", type=str, help="Dataset for text classfication.")
+    parser.add_argument("--data_file", default=None, type=str, help="The data to predict with one sample per line.")
+    parser.add_argument(
+        "--max_seq_length", default=128, type=int, help="The maximum total input sequence length after tokenization."
+    )
+    parser.add_argument(
+        "--use_fp16",
+        action="store_true",
+        help="Whether to use fp16 inference, only takes effect when deploying on gpu.",
+    )
+    parser.add_argument("--batch_size", default=32, type=int, help="Batch size per GPU/CPU for predicting.")
+    parser.add_argument(
+        "--num_threads", default=psutil.cpu_count(logical=False), type=int, help="num_threads for cpu."
+    )
+    parser.add_argument(
+        "--device", choices=["cpu", "gpu"], default="cpu", help="Select which device to train model, defaults to gpu."
+    )
+    parser.add_argument("--device_id", default=0, help="Select which gpu device to train model.")
+    args = parser.parse_args()
+    return args
+
+
+LABEL_LIST = {
+    "kuake-qic": ["病情诊断", "治疗方案", "病因分析", "指标解读", "就医建议", "疾病表述", "后果表述", "注意事项", "功效作用", "医疗费用", "其他"],
+    "kuake-qtr": ["完全不匹配", "很少匹配，有一些参考价值", "部分匹配", "完全匹配"],
+    "kuake-qqr": ["B为A的语义父集，B指代范围大于A； 或者A与B语义毫无关联。", "B为A的语义子集，B指代范围小于A。", "表示A与B等价，表述完全一致。"],
+    "chip-ctc": [
+        "成瘾行为",
+        "居住情况",
+        "年龄",
+        "酒精使用",
+        "过敏耐受",
+        "睡眠",
+        "献血",
+        "能力",
+        "依存性",
+        "知情同意",
+        "数据可及性",
+        "设备",
+        "诊断",
+        "饮食",
+        "残疾群体",
+        "疾病",
+        "教育情况",
+        "病例来源",
+        "参与其它试验",
+        "伦理审查",
+        "种族",
+        "锻炼",
+        "性别",
+        "健康群体",
+        "实验室检查",
+        "预期寿命",
+        "读写能力",
+        "含有多类别的语句",
+        "肿瘤进展",
+        "疾病分期",
+        "护理",
+        "口腔相关",
+        "器官组织状态",
+        "药物",
+        "怀孕相关",
+        "受体状态",
+        "研究者决定",
+        "风险评估",
+        "性取向",
+        "体征(医生检测）",
+        " 吸烟状况",
+        "特殊病人特征",
+        "症状(患者感受)",
+        "治疗或手术",
+    ],
+    "chip-sts": ["语义不同", "语义相同"],
+    "chip-cdn-2c": ["否", "是"],
+}
+
+TEXT = {
+    "kuake-qic": ["心肌缺血如何治疗与调养呢？", "什么叫痔核脱出？什么叫外痔？"],
+    "kuake-qtr": [["儿童远视眼怎么恢复视力", "近视眼该如何保养才能恢复一些视力"], ["维生素的药有哪些", "抗生素类的药物都有哪些？"],["儿童远视眼怎么恢复视力", "抗生素类的药物都有哪些？"]],
+    "kuake-qqr": [["茴香是发物吗", "茴香怎么吃？"], ["气的胃疼是怎么回事", "气到胃痛是什么原因"]],
+    "chip-ctc": ["(1)前牙结构发育不良：釉质发育不全、氟斑牙、四环素牙等；", "怀疑或确有酒精或药物滥用史；"],
+    "chip-sts": [["糖尿病能吃减肥药吗？能治愈吗？", "糖尿病为什么不能吃减肥药"], ["H型高血压的定义", "WHO对高血压的最新分类定义标准数值"],["糖尿病能吃减肥药吗？能治愈吗？", "WHO对高血压的最新分类定义标准数值"]],
+    "chip-cdn-2c": [["1型糖尿病性植物神经病变", " 1型糖尿病肾病IV期"], ["髂腰肌囊性占位", "髂肌囊肿"],["1型糖尿病性植物神经病变", "髂肌囊肿"]],
+}
+
+METRIC = {
+    "kuake-qic": "acc",
+    "kuake-qtr": "acc",
+    "kuake-qqr": "acc",
+    "chip-ctc": "macro",
+    "chip-sts": "macro",
+    "chip-cdn-2c": "macro",
+}
+
+
+def main():
+    args = parse_args()
+
+    for arg_name, arg_value in vars(args).items():
+        logger.info("{:20}: {}".format(arg_name, arg_value))
+
+    args.dataset = args.dataset.lower()
+    label_list = LABEL_LIST[args.dataset]
+    if args.data_file is not None:
+        with open(args.data_file, "r") as fp:
+            input_data = [x.strip().split("\t") for x in fp.readlines()]
+            input_data = [x[0] if len(x) == 1 else x for x in input_data]
+    else:
+        input_data = TEXT[args.dataset]
+
+    predictor = CLSPredictor(args, label_list)
+    predictor.predict(input_data)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/DPDLDA/deploy/predictor/predictor.py b/DPDLDA/deploy/predictor/predictor.py
new file mode 100644
index 0000000000000000000000000000000000000000..75c8f778731b028b86d0cdc8af495eaa4b2af8be
--- /dev/null
+++ b/DPDLDA/deploy/predictor/predictor.py
@@ -0,0 +1,366 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+
+import numpy as np
+import onnxruntime as ort
+import paddle2onnx
+import six
+
+from paddlenlp.transformers import (
+    AutoTokenizer,
+    normalize_chars,
+    tokenize_special_chars,
+)
+from paddlenlp.utils.log import logger
+
+
+class InferBackend(object):
+    def __init__(self, model_path_prefix, device="cpu", device_id=0, use_fp16=False, num_threads=10):
+
+        if not isinstance(device, six.string_types):
+            logger.error(
+                ">>> [InferBackend] The type of device must be string, but the type you set is: ", type(device)
+            )
+            exit(0)
+        if device not in ["cpu", "gpu"]:
+            logger.error(">>> [InferBackend] The device must be cpu or gpu, but your device is set to:", type(device))
+            exit(0)
+
+        logger.info(">>> [InferBackend] Creating Engine ...")
+
+        onnx_model = paddle2onnx.command.c_paddle_to_onnx(
+            model_file=model_path_prefix + "model.pdmodel",
+            params_file=model_path_prefix + "model.pdiparams",
+            opset_version=13,
+            enable_onnx_checker=True,
+        )
+
+        infer_model_dir = model_path_prefix.rsplit("/", 1)[0]
+        float_onnx_file = os.path.join(infer_model_dir, "model.onnx")
+        '''
+        #infer_model_dir = model_path_prefix.rsplit("/", 1)[0]
+        float_onnx_file = os.path.join(model_path_prefix, ".onnx")
+        '''
+        with open(float_onnx_file, "wb") as f:
+            f.write(onnx_model)
+
+        if device == "gpu":
+            logger.info(">>> [InferBackend] Use GPU to inference ...")
+            providers = ["CUDAExecutionProvider"]
+            if use_fp16:
+                logger.info(">>> [InferBackend] Use FP16 to inference ...")
+                import onnx
+                from onnxconverter_common import float16
+
+                fp16_model_file = os.path.join(infer_model_dir, "fp16_model.onnx")
+                onnx_model = onnx.load_model(float_onnx_file)
+                trans_model = float16.convert_float_to_float16(onnx_model, keep_io_types=True)
+                onnx.save_model(trans_model, fp16_model_file)
+                onnx_model = fp16_model_file
+        else:
+            logger.info(">>> [InferBackend] Use CPU to inference ...")
+            providers = ["CPUExecutionProvider"]
+            if use_fp16:
+                logger.warning(
+                    ">>> [InferBackend] Ignore use_fp16 as it only " + "takes effect when deploying on gpu..."
+                )
+
+        sess_options = ort.SessionOptions()
+        sess_options.intra_op_num_threads = num_threads
+        self.predictor = ort.InferenceSession(
+            onnx_model, sess_options=sess_options, providers=providers, provider_options=[{"device_id": device_id}]
+        )
+
+        self.input_handles = [
+            self.predictor.get_inputs()[0].name,
+            self.predictor.get_inputs()[1].name,
+        ]
+
+        if device == "gpu":
+            try:
+                assert "CUDAExecutionProvider" in self.predictor.get_providers()
+            except AssertionError:
+                raise AssertionError(
+                    """The environment for GPU inference is not set properly. \nA possible cause is that you had installed both onnxruntime and onnxruntime-gpu. \nPlease run the following commands to reinstall: \n1) pip uninstall -y onnxruntime onnxruntime-gpu  \n2) pip install onnxruntime-gpu"""
+                )
+        logger.info(">>> [InferBackend] Engine Created ...")
+
+    def infer(self, input_dict: dict):
+        input_dict = {k: v for k, v in input_dict.items() if k in self.input_handles}
+        result = self.predictor.run(None, input_dict)
+        return result
+
+
+class EHealthPredictor(object):
+    def __init__(self, args, label_list):
+        self.label_list = label_list
+        self._tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path, use_fast=True)
+        self._max_seq_length = args.max_seq_length
+        self._batch_size = args.batch_size
+        self.inference_backend = InferBackend(
+            args.model_path_prefix, args.device, args.device_id, args.use_fp16, args.num_threads
+        )
+
+    def predict(self, input_data: list):
+        encoded_inputs = self.preprocess(input_data)
+        infer_result = self.infer_batch(encoded_inputs)
+        result = self.postprocess(infer_result)
+        self.printer(result, input_data)
+        return result
+
+    def _infer(self, input_dict):
+        infer_data = self.inference_backend.infer(input_dict)
+        return infer_data
+
+    def infer_batch(self, encoded_inputs):
+        num_sample = len(encoded_inputs["input_ids"])
+        infer_data = None
+        num_infer_data = None
+        for idx in range(0, num_sample, self._batch_size):
+            l, r = idx, idx + self._batch_size
+            keys = encoded_inputs.keys()
+            input_dict = {k: encoded_inputs[k][l:r] for k in keys}
+            results = self._infer(input_dict)
+            if infer_data is None:
+                infer_data = [[x] for x in results]
+                num_infer_data = len(results)
+            else:
+                for i in range(num_infer_data):
+                    infer_data[i].append(results[i])
+        for i in range(num_infer_data):
+            infer_data[i] = np.concatenate(infer_data[i], axis=0)
+        return infer_data
+
+    def performance(self, encoded_inputs):
+        nums = len(encoded_inputs["input_ids"])
+        start_time = time.time()
+        infer_result = self.infer_batch(preprocess_result)  # noqa
+        total_time = time.time() - start_time
+        logger.info("sample nums: %d, time: %.2f, latency: %.2f ms" % (nums, total_time, 1000 * total_time / nums))
+
+    def get_text_and_label(self, dataset):
+        raise NotImplementedError
+
+    def preprocess(self, input_data: list):
+        raise NotImplementedError
+
+    def postprocess(self, infer_data):
+        raise NotImplementedError
+
+    def printer(self, result, input_data):
+        raise NotImplementedError
+
+
+class CLSPredictor(EHealthPredictor):
+    def preprocess(self, input_data: list):
+        norm_text = lambda x: tokenize_special_chars(normalize_chars(x))
+        # To deal with a pair of input text.
+        if isinstance(input_data[0], list):
+            text = [norm_text(sample[0]) for sample in input_data]
+            text_pair = [norm_text(sample[1]) for sample in input_data]
+        else:
+            text = [norm_text(x) for x in input_data]
+            text_pair = None
+
+        data = self._tokenizer(
+            text=text, text_pair=text_pair, max_length=self._max_seq_length, padding=True, truncation=True
+        )
+
+        encoded_inputs = {
+            "input_ids": np.array(data["input_ids"], dtype="int64"),
+            "token_type_ids": np.array(data["token_type_ids"], dtype="int64"),
+        }
+        return encoded_inputs
+
+    def postprocess(self, infer_data):
+        infer_data = infer_data[0]
+        max_value = np.max(infer_data, axis=1, keepdims=True)
+        exp_data = np.exp(infer_data - max_value)
+        probs = exp_data / np.sum(exp_data, axis=1, keepdims=True)
+        label = probs.argmax(axis=-1)
+        confidence = probs.max(axis=-1)
+        return {"label": label, "confidence": confidence}
+
+    def printer(self, result, input_data):
+        label, confidence = result["label"], result["confidence"]
+        for i in range(len(label)):
+            logger.info("input data: {}".format(input_data[i]))
+            logger.info("labels: {}, confidence: {}".format(self.label_list[label[i]], confidence[i]))
+            logger.info("-----------------------------")
+
+
+class NERPredictor(EHealthPredictor):
+    """The predictor for CMeEE dataset."""
+
+    en_to_cn = {
+        "bod": "身体",
+        "mic": "微生物类",
+        "dis": "疾病",
+        "sym": "临床表现",
+        "pro": "医疗程序",
+        "equ": "医疗设备",
+        "dru": "药物",
+        "dep": "科室",
+        "ite": "医学检验项目",
+    }
+
+    def _extract_chunk(self, tokens):
+        chunks = set()
+        start_idx, cur_idx = 0, 0
+        while cur_idx < len(tokens):
+            if tokens[cur_idx][0] == "B":
+                start_idx = cur_idx
+                cur_idx += 1
+                while cur_idx < len(tokens) and tokens[cur_idx][0] == "I":
+                    if tokens[cur_idx][2:] == tokens[start_idx][2:]:
+                        cur_idx += 1
+                    else:
+                        break
+                if cur_idx < len(tokens) and tokens[cur_idx][0] == "E":
+                    if tokens[cur_idx][2:] == tokens[start_idx][2:]:
+                        chunks.add((tokens[cur_idx][2:], start_idx - 1, cur_idx))
+                        cur_idx += 1
+            elif tokens[cur_idx][0] == "S":
+                chunks.add((tokens[cur_idx][2:], cur_idx - 1, cur_idx))
+                cur_idx += 1
+            else:
+                cur_idx += 1
+        return list(chunks)
+
+    def preprocess(self, infer_data):
+        infer_data = [[x.lower() for x in text] for text in infer_data]
+        data = self._tokenizer(
+            infer_data, max_length=self._max_seq_length, padding=True, is_split_into_words=True, truncation=True
+        )
+
+        encoded_inputs = {
+            "input_ids": np.array(data["input_ids"], dtype="int64"),
+            "token_type_ids": np.array(data["token_type_ids"], dtype="int64"),
+        }
+        return encoded_inputs
+
+    def postprocess(self, infer_data):
+        tokens_oth = np.argmax(infer_data[0], axis=-1)
+        tokens_sym = np.argmax(infer_data[1], axis=-1)
+        entity = []
+        for oth_ids, sym_ids in zip(tokens_oth, tokens_sym):
+            token_oth = [self.label_list[0][x] for x in oth_ids]
+            token_sym = [self.label_list[1][x] for x in sym_ids]
+            chunks = self._extract_chunk(token_oth) + self._extract_chunk(token_sym)
+            sub_entity = []
+            for etype, sid, eid in chunks:
+                sub_entity.append({"type": self.en_to_cn[etype], "start_id": sid, "end_id": eid})
+            entity.append(sub_entity)
+        return {"entity": entity}
+
+    def printer(self, result, input_data):
+        result = result["entity"]
+        for i, preds in enumerate(result):
+            logger.info("input data: {}".format(input_data[i]))
+            logger.info("detected entities:")
+            for item in preds:
+                logger.info(
+                    "* entity: {}, type: {}, position: ({}, {})".format(
+                        input_data[i][item["start_id"] : item["end_id"]],
+                        item["type"],
+                        item["start_id"],
+                        item["end_id"],
+                    )
+                )
+            logger.info("-----------------------------")
+
+
+class SPOPredictor(EHealthPredictor):
+    """The predictor for the CMeIE dataset."""
+
+    def predict(self, input_data: list):
+        encoded_inputs = self.preprocess(input_data)
+        lengths = encoded_inputs["attention_mask"].sum(axis=-1)
+        infer_result = self.infer_batch(encoded_inputs)
+        result = self.postprocess(infer_result, lengths)
+        self.printer(result, input_data)
+        return result
+
+    def preprocess(self, infer_data):
+        infer_data = [[x.lower() for x in text] for text in infer_data]
+        data = self._tokenizer(
+            infer_data,
+            max_length=self._max_seq_length,
+            padding=True,
+            is_split_into_words=True,
+            truncation=True,
+            return_attention_mask=True,
+        )
+        encoded_inputs = {
+            "input_ids": np.array(data["input_ids"], dtype="int64"),
+            "token_type_ids": np.array(data["token_type_ids"], dtype="int64"),
+            "attention_mask": np.array(data["attention_mask"], dtype="float32"),
+        }
+        return encoded_inputs
+
+    def postprocess(self, infer_data, lengths):
+        ent_logits = np.array(infer_data[0])
+        spo_logits = np.array(infer_data[1])
+        ent_pred_list = []
+        ent_idxs_list = []
+        for idx, ent_pred in enumerate(ent_logits):
+            seq_len = lengths[idx] - 2
+            start = np.where(ent_pred[:, 0] > 0.5)[0]
+            end = np.where(ent_pred[:, 1] > 0.5)[0]
+            ent_pred = []
+            ent_idxs = {}
+            for x in start:
+                y = end[end >= x]
+                if (x == 0) or (x > seq_len):
+                    continue
+                if len(y) > 0:
+                    y = y[0]
+                    if y > seq_len:
+                        continue
+                    ent_idxs[x] = (x - 1, y - 1)
+                    ent_pred.append((x - 1, y - 1))
+            ent_pred_list.append(ent_pred)
+            ent_idxs_list.append(ent_idxs)
+
+        spo_preds = spo_logits > 0
+        spo_pred_list = [[] for _ in range(len(spo_preds))]
+        idxs, preds, subs, objs = np.nonzero(spo_preds)
+        for idx, p_id, s_id, o_id in zip(idxs, preds, subs, objs):
+            obj = ent_idxs_list[idx].get(o_id, None)
+            if obj is None:
+                continue
+            sub = ent_idxs_list[idx].get(s_id, None)
+            if sub is None:
+                continue
+            spo_pred_list[idx].append((tuple(sub), p_id, tuple(obj)))
+
+        return {"entity": ent_pred_list, "spo": spo_pred_list}
+
+    def printer(self, result, input_data):
+        ent_pred_list, spo_pred_list = result["entity"], result["spo"]
+        for i, (ent, rel) in enumerate(zip(ent_pred_list, spo_pred_list)):
+            logger.info("input data: {}".format(input_data[i]))
+            logger.info("detected entities and relations:")
+            for sid, eid in ent:
+                logger.info("* entity: {}, position: ({}, {})".format(input_data[i][sid : eid + 1], sid, eid))
+            for s, p, o in rel:
+                logger.info(
+                    "+ spo: ({}, {}, {})".format(
+                        input_data[i][s[0] : s[1] + 1], self.label_list[p], input_data[i][o[0] : o[1] + 1]
+                    )
+                )
+            logger.info("-----------------------------")
diff --git a/DPDLDA/deploy/predictor/requirements_cpu.txt b/DPDLDA/deploy/predictor/requirements_cpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..645682ec79c6c8694ee9ea288af3dc3c416a4dfb
--- /dev/null
+++ b/DPDLDA/deploy/predictor/requirements_cpu.txt
@@ -0,0 +1,2 @@
+onnxruntime==1.10.0
+psutil
diff --git a/DPDLDA/deploy/predictor/requirements_gpu.txt b/DPDLDA/deploy/predictor/requirements_gpu.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2ca8b172eb7993140d6f5e2c3692a200195dd1ee
--- /dev/null
+++ b/DPDLDA/deploy/predictor/requirements_gpu.txt
@@ -0,0 +1,4 @@
+onnxruntime-gpu==1.11.1
+onnx==1.12.0
+onnxconverter-common==1.9.0
+psutil
diff --git a/DPDLDA/export_model.py b/DPDLDA/export_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..2ab246ef2955bcb3271114f4b1941725b2fca34d
--- /dev/null
+++ b/DPDLDA/export_model.py
@@ -0,0 +1,97 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+
+import paddle
+from model import ElectraForBinaryTokenClassification, ElectraForSPO
+
+from paddlenlp.transformers import ElectraForSequenceClassification
+
+NUM_CLASSES = {
+    "CHIP-CDN-2C": 2,
+    "CHIP-STS": 2,
+    "CHIP-CTC": 44,
+    "KUAKE-QQR": 3,
+    "KUAKE-QTR": 4,
+    "KUAKE-QIC": 11,
+    "CMeEE": [33, 5],
+    "CMeIE": 44,
+}
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    #parser.add_argument("--train_dataset",default="CHIP-CDN-2C", required=True, type=str, help="The name of dataset used for training.")
+    parser.add_argument("--train_dataset", default="CHIP-STS", type=str,help="The name of dataset used for training.")
+    '''
+    parser.add_argument(
+        "--params_path",
+        type=str,
+        required=True,
+        default="./checkpoint/model_1/",
+        help="The path to model parameters to be loaded.",
+    )
+    '''
+    parser.add_argument(
+        "--params_path",
+        type=str,
+        default="./checkpoint/model_1500/",
+        help="The path to model parameters to be loaded.",
+    )
+    parser.add_argument(
+        "--output_path", type=str, default="./export2", help="The path of model parameter in static graph to be saved."
+    )
+    args = parser.parse_args()
+    return args
+
+
+def main():
+    args = parse_args()
+
+    # Load the model parameters.
+    if args.train_dataset not in NUM_CLASSES:
+        raise ValueError(f"Please modify the code to fit {args.dataset}")
+
+    if args.train_dataset == "CMeEE":
+        model = ElectraForBinaryTokenClassification.from_pretrained(
+            args.params_path,
+            num_classes_oth=NUM_CLASSES[args.train_dataset][0],
+            num_classes_sym=NUM_CLASSES[args.train_dataset][1],
+        )
+    elif args.train_dataset == "CMeIE":
+        model = ElectraForSPO.from_pretrained(args.params_path, num_labels=NUM_CLASSES[args.train_dataset])
+    else:
+        model = ElectraForSequenceClassification.from_pretrained(
+            args.params_path, num_labels=NUM_CLASSES[args.train_dataset]
+        )
+
+    model.eval()
+
+    # Convert to static graph with specific input description:
+    # input_ids, token_type_ids
+    input_spec = [
+        paddle.static.InputSpec(shape=[None, None], dtype="int64"),
+        paddle.static.InputSpec(shape=[None, None], dtype="int64"),
+    ]
+    model = paddle.jit.to_static(model, input_spec=input_spec)
+
+    # Save in static graph model.
+    save_path = os.path.join(args.output_path, "inference")
+    paddle.jit.save(model, save_path)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/DPDLDA/model.py b/DPDLDA/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..f93da173e2bbc1becb43cd7f803d200daa49c5b8
--- /dev/null
+++ b/DPDLDA/model.py
@@ -0,0 +1,123 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+
+#from paddlenlp.transformers import ElectraConfig, ElectraModel, ElectraPretrainedModel
+from transformers import ElectraConfig
+from paddlenlp.transformers import ElectraModel, ElectraPretrainedModel
+
+class ElectraForBinaryTokenClassification(ElectraPretrainedModel):
+    """
+    Electra Model with two linear layers on top of the hidden-states output layers,
+    designed for token classification tasks with nesting.
+
+    Args:
+        electra (:class:`ElectraModel`):
+            An instance of ElectraModel.
+        num_classes (list):
+            The number of classes.
+        dropout (float, optionl):
+            The dropout probability for output of Electra.
+            If None, use the same value as `hidden_dropout_prob' of 'ElectraModel`
+            instance `electra`. Defaults to None.
+    """
+
+    def __init__(self, config: ElectraConfig, num_classes_oth, num_classes_sym):
+        super(ElectraForBinaryTokenClassification, self).__init__(config)
+        self.num_classes_oth = num_classes_oth
+        self.num_classes_sym = num_classes_sym
+        self.electra = ElectraModel(config)
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)
+        self.classifier_oth = nn.Linear(config.hidden_size, self.num_classes_oth)
+        self.classifier_sym = nn.Linear(config.hidden_size, self.num_classes_sym)
+
+    def forward(self, input_ids=None, token_type_ids=None, position_ids=None, attention_mask=None):
+        sequence_output = self.electra(input_ids, token_type_ids, position_ids, attention_mask)
+        sequence_output = self.dropout(sequence_output)
+
+        logits_sym = self.classifier_sym(sequence_output)
+        logits_oth = self.classifier_oth(sequence_output)
+
+        return logits_oth, logits_sym
+
+
+class MultiHeadAttentionForSPO(nn.Layer):
+    """
+    Multi-head attention layer for SPO task.
+    """
+
+    def __init__(self, embed_dim, num_heads, scale_value=768):
+        super(MultiHeadAttentionForSPO, self).__init__()
+        self.embed_dim = embed_dim
+        self.num_heads = num_heads
+        self.scale_value = scale_value**-0.5
+        self.q_proj = nn.Linear(embed_dim, embed_dim * num_heads)
+        self.k_proj = nn.Linear(embed_dim, embed_dim * num_heads)
+
+    def forward(self, query, key):
+        q = self.q_proj(query)
+        k = self.k_proj(key)
+        q = paddle.reshape(q, shape=[0, 0, self.num_heads, self.embed_dim])
+        k = paddle.reshape(k, shape=[0, 0, self.num_heads, self.embed_dim])
+        q = paddle.transpose(q, perm=[0, 2, 1, 3])
+        k = paddle.transpose(k, perm=[0, 2, 1, 3])
+        scores = paddle.matmul(q, k, transpose_y=True)
+        scores = paddle.scale(scores, scale=self.scale_value)
+        return scores
+
+
+class ElectraForSPO(ElectraPretrainedModel):
+    """
+    Electra Model with a linear layer on top of the hidden-states output
+    layers for entity recognition, and a multi-head attention layer for
+    relation classification.
+
+    Args:
+        electra (:class:`ElectraModel`):
+            An instance of ElectraModel.
+        num_classes (int):
+            The number of classes.
+        dropout (float, optionl):
+            The dropout probability for output of Electra.
+            If None, use the same value as `hidden_dropout_prob' of 'ElectraModel`
+            instance `electra`. Defaults to None.
+    """
+
+    def __init__(self, config: ElectraConfig):
+        super(ElectraForSPO, self).__init__(config)
+        self.num_classes = config.num_labels
+        self.electra = ElectraModel(config)
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)
+        self.classifier = nn.Linear(config.hidden_size, 2)
+        self.span_attention = MultiHeadAttentionForSPO(config.hidden_size, config.num_labels)
+
+    def forward(self, input_ids=None, token_type_ids=None, position_ids=None, attention_mask=None):
+        outputs = self.electra(
+            input_ids, token_type_ids, position_ids, attention_mask, output_hidden_states=True, return_dict=True
+        )
+        sequence_outputs = outputs.last_hidden_state
+        all_hidden_states = outputs.hidden_states
+        sequence_outputs = self.dropout(sequence_outputs)
+        ent_logits = self.classifier(sequence_outputs)
+
+        subject_output = all_hidden_states[-2]
+        cls_output = paddle.unsqueeze(sequence_outputs[:, 0, :], axis=1)
+        subject_output = subject_output + cls_output
+
+        output_size = self.num_classes + self.electra.config["hidden_size"]  # noqa:F841
+        rel_logits = self.span_attention(sequence_outputs, subject_output)
+
+        return ent_logits, rel_logits
diff --git a/DPDLDA/train_classification.py b/DPDLDA/train_classification.py
new file mode 100644
index 0000000000000000000000000000000000000000..12c259ddbc1401975728928177f5a595e29718b6
--- /dev/null
+++ b/DPDLDA/train_classification.py
@@ -0,0 +1,265 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import distutils.util
+import os
+import random
+import time
+from functools import partial
+
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+from paddle.metric import Accuracy
+from utils import LinearDecayWithWarmup, convert_example, create_dataloader
+
+from paddlenlp.data import Pad, Stack, Tuple
+from paddlenlp.datasets import load_dataset
+from paddlenlp.metrics import AccuracyAndF1, MultiLabelsMetric
+from paddlenlp.transformers import ElectraForSequenceClassification, ElectraTokenizer
+
+METRIC_CLASSES = {
+    "KUAKE-QIC": Accuracy,
+    "KUAKE-QQR": Accuracy,
+    "KUAKE-QTR": Accuracy,
+    "CHIP-CTC": MultiLabelsMetric,
+    "CHIP-STS": MultiLabelsMetric,
+    "CHIP-CDN-2C": AccuracyAndF1,
+}
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+    "--dataset",
+    choices=["KUAKE-QIC", "KUAKE-QQR", "KUAKE-QTR", "CHIP-STS", "CHIP-CTC", "CHIP-CDN-2C"],
+    default="CHIP-STS",
+    type=str,
+    help="Dataset for sequence classfication tasks.",
+)
+parser.add_argument("--seed", default=1000, type=int, help="Random seed for initialization.")
+parser.add_argument(
+    "--device",
+    choices=["cpu", "gpu", "xpu", "npu"],
+    default="cpu",
+    help="Select which device to train model, default to gpu.",
+)
+parser.add_argument("--epochs", default=3, type=int, help="Total number of training epochs.")
+parser.add_argument(
+    "--max_steps", default=-1, type=int, help="If > 0: set total number of training steps to perform. Override epochs."
+)
+parser.add_argument("--batch_size", default=32, type=int, help="Batch size per GPU/CPU for training.")
+parser.add_argument(
+    "--learning_rate", default=6e-5, type=float, help="Learning rate for fine-tuning sequence classification task."
+)
+parser.add_argument("--weight_decay", default=0.01, type=float, help="Weight decay of optimizer if we apply some.")
+parser.add_argument(
+    "--warmup_proportion",
+    default=0.1,
+    type=float,
+    help="Linear warmup proportion of learning rate over the training process.",
+)
+parser.add_argument(
+    "--max_seq_length", default=128, type=int, help="The maximum total input sequence length after tokenization."
+)
+parser.add_argument("--init_from_ckpt", default=None, type=str, help="The path of checkpoint to be loaded.")
+parser.add_argument("--logging_steps", default=10, type=int, help="The interval steps to logging.")
+parser.add_argument(
+    "--save_dir",
+    default="./checkpoint",
+    type=str,
+    help="The output directory where the model checkpoints will be written.",
+)
+parser.add_argument("--save_steps", default=20, type=int, help="The interval steps to save checkpoints.")
+parser.add_argument("--valid_steps", default=20, type=int, help="The interval steps to evaluate model performance.")
+parser.add_argument("--use_amp", default=False, type=distutils.util.strtobool, help="Enable mixed precision training.")
+parser.add_argument("--scale_loss", default=128, type=float, help="The value of scale_loss for fp16.")
+
+args = parser.parse_args()
+
+
+def set_seed(seed):
+    """set random seed"""
+    random.seed(seed)
+    np.random.seed(seed)
+    paddle.seed(seed)
+
+
+@paddle.no_grad()
+def evaluate(model, criterion, metric, data_loader):
+    """
+    Given a dataset, it evals model and compute the metric.
+
+    Args:
+        model(obj:`paddle.nn.Layer`): A model to classify texts.
+        dataloader(obj:`paddle.io.DataLoader`): The dataset loader which generates batches.
+        criterion(obj:`paddle.nn.Layer`): It can compute the loss.
+        metric(obj:`paddle.metric.Metric`): The evaluation metric.
+    """
+    model.eval()
+    metric.reset()
+    losses = []
+    for batch in data_loader:
+        input_ids, token_type_ids, position_ids, labels = batch
+        logits = model(input_ids, token_type_ids, position_ids)
+        loss = criterion(logits, labels)
+        losses.append(loss.numpy())
+        correct = metric.compute(logits, labels)
+        metric.update(correct)
+    if isinstance(metric, Accuracy):
+        metric_name = "accuracy"
+        result = metric.accumulate()
+    elif isinstance(metric, MultiLabelsMetric):
+        metric_name = "macro f1"
+        _, _, result = metric.accumulate("macro")
+    else:
+        metric_name = "micro f1"
+        _, _, _, result, _ = metric.accumulate()
+
+    print("eval loss: %.5f, %s: %.5f" % (np.mean(losses), metric_name, result))
+    model.train()
+    metric.reset()
+
+
+def do_train():
+    paddle.set_device(args.device)
+    rank = paddle.distributed.get_rank()
+    if paddle.distributed.get_world_size() > 1:
+        paddle.distributed.init_parallel_env()
+
+    set_seed(args.seed)
+
+    train_ds, dev_ds = load_dataset("cblue", args.dataset, splits=["train", "dev"])
+
+    model = ElectraForSequenceClassification.from_pretrained(
+        "ernie-health-chinese", num_labels=len(train_ds.label_list)
+    )
+    tokenizer = ElectraTokenizer.from_pretrained("ernie-health-chinese")
+
+    trans_func = partial(convert_example, tokenizer=tokenizer, max_seq_length=args.max_seq_length)
+    batchify_fn = lambda samples, fn=Tuple(  # noqa: E731
+        Pad(axis=0, pad_val=tokenizer.pad_token_id, dtype="int64"),  # input
+        Pad(axis=0, pad_val=tokenizer.pad_token_type_id, dtype="int64"),  # segment
+        Pad(axis=0, pad_val=args.max_seq_length - 1, dtype="int64"),  # position
+        Stack(dtype="int64"),
+    ): [data for data in fn(samples)]
+    train_data_loader = create_dataloader(
+        train_ds, mode="train", batch_size=args.batch_size, batchify_fn=batchify_fn, trans_fn=trans_func
+    )
+    dev_data_loader = create_dataloader(
+        dev_ds, mode="dev", batch_size=args.batch_size, batchify_fn=batchify_fn, trans_fn=trans_func
+    )
+
+    if args.init_from_ckpt and os.path.isfile(args.init_from_ckpt):
+        state_dict = paddle.load(args.init_from_ckpt)
+        state_keys = {x: x.replace("discriminator.", "") for x in state_dict.keys() if "discriminator." in x}
+        if len(state_keys) > 0:
+            state_dict = {state_keys[k]: state_dict[k] for k in state_keys.keys()}
+        model.set_dict(state_dict)
+    if paddle.distributed.get_world_size() > 1:
+        model = paddle.DataParallel(model)
+
+    num_training_steps = args.max_steps if args.max_steps > 0 else len(train_data_loader) * args.epochs
+    args.epochs = (num_training_steps - 1) // len(train_data_loader) + 1
+
+    lr_scheduler = LinearDecayWithWarmup(args.learning_rate, num_training_steps, args.warmup_proportion)
+
+    # Generate parameter names needed to perform weight decay.
+    # All bias and LayerNorm parameters are excluded.
+    decay_params = [p.name for n, p in model.named_parameters() if not any(nd in n for nd in ["bias", "norm"])]
+
+    optimizer = paddle.optimizer.AdamW(
+        learning_rate=lr_scheduler,
+        parameters=model.parameters(),
+        weight_decay=args.weight_decay,
+        apply_decay_param_fun=lambda x: x in decay_params,
+    )
+
+    criterion = paddle.nn.loss.CrossEntropyLoss()
+    if METRIC_CLASSES[args.dataset] is Accuracy:
+        metric = METRIC_CLASSES[args.dataset]()
+        metric_name = "accuracy"
+    elif METRIC_CLASSES[args.dataset] is MultiLabelsMetric:
+        metric = METRIC_CLASSES[args.dataset](num_labels=len(train_ds.label_list))
+        metric_name = "macro f1"
+    else:
+        metric = METRIC_CLASSES[args.dataset]()
+        metric_name = "micro f1"
+    if args.use_amp:
+        scaler = paddle.amp.GradScaler(init_loss_scaling=args.scale_loss)
+    global_step = 0
+    tic_train = time.time()
+    total_train_time = 0
+    for epoch in range(1, args.epochs + 1):
+        for step, batch in enumerate(train_data_loader, start=1):
+            input_ids, token_type_ids, position_ids, labels = batch
+            with paddle.amp.auto_cast(
+                args.use_amp,
+                custom_white_list=["layer_norm", "softmax", "gelu", "tanh"],
+            ):
+                logits = model(input_ids, token_type_ids, position_ids)
+                loss = criterion(logits, labels)
+            probs = F.softmax(logits, axis=1)
+            correct = metric.compute(probs, labels)
+            metric.update(correct)
+
+            if isinstance(metric, Accuracy):
+                result = metric.accumulate()
+            elif isinstance(metric, MultiLabelsMetric):
+                _, _, result = metric.accumulate("macro")
+            else:
+                _, _, _, result, _ = metric.accumulate()
+
+            if args.use_amp:
+                scaler.scale(loss).backward()
+                scaler.minimize(optimizer, loss)
+            else:
+                loss.backward()
+                optimizer.step()
+            lr_scheduler.step()
+            optimizer.clear_grad()
+
+            global_step += 1
+            if global_step % args.logging_steps == 0 and rank == 0:
+                time_diff = time.time() - tic_train
+                total_train_time += time_diff
+                print(
+                    "global step %d, epoch: %d, batch: %d, loss: %.5f, %s: %.5f, speed: %.2f step/s"
+                    % (global_step, epoch, step, loss, metric_name, result, args.logging_steps / time_diff)
+                )
+
+            if global_step % args.valid_steps == 0 and rank == 0:
+                print("evaluate前")
+                evaluate(model, criterion, metric, dev_data_loader)
+                print("evaluate后")
+
+            if global_step % args.save_steps == 0 and rank == 0:
+                save_dir = os.path.join(args.save_dir, "model_%d" % global_step)
+                if not os.path.exists(save_dir):
+                    os.makedirs(save_dir)
+                if paddle.distributed.get_world_size() > 1:
+                    model._layers.save_pretrained(save_dir)
+                else:
+                    model.save_pretrained(save_dir)
+                tokenizer.save_pretrained(save_dir)
+
+            if global_step >= num_training_steps:
+                return
+            tic_train = time.time()
+
+    if rank == 0 and total_train_time > 0:
+        print("Speed: %.2f steps/s" % (global_step / total_train_time))
+
+
+if __name__ == "__main__":
+    do_train()
diff --git a/DPDLDA/utils.py b/DPDLDA/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..4c0bda0c63ebbd5eb07a27ef1b1aef45a055671a
--- /dev/null
+++ b/DPDLDA/utils.py
@@ -0,0 +1,503 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+
+import numpy as np
+import paddle
+from paddle.optimizer.lr import LambdaDecay
+
+from paddlenlp.transformers import normalize_chars, tokenize_special_chars
+
+
+def create_dataloader(dataset, mode="train", batch_size=1, batchify_fn=None, trans_fn=None):
+    if trans_fn:
+        dataset = dataset.map(trans_fn)
+
+    shuffle = True if mode == "train" else False
+    if mode == "train":
+        batch_sampler = paddle.io.DistributedBatchSampler(dataset, batch_size=batch_size, shuffle=shuffle)
+    else:
+        batch_sampler = paddle.io.BatchSampler(dataset, batch_size=batch_size, shuffle=shuffle)
+
+    return paddle.io.DataLoader(dataset=dataset, batch_sampler=batch_sampler, collate_fn=batchify_fn, return_list=True)
+
+
+class LinearDecayWithWarmup(LambdaDecay):
+    def __init__(self, learning_rate, total_steps, warmup, last_epoch=-1, verbose=False):
+        """
+        Creates a learning rate scheduler, which increases learning rate linearly
+        from 0 to given `learning_rate`, after this warmup period learning rate
+        would be decreased linearly from the base learning rate to 0.
+
+        Args:
+            learning_rate (float):
+                The base learning rate. It is a python float number.
+            total_steps (int):
+                The number of training steps.
+            warmup (int or float):
+                If int, it means the number of steps for warmup. If float, it means
+                the proportion of warmup in total training steps.
+            last_epoch (int, optional):
+                The index of last epoch. It can be set to restart training. If
+                None, it means initial learning rate.
+                Defaults to -1.
+            verbose (bool, optional):
+                If True, prints a message to stdout for each update.
+                Defaults to False.
+        """
+
+        warmup_steps = warmup if isinstance(warmup, int) else int(math.floor(warmup * total_steps))
+
+        def lr_lambda(current_step):
+            if current_step < warmup_steps:
+                return float(current_step) / float(max(1, warmup_steps))
+            return max(0.0, 1.0 - current_step / total_steps)
+
+        super(LinearDecayWithWarmup, self).__init__(learning_rate, lr_lambda, last_epoch, verbose)
+
+
+def convert_example(example, tokenizer, max_seq_length=512, is_test=False):
+    """
+    Builds model inputs from a sequence or a pair of sequences for sequence
+    classification tasks by concatenating and adding special tokens. And
+    creates a mask from the two sequences for sequence-pair classification
+    tasks.
+
+    The convention in Electra/EHealth is:
+
+    - single sequence:
+        input_ids:      ``[CLS] X [SEP]``
+        token_type_ids: ``  0   0   0``
+        position_ids:   ``  0   1   2``
+
+    - a senquence pair:
+        input_ids:      ``[CLS] X [SEP] Y [SEP]``
+        token_type_ids: ``  0   0   0   1   1``
+        position_ids:   ``  0   1   2   3   4``
+
+    Args:
+        example (obj:`dict`):
+            A dictionary of input data, containing text and label if it has.
+        tokenizer (obj:`PretrainedTokenizer`):
+            A tokenizer inherits from :class:`paddlenlp.transformers.PretrainedTokenizer`.
+            Users can refer to the superclass for more information.
+        max_seq_length (obj:`int`):
+            The maximum total input sequence length after tokenization.
+            Sequences longer will be truncated, and the shorter will be padded.
+        is_test (obj:`bool`, default to `False`):
+            Whether the example contains label or not.
+
+    Returns:
+        input_ids (obj:`list[int]`):
+            The list of token ids.
+        token_type_ids (obj:`list[int]`):
+            List of sequence pair mask.
+        position_ids (obj:`list[int]`):
+            List of position ids.
+        label(obj:`numpy.array`, data type of int64, optional):
+            The input label if not is_test.
+    """
+    text_a = example["text_a"]
+    text_b = example.get("text_b", None)
+
+    text_a = tokenize_special_chars(normalize_chars(text_a))
+    if text_b is not None:
+        text_b = tokenize_special_chars(normalize_chars(text_b))
+
+    encoded_inputs = tokenizer(text=text_a, text_pair=text_b, max_seq_len=max_seq_length, return_position_ids=True)
+    input_ids = encoded_inputs["input_ids"]
+    token_type_ids = encoded_inputs["token_type_ids"]
+    position_ids = encoded_inputs["position_ids"]
+
+    if is_test:
+        return input_ids, token_type_ids, position_ids
+    label = np.array([example["label"]], dtype="int64")
+    return input_ids, token_type_ids, position_ids, label
+
+
+def convert_example_ner(example, tokenizer, max_seq_length=512, pad_label_id=-100, is_test=False):
+    """
+    Builds model inputs from a sequence and creates labels for named-
+    entity recognition task CMeEE.
+
+    For example, a sample should be:
+
+    - input_ids:      ``[CLS]  x1   x2 [SEP] [PAD]``
+    - token_type_ids: ``  0    0    0    0     0``
+    - position_ids:   ``  0    1    2    3     0``
+    - attention_mask: ``  1    1    1    1     0``
+    - label_oth:      `` 32    3   32   32    32`` (optional, label ids of others)
+    - label_sym:      ``  4    4    4    4     4`` (optional, label ids of symptom)
+
+    Args:
+        example (obj:`dict`):
+            A dictionary of input data, containing text and label if it has.
+        tokenizer (obj:`PretrainedTokenizer`):
+            A tokenizer inherits from :class:`paddlenlp.transformers.PretrainedTokenizer`.
+            Users can refer to the superclass for more information.
+        max_seq_length (obj:`int`):
+            The maximum total input sequence length after tokenization.
+            Sequences longer will be truncated, and the shorter will be padded.
+        is_test (obj:`bool`, default to `False`):
+            Whether the example contains label or not.
+
+    Returns:
+        encoded_output (obj: `dict[str, list|np.array]`):
+            The sample dictionary including `input_ids`, `token_type_ids`,
+            `position_ids`, `attention_mask`, `label_oth` (optional),
+            `label_sym` (optional)
+    """
+
+    encoded_inputs = {}
+    text = example["text"]
+    if len(text) > max_seq_length - 2:
+        text = text[: max_seq_length - 2]
+    text = ["[CLS]"] + [x.lower() for x in text] + ["[SEP]"]
+    input_len = len(text)
+    encoded_inputs["input_ids"] = tokenizer.convert_tokens_to_ids(text)
+    encoded_inputs["token_type_ids"] = np.zeros(input_len)
+    encoded_inputs["position_ids"] = list(range(input_len))
+    encoded_inputs["attention_mask"] = np.ones(input_len)
+
+    if not is_test:
+        labels = example["labels"]
+        if input_len - 2 < len(labels[0]):
+            labels[0] = labels[0][: input_len - 2]
+        if input_len - 2 < len(labels[1]):
+            labels[1] = labels[1][: input_len - 2]
+        encoded_inputs["label_oth"] = [pad_label_id[0]] + labels[0] + [pad_label_id[0]]
+        encoded_inputs["label_sym"] = [pad_label_id[1]] + labels[1] + [pad_label_id[1]]
+
+    return encoded_inputs
+
+
+def convert_example_spo(example, tokenizer, num_classes, max_seq_length=512, is_test=False):
+    """
+    Builds model inputs from a sequence and creates labels for SPO prediction
+    task CMeIE.
+
+    For example, a sample should be:
+
+    - input_ids:      ``[CLS]  x1   x2 [SEP] [PAD]``
+    - token_type_ids: ``  0    0    0    0     0``
+    - position_ids:   ``  0    1    2    3     0``
+    - attention_mask: ``  1    1    1    1     0``
+    - ent_label:      ``[[0    1    0    0     0], # start ids are set as 1
+                         [0    0    1    0     0]] # end ids are set as 1
+    - spo_label: a tensor of shape [num_classes, max_batch_len, max_batch_len].
+                 Set [predicate_id, subject_start_id, object_start_id] as 1
+                 when (subject, predicate, object) exists.
+
+    Args:
+        example (obj:`dict`):
+            A dictionary of input data, containing text and label if it has.
+        tokenizer (obj:`PretrainedTokenizer`):
+            A tokenizer inherits from :class:`paddlenlp.transformers.PretrainedTokenizer`.
+            Users can refer to the superclass for more information.
+        num_classes (obj:`int`):
+            The number of predicates.
+        max_seq_length (obj:`int`):
+            The maximum total input sequence length after tokenization.
+            Sequences longer will be truncated, and the shorter will be padded.
+        is_test (obj:`bool`, default to `False`):
+            Whether the example contains label or not.
+
+    Returns:
+        encoded_output (obj: `dict[str, list|np.array]`):
+            The sample dictionary including `input_ids`, `token_type_ids`,
+            `position_ids`, `attention_mask`, `ent_label` (optional),
+            `spo_label` (optional)
+    """
+    encoded_inputs = {}
+    text = example["text"]
+    if len(text) > max_seq_length - 2:
+        text = text[: max_seq_length - 2]
+    text = ["[CLS]"] + [x.lower() for x in text] + ["[SEP]"]
+    input_len = len(text)
+    encoded_inputs["input_ids"] = tokenizer.convert_tokens_to_ids(text)
+    encoded_inputs["token_type_ids"] = np.zeros(input_len)
+    encoded_inputs["position_ids"] = list(range(input_len))
+    encoded_inputs["attention_mask"] = np.ones(input_len)
+    if not is_test:
+        encoded_inputs["ent_label"] = example["ent_label"]
+        encoded_inputs["spo_label"] = example["spo_label"]
+    return encoded_inputs
+
+
+class NERChunkEvaluator(paddle.metric.Metric):
+    """
+    NERChunkEvaluator computes the precision, recall and F1-score for chunk detection.
+    It is often used in sequence tagging tasks, such as Named Entity Recognition (NER).
+
+    Args:
+        label_list (list):
+            The label list.
+
+    Note:
+        Difference from `paddlenlp.metric.ChunkEvaluator`:
+
+        - `paddlenlp.metric.ChunkEvaluator`
+           All sequences with non-'O' labels are taken as chunks when computing num_infer.
+        - `NERChunkEvaluator`
+           Only complete sequences are taken as chunks, namely `B- I- E-` or `S-`.
+    """
+
+    def __init__(self, label_list):
+        super(NERChunkEvaluator, self).__init__()
+        self.id2label = [dict(enumerate(x)) for x in label_list]
+        self.num_classes = [len(x) for x in label_list]
+        self.num_infer = 0
+        self.num_label = 0
+        self.num_correct = 0
+
+    def compute(self, lengths, predictions, labels):
+        """
+        Computes the prediction, recall and F1-score for chunk detection.
+
+        Args:
+            lengths (Tensor):
+                The valid length of every sequence, a tensor with shape `[batch_size]`.
+            predictions (Tensor):
+                The predictions index, a tensor with shape `[batch_size, sequence_length]`.
+            labels (Tensor):
+                The labels index, a tensor with shape `[batch_size, sequence_length]`.
+
+        Returns:
+            tuple: Returns tuple (`num_infer_chunks, num_label_chunks, num_correct_chunks`).
+
+            With the fields:
+
+            - `num_infer_chunks` (Tensor): The number of the inference chunks.
+            - `num_label_chunks` (Tensor): The number of the label chunks.
+            - `num_correct_chunks` (Tensor): The number of the correct chunks.
+        """
+        assert len(predictions) == len(labels)
+        assert len(predictions) == len(self.id2label)
+        preds = [x.numpy() for x in predictions]
+        labels = [x.numpy() for x in labels]
+
+        preds_chunk = set()
+        label_chunk = set()
+        for idx, (pred, label) in enumerate(zip(preds, labels)):
+            for i, case in enumerate(pred):
+                case = [self.id2label[idx][x] for x in case[: lengths[i]]]
+                preds_chunk |= self.extract_chunk(case, i)
+            for i, case in enumerate(label):
+                case = [self.id2label[idx][x] for x in case[: lengths[i]]]
+                label_chunk |= self.extract_chunk(case, i)
+
+        num_infer = len(preds_chunk)
+        num_label = len(label_chunk)
+        num_correct = len(preds_chunk & label_chunk)
+        return num_infer, num_label, num_correct
+
+    def update(self, correct):
+        num_infer, num_label, num_correct = correct
+        self.num_infer += num_infer
+        self.num_label += num_label
+        self.num_correct += num_correct
+
+    def accumulate(self):
+        precision = self.num_correct / (self.num_infer + 1e-6)
+        recall = self.num_correct / (self.num_label + 1e-6)
+        f1 = 2 * precision * recall / (precision + recall + 1e-6)
+        return precision, recall, f1
+
+    def reset(self):
+        self.num_infer = 0
+        self.num_label = 0
+        self.num_correct = 0
+
+    def name(self):
+        return "precision", "recall", "f1"
+
+    def extract_chunk(self, sequence, cid=0):
+        chunks = set()
+
+        start_idx, cur_idx = 0, 0
+        while cur_idx < len(sequence):
+            if sequence[cur_idx][0] == "B":
+                start_idx = cur_idx
+                cur_idx += 1
+                while cur_idx < len(sequence) and sequence[cur_idx][0] == "I":
+                    if sequence[cur_idx][2:] == sequence[start_idx][2:]:
+                        cur_idx += 1
+                    else:
+                        break
+                if cur_idx < len(sequence) and sequence[cur_idx][0] == "E":
+                    if sequence[cur_idx][2:] == sequence[start_idx][2:]:
+                        chunks.add((cid, sequence[cur_idx][2:], start_idx, cur_idx))
+                        cur_idx += 1
+            elif sequence[cur_idx][0] == "S":
+                chunks.add((cid, sequence[cur_idx][2:], cur_idx, cur_idx))
+                cur_idx += 1
+            else:
+                cur_idx += 1
+
+        return chunks
+
+
+class SPOChunkEvaluator(paddle.metric.Metric):
+    """
+    SPOChunkEvaluator computes the precision, recall and F1-score for multiple
+    chunk detections, including Named Entity Recognition (NER) and SPO Prediction.
+
+    Args:
+        num_classes (int):
+            The number of predicates.
+    """
+
+    def __init__(self, num_classes=None):
+        super(SPOChunkEvaluator, self).__init__()
+        self.num_classes = num_classes
+        self.num_infer_ent = 0
+        self.num_infer_spo = 1e-10
+        self.num_label_ent = 0
+        self.num_label_spo = 1e-10
+        self.num_correct_ent = 0
+        self.num_correct_spo = 0
+
+    def compute(self, lengths, ent_preds, spo_preds, ent_labels, spo_labels):
+        """
+        Computes the prediction, recall and F1-score for NER and SPO prediction.
+
+        Args:
+            lengths (Tensor):
+                The valid length of every sequence, a tensor with shape `[batch_size]`.
+            ent_preds (Tensor):
+                The predictions of entities.
+                A tensor with shape `[batch_size, sequence_length, 2]`.
+                `ent_preds[:, :, 0]` denotes the start indexes of entities.
+                `ent_preds[:, :, 1]` denotes the end indexes of entities.
+            spo_preds (Tensor):
+                The predictions of predicates between all possible entities.
+                A tensor with shape `[batch_size, num_classes, sequence_length, sequence_length]`.
+            ent_labels (list[list|tuple]):
+                The entity labels' indexes. A list of pair `[start_index, end_index]`.
+            spo_labels (list[list|tuple]):
+                The SPO labels' indexes. A list of triple `[[subject_start_index, subject_end_index],
+                predicate_id, [object_start_index, object_end_index]]`.
+
+        Returns:
+            tuple:
+                Returns tuple (`num_infer_chunks, num_label_chunks, num_correct_chunks`).
+                The `ent` denotes results of NER and the `spo` denotes results of SPO prediction.
+
+            With the fields:
+
+            - `num_infer_chunks` (dict): The number of the inference chunks.
+            - `num_label_chunks` (dict): The number of the label chunks.
+            - `num_correct_chunks` (dict): The number of the correct chunks.
+        """
+        ent_preds = ent_preds.numpy()
+        spo_preds = spo_preds.numpy()
+
+        ent_pred_list = []
+        ent_idxs_list = []
+        for idx, ent_pred in enumerate(ent_preds):
+            seq_len = lengths[idx] - 2
+            start = np.where(ent_pred[:, 0] > 0.5)[0]
+            end = np.where(ent_pred[:, 1] > 0.5)[0]
+            ent_pred = []
+            ent_idxs = {}
+            for x in start:
+                y = end[end >= x]
+                if (x == 0) or (x > seq_len):
+                    continue
+                if len(y) > 0:
+                    y = y[0]
+                    if y > seq_len:
+                        continue
+                    ent_idxs[x] = (x - 1, y - 1)
+                    ent_pred.append((x - 1, y - 1))
+            ent_pred_list.append(ent_pred)
+            ent_idxs_list.append(ent_idxs)
+
+        spo_preds = spo_preds > 0
+        spo_pred_list = [[] for _ in range(len(spo_preds))]
+        idxs, preds, subs, objs = np.nonzero(spo_preds)
+        for idx, p_id, s_id, o_id in zip(idxs, preds, subs, objs):
+            obj = ent_idxs_list[idx].get(o_id, None)
+            if obj is None:
+                continue
+            sub = ent_idxs_list[idx].get(s_id, None)
+            if sub is None:
+                continue
+            spo_pred_list[idx].append((sub, p_id, obj))
+
+        correct = {"ent": 0, "spo": 0}
+        infer = {"ent": 0, "spo": 0}
+        label = {"ent": 0, "spo": 0}
+        for ent_pred, ent_true in zip(ent_pred_list, ent_labels):
+            ent_true = [tuple(x) for x in ent_true]
+            infer["ent"] += len(set(ent_pred))
+            label["ent"] += len(set(ent_true))
+            correct["ent"] += len(set(ent_pred) & set(ent_true))
+
+        for spo_pred, spo_true in zip(spo_pred_list, spo_labels):
+            spo_true = [(tuple(s), p, tuple(o)) for s, p, o in spo_true]
+            infer["spo"] += len(set(spo_pred))
+            label["spo"] += len(set(spo_true))
+            correct["spo"] += len(set(spo_pred) & set(spo_true))
+
+        return infer, label, correct
+
+    def update(self, corrects):
+        assert len(corrects) == 3
+        for item in corrects:
+            assert isinstance(item, dict)
+            for value in item.values():
+                if not self._is_number_or_matrix(value):
+                    raise ValueError("The numbers must be a number(int) or a numpy ndarray.")
+        num_infer, num_label, num_correct = corrects
+        self.num_infer_ent += num_infer["ent"]
+        self.num_infer_spo += num_infer["spo"]
+        self.num_label_ent += num_label["ent"]
+        self.num_label_spo += num_label["spo"]
+        self.num_correct_ent += num_correct["ent"]
+        self.num_correct_spo += num_correct["spo"]
+
+    def accumulate(self):
+        spo_precision = self.num_correct_spo / self.num_infer_spo
+        spo_recall = self.num_correct_spo / self.num_label_spo
+        spo_f1 = 2 * self.num_correct_spo / (self.num_infer_spo + self.num_label_spo)
+        ent_precision = self.num_correct_ent / self.num_infer_ent if self.num_infer_ent > 0 else 0.0
+        ent_recall = self.num_correct_ent / self.num_label_ent if self.num_label_ent > 0 else 0.0
+        ent_f1 = (
+            2 * ent_precision * ent_recall / (ent_precision + ent_recall) if (ent_precision + ent_recall) != 0 else 0.0
+        )
+        return {"entity": (ent_precision, ent_recall, ent_f1), "spo": (spo_precision, spo_recall, spo_f1)}
+
+    def _is_number_or_matrix(self, var):
+        def _is_number_(var):
+            return (
+                isinstance(var, int)
+                or isinstance(var, np.int64)
+                or isinstance(var, float)
+                or (isinstance(var, np.ndarray) and var.shape == (1,))
+            )
+
+        return _is_number_(var) or isinstance(var, np.ndarray)
+
+    def reset(self):
+        self.num_infer_ent = 0
+        self.num_infer_spo = 1e-10
+        self.num_label_ent = 0
+        self.num_label_spo = 1e-10
+        self.num_correct_ent = 0
+        self.num_correct_spo = 0
+
+    def name(self):
+        return {"entity": ("precision", "recall", "f1"), "spo": ("precision", "recall", "f1")}