diff --git a/ais_bench/benchmark/configs/datasets/omnidocbench/README.md b/ais_bench/benchmark/configs/datasets/omnidocbench/README.md new file mode 100644 index 0000000000000000000000000000000000000000..3f6d9471d2c371e3e4451888f57b498be6ded51a --- /dev/null +++ b/ais_bench/benchmark/configs/datasets/omnidocbench/README.md @@ -0,0 +1,37 @@ +# OmniDocBench +中文 | [English](README_en.md) +## 数据集简介 +OmniDocBench是一个针对真实场景下多样性文档解析评测集,具有以下特点: + +- 文档类型多样:该评测集涉及1355个PDF页面,涵盖9种文档类型、4种排版类型和3种语言类型。覆盖面广,包含学术文献、财报、报纸、教材、手写笔记等; +- 标注信息丰富:包含15个block级别(文本段落、标题、表格等,总量超过20k)和4个Span级别(文本行、行内公式、角标等,总量超过80k)的文档元素的定位信息,以及每个元素区域的识别结果(文本Text标注,公式LaTeX标注,表格包含LaTeX和HTML两种类型的标注)。OmniDocBench还提供了各个文档组件的阅读顺序的标注。除此之外,在页面和block级别还包含多种属性标签,标注了5种页面属性标签、3种文本属性标签和6种表格属性标签。 +- 标注质量高:经过人工筛选,智能标注,人工标注及全量专家质检和大模型质检,数据质量较高。 +- 配套评测代码:设计端到端评测及单模块评测代码,保证评测的公平性及准确性。 + +> 🔗 数据集主页链接[https://huggingface.co/datasets/opendatalab/OmniDocBench](https://huggingface.co/datasets/opendatalab/OmniDocBench) + +## 数据集部署 +- 建议部署在`{工具根路径}/ais_bench/datasets`目录下(数据集任务中设置的默认路径),以linux上部署为例,具体执行步骤如下: +```bash +# linux服务器内,处于工具根路径下 +cd ais_bench/datasets +git clone https://huggingface.co/datasets/opendatalab/OmniDocBench +``` +- 在`{工具根路径}/ais_bench/datasets`目录下执行`tree aime/`查看目录结构,若目录结构如下所示,则说明数据集部署成功。 + ``` + OmniDocBench + ├── images + │   ├── PPT_1001115_eng_page_003.png + │   └── PPT_1001115_eng_page_005.png + │ # ...... + | + └── OmniDocBench.json + ``` + +## 可用数据集任务 +|任务名称|简介|评估指标|few-shot|prompt格式|对应源码配置文件路径| +| --- | --- | --- | --- | --- | --- | +|omnidocbench_gen|OmniDocBench数据集生成式任务|accuracy (pass@1)|0-shot|字符串格式|[omnidocbench_gen.py](omnidocbench_gen.py)| + +## 使用约束 +- 当前仅支持Edit_dist指标(用于测评DeepSeek-OCR模型),其他指标暂不支持,overall为各个维度的Edit_dist评分的均值 diff --git a/ais_bench/benchmark/configs/datasets/omnidocbench/README_en.md b/ais_bench/benchmark/configs/datasets/omnidocbench/README_en.md new file mode 100644 index 0000000000000000000000000000000000000000..85072c969a5a40ee8cda95f9db04fb552778c63a --- /dev/null +++ b/ais_bench/benchmark/configs/datasets/omnidocbench/README_en.md @@ -0,0 +1,37 @@ +# AIME2024 +[中文](README.md) | English +## Dataset Introduction +OmniDocBench is a benchmark for evaluating diverse document parsing in real-world scenarios, featuring the following characteristics: + +- Diverse Document Types: This benchmark includes 1355 PDF pages, covering 9 document types, 4 layout types, and 3 language types. It encompasses a wide range of content, including academic papers, financial reports, newspapers, textbooks, and handwritten notes. +- Rich Annotation Information: It contains localization information for 15 block-level (such as text paragraphs, headings, tables, etc., totaling over 20k) and 4 span-level (such as text lines, inline formulas, subscripts, etc., totaling over 80k) document elements. Each element's region includes recognition results (text annotations, LaTeX annotations for formulas, and both LaTeX and HTML annotations for tables). OmniDocBench also provides annotations for the reading order of document components. Additionally, it includes various attribute tags at the page and block levels, with annotations for 5 page attribute tags, 3 text attribute tags, and 6 table attribute tags. +- High Annotation Quality: The data quality is high, achieved through manual screening, intelligent annotation, manual annotation, and comprehensive expert and large model quality checks. +- Supporting Evaluation Code: It includes end-to-end and single-module evaluation code to ensure fairness and accuracy in assessments. + +> 🔗 Dataset Homepage Link: [https://huggingface.co/datasets/opendatalab/OmniDocBench](https://huggingface.co/datasets/opendatalab/OmniDocBench) + +## Dataset Deployment +- It is recommended to deploy the dataset in the directory `{tool_root_path}/ais_bench/datasets` (the default path set in dataset tasks). Taking deployment on Linux as an example, the specific execution steps are as follows: +```bash +# Within the Linux server, under the tool root path +cd ais_bench/datasets +git clone https://huggingface.co/datasets/opendatalab/OmniDocBench +``` +- Execute `tree aime/` in the directory `{tool_root_path}/ais_bench/datasets` to check the directory structure. If the directory structure is as shown below, the dataset has been deployed successfully: + ``` + OmniDocBench + ├── images + │   ├── PPT_1001115_eng_page_003.png + │   └── PPT_1001115_eng_page_005.png + │ # ...... + | + └── OmniDocBench.json + ``` + +## Available Dataset Tasks +| Task Name | Introduction | Evaluation Metric | Few-Shot | Prompt Format | Corresponding Source Code Configuration File Path | +| --- | --- | --- | --- | --- | --- | +| omnidocbench_gen | Generative task for the OmniDocBench dataset | accuracy (pass@1) | 0-shot | String format | [omnidocbench_gen.py](omnidocbench_gen.py) | + +## Usage Constraints: +- Currently, only the Edit_dist metric is supported (used to evaluate the DeepSeek-OCR model); other metrics are not supported yet. The "overall" score is the average of the Edit_dist scores across all dimensions. \ No newline at end of file diff --git a/ais_bench/benchmark/configs/datasets/omnidocbench/omnidocbench_gen.py b/ais_bench/benchmark/configs/datasets/omnidocbench/omnidocbench_gen.py index 6e49fe9976e6f4c8516f2093c5b50a70946c904e..00f98b73fec012dc40c0a6a5bf506aa075f9d5e7 100644 --- a/ais_bench/benchmark/configs/datasets/omnidocbench/omnidocbench_gen.py +++ b/ais_bench/benchmark/configs/datasets/omnidocbench/omnidocbench_gen.py @@ -37,7 +37,7 @@ omnidocbench_datasets = [ abbr='omnidocbench', type=OmniDocBenchDataset, path='ais_bench/datasets/OmniDocBench/OmniDocBench.json', # 数据集路径,使用相对路径时相对于源码根路径,支持绝对路径 - image_path='ais_bench/datasets/images', + image_path='ais_bench/datasets/OmniDocBench/images', reader_cfg=omnidocbench_reader_cfg, infer_cfg=omnidocbench_infer_cfg, eval_cfg=omnidocbench_eval_cfg diff --git a/doc/users_guide/accuracy_benchmark.md b/doc/users_guide/accuracy_benchmark.md index 7da3e290add02b381fe1439b82d7016122a20b30..565c3c9db5402f2d2211cb829ed577892e3a72a1 100644 --- a/doc/users_guide/accuracy_benchmark.md +++ b/doc/users_guide/accuracy_benchmark.md @@ -23,6 +23,8 @@ ais_bench --models vllm_api_general_chat vllm_api_stream_chat --datasets gsm8k_g ``` 上述命令指定了2个模型任务(`vllm_api_general_chat` `vllm_api_stream_chat`)和2个数据集任务(`gsm8k_gen_4_shot_cot_str` `aime2024_gen_0_shot_chat_prompt`),将执行以下4个组合精度测试任务: + + + [vllm_api_general_chat](../../ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py)模型任务 + [gsm8k_gen_4_shot_cot_str](../../ais_bench/benchmark/configs/datasets/gsm8k/gsm8k_gen_4_shot_cot_str.py) 数据集任务 + [vllm_api_general_chat](../../ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py)模型任务 + [aime2024_gen_0_shot_chat_prompt](../../ais_bench/benchmark/configs/datasets/aime2024/aime2024_gen_0_shot_chat_prompt.py) 数据集任务 + [vllm_api_stream_chat](../../ais_bench/benchmark/configs/models/vllm_api/vllm_api_stream_chat.py)模型任务 + [gsm8k_gen_4_shot_cot_str](../../ais_bench/benchmark/configs/datasets/gsm8k/gsm8k_gen_4_shot_cot_str.py) 数据集任务