# ROOR-Datasets
**Repository Path**: dlml2/ROOR-Datasets
## Basic Information
- **Project Name**: ROOR-Datasets
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: CC-BY-4.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-07-04
- **Last Updated**: 2025-07-04
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding
This is the official repository of these VrDU datasets:
1. EC-FUNSD, a benchmark of semantic entity recognition (SER) and entity linking (EL), focusing on entity-centric robustness evaluation of pre-trained text-and-layout models [[paper]](https://arxiv.org/abs/2402.02379);
2. ROOR, a reading order prediction (ROP) benchmark which annotates layout reading order as ordering relations [[paper]](https://arxiv.org/abs/2402.02379).
Please refer to [ROOR](https://github.com/chongzhangFDU/ROOR) for the relevant code implementation.
## Datasets
The structure of the released datasets is listed below, in which:
* `data.*.txt` denotes the train/val/test split of the dataset. The format of each row is `0000971160.json` to specify a document sample.
* `labels.txt` denote the entity types of the SER task from EC-FUNSD.
* `images` contain the document images of the samples.
* `jsons` contain the annotations of the samples. The datasets EC-FUNSD and ROOR share the same layout annotations. Therefore, for each document sample, the layout annotation and the SER and EL annotation from EC-FUNSD and the ROP annotation from ROOR are integrated into one single JSON file. Within the file, the annotations of layout and these three tasks are indicated by distinct keys.
```
data
├── images
│ ├── 0000971160.png
│ ├── 0000989556.png
│ ├── ...
│ └── 93455715.png
├── jsons
│ ├── 0000971160.json
│ ├── 0000989556.json
│ ├── ...
│ └── 93455715.json
├── data.train.txt
├── data.val.txt
└── labels.txt
```
One sample annotation is displayed below, in which:
* `"uid"` identifies the data sample.
* `"img"` refers to the corresponding document image, toghther with its height and width information.
* `"document"` refers to the corresponding layout annotations. Each element refers to one segment.
* `"id"` identifies the segment. `"box"` refers to the position box of the segment. `"text"` refers to the text the segment contains.
* `"words"` refers to the words within the segment, where
* `"id"` identifies the word, globally within the sample, and is used in `"label_entities"`. `"box"` refers to the position box of the word. `"text"` refers to the text of the word.
* `"label_entities"` refers to the corresponding SER annotations. Each element refers to one entity.
* `"entity_id"` identifies the entity. `label` refers to the entity type.
* `"word_idx"` refers to the word sequence that composes the entity, denoted by a list of word indexes. The indexes are guaranteed to be consecutive.
* `"label_linkings"` refers to the corresponding EL annotations. Each element is a linking pair indicating to the `entity_id` of the head and tail entities.
* `"ro_linkings"` refers to the corresponding RO relation annotations. Each element is a linking pair indicating to the segment `id` of the head and tail segments.
```json
{
"uid": "00040534",
"img": {"fname": "images/00040534.png", "height": 1000, "width": 777},
"document": [
{
"id": 0, "box": [80, 946, 202, 956], "text": "LORILLARO RESEARCH CENTER",
"words": [
{"box": [80, 946, 123, 956], "id": 0, "text": "LORILLARO"},
{"box": [124, 946, 165, 956], "id": 1, "text": "RESEARCH"},
{"box": [168, 946, 202, 956], "id": 2, "text": "CENTER"}]},
...
],
"label_entities": [
{"entity_id": 0, "label": "question", "word_idx": [16]},
...
],
"label_linkings": [[0, 35], [1, 36], ...],
"ro_linkings": [[1, 29], [2, 34], ...]
}
```
[EC-FUNSD](https://paperswithcode.com/dataset/ec-funsd) and [ROOR](https://paperswithcode.com/dataset/roor) are currently available at Papers With Code.
## Citation
If you found this repository useful, please cite our paper:
```
@article{zhang2024modeling,
title={Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding},
author={Zhang, Chong and Tu, Yi and Zhao, Yixi and Yuan, Chenshu and Chen, Huan and Zhang, Yue and Chai, Mingxu and Guo, Ya and Zhu, Huijia and Zhang, Qi and others},
journal={arXiv preprint arXiv:2409.19672},
year={2024}
}
@article{zhang2024rethinking,
title={Rethinking the Evaluation of Pre-trained Text-and-Layout Models from an Entity-Centric Perspective},
author={Zhang, Chong and Zhao, Yixi and Yuan, Chenshu and Tu, Yi and Guo, Ya and Zhang, Qi},
journal={arXiv preprint arXiv:2402.02379},
year={2024}
}
```
## License
All datasets in this repository are released under the CC BY 4.0 International license, which can be found [here](https://creativecommons.org/licenses/by/4.0/legalcode).
These datasets utilize the [FUNSD](https://guillaumejaume.github.io/FUNSD/work/) dataset, along with their respective licensing agreements.