# DTLR
**Repository Path**: data_factory/DTLR
## Basic Information
- **Project Name**: DTLR
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-10-09
- **Last Updated**: 2024-10-09
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
## Description
This repository is the official implementation for [General Detection-based Text Line Recognition](https://detection-based-text-line-recognition.github.io/),
the paper is available on [arXiv](https://arxiv.org/pdf/2409.17095).
This repository builds on the code for [DINO-DETR](https://github.com/IDEA-Research/DINO), the official implementation of the paper "[DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection](https://arxiv.org/abs/2203.03605)". We present a model that adapts DINO-DETR for text recognition as a detection and recognition task. The model is pretrained on synthetic data using the same loss as DINO-DETR and then fine-tuned on a real dataset with CTC loss.
## Content
Installation, Datasets, and Weights
## Installation, Datasets, and Weights
### 1. Installation
The model was trained with `python=3.11.0`, `pytorch=2.1.0`, `cuda=11.8` and builds on the DETR-variants [DINO](https://arxiv.org/abs/2203.03605)/[DN](https://arxiv.org/abs/2203.01305)/[DAB](https://arxiv.org/abs/2201.12329) and [Deformable-DETR](https://arxiv.org/abs/2010.04159).
1. Clone this repository and create a virtual environment
2. Follow instructions to install a [Pytorch](https://pytorch.org/get-started/locally/) version compatible with your system and CUDA version
3. Install other dependencies
```bash
pip install -r requirements.txt
```
4. Compiling CUDA operators
```bash
python models/dino/ops/setup.py build install # 'cuda not available', run => export CUDA_HOME=/usr/local/cuda-
# unit test (should see all checking is True) # could output an outofmemory error
python models/dino/ops/test.py
```
### 2. Datasets
Datasets should be placed in the appropriate folder specified in **datasets/config.json**. We preprocess the images and annotations for the IAM dataset, while all other datasets are used in their original form.
For each dataset (except IAM), a charset file (.pkl) is required. Charset files can be found in the folder [data](data).
**Handwritten**
1. IAM: the official website is [here](http://www.fki.inf.unibe.ch/databases/iam-handwriting-database). We preprocess the images and annotation following the instruction in the [PyLai Repository](https://github.com/carmocca/PyLaia-examples/tree/master/iam-htr). The annotations are stored in [data/IAM_new/labels.pkl](data/IAM_new).
2. RIMES: TEKLIA provide the dataset [here](https://teklia.com/research/rimes-database/). After downloading, place the charset file in the same folder as the dataset.
3. READ: the dataset is available [here](https://zenodo.org/records/1297399). After downloading, place the charset file in the same folder as the dataset.
**Chinese**
The official website is [here](https://nlpr.ia.ac.cn/databases/handwriting/Download.html). Images and annotations are provide only in bytes format for these datasets.
1. CASIA v1: Download the dataset in bytes format with the link above and place the charset in the same folder as the dataset.
2. CASIA v2: We provide directly a version of the dataset with images (PNG) and annotations (TXT). Download the dataset [here](https://drive.google.com/file/d/1ZfrsxBM2uhnqa0vps-8950ZFflYgMHin/view?usp=sharing).
**Ciphers**
The ciphers borg and copiale are available [here](https://pages.cvc.uab.es/abaro/datasets.html). The charset files are provided in the folder [data](data).
### 3. Weights
Pretrained checkpoints can be found [here](https://drive.google.com/file/d/1sr-CSCdiVhCuUmZa3danqSvdzIvj8Pdl/view?usp=sharing). The folder includes the weights of the following **pretrained** models:
- **General model**: Trained on random Latin characters. Typically used for finetuning on ciphers.
- **English model**: Trained on English text with random erasing. Typically used for finetuning on IAM.
- **French model**: Trained on French text with random erasing. Typically used for finetuning on RIMES.
- **German model**: Trained on German text with random erasing. Typically used for finetuning on READ.
- **Chinese model**: Trained on random handwritten Chinese characters from HWDB 1. Typically used for finetuning on HWDB 2.
Finetuned checkpoints can be found [here](https://drive.google.com/file/d/11UXYJHBKhgI6DhhkqQ6UpHFRXt3XNFQA/view?usp=sharing).
Checkpoints should be organized as follows:
```bash
logs/
└── IAM/
└── checkpoint.pth
└── other_model/
└── checkpoint.pth
...
```
Pretraining
# Pretraining
Pretraining scipts are available in **scripts/pretraining**.
## Latin scripts
You need to download the folder [resources](https://drive.google.com/file/d/1XxeizTec4XOsLfyV_Q_dVQ1rMNWzbmoO/view?usp=sharing) (background, fonts, noises, texts) and place it in the folder **dataset**.
To train models with random erasing:
```bash
sh scripts/pretraining/Synthetic_english_w_masking.sh
sh scripts/pretraining/Synthetic_german_w_masking.sh
sh scripts/pretraining/Synthetic_french_w_masking.sh
sh scripts/pretraining/Synthetic_general.sh
```
## Chinese scripts
You need the dataset CASIA v1 [here]
To train a model with random erasing
```bash
sh scripts/pretraining/Synthetic_english.sh
```
Then for instances to train a model for chinese with random erasing:
```bash
bash scripts/pretraining/Synthetic_chinese_w_masking.sh
```
Finetuning
# Finetuning
Finetuning occurs in two stages. The scripts are available in **scripts/finetuning.**. For Step 1 it is expected that a model is pretrained is placed in the folder **logs/your_model_name**.
Evaluation
# Evaluation
Use the scripts in **scripts/evaluating** to evaluate the model on the different datasets.
Ngram
# Ngram
## Evaluation
We provide our N-gran models for RIMES, READ and IAM [here](). We strongly advice to create a separate environment for the ngram model and to install the libraries in the [ngram/mini_guide.md](ngram/mini_guide.md).
To run an evalutation with the ngram model:
```bash
bash python ngram/clean_gen_ngram_preds.py --config_path ngram/IAM.yaml
bash python ngram/clean_gen_ngram_preds.py --config_path ngram/READ.yaml
bash python ngram/clean_gen_ngram_preds.py --config_path ngram/RIMES.yaml
```
## Training a ngram model
To train you own ngram model, follow the instructions in the [ngram/mini_guide.md](ngram/mini_guide.md)
## Citation
If you find this code useful, don't forget to star the repo :star: and cite the papers :point_down:
```
@article{baena2024DTLR, title={General Detection-based Text Line Recognition},
author={Raphael Baena and Syrine Kalleli and Mathieu Aubry},
booktitle={NeurIPS},
year={2024}},
url={https://arxiv.org/abs/2409.17095},
```