# MASTER-TF **Repository Path**: OpenOCR/MASTER-TF ## Basic Information - **Project Name**: MASTER-TF - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 10 - **Forks**: 4 - **Created**: 2020-11-12 - **Last Updated**: 2021-07-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # This is a reimplementation of [MASTER](https://arxiv.org/abs/1910.02562). MASTER is a scene text recognition model which is based on self-attention mechanism. Below is the architecture. ![WX20200703-001140.png](https://i.loli.net/2020/07/03/Nj1CPvrT7J2ehWy.png) This repo is a tensorflow implemention which may differ from our pytorch implementaion when we have done in PingAn for paper. This repo is its tensorflow implemention. - [x] Multi-gpu Training - [x] Greedy Decoding - [x] Single image inference - [x] Eval iiit5k - [x] Convert Checkpoint to SavedModel format - [x] refactory codes to be more tensorflow-style and be more consistent to graph mode - [x] support tensorflow serving mode ## Preparation It is highly recommended that install tensorflow-gpu using conda. Python3.7 is preferred. ```bash pip install -r requirements.txt ``` ## Dataset I use Clovaai's MJ training split for training. please check `src/dataset/benchmark_data_generator.py` for details. Eval datasets are some real scene text datasets. You can downloaded directly from [here](https://drive.google.com/drive/folders/1OG4ufr-kj2jFLmM4gyFEI0tMGYZrz8HI). ## Training ```bash # training from scratch python train.py -c [your_config].yaml # resume training from last checkpoint python train.py -c [your_config].yaml -r # finetune with some checkpoint python train.py -c [your_config].yaml -f [checkpoint] ``` ## Eval Currently, you can download checkpoint from [here](https://pan.baidu.com/s/1ijpo8WRZHR-AyDclxQVDiw) with code **o6g9**, or from [Google Driver](https://drive.google.com/file/d/1gpfMvnQWZimogQLFM_teOwiLNz-ZEF02/view?usp=sharing), this checkpoint was trained with MJ and selected for the best performance of iiit5k dataset. Below is the comparision between pytorch version and tensorflow version. | Framework | Dataset | Word Accuracy | Training Details | | --- | --- | --- | --- | | Pytorch | MJ | 85.05% | 3 V100 4 epochs Batch Size: 3*128| | Tensorflow | MJ | 85.53% | 2 2080ti 4 epochs Batch Size: 2 * 50 | Please download the checkpoint and model config from [here](https://pan.baidu.com/s/1ijpo8WRZHR-AyDclxQVDiw) with code **o6g9** and unzip it, and you can get this metric by running: ```bash python eval_iiit5k.py --ckpt [checkpoint file] --cfg [model config] -o [output dir] -i [iiit5k lmdb test dataset] ``` The checkpoint file argument should be `${where you unzip}/backup/512_8_3_3_2048_2048_0.2_0_Adam_mj_my/checkpoints/OCRTransformer-Best` ## Tensorflow Serving For tensorflow serving, you should use savedModel format, I provided test case to show you how to convert a checkpoint to savedModel and how to use it. ```bash pytest -s tests/test_units::test_savedModel #check the test case test_savedModel in tests/test_units pytest -s tests/test_units::test_loadModel # call decode to inference and get predicted transcript and logits out. ``` ## Citations If you find this code useful please cite our [paper](https://arxiv.org/abs/1910.02562): ```bibtex @misc{lu2019master, title={MASTER: Multi-Aspect Non-local Network for Scene Text Recognition}, author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao}, year={2019}, eprint={1910.02562}, archivePrefix={arXiv}, primaryClass={cs.CV} } ``` ## License This project is licensed under the MIT License. See LICENSE for more details. ## Acknowledgements Thanks to the authors and their repo: - [SAR_TF](https://github.com/Pay20Y/SAR_TF) - [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark)