# OpenOCR1
**Repository Path**: dlml2/OpenOCR1
## Basic Information
- **Project Name**: OpenOCR1
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-07-04
- **Last Updated**: 2025-07-04
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
______________________________________________________________________
我们致力于构建场景文本检测与识别模型的统一训练评估基准。基于此基准,我们推出了兼顾精度与效率的通用OCR系统——**OpenOCR**。本仓库同时作为复旦大学[FVL实验室](https://fvl.fudan.edu.cn)OCR团队的官方代码库。
我们诚挚欢迎研究者推荐OCR相关算法,并指出潜在的事实性错误或代码缺陷。收到建议后,我们将及时评估并严谨复现。期待与您携手推进OpenOCR发展,持续为OCR社区贡献力量!
## 核心特性
- 🔥**OpenOCR: A general OCR system with accuracy and efficiency**
- ⚡\[[快速开始](#快速开始)\] \[[模型下载](https://github.com/Topdu/OpenOCR/releases/tag/develop0.0.1)\] \[[ModelScope Demo](https://modelscope.cn/studios/topdktu/OpenOCR-Demo)\] \[[Hugging Face Demo](https://huggingface.co/spaces/topdu/OpenOCR-Demo)\] \[[本地Demo](#本地Demo)\] \[[PaddleOCR实现](https://paddlepaddle.github.io/PaddleOCR/latest/algorithm/text_recognition/algorithm_rec_svtrv2.html)\]
- [技术文档](./docs/openocr.md)
- 基于SVTRv2构建的实用OCR系统
- 在[OCR竞赛榜单](https://aistudio.baidu.com/competition/detail/1131/0/leaderboard)上,精度超越[PP-OCRv4](https://paddlepaddle.github.io/PaddleOCR/latest/ppocr/model_list.html)基线4.5%,推理速度保持相近
- [x] 支持中英文文本检测与识别
- [x] 提供服务器端(Server)与移动端(mobile)模型
- [x] 支持自定义数据集微调: [检测模型微调](./docs/finetune_det.md), [识别模型微调](./docs/finetune_rec.md)
- [x] [支持导出ONNX模型](#导出onnx模型)
- 🔥**SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition**
- \[[论文](https://arxiv.org/abs/2411.15858)\] \[[文档](./configs/rec/svtrv2/)\] \[[模型](./configs/rec/svtrv2/readme.md#11-models-and-results)\] \[[数据集](./docs/svtrv2.md#downloading-datasets)\] \[[配置/训练/推理](./configs/rec/svtrv2/readme.md#3-model-training--evaluation)\] \[[基准测试](./docs/svtrv2.md#results-benchmark--configs--checkpoints)\]
- [技术文档](./docs/svtrv2.md)
- 基于[Union14M](https://github.com/Mountchicken/Union14M)构建的场景文本识别统一训练评估基准
- 支持24种场景文本识别方法在大规模真实数据集[Union14M-L-Filter](./docs/svtrv2.md#数据集详情)上的训练,将持续集成前沿方法
- 相比基于合成数据训练的模型,精度提升20-30%
- 单一视觉模型实现任意形状文本识别与语言建模
- 在精度与速度上全面超越基于Attention的编解码模型
- [从零训练SOTA模型指南](./docs/svtrv2.md#get-started-with-training-a-sota-scene-text-recognition-model-from-scratch)
## 自研STR算法
- [**SMTR&FocalSVTR**](./configs/rec/smtr/) (*Yongkun Du, Zhineng Chen\*, Caiyan Jia, Xieping Gao, Yu-Gang Jiang. Out of Length Text Recognition with Sub-String Matching,* AAAI 2025. [Doc](./configs/rec/smtr/), [Paper](https://arxiv.org/abs/2407.12317))
- [**DPTR**](./configs/rec/dptr/) (*Shuai Zhao, Yongkun Du, Zhineng Chen\*, Yu-Gang Jiang. Decoder Pre-Training with only Text for Scene Text Recognition,* ACM MM 2024. [Paper](https://arxiv.org/abs/2408.05706))
- [**IGTR**](./configs/rec/igtr/) (*Yongkun Du, Zhineng Chen\*, Yuchen Su, Caiyan Jia, Yu-Gang Jiang. Instruction-Guided Scene Text Recognition,* TPAMI 2025. [Doc](./configs/rec/igtr), [Paper](https://doi.ieeecomputersociety.org/10.1109/TPAMI.2025.3525526))
- [**SVTRv2**](./configs/rec/svtrv2) (*Yongkun Du, Zhineng Chen\*, Hongtao Xie, Caiyan Jia, Yu-Gang Jiang. SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition,* 2024. [Doc](./configs/rec/svtrv2/), [Paper](https://arxiv.org/abs/2411.15858))
- [**CDistNet**](./configs/rec/cdistnet/) (*Tianlun Zheng, Zhineng Chen\*, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang. CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition,* IJCV 2024. [Paper](https://link.springer.com/article/10.1007/s11263-023-01880-0))
- **MRN** (*Tianlun Zheng, Zhineng Chen\*, Bingchen Huang, Wei Zhang, Yu-Gang Jiang. MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition,* ICCV 2023. [Paper](https://openaccess.thecvf.com/content/ICCV2023/html/Zheng_MRN_Multiplexed_Routing_Network_for_Incremental_Multilingual_Text_Recognition_ICCV_2023_paper.html), [Code](https://github.com/simplify23/MRN))
- **TPS++** (*Tianlun Zheng, Zhineng Chen\*, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang. TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition,* IJCAI 2023. [Paper](https://arxiv.org/abs/2305.05322), [Code](https://github.com/simplify23/TPS_PP))
- [**CPPD**](./configs/rec/cppd/) (*Yongkun Du, Zhineng Chen\*, Caiyan Jia, Xiaoting Yin, Chenxia Li, Yuning Du, Yu-Gang Jiang. Context Perception Parallel Decoder for Scene Text Recognition,* TPAMI (accepted). [PaddleOCR Doc](https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/algorithm/text_recognition/algorithm_rec_cppd.en.md), [Paper](https://doi.ieeecomputersociety.org/10.1109/TPAMI.2025.3545453))
- [**SVTR**](./configs/rec/svtr/) (*Yongkun Du, Zhineng Chen\*, Caiyan Jia, Xiaoting Yin, Tianlun Zheng, Chenxia Li, Yuning Du, Yu-Gang Jiang. SVTR: Scene Text Recognition with a Single Visual Model,* IJCAI 2022 (Long). [PaddleOCR Doc](https://github.com/Topdu/PaddleOCR/blob/main/doc/doc_ch/algorithm_rec_svtr.md), [Paper](https://www.ijcai.org/proceedings/2022/124))
- [**NRTR**](./configs/rec/nrtr/) (*Fenfen Sheng, Zhineng Chen, Bo Xu. NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition,* ICDAR 2019. [Paper](https://arxiv.org/abs/1806.00926))
## 近期更新
- **2025.03.24**: 🔥 发布自定义数据集微调功能: [检测模型微调](./docs/finetune_det.md), [识别模型微调](./docs/finetune_rec.md)
- **2025.03.23**: 🔥 新增[ONNX模型导出功能](#导出onnx模型)
- **2025.02.22**: [CPPD](https://doi.ieeecomputersociety.org/10.1109/TPAMI.2025.3545453)论文被TPAMI录用,详见[文档](./configs/rec/cppd/)与[PaddleOCR文档](https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/algorithm/text_recognition/algorithm_rec_cppd.en.md)
- **2024.12.31**: [IGTR](https://doi.ieeecomputersociety.org/10.1109/TPAMI.2025.3525526)论文被TPAMI录用,详见[文档](./configs/rec/igtr/)
- **2024.12.16**: [SMTR](https://arxiv.org/abs/2407.12317)论文被AAAI 2025录用,详见[文档](./configs/rec/smtr/)
- **2024.12.03**: [DPTR](https://arxiv.org/abs/2408.05706)预训练代码合并
- **🔥 2024.11.23 重大更新**:
- **OpenOCR通用OCR系统发布**
- ⚡\[[快速开始](#快速开始)\] \[[模型下载](https://github.com/Topdu/OpenOCR/releases/tag/develop0.0.1)\] \[[ModelScopeDemo](https://modelscope.cn/studios/topdktu/OpenOCR-Demo)\] \[[Hugging FaceDemo](https://huggingface.co/spaces/topdu/OpenOCR-Demo)\] \[[本地Demo](#本地Demo)\] \[[PaddleOCR实现](https://paddlepaddle.github.io/PaddleOCR/latest/algorithm/text_recognition/algorithm_rec_svtrv2.html)\]
- [技术文档](./docs/openocr.md)
- **SVTRv2论文发布**
- \[[论文](https://arxiv.org/abs/2411.15858)\] \[[文档](./configs/rec/svtrv2/)\] \[[模型](./configs/rec/svtrv2/readme.md#11-models-and-results)\] \[[数据集](./docs/svtrv2.md#downloading-datasets)\] \[[配置/训练/推理](./configs/rec/svtrv2/readme.md#3-model-training--evaluation)\] \[[基准测试](./docs/svtrv2.md#results-benchmark--configs--checkpoints)\]
- [技术文档](./docs/svtrv2.md)
- [从零训练SOTA模型指南](./docs/svtrv2.md#get-started-with-training-a-sota-scene-text-recognition-model-from-scratch)
## 快速开始
**注意**: OpenOCR支持ONNX和PyTorch双框架推理,环境相互独立。使用ONNX推理时无需安装PyTorch,反之亦然。
### 1. ONNX推理
#### 安装OpenOCR及依赖:
```shell
pip install openocr-python
pip install onnxruntime
```
#### 使用示例:
```python
from openocr import OpenOCR
onnx_engine = OpenOCR(backend='onnx', device='cpu')
img_path = '/path/img_path or /path/img_file'
result, elapse = onnx_engine(img_path)
```
### 2. PyTorch推理
#### 环境依赖:
- [PyTorch](http://pytorch.org/) >= 1.13.0
- Python >= 3.7
```shell
conda create -n openocr python==3.8
conda activate openocr
# 安装GPU版本
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# 或CPU版本
conda install pytorch torchvision torchaudio cpuonly -c pytorch
```
#### 2.1 Python包安装
**安装OpenOCR**:
```shell
pip install openocr-python
```
**使用示例**:
```python
from openocr import OpenOCR
engine = OpenOCR()
img_path = '/path/img_path or /path/img_file'
result, elapse = engine(img_path)
# Server模式
# engine = OpenOCR(mode='server')
```
#### 2.2 源码安装
```shell
git clone https://github.com/Topdu/OpenOCR.git
cd OpenOCR
pip install -r requirements.txt
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_det_repvit_ch.pth
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_repsvtr_ch.pth
# Server识别模型
# wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_svtrv2_ch.pth
```
**使用命令**:
```shell
# 端到端OCR系统: 检测+识别
python tools/infer_e2e.py --img_path=/path/img_path or /path/img_file
# 单独检测模型
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.infer_img=/path/img_path or /path/img_file
# 单独识别模型
python tools/infer_rec.py --c ./configs/rec/svtrv2/repsvtr_ch.yml --o Global.infer_img=/path/img_path or /path/img_file
```
##### 导出ONNX模型
```shell
pip install onnx
python tools/toonnx.py --c configs/rec/svtrv2/repsvtr_ch.yml --o Global.device=cpu
python tools/toonnx.py --c configs/det/dbnet/repvit_db.yml --o Global.device=cpu
```
##### ONNXRuntime推理
```shell
pip install onnxruntime
# 端到端OCR系统
python tools/infer_e2e.py --img_path=/path/img_path or /path/img_file --backend=onnx --device=cpu
# 检测模型
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.backend=onnx Global.device=cpu Global.infer_img=/path/img_path or /path/img_file
# 识别模型
python tools/infer_rec.py --c ./configs/rec/svtrv2/repsvtr_ch.yml --o Global.backend=onnx Global.device=cpu Global.infer_img=/path/img_path or /path/img_file
```
#### 本地Demo
```shell
pip install gradio==4.20.0
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/OCR_e2e_img.tar
tar xf OCR_e2e_img.tar
# 启动Demo
python demo_gradio.py
```
## 算法复现计划
### 场景文本识别(STR)
| 方法 | 会议/期刊 | 训练支持 | 评估支持 | 贡献者 |
| --------------------------------------------- | ------------------------------------------------------------------------------------------------ | -------- | -------- | ------------------------------------------- |
| [CRNN](./configs/rec/svtrs/) | [TPAMI 2016](https://arxiv.org/abs/1507.05717) | ✅ | ✅ | |
| [ASTER](./configs/rec/aster/) | [TPAMI 2019](https://ieeexplore.ieee.org/document/8395027) | ✅ | ✅ | [pretto0](https://github.com/pretto0) |
| [NRTR](./configs/rec/nrtr/) | [ICDAR 2019](https://arxiv.org/abs/1806.00926) | ✅ | ✅ | |
| [SAR](./configs/rec/sar/) | [AAAI 2019](https://aaai.org/papers/08610-show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition/) | ✅ | ✅ | [pretto0](https://github.com/pretto0) |
| [MORAN](./configs/rec/moran/) | [PR 2019](https://www.sciencedirect.com/science/article/abs/pii/S0031320319300263) | ✅ | ✅ | |
| [DAN](./configs/rec/dan/) | [AAAI 2020](https://arxiv.org/pdf/1912.10205) | ✅ | ✅ | |
| [RobustScanner](./configs/rec/robustscanner/) | [ECCV 2020](https://www.ecva.net/papers/eccv_2020/papers_ECCV/html/3160_ECCV_2020_paper.php) | ✅ | ✅ | [pretto0](https://github.com/pretto0) |
| [AutoSTR](./configs/rec/autostr/) | [ECCV 2020](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123690732.pdf) | ✅ | ✅ | |
| [SRN](./configs/rec/srn/) | [CVPR 2020](https://openaccess.thecvf.com/content_CVPR_2020/html/Yu_Towards_Accurate_Scene_Text_Recognition_With_Semantic_Reasoning_Networks_CVPR_2020_paper.html) | ✅ | ✅ | [pretto0](https://github.com/pretto0) |
| [SEED](./configs/rec/seed/) | [CVPR 2020](https://openaccess.thecvf.com/content_CVPR_2020/html/Qiao_SEED_Semantics_Enhanced_Encoder-Decoder_Framework_for_Scene_Text_Recognition_CVPR_2020_paper.html) | ✅ | ✅ | |
| [ABINet](./configs/rec/abinet/) | [CVPR 2021](https://openaccess.thecvf.com//content/CVPR2021/html/Fang_Read_Like_Humans_Autonomous_Bidirectional_and_Iterative_Language_Modeling_for_CVPR_2021_paper.html) | ✅ | ✅ | [YesianRohn](https://github.com/YesianRohn) |
| [VisionLAN](./configs/rec/visionlan/) | [ICCV 2021](https://openaccess.thecvf.com/content/ICCV2021/html/Wang_From_Two_to_One_A_New_Scene_Text_Recognizer_With_ICCV_2021_paper.html) | ✅ | ✅ | [YesianRohn](https://github.com/YesianRohn) |
| PIMNet | [ACM MM 2021](https://dl.acm.org/doi/10.1145/3474085.3475238) | | | TODO |
| [SVTR](./configs/rec/svtrs/) | [IJCAI 2022](https://www.ijcai.org/proceedings/2022/124) | ✅ | ✅ | |
| [PARSeq](./configs/rec/parseq/) | [ECCV 2022](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880177.pdf) | ✅ | ✅ | |
| [MATRN](./configs/rec/matrn/) | [ECCV 2022](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880442.pdf) | ✅ | ✅ | |
| [MGP-STR](./configs/rec/mgpstr/) | [ECCV 2022](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880336.pdf) | ✅ | ✅ | |
| [LPV](./configs/rec/lpv/) | [IJCAI 2023](https://www.ijcai.org/proceedings/2023/0189.pdf) | ✅ | ✅ | |
| [MAERec](./configs/rec/maerec/)(Union14M) | [ICCV 2023](https://openaccess.thecvf.com/content/ICCV2023/papers/Jiang_Revisiting_Scene_Text_Recognition_A_Data_Perspective_ICCV_2023_paper.pdf) | ✅ | ✅ | |
| [LISTER](./configs/rec/lister/) | [ICCV 2023](https://openaccess.thecvf.com/content/ICCV2023/papers/Cheng_LISTER_Neighbor_Decoding_for_Length-Insensitive_Scene_Text_Recognition_ICCV_2023_paper.pdf) | ✅ | ✅ | |
| [CDistNet](./configs/rec/cdistnet/) | [IJCV 2024](https://link.springer.com/article/10.1007/s11263-023-01880-0) | ✅ | ✅ | [YesianRohn](https://github.com/YesianRohn) |
| [BUSNet](./configs/rec/busnet/) | [AAAI 2024](https://ojs.aaai.org/index.php/AAAI/article/view/28402) | ✅ | ✅ | |
| DCTC | [AAAI 2024](https://ojs.aaai.org/index.php/AAAI/article/view/28575) | | | TODO |
| [CAM](./configs/rec/cam/) | [PR 2024](https://arxiv.org/abs/2402.13643) | ✅ | ✅ | |
| [OTE](./configs/rec/ote/) | [CVPR 2024](https://openaccess.thecvf.com/content/CVPR2024/html/Xu_OTE_Exploring_Accurate_Scene_Text_Recognition_Using_One_Token_CVPR_2024_paper.html) | ✅ | ✅ | |
| CFF | [IJCAI 2024](https://arxiv.org/abs/2407.05562) | | | TODO |
| [DPTR](./configs/rec/dptr/) | [ACM MM 2024](https://arxiv.org/abs/2408.05706) | | | [fd-zs](https://github.com/fd-zs) |
| VIPTR | [ACM CIKM 2024](https://arxiv.org/abs/2401.10110) | | | TODO |
| [IGTR](./configs/rec/igtr/) | [TPAMI 2025](https://doi.ieeecomputersociety.org/10.1109/TPAMI.2025.3525526) | ✅ | ✅ | |
| [SMTR](./configs/rec/smtr/) | [AAAI 2025](https://arxiv.org/abs/2407.12317) | ✅ | ✅ | |
| [CPPD](./configs/rec/cppd/) | [TPAMI Online Access](https://doi.ieeecomputersociety.org/10.1109/TPAMI.2025.3545453) | ✅ | ✅ | |
| [FocalSVTR-CTC](./configs/rec/svtrs/) | [2024](https://arxiv.org/abs/2407.12317) | ✅ | ✅ | |
| [SVTRv2](./configs/rec/svtrv2/) | [2024](https://arxiv.org/abs/2411.15858) | ✅ | ✅ | |
| [ResNet+Trans-CTC](./configs/rec/svtrs/) | | ✅ | ✅ | |
| [ViT-CTC](./configs/rec/svtrs/) | | ✅ | ✅ | |
#### 核心贡献者
______________________________________________________________________
复旦大学[FVL实验室](https://fvl.fudan.edu.cn)的Yiming Lei ([pretto0](https://github.com/pretto0)), Xingsong Ye ([YesianRohn](https://github.com/YesianRohn)), and Shuai Zhao ([fd-zs](https://github.com/fd-zs))在Zhineng Chen老师([个人主页](https://zhinchenfd.github.io/))指导下完成了主要算法复现工作,感谢他们的贡献。
### 场景文本检测(STD)
开发中
### 端到端文本识别(Text Spotting)
开发中
______________________________________________________________________
## 引用
如果我们的工作对您的研究有所帮助,请引用:
```bibtex
@article{Du2024SVTRv2,
title={SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition},
author={Yongkun Du and Zhineng Chen and Hongtao Xie and Caiyan Jia and Yu-Gang Jiang},
journal={CoRR},
volume={abs/2411.15858},
eprinttype={arXiv},
year={2024},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.15858}
}
```
## 致谢
本代码库基于[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)、[PytorchOCR](https://github.com/WenmuZhou/PytorchOCR)和[MMOCR](https://github.com/open-mmlab/mmocr)构建,感谢他们的出色工作!