# PaddleOCRModelConvert **Repository Path**: RapidAI/PaddleOCRModelConvert ## Basic Information - **Project Name**: PaddleOCRModelConvert - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-01-25 - **Last Updated**: 2024-08-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

🔄 PaddleOCR Model Convert

 
PyPI SemVer2.0
### Introduction - This repository is mainly to convert [Inference Model in PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/models_list.md) into ONNX format. - **Input**: **url** or local **tar** path of inference model - **Output**: converted **ONNX** model - If it is a recognition model, you need to provide the original txt path of the corresponding dictionary (**Open the txt file in github, click the path after raw in the upper right corner, similar to [this](https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/ppocr/utils/ppocr_keys_v1.txt)**), used to write the dictionary into the ONNX model - ☆ It needs to be used with the relevant reasoning code in [RapidOCR](https://github.com/RapidAI/RapidOCR) - If you encounter a model that cannot be successfully converted, you can check which steps are wrong one by one according to the ideas in the figure below. ### Overall framework ```mermaid flowchart TD A([PaddleOCR inference model]) --paddle2onnx--> B([ONNX]) B --> C([Change Dynamic Input]) --> D([Rec: save the character dict to onnx]) D --> E([Save]) ``` ### Installation ```bash pip install paddleocr_convert ``` ### Usage > [!WARNING] > > Only support the **reasoning model** in the download address in [link](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/models_list.md), if it is a training model, Manual conversion to inference format is required. > > The **slim quantized model** in PaddleOCR does not support conversion. #### Using the command line - Usage: ```bash $ paddleocr_convert -h usage: paddleocr_convert [-h] [-p MODEL_PATH] [-o SAVE_DIR] [-txt_path TXT_PATH] optional arguments: -h, --help show this help message and exit -p MODEL_PATH, --model_path MODEL_PATH The inference model url or local path of paddleocr. e.g. https://paddleocr.bj.bcebos.com/PP- OCRv3/chinese/ch_PP-OCRv3_det_infer.tar or models/ch_PP-OCRv3_det_infer.tar -o SAVE_DIR, --save_dir SAVE_DIR The directory of saving the model. -txt_path TXT_PATH, --txt_path TXT_PATH The raw txt url or local txt path, if the model is recognition model. ``` - Example: ```bash #online $ paddleocr_convert -p https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar \ -o models $ paddleocr_convert -p https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar\ -o models\ -txt_path https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/ppocr/utils/ppocr_keys_v1.txt # offline $ paddleocr_convert -p models/ch_PP-OCRv3_det_infer.tar\ -o models $ paddleocr_convert -p models/ch_PP-OCRv3_rec_infer.tar\ -o models\ -txt_path models/ppocr_keys_v1.txt ``` #### Script use - online mode ```python from paddleocr_convert import PaddleOCRModelConvert converter = PaddleOCRModelConvert() save_dir = 'models' url = 'https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar' txt_url = 'https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.6/ppocr/utils/ppocr_keys_v1.txt' converter(url, save_dir, txt_path=txt_url) ``` - offline mode ```python from paddleocr_convert import PaddleOCRModelConvert converter = PaddleOCRModelConvert() save_dir = 'models' model_path = 'models/ch_PP-OCRv3_rec_infer.tar' txt_path = 'models/ppocr_keys_v1.txt' converter(model_path, save_dir, txt_path=txt_path) ``` ### Use the model Assuming that the model needs to be recognized in Japanese, and it has been converted, the path is `local/models/japan.onnx` 1. Install `rapidocr_onnxruntime` library ```bash pip install rapidocr_onnxruntime ``` 2. Script use ```python from rapidocr_onnxruntime import RapidOCR model_path = 'local/models/japan.onnx' engine = RapidOCR(rec_model_path=model_path) img = '1.jpg' result, elapse = engine(img) ``` 3. CLI use ```bash rapidocr_onnxruntime -img 1.jpg --rec_model_path local/models/japan.onnx ``` ### Changelog
Click to expand - 2023-09-22 v0.0.17 update: - Improve the log when meets the error. - 2023-07-27 v0.0.16 update: - Added the online conversion version of ModelScope. - Change python version from python 3.6 ~ 3.11. - 2023-04-13 update: - Add online conversion program [link](https://huggingface.co/spaces/SWHL/PaddleOCRModelConverter) - 2023-03-05 v0.0.4~7 update: - Support transliteration of local models and dictionaries - Optimize internal logic and error feedback - 2023-02-28 v0.0.3 update: - Added setting to automatically change to dynamic input for models that are not dynamic input - 2023-02-27 v0.0.2 update: - Encapsulate the conversion model code into a package, which is convenient for self-help model conversion - 2022-08-15 v0.0.1 update: - Write the dictionary of the recognition model into the meta in the onnx model for subsequent distribution.