# TCM **Repository Path**: dlml2/TCM ## Basic Information - **Project Name**: TCM - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-12-06 - **Last Updated**: 2024-12-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Turning a CLIP Model into a Scene Text Detector This repository is build upon [mmocr 0.4.0](https://github.com/open-mmlab/mmocr/tree/0.x). ### NightTime-ArT Dataset NightTime-ArT dataset, collected from ArT, can be downloaded from [here](https://drive.google.com/file/d/1v3CshPqlvhpnK1_MKwqqkWJDikKl_g4Y). ## Usage ### Environment - cuda 11.1 - torch=1.8.0 - torchvision=0.9.0 - timm=0.4.12 - mmcv-full=1.3.17 - mmseg=0.20.2 - mmdet=2.19.1 - mmocr=0.4.0 The code is based on mmocr. Please first install the `mmcv-full` and `mmocr` following the official guidelines ([mmocr](https://github.com/open-mmlab/mmocr)). ### Dataset - Please following the mmocr official guidelines to prepare the [datasets](https://mmocr.readthedocs.io/en/v0.4.1/datasets/det.html) accordingly. - Configure the dataset path in [`ocrclip/configs/_base_/det_datasets`](ocrclip/configs/_base_/det_datasets). ### Pre-trained CLIP Models - Download the pre-trained CLIP models (`RN50.pt`) and save them to the `pretrained` folder. - Configure the pre-trained CLIP models path in config file as ```python model = dict( pretrained='xxx/ocrclip/pretrained/RN50.pt', ) ``` ### Pretraining & Training & Evaluation To pretrain the TCM model on SynthText/Synth150k, please configure the corresponding dataset path, then run: ``` bash dist_train.sh configs/textdet/xxnet/xxx.py 8 ``` To finetune the TCM model based on pretrained model, please configure the `load_from` to the pretrained checkpoint path, then run: ``` bash dist_train.sh configs/textdet/xxnet/xxx.py 8 ``` To evaluate the performance with checkpoint, run: ``` bash dist_test.sh configs/textdet/xxnet/xxx.py /path/to/checkpoint 1 --eval hmean-iou ``` ### Results | Method | Data | F-measure | Model | |--------|------|-----------|--------| | TCM-DB | TD | 88.8% | [config](ocrclip/configs/textdet/dbnet/clip_db_r50_fpnc_prompt_gen_vis_1200e_ft_td_ranger_post_taiji.py) [weights](https://mega.nz/file/daZWnYQI#XTQbvp86rxf-zIoQKQwVcXeUnGNqj4ADm1OijQKgEMM) | | TCM-DB | IC15 | 88.8% | [config](ocrclip/configs/textdet/dbnet/clip_db_r50_fpnc_prompt_gen_vis_1200e_ft_gen_ic15_adam_taiji.py) [weights](https://mega.nz/file/cDQ1RASb#k5IOBtv12legGQPFCBW4-7e8SuD9WXcX4uoTE4Z9hpA) | | TCM-DB | CTW | 85.1% | [config](ocrclip/configs/textdet/dbnet/clip_db_r50_fpnc_prompt_gen_vis_32_1200e_ft_ctw_adamw_taiji.py) | | TCM-DB | TT | 85.9% | [config](ocrclip/configs/textdet/dbnet/clip_db_r50_fpnc_prompt_gen_vis_32_1200e_ft_tt_adamw_taiji.py) | ## Turning a CLIP Model into a Scene Text Spotter ### TCM for Scene Text Spotter Please refer to the [`spotter`](spotter) folder for more details. ### TCM for Rotated Object Detection Please refer to the [`rotated_object_detection`](rotated_object_detection) folder for more details. ### TODO - [x] Add FastTCM - [ ] Migration from mmocr 0.4.0 to mmocr 1.0.0 - [ ] Refactor and clean code - [ ] Release domain adaptation setting ### Cites If you find this project helpful for your research, please consider citing the paper ``` @inproceedings{Yu2023TurningAC, title={Turning a CLIP Model into a Scene Text Detector}, author={Wenwen Yu and Yuliang Liu and Wei Hua and Deqiang Jiang and Bo Ren and Xiang Bai}, booktitle={IEEE Conference on Computer Vision and Pattern Recognition}, year={2023} } @article{Yu2024TurningAC, title={Turning a CLIP Model into a Scene Text Spotter}, author={Wenwen Yu and Yuliang Liu and Xingkui Zhu and Haoyu Cao and Xing Sun and Xiang Bai}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2024} } ``` ### Licence This project is under the CC-BY-NC 4.0 license. See `LICENSE` for more details. ### Acknowledges The project partially based on [MMOCR](https://github.com/open-mmlab/mmocr), [CLIP](https://github.com/openai/CLIP), [MMRotate](https://github.com/open-mmlab/mmrotate), [DenseCLIP](https://github.com/raoyongming/DenseCLIP), [AdelaiDet](https://github.com/aim-uofa/AdelaiDet), [Deformable-DETR](https://github.com/fundamentalvision/Deformable-DETR), [TESTR](https://github.com/mlpc-ucsd/TESTR). Thanks for their great works.