# TagCLIP

**Repository Path**: data_factory/TagCLIP

## Basic Information

- **Project Name**: TagCLIP
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-05-19
- **Last Updated**: 2024-05-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tagclip-a-local-to-global-framework-to/unsupervised-semantic-segmentation-with-11)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-with-11?p=tagclip-a-local-to-global-framework-to)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tagclip-a-local-to-global-framework-to/unsupervised-semantic-segmentation-with-10)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-with-10?p=tagclip-a-local-to-global-framework-to)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tagclip-a-local-to-global-framework-to/unsupervised-semantic-segmentation-with-1)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-with-1?p=tagclip-a-local-to-global-framework-to)
# TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training (AAAI 2024)
:closed_book: [[arxiv paper]](https://arxiv.org/abs/2312.12828)

![images](framework.png)
## Reqirements

```
# create conda env
conda create -n tagclip python=3.9
conda activate tagclip

# install packages
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install opencv-python ftfy regex tqdm ttach lxml
```

## Preparing Datasets
Download each dataset from the official website ([PASCAL VOC 2007](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/), [PASCAL VOC 2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/), [COCO 2014](https://cocodataset.org/#download), [COCO 2017](https://cocodataset.org/#download)) and put them under local directory like `/local_root/datasets`.
The structure of `/local_root/datasets/`can be organized as follows:
```
---VOC2007/
       --Annotations
       --ImageSets
       --JPEGImages
---VOC2012/   # similar to VOC2007
       --Annotations
       --ImageSets
       --JPEGImages
       --SegmentationClass
---COCO2014/
       --train2014  # optional, not used in TagCLIP
       --val2014
---COCO2017/
       --train2017  # optional, not used in TagCLIP
       --val2017
       --SegmentationClass
---cocostuff/
       --SegmentationClass
```
Note that we use VOC 2007 and COCO 2014 for multi-label classification evaluation. VOC 2012 and COCO 2017 are adopted for annotation-free semantic segmentation (classify then segment). The processed `SegmentationClass` for [COCO 2017](https://drive.google.com/file/d/1LUEVI62pFHAVJag1MDV5b-Vica2KcUlV/view?usp=drive_link) and [cocostuff](https://drive.google.com/file/d/1nQtOso9JIIdDnqjma34vm9vQo7dPe6DU/view?usp=drive_link) are provided in Google Drive.

### Preparing pre-trained model
Download CLIP pre-trained [ViT-B/16](https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt) and put it to `/local_root/pretrained_models/clip`.

## Usage
### Multi-Label Classification.
```
# For VOC2007
python classify.py --img_root /local_root/datasets/VOC2007/JPEGImages/ --split_file ./imageset/voc2007/test_cls.txt --model_path /local_root/pretrained_models/clip/ViT-B-16.pt --dataset voc2007

# For COCO14
python classify.py --img_root /local_root/datasets/COCO2014/val2014/ --split_file ./imageset/coco2014/val_cls.txt --model_path /local_root/pretrained_models/clip/ViT-B-16.pt --dataset coco2014
```


### Annotation-free Semantic Segmentation
By combing TagCLIP and weakly supervised semantic segmentation (WSSS) method [CLIP-ES](https://github.com/linyq2117/CLIP-ES), we can realize annotation-free semantic segmentation.

First generate category labels for each image using TagCLIP, which will be saved in `./output/{args.dataset}_val_tagclip.txt`. We also give our generated labels as `./output/{args.dataset}_val_tagclip_example.txt` for reference.
```
# For VOC2012
python classify.py --img_root /local_root/datasets/VOC2012/JPEGImages/ --split_file ./imageset/voc2012/val.txt --model_path /local_root/pretrained_models/clip/ViT-B-16.pt --dataset voc2012 --save_file

# For COCO17
python classify.py --img_root /local_root/datasets/COCO2017/val2017/ --split_file ./imageset/coco2017/val.txt --model_path /local_root/pretrained_models/clip/ViT-B-16.pt --dataset coco2017 --save_file

# For cocostuff
python classify.py --img_root /local_root/datasets/COCO2017/val2017/ --split_file ./imageset/cocostuff/val.txt --model_path /local_root/pretrained_models/clip/ViT-B-16.pt --dataset cocostuff --save_file
```

Then use CLIP-ES to geberate and evaluate segmentation masks.

```
cd CLIP-ES

# For VOC2012
python generate_cams_voc.py --img_root /local_root/datasets/VOC2012/JPEGImages --split_file ../output/voc2012_val_tagclip.txt --model /local_root/pretrained_models/clip/ViT-B-16.pt --cam_out_dir ./output/voc2012/val/tagclip
python eval_cam.py --cam_out_dir ./output/voc2012/val/tagclip/ --cam_type attn_highres --gt_root /local_root/datasets/VOC2012/SegmentationClass --split_file ../imageset/voc2012/val.txt

# For COCO17
python generate_cams_coco.py --img_root /local_root/datasets/COCO2017/val2017/ --split_file ../output/coco2017_val_tagclip.txt --model /local_root/pretrained_models/clip/ViT-B-16.pt --cam_out_dir ./output/coco2017/val/tagclip
python eval_cam.py --cam_out_dir ./output/coco2017/val/tagclip/ --cam_type attn_highres --gt_root /local_root/datasets/COCO2017/SegmentationClass --split_file ../imageset/coco2017/val.txt

# For cocostuff
python generate_cams_cocostuff.py --img_root /local_root/datasets/COCO2017/val2017/ --split_file ../output/cocostuff_val_tagclip.txt --model /local_root/pretrained_models/clip/ViT-B-16.pt --cam_out_dir ./output/cocostuff/val/tagclip
python eval_cam_cocostuff.py --cam_out_dir ./output/cocostuff/val/tagclip/ --cam_type attn_highres --gt_root /local_root/datasets/cocostuff/SegmentationClass/val --split_file ../imageset/cocostuff/val.txt
```

### Use CRF to postprocess
```
# install dense CRF
pip install --force-reinstall cython==0.29.36
pip install joblib
pip install --no-build-isolation git+https://github.com/lucasb-eyer/pydensecrf.git

# eval CRF processed pseudo masks
## for VOC12 
python eval_cam_with_crf.py --cam_out_dir ./output/voc2012/val/tagclip/ --gt_root /local_root/datasets/VOC2012/SegmentationClass --image_root /local_root/datasets/VOC2012/JPEGImages --split_file ../imageset/voc2012/val.txt --eval_only

## for COCO14
python eval_cam_with_crf.py --cam_out_dir ./output/coco2017/val/tagclip/ --gt_root /local_root/datasets/COCO2017/SegmentationClass --image_root /local_root/datasets/COCO2017/val2017 --split_file ../imageset/coco2017/val.txt --eval_only

## for cocostuff
python eval_cam_with_crf_cocostuff.py --cam_out_dir ./output/cocostuff/val/tagclip/ --gt_root /local_root/datasets/cocostuff/SegmentationClass/val --image_root /local_root/datasets/COCO2017/val2017 --split_file ../imageset/cocostuff/val.txt --eval_only

```

## Results
### Multi-label Classification (mAP)
| Method | VOC2007 | COCO2014 |
| --- | --- | --- | 
| TagCLIP (paper) | 92.8 | 68.8 |
| TagCLIP (this repo) | 92.8 | 68.7 |
### Annotation-free semantic Segmentation (mIoU)
| Method | VOC2012 | COCO2014 | cocostuff |
| --- | --- | --- | --- |
| CLS-SEG (paper) | 64.8 | 34.0 | 30.1 |
| CLS-SEG+CRF (paper) | 68.7 | 35.3 | 31.0
| CLS-SEG (this repo) | 64.7 | 34.0 | 30.3 |
| CLS-SEG+CRF (this repo) | 68.6 | 35.2 | 31.1

 
## Acknowledgement
We borrowed partial codes from [CLIP](https://github.com/openai/CLIP), [pytorch_grad_cam](https://github.com/jacobgil/pytorch-grad-cam/tree/61e9babae8600351b02b6e90864e4807f44f2d4a), [CLIP-ES](https://github.com/linyq2117/CLIP-ES) and [CLIP_Surgery](https://github.com/xmed-lab/CLIP_Surgery). Thanks for their wonderful works.

## Citation
If you find this project helpful for your research, please consider citing the following BibTeX entry.
```
@misc{lin2023tagclip,
      title={TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training}, 
      author={Yuqi Lin and Minghao Chen and Kaipeng Zhang and Hengjia Li and Mingming Li and Zheng Yang and Dongqin Lv and Binbin Lin and Haifeng Liu and Deng Cai},
      year={2023},
      eprint={2312.12828},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```