# FreeSeg **Repository Path**: ByteDance/FreeSeg ## Basic Information - **Project Name**: FreeSeg - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-03-30 - **Last Updated**: 2026-01-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation This repository contains the pytorch codes and trained models described in the CVPR2023 paper "". This algorithm is proposed by ByteDance, Intelligent Creation, AutoML Team (字节跳动-智能创作-AutoML团队). Authors: Jie Qin, Jie Wu, Pengxiang Yan, Ming Li, Ren Yuxi, Xuefeng Xiao, Yitong Wang, Rui Wang, Shilei Wen, Xin Pan, Xingang Wang ## Overview ![overview](imgs/framework.png) ## Installation ### Environment * python>=3.7 * torch>=1.10.0 * torchvision>=0.11.1 * timm>=0.6.12 * detectron2>=0.6.0 follow [Detectron2 installation instructions](https://detectron2.readthedocs.io/en/latest/tutorials/install.html). * requirement.txt ### Other dependency The modified clip package. ```bash cd third_party/CLIP python -m pip install -Ue . ``` CUDA kernel for MSDeformAttn ```bash cd mask2former/modeling/heads/ops bash make.sh ``` ## Dataset Preparation We follow [Mask2Former](https://github.com/facebookresearch/Mask2Former/blob/main/datasets/README.md) to build some datasets used in our experiments. The datasets are assumed to exist in a directory specified by the environment variable `DETECTRON2_DATASETS`. Under this directory, detectron2 will look for datasets in the structure described below, if needed. ```bash $DETECTRON2_DATASETS/ ADEChallengeData2016/ coco/ VOC2012/ ``` You need to set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`. ### Expected dataset structure for [COCO](https://cocodataset.org/#download): ```bash coco/ annotations/ instances_{train,val}2017.json panoptic_{train,val}2017.json {train,val}2017/ # image files that are mentioned in the corresponding json panoptic_{train,val}2017/ # png annotations stuffthingmaps/ ``` Then transform the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset. ```bash python datasets/prepare_coco_alldata.py datasets/coco python datasets/prepare_coco_stuff_164k_sem_seg.py datasets/coco python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/train2017_base datasets/coco/stuffthingmaps_detectron2/train2017_base_label_count.json python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/val2017 datasets/coco/stuffthingmaps_detectron2/val2017_label_count.json ``` ### Expected dataset structure for [VOC2012](http://host.robots.ox.ac.uk/pascal/VOC/index.html): ``` VOC2012/ JPEGImages/ SegmentationClassAug/ {train,val}.txt ``` Then transform the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset. ```bash python datasets/prepare_voc_sem_seg.py datasets/VOC2012 python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train_base datasets/VOC2012/annotations_detectron2/train_base_label_count.json python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/val datasets/VOC2012/annotations_detectron2/val_label_count.json ``` ## Getting Started ### Training To train a model with "train_net.py", first make sure the preparations are done. Take the training on COCO as an example. Training prompts ```bash python train_net.py --config-file configs/coco-stuff-164k-156/mask2former_learn_prompt_bs32_16k.yaml --num-gpus 8 ``` Training model ```bash python train_net.py --config-file configs/coco-stuff-164k-156/mask2former_R101c_alltask_bs32_60k.yaml --num-gpus 8 MODEL.CLIP_ADAPTER.PROMPT_CHECKPOINT ${TRAINED_PROMPT_MODEL} ``` ### Evaluation ```bash python train_net.py --config-file configs/coco-stuff-164k-156/mask2former_R101c_alltask_bs32_60k.yaml --num-gpus 8 --eval-only MODEL.WEIGHTS ${TRAINED_MODEL} ``` ### Testing for Demo The model weight for demo can get from [model](https://drive.google.com/file/d/1X0oWfcpZo5bDkyFw7xiGBk_Yqx5gxhj_/view?usp=drive_link). ## Citation If you find this work useful in your method, you can cite the paper as below: ```bash @inproceedings{qin2023freeseg, title={FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation}, author={Qin, Jie and Wu, Jie and Yan, Pengxiang and Li, Ming and Yuxi, Ren and Xiao, Xuefeng and Wang, Yitong and Wang, Rui and Wen, Shilei and Pan, Xin and others}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={19446--19455}, year={2023} } ```