# DCT-Mask

**Repository Path**: aliyun/DCT-Mask

## Basic Information

- **Project Name**: DCT-Mask
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-05-08
- **Last Updated**: 2025-06-26

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

This project hosts the code for implementing the DCT-MASK algorithms for instance segmentation.

> [**DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation**]
> Xing Shen*, Jirui Yang*, Chunbo Wei, Bing Deng, Jianqiang Huang, Xiansheng Hua
 Xiaoliang Cheng, Kewei Liang
>
> In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition(CVPR 2021)
>
> *arXiv preprint([arXiv:2011.09876](https://arxiv.org/abs/2011.09876))*  

## Contributions
- We propose a high-quality and low-complexity mask representation for instance segmentation, which encodes the high-resolution binary mask into a compact vector with discrete cosine transform.
- With slight modifications, DCT-Mask could be integrated
  into most pixel-based frameworks, and achieve
  significant and consistent improvement on different
  datasets, backbones, and training schedules. Specifically,
  it obtains more improvements for more complex
  backbones and higher-quality annotations.
- DCT-Mask does not require extra pre-processing or
  pre-training. It achieves high-resolution mask prediction
  at a speed similar to low-resolution.

## Installation
#### Requirements
- PyTorch ≥ 1.5 and fvcore == 0.1.1.post20200716

This implementation is based on [detectron2](https://github.com/facebookresearch/detectron2). Please refer to [INSTALL.md](INSTALL.md). for installation and dataset preparation.

## Usage 
The codes of this project is on projects/DCT_Mask/ 
### Train with multiple GPUs
    cd ./projects/DCT_Mask/
    ./train1.sh

### Testing
    cd ./projects/DCT_Mask/
    ./test1.sh
## Model ZOO 
### Trained models on COCO
Model |  Backbone | Schedule | Multi-scale training | Inference time (s/im) | AP (minival) | Link
--- |:---:|:---:|:---:|:---:|:---:|:---:
DCT-Mask R-CNN | R50 | 1x | Yes |   0.0465 | 36.5  | [download(Fetch code: xpdm)](https://pan.baidu.com/s/1p1OK9KU3ojVwM0gj8nqkPw)
DCT-Mask R-CNN | R101 | 3x | Yes |   0.0595 | 39.9  | [download(Fetch code: 7q6x)](https://pan.baidu.com/s/19IYgrUXi4o_gTNl8MzGIOA)
DCT-Mask R-CNN | RX101 | 3x | Yes |   0.1049 | 41.2  | [download(Fetch code: ufw2)](https://pan.baidu.com/s/149NL1S4AfJJSRSki3bVpGw)
Casecade DCT-Mask R-CNN | R50  | 1x | Yes |   0.0630 | 37.5  | [download(Fetch code: yqxp)](https://pan.baidu.com/s/1U9AF8bP5FTWYqBGVrt5HmA)
Casecade DCT-Mask R-CNN | R101  | 3x | Yes |   0.0750 | 40.8  | [download(Fetch code: r8xv)](https://pan.baidu.com/s/11UQ1Zot7M5FqK1DIa-HOHA)
Casecade DCT-Mask R-CNN | RX101  | 3x | Yes |   0.1195 | 42.0  | [download(Fetch code: pdej)](https://pan.baidu.com/s/1xaChv_C-YRxkxY6gjumHOw)

### Trained models on Cityscapes
Model |Data|  Backbone | Schedule | Multi-scale training | AP (val) | Link
--- |:---:|:---:|:---:|:---:|:---:|:---:
DCT-Mask R-CNN | Fine-Only | R50 | 1x | Yes | 37.0  | [download(Fetch code: dn7i)](https://pan.baidu.com/s/1vcDVv8NbOm3OV8_2fsf-DQ)
DCT-Mask R-CNN | CoCo-Pretrain +Fine | R50 | 1x | Yes | 39.6  | [download(Fetch code: ntqf)](https://pan.baidu.com/s/1dVcSwP2PG_6jZYgVMbWT0w)

#### Notes
- We observe about 0.2 AP noise in COCO.
- High variance observed in CityScapes when trained on fine annotations only. 
  We report the median of 5 runs AP in the article (i.e. 35.6), while in this repo we report the best results (37.0).
- Initialized from COCO pre-training will reduce the variance on CityScapes as well as increasing mask AP.
- The inference time is measured on single GPU with batchsize 1. All GPUs are NVIDIA V100.
- [Lvis 0.5](https://) is used for evaluation.

## Contributing to the project
Any pull requests or issues are welcome. 

If there is any problem with this project, please contact [Xing Shen](shenxingsx@zju.edu.cn).

## Citations
Please consider citing our papers in your publications if the project helps your research. 

## License
- MIT License.