# kmax-deeplab **Repository Path**: ByteDance/kmax-deeplab ## Basic Information - **Project Name**: kmax-deeplab - **Description**: a PyTorch re-implementation of ECCV 2022 paper based on Detectron2: k-means mask Transformer. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-07-08 - **Last Updated**: 2026-01-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # kMaX-DeepLab (ECCV 2022) This is a *PyTorch re-implementation* of our ECCV 2022 paper based on Detectron2: [k-means mask Transformer](https://arxiv.org/pdf/2207.04044.pdf). *Disclaimer*: This is a *re-implementation* of kMaX-DeepLab in PyTorch. While we have tried our best to reproduce all the numbers reported in the paper, please refer to the original numbers in the [paper](https://arxiv.org/pdf/2207.04044.pdf) or [tensorflow repo](https://github.com/google-research/deeplab2/blob/main/g3doc/projects/kmax_deeplab.md) when making performance or speed comparisons. [kMaX-DeepLab](https://arxiv.org/pdf/2207.04044.pdf) is an end-to-end method for general segmentation tasks. Built upon [MaX-DeepLab](https://arxiv.org/pdf/2012.00759.pdf) and [CMT-DeepLab](https://arxiv.org/pdf/2206.08948.pdf), kMaX-DeepLab proposes a novel view to regard the mask transformer as a process of iteratively performing cluster-assignment and cluster-update steps.

Insipred by the similarity between cross-attention and k-means clustering algorithm, kMaX-DeepLab proposes k-means cross-attention, which adopts a simple modification by changing the activation function in cross-attention from spatial-wise softmax to cluster-wise argmax.

As a result, kMaX-DeepLab not only produces much more plausible attention map but also enjoys a much better performance. ## Installation The code-base is verified with pytorch==1.12.1, torchvision==0.13.1, cudatoolkit==11.3, and detectron2==0.6, please install other libiaries through *pip3 install -r requirements.txt* Please refer to [Mask2Former's script](https://github.com/facebookresearch/Mask2Former/blob/main/datasets/README.md) for data preparation. ## Model Zoo Note that model zoo below are *trained from scratch using this PyTorch code-base*, we also offer code for porting and evaluating the [TensorFlow checkpoints](https://github.com/google-research/deeplab2/blob/main/g3doc/projects/kmax_deeplab.md) in the section *Porting TensorFlow Weights*. ### COCO Panoptic Segmentation

Backbone	PQ	SQ	RQ	PQ^thing	PQ^stuff	ckpt
ResNet-50	53.3	83.2	63.3	58.8	45.0	download
ConvNeXt-Tiny	55.5	83.3	65.9	61.4	46.7	download
ConvNeXt-Small	56.7	83.4	67.2	62.7	47.7	download
ConvNeXt-Base	57.2	83.4	67.9	63.4	47.9	download
ConvNeXt-Large	57.9	83.5	68.5	64.3	48.4	download

### Cityscapes Panoptic Segmentation

Backbone	PQ	SQ	RQ	PQ^thing	PQ^stuff	AP	IoU	ckpt
ResNet-50	63.5	82.0	76.5	57.8	67.7	38.6	79.5	download
ConvNeXt-Large	68.4	83.3	81.3	62.6	72.6	45.1	83.0	download

### ADE20K Panoptic Segmentation

Backbone	PQ	SQ	RQ	PQ^thing	PQ^stuff	ckpt
ResNet-50	42.2	81.6	50.4	41.9	42.7	download
ConvNeXt-Large	50.0	83.3	59.1	49.5	50.8	download

## Example Commands for Training and Testing To train kMaX-DeepLab with ResNet-50 backbone: ``` python3 train_net.py --num-gpus 8 --num-machines 4 \ --machine-rank MACHINE_RANK --dist-url DIST_URL \ --config-file configs/coco/panoptic_segmentation/kmax_r50.yaml ``` The training takes 53 hours with 32 V100 on our end. To test kMaX-DeepLab with ResNet-50 backbone and the provided weights: ``` python3 train_net.py --num-gpus NUM_GPUS \ --config-file configs/coco/panoptic_segmentation/kmax_r50.yaml \ --eval-only MODEL.WEIGHTS kmax_r50.pth ``` Integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the Web Demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/fun-research/kMaX-DeepLab) ## Porting TensorFlow Weights We also provide a [script](./convert-tf-weights-to-d2.py) to convert the official TensorFlow weights into PyTorch format and use them in this code-base. Example for porting and evaluating kMaX with ConvNeXt-Large on Cityscapes from [TensorFlow weights](https://github.com/google-research/deeplab2/blob/main/g3doc/projects/kmax_deeplab.md): ``` pip3 install tensorflow==2.9 keras==2.9 wget https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/kmax_convnext_large_res1281_ade20k_train.tar.gz tar -xvf kmax_convnext_large_res1281_ade20k_train.tar.gz python3 convert-tf-weights-to-d2.py ./kmax_convnext_large_res1281_ade20k_train/ckpt-100000 kmax_convnext_large_res1281_ade20k_train.pkl python3 train_net.py --num-gpus 8 --config-file configs/ade20k/kmax_convnext_large.yaml \ --eval-only MODEL.WEIGHTS ./kmax_convnext_large_res1281_ade20k_train.pkl ``` This expexts to give PQ = 50.6620. Note that minor performance difference may exist due to numeric difference across different deep learning frameworks and implementation details. ## Citing kMaX-DeepLab If you find this code helpful in your research or wish to refer to the baseline results, please use the following BibTeX entry. * kMaX-DeepLab: ``` @inproceedings{kmax_deeplab_2022, author={Qihang Yu and Huiyu Wang and Siyuan Qiao and Maxwell Collins and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen}, title={{k-means Mask Transformer}}, booktitle={ECCV}, year={2022} } ``` * CMT-DeepLab: ``` @inproceedings{cmt_deeplab_2022, author={Qihang Yu and Huiyu Wang and Dahun Kim and Siyuan Qiao and Maxwell Collins and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen}, title={CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation}, booktitle={CVPR}, year={2022} } ``` ## Acknowledgements We express gratitude to the following open-source projects which this code-base is based on: [DeepLab2](https://github.com/google-research/deeplab2) [Mask2Former](https://github.com/facebookresearch/Mask2Former)