# kmax-deeplab **Repository Path**: ByteDance/kmax-deeplab ## Basic Information - **Project Name**: kmax-deeplab - **Description**: a PyTorch re-implementation of ECCV 2022 paper based on Detectron2: k-means mask Transformer. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-07-08 - **Last Updated**: 2026-01-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # kMaX-DeepLab (ECCV 2022) This is a *PyTorch re-implementation* of our ECCV 2022 paper based on Detectron2: [k-means mask Transformer](https://arxiv.org/pdf/2207.04044.pdf). *Disclaimer*: This is a *re-implementation* of kMaX-DeepLab in PyTorch. While we have tried our best to reproduce all the numbers reported in the paper, please refer to the original numbers in the [paper](https://arxiv.org/pdf/2207.04044.pdf) or [tensorflow repo](https://github.com/google-research/deeplab2/blob/main/g3doc/projects/kmax_deeplab.md) when making performance or speed comparisons. [kMaX-DeepLab](https://arxiv.org/pdf/2207.04044.pdf) is an end-to-end method for general segmentation tasks. Built upon [MaX-DeepLab](https://arxiv.org/pdf/2012.00759.pdf) and [CMT-DeepLab](https://arxiv.org/pdf/2206.08948.pdf), kMaX-DeepLab proposes a novel view to regard the mask transformer as a process of iteratively performing cluster-assignment and cluster-update steps.

Insipred by the similarity between cross-attention and k-means clustering algorithm, kMaX-DeepLab proposes k-means cross-attention, which adopts a simple modification by changing the activation function in cross-attention from spatial-wise softmax to cluster-wise argmax.

As a result, kMaX-DeepLab not only produces much more plausible attention map but also enjoys a much better performance. ## Installation The code-base is verified with pytorch==1.12.1, torchvision==0.13.1, cudatoolkit==11.3, and detectron2==0.6, please install other libiaries through *pip3 install -r requirements.txt* Please refer to [Mask2Former's script](https://github.com/facebookresearch/Mask2Former/blob/main/datasets/README.md) for data preparation. ## Model Zoo Note that model zoo below are *trained from scratch using this PyTorch code-base*, we also offer code for porting and evaluating the [TensorFlow checkpoints](https://github.com/google-research/deeplab2/blob/main/g3doc/projects/kmax_deeplab.md) in the section *Porting TensorFlow Weights*. ### COCO Panoptic Segmentation
Backbone PQ SQ RQ PQthing PQstuff ckpt
ResNet-50 53.3 83.2 63.3 58.8 45.0 download
ConvNeXt-Tiny 55.5 83.3 65.9 61.4 46.7 download
ConvNeXt-Small 56.7 83.4 67.2 62.7 47.7 download
ConvNeXt-Base 57.2 83.4 67.9 63.4 47.9 download
ConvNeXt-Large 57.9 83.5 68.5 64.3 48.4 download
### Cityscapes Panoptic Segmentation
Backbone PQ SQ RQ PQthing PQstuff AP IoU ckpt
ResNet-50 63.5 82.0 76.5 57.8 67.7 38.6 79.5 download
ConvNeXt-Large 68.4 83.3 81.3 62.6 72.6 45.1 83.0 download
### ADE20K Panoptic Segmentation
Backbone PQ SQ RQ PQthing PQstuff ckpt
ResNet-50 42.2 81.6 50.4 41.9 42.7 download
ConvNeXt-Large 50.0 83.3 59.1 49.5 50.8 download
## Example Commands for Training and Testing To train kMaX-DeepLab with ResNet-50 backbone: ``` python3 train_net.py --num-gpus 8 --num-machines 4 \ --machine-rank MACHINE_RANK --dist-url DIST_URL \ --config-file configs/coco/panoptic_segmentation/kmax_r50.yaml ``` The training takes 53 hours with 32 V100 on our end. To test kMaX-DeepLab with ResNet-50 backbone and the provided weights: ``` python3 train_net.py --num-gpus NUM_GPUS \ --config-file configs/coco/panoptic_segmentation/kmax_r50.yaml \ --eval-only MODEL.WEIGHTS kmax_r50.pth ``` Integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the Web Demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/fun-research/kMaX-DeepLab) ## Porting TensorFlow Weights We also provide a [script](./convert-tf-weights-to-d2.py) to convert the official TensorFlow weights into PyTorch format and use them in this code-base. Example for porting and evaluating kMaX with ConvNeXt-Large on Cityscapes from [TensorFlow weights](https://github.com/google-research/deeplab2/blob/main/g3doc/projects/kmax_deeplab.md): ``` pip3 install tensorflow==2.9 keras==2.9 wget https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/kmax_convnext_large_res1281_ade20k_train.tar.gz tar -xvf kmax_convnext_large_res1281_ade20k_train.tar.gz python3 convert-tf-weights-to-d2.py ./kmax_convnext_large_res1281_ade20k_train/ckpt-100000 kmax_convnext_large_res1281_ade20k_train.pkl python3 train_net.py --num-gpus 8 --config-file configs/ade20k/kmax_convnext_large.yaml \ --eval-only MODEL.WEIGHTS ./kmax_convnext_large_res1281_ade20k_train.pkl ``` This expexts to give PQ = 50.6620. Note that minor performance difference may exist due to numeric difference across different deep learning frameworks and implementation details. ## Citing kMaX-DeepLab If you find this code helpful in your research or wish to refer to the baseline results, please use the following BibTeX entry. * kMaX-DeepLab: ``` @inproceedings{kmax_deeplab_2022, author={Qihang Yu and Huiyu Wang and Siyuan Qiao and Maxwell Collins and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen}, title={{k-means Mask Transformer}}, booktitle={ECCV}, year={2022} } ``` * CMT-DeepLab: ``` @inproceedings{cmt_deeplab_2022, author={Qihang Yu and Huiyu Wang and Dahun Kim and Siyuan Qiao and Maxwell Collins and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen}, title={CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation}, booktitle={CVPR}, year={2022} } ``` ## Acknowledgements We express gratitude to the following open-source projects which this code-base is based on: [DeepLab2](https://github.com/google-research/deeplab2) [Mask2Former](https://github.com/facebookresearch/Mask2Former)