# kmax-deeplab
**Repository Path**: ByteDance/kmax-deeplab
## Basic Information
- **Project Name**: kmax-deeplab
- **Description**: a PyTorch re-implementation of ECCV 2022 paper based on Detectron2: k-means mask Transformer.
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-07-08
- **Last Updated**: 2026-01-25
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# kMaX-DeepLab (ECCV 2022)
This is a *PyTorch re-implementation* of our ECCV 2022 paper based on Detectron2: [k-means mask Transformer](https://arxiv.org/pdf/2207.04044.pdf).
*Disclaimer*: This is a *re-implementation* of kMaX-DeepLab in PyTorch. While we have tried our best to reproduce all the numbers reported in the paper, please refer to the original numbers in the [paper](https://arxiv.org/pdf/2207.04044.pdf) or [tensorflow repo](https://github.com/google-research/deeplab2/blob/main/g3doc/projects/kmax_deeplab.md) when making performance or speed comparisons.
[kMaX-DeepLab](https://arxiv.org/pdf/2207.04044.pdf) is an end-to-end method for
general segmentation tasks. Built upon
[MaX-DeepLab](https://arxiv.org/pdf/2012.00759.pdf) and
[CMT-DeepLab](https://arxiv.org/pdf/2206.08948.pdf), kMaX-DeepLab proposes a
novel view to regard the mask transformer as a process of iteratively
performing cluster-assignment and cluster-update steps.
Insipred by the similarity between cross-attention and k-means clustering
algorithm, kMaX-DeepLab proposes k-means cross-attention, which adopts a simple
modification by changing the activation function in cross-attention from
spatial-wise softmax to cluster-wise argmax.
As a result, kMaX-DeepLab not only produces much more plausible attention map
but also enjoys a much better performance.
## Installation
The code-base is verified with pytorch==1.12.1, torchvision==0.13.1, cudatoolkit==11.3, and detectron2==0.6,
please install other libiaries through *pip3 install -r requirements.txt*
Please refer to [Mask2Former's script](https://github.com/facebookresearch/Mask2Former/blob/main/datasets/README.md) for data preparation.
## Model Zoo
Note that model zoo below are *trained from scratch using this PyTorch code-base*, we also offer code for porting and evaluating the [TensorFlow checkpoints](https://github.com/google-research/deeplab2/blob/main/g3doc/projects/kmax_deeplab.md) in the section *Porting TensorFlow Weights*.
### COCO Panoptic Segmentation
### Cityscapes Panoptic Segmentation
### ADE20K Panoptic Segmentation
## Example Commands for Training and Testing
To train kMaX-DeepLab with ResNet-50 backbone:
```
python3 train_net.py --num-gpus 8 --num-machines 4 \
--machine-rank MACHINE_RANK --dist-url DIST_URL \
--config-file configs/coco/panoptic_segmentation/kmax_r50.yaml
```
The training takes 53 hours with 32 V100 on our end.
To test kMaX-DeepLab with ResNet-50 backbone and the provided weights:
```
python3 train_net.py --num-gpus NUM_GPUS \
--config-file configs/coco/panoptic_segmentation/kmax_r50.yaml \
--eval-only MODEL.WEIGHTS kmax_r50.pth
```
Integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the Web Demo: [](https://huggingface.co/spaces/fun-research/kMaX-DeepLab)
## Porting TensorFlow Weights
We also provide a [script](./convert-tf-weights-to-d2.py) to convert the official TensorFlow weights into PyTorch format and use them in this code-base.
Example for porting and evaluating kMaX with ConvNeXt-Large on Cityscapes from [TensorFlow weights](https://github.com/google-research/deeplab2/blob/main/g3doc/projects/kmax_deeplab.md):
```
pip3 install tensorflow==2.9 keras==2.9
wget https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/kmax_convnext_large_res1281_ade20k_train.tar.gz
tar -xvf kmax_convnext_large_res1281_ade20k_train.tar.gz
python3 convert-tf-weights-to-d2.py ./kmax_convnext_large_res1281_ade20k_train/ckpt-100000 kmax_convnext_large_res1281_ade20k_train.pkl
python3 train_net.py --num-gpus 8 --config-file configs/ade20k/kmax_convnext_large.yaml \
--eval-only MODEL.WEIGHTS ./kmax_convnext_large_res1281_ade20k_train.pkl
```
This expexts to give PQ = 50.6620. Note that minor performance difference may exist due to numeric difference across different deep learning frameworks and implementation details.
## Citing kMaX-DeepLab
If you find this code helpful in your research or wish to refer to the baseline
results, please use the following BibTeX entry.
* kMaX-DeepLab:
```
@inproceedings{kmax_deeplab_2022,
author={Qihang Yu and Huiyu Wang and Siyuan Qiao and Maxwell Collins and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen},
title={{k-means Mask Transformer}},
booktitle={ECCV},
year={2022}
}
```
* CMT-DeepLab:
```
@inproceedings{cmt_deeplab_2022,
author={Qihang Yu and Huiyu Wang and Dahun Kim and Siyuan Qiao and Maxwell Collins and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen},
title={CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation},
booktitle={CVPR},
year={2022}
}
```
## Acknowledgements
We express gratitude to the following open-source projects which this code-base is based on:
[DeepLab2](https://github.com/google-research/deeplab2)
[Mask2Former](https://github.com/facebookresearch/Mask2Former)