# DINO
**Repository Path**: xyz-dev-max/DINO
## Basic Information
- **Project Name**: DINO
- **Description**: https://github.com/IDEA-Research/DINO
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 1
- **Created**: 2023-11-06
- **Last Updated**: 2023-11-06
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# DINO
[](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=dino-detr-with-improved-denoising-anchor-1)
[](https://paperswithcode.com/sota/object-detection-on-coco?p=dino-detr-with-improved-denoising-anchor-1)
This is the official implementation of the paper "[DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection](https://arxiv.org/abs/2203.03605)".
(DINO pronounced `daɪnoʊ' as in dinosaur)
Authors: [Hao Zhang](https://scholar.google.com/citations?user=B8hPxMQAAAAJ&hl=zh-CN)\*, [Feng Li](https://fengli-ust.github.io/)\*, [Shilong Liu](https://www.lsl.zone/)\*, [Lei Zhang](https://www.leizhang.org/), [Hang Su](https://www.suhangss.me/), [Jun Zhu](https://ml.cs.tsinghua.edu.cn/~jun/index.shtml), [Lionel M. Ni](https://www.cse.ust.hk/~ni/), [Heung-Yeung Shum](https://scholar.google.com.hk/citations?user=9akH-n8AAAAJ&hl=en)
# News
[2023/7/10] We release [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM), a universal image segmentation model to enable segment and recognize anything at any desired granularity. **Code** and **checkpoint** are available!
[2023/4/28]: We release a strong open-set object detection and segmentation model [OpenSeeD](https://arxiv.org/pdf/2303.08131.pdf) that achieves the best results on open-set object segmentation tasks. Code and checkpoints are available [here](https://github.com/IDEA-Research/OpenSeeD).
[2023/4/26]: DINO is shining again! We release [Stable-DINO](https://github.com/IDEA-Research/Stable-DINO) which is built upon DINO and [FocalNet-Huge](https://github.com/microsoft/FocalNet) backbone that achieves `64.8 AP` on COCO test-dev.
[2023/4/22]: With better hyper-params, our DINO-4scale model achieves `49.8 AP` under 12ep settings, please check [detrex: DINO](https://github.com/IDEA-Research/detrex/tree/main/projects/dino) for more details.
[2023/3/13]: We release a strong open-set object detection model [Grounding DINO](https://arxiv.org/abs/2303.05499) that achieves the best results on open-set object detection tasks. It achieves **52.5** **zero-shot** AP on COCO detection, **without any COCO training data!** It achieves **63.0** AP on COCO after fine-tuning. Code and checkpoints will be available [here](https://github.com/IDEA-Research/GroundingDINO).
[2023/1/23]: DINO has been accepted to ICLR 2023!
[2022/12/02]: Code for [Mask DINO](https://github.com/IDEA-Research/MaskDINO) is released (also in [detrex](https://github.com/IDEA-Research/detrex/tree/main/projects/maskdino))! Mask DINO further Achieves **51.7** and **59.0** box AP on COCO with a ResNet-50 and SwinL without extra detection data, **outperforming DINO** under the same setting!.
[2022/9/22]: We release a toolbox [**detrex**](https://github.com/IDEA-Research/detrex) that provides state-of-the-art Transformer-based detection algorithms. It includes DINO **with better performance**. Welcome to use it!
- Supports Now: [DETR](https://arxiv.org/abs/2005.12872), [Deformble DETR](https://arxiv.org/abs/2010.04159), [Conditional DETR](https://arxiv.org/abs/2108.06152), [DAB-DETR](https://arxiv.org/abs/2201.12329), [DN-DETR](https://arxiv.org/abs/2203.01305), [DINO](https://arxiv.org/abs/2203.03605).
[2022/9/18]: We organize **ECCV Workshop** [*Computer Vision in the Wild (CVinW)*](https://computer-vision-in-the-wild.github.io/eccv-2022/), where two challenges are hosted to evaluate the zero-shot, few-shot and full-shot performance of pre-trained vision models in downstream tasks:
- [``*Image Classification in the Wild (ICinW)*''](https://eval.ai/web/challenges/challenge-page/1832/overview) Challenge evaluates on 20 image classification tasks.
- [``*Object Detection in the Wild (ODinW)*''](https://eval.ai/web/challenges/challenge-page/1839/overview) Challenge evaluates on 35 object detection tasks.
[
[Workshop]](https://computer-vision-in-the-wild.github.io/eccv-2022/) [
[IC Challenge] ](https://eval.ai/web/challenges/challenge-page/1832/overview)
[
[OD Challenge] ](https://eval.ai/web/challenges/challenge-page/1839/overview)
[2022/8/6]: We update Swin-L model results without techniques such as O365 pre-training, large image size, and multi-scale test. We also upload the corresponding checkpoints to [Google Drive.](https://drive.google.com/drive/folders/1qD5m1NmK0kjE5hh-G17XUX751WsEG-h_?usp=sharing) Our 5-scale model without any tricks obtains 58.5 AP on COCO val.
[2022/7/14]: We release the code with Swin-L and Convnext backbone.
[2022/7/10]: We release the code and checkpoints with Resnet-50 backbone.
[2022/6/7]: We release a unified detection and segmentation model [Mask DINO](https://arxiv.org/pdf/2206.02777.pdf) that achieves the best results on all the three segmentation tasks (**54.7** AP on [COCO instance leaderboard](https://paperswithcode.com/sota/instance-segmentation-on-coco), **59.5** PQ on [COCO panoptic leaderboard](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-test-dev), and **60.8** mIoU on [ADE20K semantic leaderboard](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k))! Code will be available [here](https://github.com/IDEACVR/MaskDINO).
[2022/5/28] Code for [DN-DETR](https://arxiv.org/pdf/2203.01305.pdf) is available [here](https://github.com/IDEA-opensource/DN-DETR).
[2020/4/10]: Code for [DAB-DETR](https://arxiv.org/abs/2201.12329) is avaliable [here](https://github.com/SlongLiu/DAB-DETR).
[2022/3/8]: We reach the SOTA on [MS-COCO leader board](https://paperswithcode.com/sota/object-detection-on-coco) with **63.3AP**!
[2022/3/9]: We build a repo [Awesome Detection Transformer](https://github.com/IDEACVR/awesome-detection-transformer) to present papers about transformer for detection and segmenttion. Welcome to your attention!

# Introduction
We present **DINO** (**D**ETR with **I**mproved de**N**oising anch**O**r
boxes) with:
1. **State-of-the-art & end-to-end**: DINO achieves **63.2** AP on COCO Val and **63.3** AP on COCO test-dev with more than ten times smaller model size and data size than previous best models.
2. **Fast-converging**: With the ResNet-50 backbone, DINO with 5 scales achieves **49.4** AP in 12 epochs and **51.3** AP in 24 epochs. Our 4-scale model achieves similar performance and runs at 23 FPS.
# Methods

## Model Zoo
We have put our model checkpoints here [[model zoo in Google Drive]](https://drive.google.com/drive/folders/1qD5m1NmK0kjE5hh-G17XUX751WsEG-h_?usp=sharing)[[model zoo in 百度网盘]](https://pan.baidu.com/s/1St5rvfgfPwpnPuf_Oe6DpQ)(提取码"DINO"), where checkpoint{x}_{y}scale.pth denotes the checkpoint of y-scale model trained for x epochs.
### 12 epoch setting
name | backbone | box AP | Checkpoint | Where in Our Paper | |
---|---|---|---|---|---|
1 | DINO-4scale | R50 | 49.0 | Google Drive / BaiDu  | Table 1 |
2 | DINO-5scale | R50 | 49.4 | Google Drive / BaiDu | Table 1 |
3 | DINO-4scale | Swin-L | 56.8 | Google Drive  | |
4 | DINO-5scale | Swin-L | 57.3 | Google Drive  |
name | backbone | box AP | Checkpoint | Where in Our Paper | |
---|---|---|---|---|---|
1 | DINO-4scale | R50 | 50.4 | Google Drive / BaiDu  | Table 2 |
2 | DINO-5scale | R50 | 51.3 | Google Drive / BaiDu | Table 2 |
name | backbone | box AP | Checkpoint | Where in Our Paper | |
---|---|---|---|---|---|
1 | DINO-4scale | R50 | 50.9 | Google Drive / BaiDu  | Table 2 |
2 | DINO-5scale | R50 | 51.2 | Google Drive / BaiDu | Table 2 |
3 | DINO-4scale | Swin-L | 58.0 | Google Drive  | |
4 | DINO-5scale | Swin-L | 58.5 | Google Drive  |
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022.
[paper] [code] [中文解读]
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR.
Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang.
International Conference on Learning Representations (ICLR) 2022.
[paper] [code]