# OV-DINO **Repository Path**: wang-tf/OV-DINO ## Basic Information - **Project Name**: OV-DINO - **Description**: https://github.com/wanghao9610/OV-DINO - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-16 - **Last Updated**: 2025-07-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
## :sparkles: Model Zoo
| Model | Pre-Train Data | APmv | APr | APc | APf | APval | APr | APc | APf | APcoco | Weights |
| -------- | --------------- | ---- | ---- | ---- | ---- | ----- | ---- | ---- | ---- | --------- | ------- |
| OV-DINO1 | O365 | 24.4 | 15.5 | 20.3 | 29.7 | 18.7 | 9.3 | 14.5 | 27.4 | 49.5 / 57.5 | [CKPT](https://huggingface.co/hao9610/OV-DINO/resolve/main/ovdino_swint_o-coco49.5_lvismv24.4_lvis18.7.pth) / [LOG](https://huggingface.co/hao9610/OV-DINO/blob/main/ovdino_swint_o-coco49.5_lvismv24.4_lvis18.7.log) 🤗 |
| OV-DINO2 | O365,GoldG | 39.4 | 32.0 | 38.7 | 41.3 | 32.2 | 26.2 | 30.1 | 37.3 | 50.6 / 58.4 | [CKPT](https://huggingface.co/hao9610/OV-DINO/resolve/main/ovdino_swint_og-coco50.6_lvismv39.4_lvis32.2.pth) 🤗 |
| OV-DINO3 | O365,GoldG,CC1M‡ | 40.1 | 34.5 | 39.5 | 41.5 | 32.9 | 29.1 | 30.4 | 37.4 | 50.2 / 58.2 | [CKPT](https://huggingface.co/hao9610/OV-DINO/resolve/main/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth) 🤗 |
**NOTE**: *APmv denotes the zero-shot evaluation results on LVIS MiniVal, APval denotes the zero-shot evaluation results on LVIS Val, APcoco denotes (zero-shot / fine-tune) evaluation results on COCO, respectively.*
## :checkered_flag: Getting Started
### 1. Project Structure
```
OV-DINO
├── datas
│  ├── o365
│ │ ├── annotations
│ │ ├── train
│ │ ├── val
│ │ └── test
│  ├── coco
│ │ ├── annotations
│ │ ├── train2017
│ │ └── val2017
│ ├── lvis
│ │ ├── annotations
│ │ ├── train2017
│ │ └── val2017
│ └── custom
│ ├── annotations
│ ├── train
│ └── val
├── docs
├── inits
│  ├── huggingface
│  ├── ovdino
│  ├── sam2
│  └── swin
├── ovdino
│  ├── configs
│  ├── demo
│  ├── detectron2-717ab9
│  ├── detrex
│  ├── projects
│  ├── scripts
│  └── tools
├── wkdrs
│ ├── ...
│
```
### 2. Installation
```bash
# clone this project
git clone https://github.com/wanghao9610/OV-DINO.git
cd OV-DINO
export root_dir=$(realpath ./)
cd $root_dir/ovdino
# Optional: set CUDA_HOME for cuda11.6.
# OV-DINO utilizes the cuda11.6 default, if your cuda is not cuda11.6, you need first export CUDA_HOME env manually.
export CUDA_HOME="your_cuda11.6_path"
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
echo -e "$log_format cuda version:\n$(nvcc -V)"
# create conda env for ov-dino
conda create -n ovdino -y
conda activate ovdino
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia -y
conda install gcc=9 gxx=9 -c conda-forge -y # Optional: install gcc9
python -m pip install -e detectron2-717ab9
pip install -e ./
# Optional: create conda env for ov-sam, it may not compatible with ov-dino, so we create a new env.
# ov-sam = ov-dino + sam2
conda create -n ovsam -y
conda activate ovsam
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia -y
# install the sam2 following the sam2 project.
# please refer to https://github.com/facebookresearch/segment-anything-2.git
# download sam2 checkpoints and put them to inits/sam2
python -m pip install -e detectron2-717ab9
pip install -e ./
```
### 2. Data Preparing
#### COCO
* Download [COCO](https://cocodataset.org/#download) from the official website, and put them on datas/coco folder.
```bash
cd $root_dir
mkdir -p datas/coco
wget http://images.cocodataset.org/zips/train2017.zip -O datas/coco/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip -O datas/coco/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip -O datas/coco/annotations_trainval2017.zip
```
* Extract the ziped files, and remove them:
```bash
cd $root_dir
unzip datas/coco/train2017.zip -d datas/coco
unzip datas/coco/val2017.zip -d datas/coco
unzip datas/coco/annotations_trainval2017.zip -d datas/coco
rm datas/coco/train2017.zip datas/coco/val2017.zip datas/coco/annotations_trainval2017.zip
```
#### LVIS
* Download LVIS annotation files:
```bash
cd $root_dir
mkdir -p datas/lvis
wget https://huggingface.co/hao9610/OV-DINO/resolve/main/lvis_v1_minival_inserted_image_name.json -O datas/lvis/annotations/lvis_v1_minival_inserted_image_name.json
wget https://huggingface.co/hao9610/OV-DINO/resolve/main/lvis_v1_val_inserted_image_name.json -O datas/lvis/annotations/lvis_v1_val_inserted_image_name.json
```
* Soft-link COCO to LVIS:
```bash
cd $root_dir
ln -s $(realpath datas/coco/train2017) datas/lvis
ln -s $(realpath datas/coco/val2017) datas/lvis
```
#### Objects365
* Refer to the [OpenDataLab](https://opendatalab.com/OpenDataLab/Objects365_v1/cli/main) for Objects365V1 download, which has provided detailed download instruction.
```bash
cd $root_dir
mkdir -p datas/o365/annotations
# Suppose you download the Objects365 raw file and put them on datas/o365/raw, extract the tared files and reorder them.
cd datas/o365/raw
tar -xvf Objects365_v1.tar.gz
cd 2019-08-02
for file in *.zip; do unzip -o "$file"; done
mv *.json $root_dir/datas/o365/annotations
mv train val test $root_dir/datas/o365
```
### 3. Evaluation
Download the pre-trained model from [Model Zoo](#model-zoo), and put them on inits/ovdino directory.
```bash
cd $root_dir/ovdino
bash scripts/eval.sh path_to_eval_config_file path_to_pretrained_model output_directory
```
#### Zero-Shot Evaluation on COCO Benchmark
```bash
cd $root_dir/ovdino
# Evaluation mean ap on COCO dataset.
bash scripts/eval.sh \
projects/ovdino/configs/ovdino_swin_tiny224_bert_base_eval_coco.py \
../inits/ovdino/ovdino_swint_og-coco50.6_lvismv39.4_lvis32.2.pth \
../wkdrs/eval_ovdino
```
#### Zero-Shot Evaluation on LVIS Benchmark
```bash
cd $root_dir/ovdino
# Evaluation of fixed_ap on LVIS MiniVal dataset.
bash scripts/eval.sh \
projects/ovdino/configs/ovdino_swin_tiny224_bert_base_eval_lvismv.py \
../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth \
../wkdrs/eval_ovdino
# Evaluation of fixed_ap on the LVIS Val dataset.
# It will require about 250GB of memory due to the large number of samples in the LVIS Val dataset, so please ensure that your machine has enough memory.
bash scripts/eval.sh \
projects/ovdino/configs/ovdino_swin_tiny224_bert_base_eval_lvis.py \
../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth \
../wkdrs/eval_ovdino
```
### 4. Fine-Tuning
#### Fine-Tuning on COCO Dataset
```bash
cd $root_dir/ovdino
bash scripts/finetune.sh \
projects/ovdino/configs/ovdino_swin_tiny224_bert_base_ft_coco_24ep.py \
../inits/ovdino/ovdino_swint_og-coco50.6_lvismv39.4_lvis32.2.pth
```
#### Fine-Tuning on Custom Dataset
* Prepare your custom dataset as the COCO annotation format, following the instructions on [custom_ovd.py](ovdino/configs/common/data/custom_ovd.py#L17).
* Refer the following command to run fine-tuning.
```bash
cd $root_dir/ovdino
bash scripts/finetune.sh \
projects/ovdino/configs/ovdino_swin_tiny224_bert_base_ft_custom_24ep.py \
../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth
```
### 5. Pre-Training
#### Pre-Training on Objects365 dataset
* Download dataset following [Objects365 Data Preparing](#objects365).
* Refer the following command to run pre-training.
On the first machine:
```bash
cd $root_dir/ovdino
# Replace $MASTER_PORT and $MASTER_ADDR with your actual machine settings.
NNODES=2 NODE_RANK=0 MASTER_PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR \
bash scripts/pretrain.sh \
projects/ovdino/configs/ovdino_swin_tiny224_bert_base_pretrain_o365_24ep.py
```
On the second machine:
```bash
cd $root_dir/ovdino
# Replace $MASTER_PORT and $MASTER_ADDR with your actual machine settings.
NNODES=2 NODE_RANK=1 MASTER_PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR \
bash scripts/pretrain.sh \
projects/ovdino/configs/ovdino_swin_tiny224_bert_base_pretrain_o365_24ep.py
```
**NOTE**: *The default batch size for O365 pre-training is 64 in our experiments, and running on 2 nodes with 8 A100 GPUs per-node. If you encounter Out-of-Memory error, you can adjust the batch size and learning rate, total steps by linearly.*
#### Pre-Training on [Objects365, GoldG] datasets
Coming soon ...
#### Pre-Training on [Objects365, GoldG, CC1M‡] datasets
Coming soon ...
**NOTE**: *We will release the all pre-training code after our paper is accepted.*
## :computer: Demo
* Local inference on a image or folder give the category names.
```bash
# for ovdino: conda activate ovdino
# for ovsam: conda activate ovsam
cd $root_dir/ovdino
bash scripts/demo.sh demo_config.py pretrained_model category_names input_images_or_directory output_directory
```
Examples:
```bash
cd $root_dir/ovdino
# single image inference
bash scripts/demo.sh \
projects/ovdino/configs/ovdino_swin_tiny224_bert_base_infer_demo.py \
../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth \
"class0 class1 class2 ..." img0.jpg output_dir/img0_vis.jpg
# multi images inference
bash scripts/demo.sh \
projects/ovdino/configs/ovdino_swin_tiny224_bert_base_infer_demo.py \
../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth \
"class0 long_class1 long_class2 ..." "img0.jpg img1.jpg" output_dir
# image folder inference
bash scripts/demo.sh \
projects/ovdino/configs/ovdino_swin_tiny224_bert_base_infer_demo.py \
../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth \
"class0 long_class1 long_class2 ..." image_dir output_dir
```
**NOTE**: *The input category_names are separated by spaces, and the words of single class are connected by underline (_).*
* Web inference demo.
```bash
cd $root_dir/ovdino
bash scripts/app.sh \
projects/ovdino/configs/ovdino_swin_tiny224_bert_base_infer_demo.py \
../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth
```
After the web demo deployment, you can open the [demo](http://127.0.0.1:7860) on your browser.
**We also provide the [online demo](http://47.115.200.157:7860), click and enjoy.**
## :white_check_mark: TODO
- [x] Release the pre-trained model.
- [x] Release the fine-tuning and evaluation code.
- [x] Support the local inference demo.
- [x] Support the web inference demo.
- [x] Release the pre-training code on O365 dataset.
- [ ] Support ONNX exporting OV-DINO model.
- [ ] Support OV-DINO in 🤗 transformers.
- [ ] Release the all pre-training code.
## :blush: Acknowledge
This project has referenced some excellent open-sourced repos ([Detectron2](https://github.com/facebookresearch/detectron2), [detrex](https://github.com/IDEA-Research/detrex), [GLIP](https://github.com/microsoft/GLIP), [G-DINO](https://github.com/IDEA-Research/GroundingDINO), [YOLO-World](https://github.com/AILab-CVC/YOLO-World)). Thanks for their wonderful works and contributions to the community.
## :pushpin: Citation
If you find OV-DINO is helpful for your research or applications, please consider giving us a star 🌟 and citing it by the following BibTex entry.
```bibtex
@article{wang2024ovdino,
title={OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion},
author={Wang, Hao and Ren, Pengzhen and Jie, Zequn and Dong, Xiao and Feng, Chengjian and Qian, Yinlong and Ma, Lin and Jiang, Dongmei and Wang, Yaowei and Lan, Xiangyuan and others},
journal={arXiv preprint arXiv:2407.07844},
year={2024}
}
```