# UniVAD **Repository Path**: monkeycc/UniVAD ## Basic Information - **Project Name**: UniVAD - **Description**: https://github.com/FantasticGNU/UniVAD - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-03-11 - **Last Updated**: 2025-07-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection Official implementation of paper [UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection](https://arxiv.org/abs/2412.03342) (CVPR 2025). ## Introduction Welcome to the official repository for "UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection." This work presents UniVAD, a novel approach that can detect anomalies across various domains, including industrial, logical, and medical fields, using a unified model without requiring domain-specific training. UniVAD operates by leveraging a few normal samples as references during testing to detect anomalies in previously unseen objects. It consists of three key components: - Contextual Component Clustering ($C^3$): Segments components within images accurately by combining clustering techniques with vision foundation models. - Component-Aware Patch Matching (CAPM): Detects structural anomalies by matching patch-level features within each component. - Graph-Enhanced Component Modeling (GECM): Identifies logical anomalies by modeling relationships between image components through graph-based feature aggregation. Our experiments on nine datasets spanning industrial, logical, and medical domains demonstrate that UniVAD achieves state-of-the-art performance in few-shot anomaly detection tasks, outperforming domain-specific models and establishing a new paradigm for unified visual anomaly detection. ![](figures/intro.jpg) ## Overview of UniVAD ![](figures/arch.jpg) ## Runing UniVAD ### Environment Installation Clone the repository locally: ``` git clone --recurse-submodules https://github.com/FantasticGNU/UniVAD.git ``` Install the required packages: ``` pip install -r requirements.txt ``` Install the GroundingDINO ``` cd models/GroundingDINO; pip install -e . ``` ### Prepare pretrained checkpoints ``` cd pretrained_ckpts; wget https://huggingface.co/lkeab/hq-sam/resolve/main/sam_hq_vit_h.pth; wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth ``` ### Prepare data #### MVTec AD - Download and extract [MVTec AD](https://www.mvtec.com/company/research/datasets/mvtec-ad) into `data/mvtec` - run `python data/mvtec_solver.py` to obtain `data/mvtec/meta.json` #### VisA - Download and extract [VisA](https://amazon-visual-anomaly.s3.us-west-2.amazonaws.com/VisA_20220922.tar) - Refer to the instructions in [https://github.com/amazon-science/spot-diff?tab=readme-ov-file#data-preparation](https://github.com/amazon-science/spot-diff?tab=readme-ov-file#data-preparation) to get the 1-class format of the dataset and put it into `data/VisA_pytorch/1cls` - run `python data/visa_solver.py` to obtain `data/VisA_pytorch/1cls/meta.json` #### MvTec LOCO AD - We use the improved MVTec LOCO Caption dataset here, which merges multiple ground truth masks in the original MVTec LOCO data into one. Please refer to [https://github.com/hujiecpp/MVTec-Caption](https://github.com/hujiecpp/MVTec-Caption) to obtain MVTec LOCO Caption dataset - run `python data/mvtec_loco_solver.py` to obtain `data/mvtec_loco_caption/meta.json` #### Medical datasets - The data in the medical dataset we used are mainly obtained from [BMAD](https://github.com/DorisBao/BMAD), and we organized it according to the MVTec format - Download from [This OneDrive Link](https://1drv.ms/u/s!AopsN_HMhJeckoJT-3yF_pwQMSn9OA?e=nRW1wA) and put them into `data/` #### Data format The prepared data format should be as follows ``` data ├── mvtec ├── meta.json ├── bottle ├── cable ├── ... ├── VisA_pytorch/1cls ├── meta.json ├── candle ├── capsules ├── ... ├── mvtec_loco_caption ├── meta.json ├── breakfast_box ├── juice_bottle ├── ... ├── BrainMRI ├── meta.json ├── train ├── test ├── ground_truth ├── LiverCT ├── meta.json ├── train ├── test ├── ground_truth ├── RESC ├── meta.json ├── train ├── test ├── ground_truth ├── HIS ├── meta.json ├── train ├── test ├── ChestXray ├── meta.json ├── train ├── test ├── OCT17 ├── meta.json ├── train ├── test ``` ### Component Segmentation Perform contextual component clustering for all data in advance to facilitate subsequent processing ``` python segment_components.py ``` ### Run the test script ``` bash test.sh ``` ## Citation: If you found UniVAD useful in your research or applications, please kindly cite using the following BibTeX: ``` @article{gu2024univad, title={UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection}, author={Gu, Zhaopeng and Zhu, Bingke and Zhu, Guibo and Chen, Yingying and Tang, Ming and Wang, Jinqiao}, journal={arXiv preprint arXiv:2412.03342}, year={2024} } ```