# MMAD **Repository Path**: chen-liangwei/MMAD ## Basic Information - **Project Name**: MMAD - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-05-06 - **Last Updated**: 2025-05-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # [ICLR 2025] MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection [](https://github.com/M-3LAB/awesome-industrial-anomaly-detection) [](https://arxiv.org/abs/2410.09453) [](https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#sundar-note) [](https://openai.com/index/hello-gpt-4o/) [](https://arxiv.org/abs/2410.09453) [](https://huggingface.co/datasets/jiang-cc/MMAD) [](https://zhuanlan.zhihu.com/p/23437607183) [//]: # (## 💡 Highlights) Our benchmark responds to the following questions: - How well are current MLLMs performing as industrial quality inspectors? - Which MLLM performs the best in industrial anomaly detection? - What are the key challenges in industrial anomaly detection for MLLMs? ## 📜 News - **[2025-01-28]** MMAD paper is accepted by [ICLR 2025](https://openreview.net/forum?id=JDiER86r8v). - **[2025-01-08]** We released a human baseline for MMAD along with some further analysis. For more details, please refer to the latest [paper](https://arxiv.org/abs/2410.09453). - **[2024-12-16]** We have released textual domain knowledge for anomaly detection, which can be used for image-text research in MMAD dataset. - **[2024-10-30]** The full dataset is released at [Hugging Face](https://huggingface.co/datasets/jiang-cc/MMAD), including images and captions. You can download the whole dataset easier now! - **[2024-10-16]** MMAD paper is released at [arXiv](https://arxiv.org/abs/2410.09453). - **[2024-10-08]** MMAD dataset and evaluation code are released. ## 👀 Overview In the field of industrial inspection, Multimodal Large Language Models (MLLMs) have a high potential to renew the paradigms in practical applications due to their robust language capabilities and generalization abilities. However, despite their impressive problem-solving skills in many domains, MLLMs' ability in industrial anomaly detection has not been systematically studied. To bridge this gap, we present MMAD, the first-ever full-spectrum MLLMs benchmark in industrial Anomaly Detection. We defined seven key subtasks of MLLMs in industrial inspection and designed a novel pipeline to generate the MMAD dataset with 39,672 questions for 8,366 industrial images. With MMAD, we have conducted a comprehensive, quantitative evaluation of various state-of-the-art MLLMs.