# FastVLM-1.5B-int8
**Repository Path**: hf-models/FastVLM-1.5B-int8
## Basic Information
- **Project Name**: FastVLM-1.5B-int8
- **Description**: Mirror of https://huggingface.co/apple/FastVLM-1.5B-int8
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-08-30
- **Last Updated**: 2025-08-30
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
---
license: apple-amlr
license_name: apple-ascl
license_link: https://github.com/apple/ml-fastvlm/blob/main/LICENSE_MODEL
library_name: ml-fastvlm
---
# FastVLM: Efficient Vision Encoding for Vision Language Models
FastVLM was introduced in
**[FastVLM: Efficient Vision Encoding for Vision Language Models](https://www.arxiv.org/abs/2412.13303). (CVPR 2025)**
[//]: # ()
### Highlights
* We introduce FastViTHD, a novel hybrid vision encoder designed to output fewer tokens and significantly reduce encoding time for high-resolution images.
* Our smallest variant outperforms LLaVA-OneVision-0.5B with 85x faster Time-to-First-Token (TTFT) and 3.4x smaller vision encoder.
* Our larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT.
### Evaluations
| Benchmark | FastVLM-0.5B | FastVLM-1.5B | FastVLM-7B |
|:--------------|:------------:|:------------:|:----------:|
| Ai2D | 68.0 | 77.4 | 83.6 |
| ScienceQA | 85.2 | 94.4 | 96.7 |
| MMMU | 33.9 | 37.8 | 45.4 |
| VQAv2 | 76.3 | 79.1 | 80.8 |
| ChartQA | 76.0 | 80.1 | 85.0 |
| TextVQA | 64.5 | 70.4 | 74.9 |
| InfoVQA | 46.4 | 59.7 | 75.8 |
| DocVQA | 82.5 | 88.3 | 93.2 |
| OCRBench | 63.9 | 70.2 | 73.1 |
| RealWorldQA | 56.1 | 61.2 | 67.2 |
| SeedBench-Img | 71.0 | 74.2 | 75.4 |
### Usage Example
The model has been exported to run with MLX. Follow the instructions in the official repository to use it in an iOS or macOS app.
## Citation
If you found this model useful, please cite the following paper:
```
@InProceedings{fastvlm2025,
author = {Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari},
title = {FastVLM: Efficient Vision Encoding for Vision Language Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025},
}
```