# ml-streambridge **Repository Path**: mirrors_apple/ml-streambridge ## Basic Information - **Project Name**: ml-streambridge - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-15 - **Last Updated**: 2026-03-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

[NeurIPS 2025] StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

[![arXiv](https://img.shields.io/badge/Arxiv-2505.05467-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2505.05467)

🌟 **StreamBridge** is a simple yet powerful framework that enables offline Video-LLMs to perform effectively in streaming scenarios. It features: - A **memory buffer** with round-decayed compression for long-context, multi-turn interactions. - A **decoupled and lightweight activation model** that enables proactive, timely responses without affecting the base model’s reasoning capabilities. - A newly built dataset, **Stream-IT**, tailored for streaming video understanding with interleaved video-text sequences and diverse instructions. > [!IMPORTANT] > For copyright reasons, we can’t release model weights trained on YouTube or other videos that may contain IP-protected content. However, we’re open-sourcing the model implementation and the synthetic data used for training. --- ## πŸ› οΈ Install 1. Clone this repository and navigate to folder ```bash git clone https://github.com/apple/ml-streambridge cd ml-streambridge ``` 2. Install package ```bash conda create -n ml-streambridge python=3.10.14 conda activate ml-streambridge pip install -e . pip install flash-attn==2.3.3 --no-build-isolation ``` ## πŸš€ Demo for Quick Start 1. Download checkpoints: TBD due to video copyright reasons. - Organize as: ``` β”œβ”€β”€ /your/path/to/checkpoints β”‚ └── llava-onevision-qwen2-0.5b-ov-hf-seperated β”‚ └── activation_0.5_ratio_anet_coin_yc2_s2s_fa_mhego_hacs_cha_et_llava-ov_epoch_5.pth β”‚ └── LLaVA-OV-7B-du2e2hjxik β”‚ └── Oryx-1.5-7B-jfsvkb3hn8 β”‚ └── Qwen2-VL-7B-jh6p673iyp ``` 2. Run a demo - Update the `your_weight_path` in `demo.py` to match the weight directory above: ```bash python demo.py # activation threshold is set for the response frequency ``` - You should see output like: ``` 18 seconds: Pour the cooked noodles. 32 seconds: Cut the lemon. 44 seconds: Cut the olives in half. 55 seconds: Chop the parsley. 68 seconds: Squeeze the lemon juice into the measuring cup. 78 seconds: Pound the chicken. ... ``` ## πŸ’‘ Evaluation on OVO-Bench (multi-turn streaming) and VideoMME (single-turn offline) 1. You can download the raw videos for OVO-Bench from [[πŸ€—HF](https://huggingface.co/datasets/JoeLeelyf/OVO-Bench)] and VideoMME from [[πŸ€—HF](https://huggingface.co/datasets/lmms-lab/Video-MME)]. And reorganize the folder as follows: ``` β”œβ”€β”€ /your/path/to/ovo_bench β”‚ └── videos β”‚ └── ovo_bench.json β”‚ └── ... β”œβ”€β”€ /your/path/to/videomme β”‚ └── videos β”‚ └── videomme.json β”‚ └── ... ``` - Here, we provide the OVO-Bench's `ovo_bench.json` and VideoMME's `videomme.json` in `./assets`. 2. Run evaluation script - Set `ANNO_PATH` and `VIDEO_PATH` in `scripts/eval.sh` for the OVO-Bench and VideoMME you download above, and then run: ```bash bash scripts/eval.sh ``` - Evaluate different models by modifiying `MODEL` and `CKPT` in the script. - By default, 8 A100-80G GPUs are used; you can adjust `NUM_GPUS` and `MAX_IMG_TOKEN` to reduce memory usage. 3. Report the results ```bash python eval/metric_report.py ``` - And you should reproduce the results below (see our paper for more details): | Model Name | OVO-Bench-Real-Time (OCR/ACR/ATR/STU/FPD/OJR/AVG.) | VideoMME (w/o subs) | |:---------------------------:|:-------------------------------------------:|:------:| | Qwen2-VL-StreamBridge | 85.24/67.89/75.00/52.25/70.30/72.28/70.49 | 63.0 | | Oryx-1.5-StreamBridge | 81.21/70.64/70.69/49.44/74.26/68.48/69.12 | 64.2 | | LLaVA-OV-StreamBridge | 74.50/78.90/72.41/52.81/78.22/68.68/70.89 | 61.0 | ## 🎬 StreamingQA-120K Dataset - The raw 1.28 million videos of StreamingQA-120K are sourced from [[πŸ€—WebVid](https://huggingface.co/datasets/WHB139426/Grounded-VideoLLM/tree/main/webvid-703k)], [[πŸ€—InternVid](https://huggingface.co/datasets/WHB139426/Grounded-VideoLLM/tree/main/internvid)] and [[πŸ€—Panda](https://huggingface.co/datasets/WHB139426/Grounded-VideoLLM/tree/main/panda70m_2m)]. You can also download them from their official repos [[WebVid-10M](https://huggingface.co/datasets/TempoFunk/webvid-10M)] [[InternVid-10M](https://huggingface.co/datasets/OpenGVLab/InternVid)] [[Panda-70M](https://github.com/snap-research/Panda-70M)] - We concatenate videos with higher similarites from these three datasets and annotate QA pairs for them. We provide the [similarity-ordered json file](https://ml-site.cdn-apple.com/datasets/streambridge/qa_groups.json). You can dynamically control the grouping size via `GROUP_LEN`: ```python import json def load_json(path): with open(path) as f: data = json.load(f) return data GROUP_LEN = 10 anns = load_json("/your/path/to/qa_groups.json") groups = [i for i in range(len(anns))] groups = [groups[i : i + GROUP_LEN] for i in range(0, len(groups), GROUP_LEN)] grouped_anns = [] for group in groups: if len(group) != GROUP_LEN: continue grouped_anns.append( { "video_ids": [anns[i]["video_id"] for i in group], "video_files": [anns[i]["video_file"] for i in group], "captions": [anns[i]["caption"] for i in group], "questions": [anns[i]["question"] for i in group], "answers": [anns[i]["answer"] for i in group], "options": [anns[i]["options"] for i in group], "types": [anns[i]["type"] for i in group], } ) print(grouped_anns[0]) ``` ## πŸ“œ License This software and accompanying data and models have been released under the following licenses: - Code: [Apple Sample Code License (ASCL)](./LICENSE) - Data: [CC-BY-NC-ND](./LICENSE_DATA) [Deed](https://creativecommons.org/licenses/by-nc-nd/4.0/) ## ✏️ Citation If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:. ```BibTeX @article{wang2025streambridge, title={StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant}, author={Wang, Haibo and Feng, Bo and Lai, Zhengfeng and Xu, Mingze and Li, Shiyu and Ge, Weifeng and Dehghan, Afshin and Cao, Meng and Huang, Ping}, journal={arXiv preprint arXiv:2505.05467}, year={2025} } ```