# ml-streambridge
**Repository Path**: mirrors_apple/ml-streambridge
## Basic Information
- **Project Name**: ml-streambridge
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-11-15
- **Last Updated**: 2026-03-21
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
[](https://arxiv.org/abs/2505.05467)
π **StreamBridge** is a simple yet powerful framework that enables offline Video-LLMs to perform effectively in streaming scenarios. It features:
- A **memory buffer** with round-decayed compression for long-context, multi-turn interactions.
- A **decoupled and lightweight activation model** that enables proactive, timely responses without affecting the base modelβs reasoning capabilities.
- A newly built dataset, **Stream-IT**, tailored for streaming video understanding with interleaved video-text sequences and diverse instructions.
> [!IMPORTANT]
> For copyright reasons, we canβt release model weights trained on YouTube or other videos that may contain IP-protected content. However, weβre open-sourcing the model implementation and the synthetic data used for training.
---
## π οΈ Install
1. Clone this repository and navigate to folder
```bash
git clone https://github.com/apple/ml-streambridge
cd ml-streambridge
```
2. Install package
```bash
conda create -n ml-streambridge python=3.10.14
conda activate ml-streambridge
pip install -e .
pip install flash-attn==2.3.3 --no-build-isolation
```
## π Demo for Quick Start
1. Download checkpoints:
TBD due to video copyright reasons.
- Organize as:
```
βββ /your/path/to/checkpoints
β βββ llava-onevision-qwen2-0.5b-ov-hf-seperated
β βββ activation_0.5_ratio_anet_coin_yc2_s2s_fa_mhego_hacs_cha_et_llava-ov_epoch_5.pth
β βββ LLaVA-OV-7B-du2e2hjxik
β βββ Oryx-1.5-7B-jfsvkb3hn8
β βββ Qwen2-VL-7B-jh6p673iyp
```
2. Run a demo
- Update the `your_weight_path` in `demo.py` to match the weight directory above:
```bash
python demo.py # activation threshold is set for the response frequency
```
- You should see output like:
```
18 seconds: Pour the cooked noodles.
32 seconds: Cut the lemon.
44 seconds: Cut the olives in half.
55 seconds: Chop the parsley.
68 seconds: Squeeze the lemon juice into the measuring cup.
78 seconds: Pound the chicken.
...
```
## π‘ Evaluation on OVO-Bench (multi-turn streaming) and VideoMME (single-turn offline)
1. You can download the raw videos for OVO-Bench from [[π€HF](https://huggingface.co/datasets/JoeLeelyf/OVO-Bench)] and VideoMME from [[π€HF](https://huggingface.co/datasets/lmms-lab/Video-MME)]. And reorganize the folder as follows:
```
βββ /your/path/to/ovo_bench
β βββ videos
β βββ ovo_bench.json
β βββ ...
βββ /your/path/to/videomme
β βββ videos
β βββ videomme.json
β βββ ...
```
- Here, we provide the OVO-Bench's `ovo_bench.json` and VideoMME's `videomme.json` in `./assets`.
2. Run evaluation script
- Set `ANNO_PATH` and `VIDEO_PATH` in `scripts/eval.sh` for the OVO-Bench and VideoMME you download above, and then run:
```bash
bash scripts/eval.sh
```
- Evaluate different models by modifiying `MODEL` and `CKPT` in the script.
- By default, 8 A100-80G GPUs are used; you can adjust `NUM_GPUS` and `MAX_IMG_TOKEN` to reduce memory usage.
3. Report the results
```bash
python eval/metric_report.py
```
- And you should reproduce the results below (see our paper for more details):
| Model Name | OVO-Bench-Real-Time (OCR/ACR/ATR/STU/FPD/OJR/AVG.) | VideoMME (w/o subs) |
|:---------------------------:|:-------------------------------------------:|:------:|
| Qwen2-VL-StreamBridge | 85.24/67.89/75.00/52.25/70.30/72.28/70.49 | 63.0 |
| Oryx-1.5-StreamBridge | 81.21/70.64/70.69/49.44/74.26/68.48/69.12 | 64.2 |
| LLaVA-OV-StreamBridge | 74.50/78.90/72.41/52.81/78.22/68.68/70.89 | 61.0 |
## π¬ StreamingQA-120K Dataset
- The raw 1.28 million videos of StreamingQA-120K are sourced from [[π€WebVid](https://huggingface.co/datasets/WHB139426/Grounded-VideoLLM/tree/main/webvid-703k)], [[π€InternVid](https://huggingface.co/datasets/WHB139426/Grounded-VideoLLM/tree/main/internvid)] and [[π€Panda](https://huggingface.co/datasets/WHB139426/Grounded-VideoLLM/tree/main/panda70m_2m)]. You can also download them from their official repos [[WebVid-10M](https://huggingface.co/datasets/TempoFunk/webvid-10M)] [[InternVid-10M](https://huggingface.co/datasets/OpenGVLab/InternVid)] [[Panda-70M](https://github.com/snap-research/Panda-70M)]
- We concatenate videos with higher similarites from these three datasets and annotate QA pairs for them. We provide the [similarity-ordered json file](https://ml-site.cdn-apple.com/datasets/streambridge/qa_groups.json). You can dynamically control the grouping size via `GROUP_LEN`:
```python
import json
def load_json(path):
with open(path) as f:
data = json.load(f)
return data
GROUP_LEN = 10
anns = load_json("/your/path/to/qa_groups.json")
groups = [i for i in range(len(anns))]
groups = [groups[i : i + GROUP_LEN] for i in range(0, len(groups), GROUP_LEN)]
grouped_anns = []
for group in groups:
if len(group) != GROUP_LEN:
continue
grouped_anns.append(
{
"video_ids": [anns[i]["video_id"] for i in group],
"video_files": [anns[i]["video_file"] for i in group],
"captions": [anns[i]["caption"] for i in group],
"questions": [anns[i]["question"] for i in group],
"answers": [anns[i]["answer"] for i in group],
"options": [anns[i]["options"] for i in group],
"types": [anns[i]["type"] for i in group],
}
)
print(grouped_anns[0])
```
## π License
This software and accompanying data and models have been released under the
following licenses:
- Code: [Apple Sample Code License (ASCL)](./LICENSE)
- Data: [CC-BY-NC-ND](./LICENSE_DATA) [Deed](https://creativecommons.org/licenses/by-nc-nd/4.0/)
## βοΈ Citation
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:.
```BibTeX
@article{wang2025streambridge,
title={StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant},
author={Wang, Haibo and Feng, Bo and Lai, Zhengfeng and Xu, Mingze and Li, Shiyu and Ge, Weifeng and Dehghan, Afshin and Cao, Meng and Huang, Ping},
journal={arXiv preprint arXiv:2505.05467},
year={2025}
}
```