# CLIP-TSA **Repository Path**: worstprogrammer/CLIP-TSA ## Basic Information - **Project Name**: CLIP-TSA - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-01 - **Last Updated**: 2025-08-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

CLIP-TSA:
CLIP-Assisted Temporal Self-Attention for
Weakly-Supervised Video Anomaly Detection

IEEE International Conference on Image Processing (ICIP), 2023

Oral Presentation

The repository discusses the implementation of the paper
"CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection"
using the PyTorch framework.

Paper

> [**CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection**](https://arxiv.org/pdf/2212.05136.pdf) (Oral, ICIP 2023) > > *[**Kevin Hyekang Joo**](https://hyekang.info/), Khoa Vo, Kashu Yamazaki, Ngan Le*

Requirements

pytorch
matplotlib
tqdm
scipy
scikit-learn

CLIP Features

- [ShanghaiTech Campus Dataset](https://drive.google.com/file/d/1FvU8-qiVwiGF5BXAdM00-YhMZ7xt_vvy/view?usp=sharing) - [UCF-Crime Dataset](https://drive.google.com/file/d/1bsVTixDxWdycDJhcTwqZV75suFrv76LB/view?usp=sharing) - [XD-Violence Dataset](https://drive.google.com/file/d/1HdN4_RcxvSp5scJ4k1PDgHHSZpEhGoZp/view?usp=sharing)

FAQ

> Q1) I get the following error: "RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'" - A1) Please go to venv/lib/python3.8/site-packages/torch/utils/data/sampler.py, and find \_\_iter__ function within RandomSampler class. Then change the line `generator = torch.Generator()` to `generator = torch.Generator(device="cuda")`. > Q2) I keep getting CUDA OUT OF MEMORY error - A2) Each dataset requires varying amounts of VRAM, and a significant amount of VRAM is expected to be used with the TSA feature enabled. Thus, please be advised if you want to run tests on big public datasets such as ShanghaiTech Campus, XD-Violence, and UCF-Crime Datasets. If you would like to test out only the power of CLIP within the model, please disable the TSA by adding `--disable_HA` to the command, which requires less amount of VRAM and should be operable on most GPUs.

How to Run

> python main.py Please change the hyperparameters & parameters accordingly by first looking at the main.py file. Otherwise, it will be run under default settings.

Citations

``` @inproceedings{joo2023cliptsa, title={CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection}, author={Joo, Hyekang Kevin and Vo, Khoa and Yamazaki, Kashu and Le, Ngan}, doi={10.1109/ICIP49359.2023.10222289}, url={https://ieeexplore.ieee.org/document/10222289} publisher={IEEE International Conference on Image Processing (ICIP)}, pages={3230--3234}, year={2023}, organization={IEEE} } ```

Contacts

*Kevin Hyekang Joo - khjoo@usc.edu or hkjoo@umd.edu*

CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection