# step-video-ti2v **Repository Path**: chhw/step-video-ti2v ## Basic Information - **Project Name**: step-video-ti2v - **Description**: Step-Video-TI2V 是阶跃星辰开源的一款一款基于 30B 参数 Step-Video-T2V 训练的图生视频模型,支持生成 102 帧、5 秒、540P 分辨率的视频, - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: https://www.oschina.net/p/step-video-ti2v - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 2 - **Created**: 2025-03-21 - **Last Updated**: 2025-03-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

## 🔥🔥🔥 News!! * Mar 17, 2025: 👋 We release the inference code and model weights of Step-Video-TI2V. [Download](https://huggingface.co/stepfun-ai/stepvideo-ti2v) * Mar 17, 2025: 👋 We release a new TI2V benchmark [Step-Video-TI2V-Eval](https://github.com/stepfun-ai/Step-Video-TI2V/tree/main/benchmark/Step-Video-TI2V-Eval) * Mar 17, 2025: 👋 Step-Video-TI2V has been integrated into [ComfyUI-Stepvideo-ti2v](https://github.com/stepfun-ai/ComfyUI-StepVideo). Enjoy! * Mar 17, 2025: 🎉 We have made our technical report available as open source. [Read](https://arxiv.org/abs/2503.11251) ## Motion Control
战马跳跃 战马蹲下 战马向前奔跑,然后转身
## Motion Dynamics Control
两名男子在互相拳击,镜头环绕两人拍摄。(motion_score: 2) 两名男子在互相拳击,镜头环绕两人拍摄。(motion_score: 5) 两名男子在互相拳击,镜头环绕两人拍摄。(motion_score: 20)
🎯 Tips: The default motion_score = 5 is suitable for general use. If you need more stability, set motion_score = 2, though it may lack dynamism in certain movements. For greater movement flexibility, you can use motion_score = 10 or motion_score = 20 to enable more intense actions. Feel free to customize the motion_score based on your creative needs to fit different use cases. ## Camera Control
镜头环绕女孩,女孩在跳舞 镜头缓慢推进,女孩在跳舞 镜头拉远,女孩在跳舞
### Supported Camera Movements | 支持的运镜方式 | Camera Movement | 运镜方式 | |--------------------------------|--------------------| | **Fixed Camera** | 固定镜头 | | **Pan Up/Down/Left/Right** | 镜头上/下/左/右移 | | **Tilt Up/Down/Left/Right** | 镜头上/下/左/右摇 | | **Zoom In/Out** | 镜头放大/缩小 | | **Dolly In/Out** | 镜头推进/拉远 | | **Camera Rotation** | 镜头旋转 | | **Tracking Shot** | 镜头跟随 | | **Orbit Shot** | 镜头环绕 | | **Rack Focus** | 焦点转移 | 🔧 Motion Score Considerations: motion_score = 5 or 10 offers smoother and more accurate motion than motion_score = 2, with motion_score = 10 providing the best responsiveness and camera tracking. Choosing the suitable setting enhances motion precision and fluidity. ## Anime-Style Generation
女生向前行走,背景是虚化模糊的效果 女人眨眼,然后对着镜头做飞吻的动作。 狸猫战士双手缓缓上扬,雷电从手中向四周扩散,
身后灵兽影像的双眼闪烁强光,
张开巨口发出低吼
Step-Video-TI2V excels in anime-style generation, enabling you to explore various anime-style images and create customized videos to match your preferences. ## Table of Contents 1. [Introduction](#1-introduction) 2. [Model Summary](#2-model-summary) 3. [Model Download](#3-model-download) 4. [Model Usage](#4-model-usage) 5. [Comparisons](#5-Comparisons) 6. [Online Engine](#6-online-engine) 7. [Citation](#7-citation) ## 1. Introduction We present Step-Video-TI2V, a state-of-the-art text-driven image-to-video generation model with 30B parameters, capable of generating videos up to 102 frames based on both text and image inputs. We build Step-Video-TI2V-Eval as a new benchmark for the text-driven image-to-video task and compare Step-Video-TI2V with open-source and commercial TI2V engines using this dataset. Experimental results demonstrate the state-of-the-art performance of Step-Video-TI2V in the image-to-video generation task. ## 2. Model Summary Step-Video-TI2V is trained based on Step-Video-T2V. To incorporate the image condition as the first frame of the generated video, we encode it into latent representations using Step-Video-T2V’s Video-VAE and concatenate them along the channel dimension of the video latent. Additionally, we introduce a motion score condition, enabling users to control the dynamic level of the video generated from the image condition.

## 3. Model Download | Models | 🤗 Huggingface | 🤖 Modelscope | 🎛️ ComfyUI | |:------------------:|:--------------:|:-------------:|:-----------------:| | Step-Video-TI2V | [Download](https://huggingface.co/stepfun-ai/stepvideo-ti2v) | [Download](https://modelscope.cn/models/stepfun-ai/stepvideo-ti2v) | [Link](https://github.com/stepfun-ai/ComfyUI-StepVideo) | ## 4. Model Usage ### 📜 4.1 Dependencies and Installation ```bash git clone https://github.com/stepfun-ai/Step-Video-TI2V.git conda create -n stepvideo python=3.10 conda activate stepvideo cd Step-Video-TI2V pip install -e . ``` ### 🚀 4.2. Inference Scripts ```bash python api/call_remote_server.py --model_dir where_you_download_dir & ## We assume you have more than 4 GPUs available. This command will return the URL for both the caption API and the VAE API. Please use the returned URL in the following command. parallel=1 or 4 # or parallel=8 Single GPU can also predict the results, although it will take longer url='127.0.0.1' model_dir=where_you_download_dir torchrun --nproc_per_node $parallel run_parallel.py --model_dir $model_dir --vae_url $url --caption_url $url --ulysses_degree $parallel --prompt "笑起来" --first_image_path ./assets/demo.png --infer_steps 50 --cfg_scale 9.0 --time_shift 13.0 --motion_score 5.0 ``` We list some more useful configurations for easy usage: | Argument | Default | Description | |:----------------------:|:---------:|:-----------------------------------------:| | `--model_dir` | None | The model checkpoint for video generation | | `--prompt` | “笑起来” | The text prompt for I2V generation | | `first_image_path` | ./assets/demo.png | The reference image path for I2V task. | | `--infer_steps` | 50 | The number of steps for sampling | | `--cfg_scale` | 9.0 | Embedded Classifier free guidance scale | | `--time_shift` | 7.0 | Shift factor for flow matching schedulers. | | `--motion_score` | 5.0 | Score to control the motion level of the video. | | `--seed` | None | The random seed for generating video, if None, we init a random seed | | `--use-cpu-offload` | False | Use CPU offload for the model load to save more memory, necessary for high-res video generation | | `--save-path` | ./results | Path to save the generated video | ## 5. Comparisons To evaluate the performance of Step-Video-TI2V, We leverage [VBench-I2V](https://arxiv.org/html/2411.13503v1) to systematically compare Step-Video-TI2V with recently released leading open-source models. The detailed results presented in the table below, highlight our model’s superior performance over these models. We presented two results of Step-Video-TI2V, with the motion set to 5 and 10, respectively. As expected, this mechanism effectively balances the motion dynamics and stability (or consistency) of the generated videos. Additionally, we submitted our results to the [VBench-I2V leaderboard](https://huggingface.co/spaces/Vchitect/VBench_Leaderboard), where Step-Video-TI2V achieved the top-ranking position. We also introduce a new benchmark dataset, [Step-Video-TI2V-Eval](https://github.com/stepfun-ai/Step-Video-TI2V/tree/main/benchmark/Step-Video-TI2V-Eval), specifically designed for the TI2V task to support future research and evaluation. The dataset includes 178 real-world and 120 anime-style prompt-image pairs, ensuring broad coverage of diverse user scenarios.
Scores Step-Video-TI2V (motion=10) Step-Video-TI2V (motion=5) OSTopA OSTopB
Total Score 87.98 87.80 87.49 86.77
I2V Score 95.11 95.50 94.63 93.25
Video-Text Camera Motion 48.15 49.22 29.58 46.45
Video-Image Subject Consistency 97.44 97.85 97.73 95.88
Video-Image Background Consistency 98.45 98.63 98.83 96.47
Quality Score 80.86 80.11 80.36 80.28
Subject Consistency 95.62 96.02 94.52 96.28
Background Consistency 96.92 97.06 96.47 97.38
Motion Smoothness 99.08 99.24 98.09 99.10
Dynamic Degree 48.78 36.58 53.41 38.13
Aesthetic Quality 61.74 62.29 61.04 61.82
Imaging Quality 70.17 70.43 71.12 70.82
![figure1](assets/vbench.png "figure1") ## 6. Online Engine The online version of Step-Video-TI2V is available on [跃问视频](https://yuewen.cn/videos), where you can also explore some impressive examples. ## 7. Citation ``` @misc{huang2025stepvideoti2vtechnicalreport, title={Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model}, author={Haoyang Huang, Guoqing Ma, Nan Duan, Xing Chen, Changyi Wan, Ranchen Ming, Tianyu Wang, Bo Wang, Zhiying Lu, Aojie Li, Xianfang Zeng, Xinhao Zhang, Gang Yu, Yuhe Yin, Qiling Wu, Wen Sun, Kang An, Xin Han, Deshan Sun, Wei Ji, Bizhu Huang, Brian Li, Chenfei Wu, Guanzhe Huang, Huixin Xiong, Jiaxin He, Jianchang Wu, Jianlong Yuan, Jie Wu, Jiashuai Liu, Junjing Guo, Kaijun Tan, Liangyu Chen, Qiaohui Chen, Ran Sun, Shanshan Yuan, Shengming Yin, Sitong Liu, Wei Chen, Yaqi Dai, Yuchu Luo, Zheng Ge, Zhisheng Guan, Xiaoniu Song, Yu Zhou, Binxing Jiao, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Yi Xiu, Yibo Zhu, Heung-Yeung Shum, Daxin Jiang}, year={2025}, eprint={2503.11251}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.11251}, } ``` ``` @misc{ma2025stepvideot2vtechnicalreportpractice, title={Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model}, author={Guoqing Ma and Haoyang Huang and Kun Yan and Liangyu Chen and Nan Duan and Shengming Yin and Changyi Wan and Ranchen Ming and Xiaoniu Song and Xing Chen and Yu Zhou and Deshan Sun and Deyu Zhou and Jian Zhou and Kaijun Tan and Kang An and Mei Chen and Wei Ji and Qiling Wu and Wen Sun and Xin Han and Yanan Wei and Zheng Ge and Aojie Li and Bin Wang and Bizhu Huang and Bo Wang and Brian Li and Changxing Miao and Chen Xu and Chenfei Wu and Chenguang Yu and Dapeng Shi and Dingyuan Hu and Enle Liu and Gang Yu and Ge Yang and Guanzhe Huang and Gulin Yan and Haiyang Feng and Hao Nie and Haonan Jia and Hanpeng Hu and Hanqi Chen and Haolong Yan and Heng Wang and Hongcheng Guo and Huilin Xiong and Huixin Xiong and Jiahao Gong and Jianchang Wu and Jiaoren Wu and Jie Wu and Jie Yang and Jiashuai Liu and Jiashuo Li and Jingyang Zhang and Junjing Guo and Junzhe Lin and Kaixiang Li and Lei Liu and Lei Xia and Liang Zhao and Liguo Tan and Liwen Huang and Liying Shi and Ming Li and Mingliang Li and Muhua Cheng and Na Wang and Qiaohui Chen and Qinglin He and Qiuyan Liang and Quan Sun and Ran Sun and Rui Wang and Shaoliang Pang and Shiliang Yang and Sitong Liu and Siqi Liu and Shuli Gao and Tiancheng Cao and Tianyu Wang and Weipeng Ming and Wenqing He and Xu Zhao and Xuelin Zhang and Xianfang Zeng and Xiaojia Liu and Xuan Yang and Yaqi Dai and Yanbo Yu and Yang Li and Yineng Deng and Yingming Wang and Yilei Wang and Yuanwei Lu and Yu Chen and Yu Luo and Yuchu Luo and Yuhe Yin and Yuheng Feng and Yuxiang Yang and Zecheng Tang and Zekai Zhang and Zidong Yang and Binxing Jiao and Jiansheng Chen and Jing Li and Shuchang Zhou and Xiangyu Zhang and Xinhao Zhang and Yibo Zhu and Heung-Yeung Shum and Daxin Jiang}, year={2025}, eprint={2502.10248}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.10248}, } ```