# lite-avatar

**Repository Path**: sunpengqi11/lite-avatar

## Basic Information

- **Project Name**: lite-avatar
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-04-25
- **Last Updated**: 2025-04-25

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# LiteAvatar
We introduce a audio2face model for realtime 2D chat avatar, which can run in 30fps on only CPU devices without GPU acceleration.
## Pipeline
- An efficient ASR model from [modelsope](https://modelscope.cn/models/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch) for audio feature extraction.
- A mouth parameter prediction model given audio feature inputs for voice synchronized mouth movement generation.
- A lightweight 2D face generator model for mouth movement rendering, which can also be deployed on mobile devices realizing realtime inference.
## Data Preparation
Get sample avatar data located in `./data/sample_data.zip` and extract to you path

🔥More avatars can be found at [LiteAvatarGallery](https://modelscope.cn/models/HumanAIGC-Engineering/LiteAvatarGallery/summary)
## Installation
We recommend a python version = 3.10 and cuda version = 11.8. Then build environment as follows:
```shell
pip install -r requirements.txt
```
## Inference
```
python lite_avatar.py --data_dir /path/to/sample_data --audio_file /path/to/audio.wav --result_dir /path/to/result
```
The mp4 video result will be saved in the result_dir.
## Interactive demo
The realtime interactive video chat demo powered by our LiteAvatar algorithm is available at [OpenAvatarChat](https://github.com/HumanAIGC-Engineering/OpenAvatarChat)
## Acknowledgement
We are grateful for the following open-source projects that we used in this project:
- [Paraformer](https://modelscope.cn/models/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch)
 and [FunASR](https://github.com/modelscope/FunASR) for audio feature extraction.
## Citation
If you find this project useful, please ⭐️ star the repository and cite our related paper:
```
@inproceedings{ZhuangQZZT22,
  author       = {Wenlin Zhuang and Jinwei Qi and Peng Zhang and Bang Zhang and Ping Tan},
  title        = {Text/Speech-Driven Full-Body Animation},
  booktitle    = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI}},
  pages        = {5956--5959},
  year         = {2022}
}
```