# ComfyUI_EchoMimic

**Repository Path**: juht/ComfyUI_EchoMimic

## Basic Information

- **Project Name**: ComfyUI_EchoMimic
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: echo_hallo
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-02-19
- **Last Updated**: 2025-02-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# ComfyUI_EchoMimic
You can use EchoMimic & EchoMimic V2 in comfyui

[Echomimic](https://github.com/antgroup/echomimic/tree/main)：Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning  
[Echomimic_v2](https://github.com/antgroup/echomimic_v2): Towards Striking, Simplified, and Semi-Body Human Animation


---

## New Updates 2025-01-03:
* 支持新版的ACC模型，在infer_mode里选择pose_acc开启，如果外网通畅会自动下载，你也可以从[这里](https://huggingface.co/BadToBest/EchoMimicV2/tree/main)预下载（denoising_unet_acc.pth和motion_module_acc.pth），并放在ComfyUI\models\echo_mimic\v2里，推荐的步数为6步,cfg为1，尺寸为768*768。ACC模型较大，小显存耗时可能会比较长；  
* Support the new version of ACC model, select 'pose_acc' to enable in 'infer_mode', and if the network is smooth, it will automatically download. You can also pre download from [here](https://huggingface.co/BadToBest/EchoMimicV2/tree/main) and put it in A. The recommended 'steps' are '6' ,'cfg' is '1' and the size is 768 * 768. The ACC model is relatively large, and low video memory consumption may be longer


**Previous updates：**  
* 新增输入图片跟基准图片对齐功能（选择pose_normal_sapiens时自动开启，3种驱动方式都能使用，见下面的示例图），修复之前的蒙版对齐错误。
* Added the function of aligning the input image with the reference image (automatically turned on when selecting pose_normal_sapiens, all three driving methods can be used，See the example diagram below), fixed the previous mask alignment error.

* V2版现在跟V1一样，有三种pose驱动方式，第一种，infer_mode选择audio_drive,pose_dir 选择列表里的几个默认pose，则使用默认的npy pose文件，第二种，infer_mode选择audio_drive,pose_dir 选择已有的npy文件夹（位于...ComfyUI/input/tensorrt_lite目录下），第三种，infer_mode选择pose_normal_dwpose 或pose_normal_sapiens,video_images连接视频入口，确认...ComfyUI/models/echo_mimic 下有yolov8m.pt 和sapiens_1b_goliath_best_goliath_AP_639_torchscript.pt2 模型 （见图示和example里的工作流,下载地址见后附）；
* 因为调用了sapiens的pose方法，所以需要安装yolo的库ultralytics ，安装方法：  pip install ultralytics  
* The V2 version now has three different pose driving methods, just like the V1 version. The first method is to select audio_drive for infer_mode and default poses from the list for pose_dir, using the default npy pose file. The second method is to select audio-drive for infer_mode and an existing npy folder (located in the... ComfyUI/input/tensorrt_lite directory) for pose_dir. The third method is to select 'pose_normal_dwpose' or 'pose_normal_sapiens' for infer_mode, connect to the video portal with video_images, and confirm Under ComfyUI/models/echo_mimic, there are 'YOLOV8m.pt' and 'sapiens_1b_goliath_best_goliath_AP_639_torchscript.pt2' models (see the workflow in the diagram and example,Please see the download link below)
* Because the pose method of ‘Sapiens’ was called, it is necessary to install YOLO's library ultralytics. Installation method： pip install ultralytics  
---

# 1. Installation

In the ./ComfyUI /custom_node directory, run the following:   
```
git clone https://github.com/smthemex/ComfyUI_EchoMimic.git
```

---
  
# 2. Requirements  

```
pip install -r requirements.txt
pip install --no-deps facenet-pytorch
```
Notice
---
* 如果安装facenet-pytorch后comfyUI奔溃，可以先卸载torch，然后再重新安装，以下版本只是示例：
* if comfyUI  broken after pip  install  facenet-pytorch ,try this below: 
```
pip uninstall torchaudio torchvision torch xformers
pip install torch torchvision torchaudio --index-url  https://download.pytorch.org/whl/cu124
pip install xformers
```
* 如果使用的是便携包版本在python_embeded目录下 打开CMD ;   
* If it is a  portable package comfyUI： open CMD in python_embeded dir   
```
python -m pip uninstall torchaudio torchvision torch xformers
python -m pip install torch torchvision torchaudio --index-url  https://download.pytorch.org/whl/cu124
python -m pip install xformers
```

* 如果ffmpeg 报错，if ffmpeg error：  
```
pip uninstall ffmpeg   
pip install ffmpeg-python  
```

* 其他库缺啥装啥。。。  
* If the module is missing, , pip install  missing module.       

## Troubleshooting errors with stable-audio-tools / other audio issues
**If using conda & python >3.12**
> Uninstall all & downgrade python
```
pip uninstall torchaudio torchvision torch xformers ffmpeg

conda uninstall python
conda install python=3.11.9

pip install --upgrade pip wheel
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
or install torch 2.4 
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
```
**Should have most of these packages if you install the custom nodes from git urls**
```
pip install flash-attn spandrel opencv-python diffusers jwt diffusers bitsandbytes omegaconf decord carvekit insightface easydict open_clip ffmpeg-python taming onnxruntime
```
---

# 3. Models Required 
----
**3.1 V1 & V2 Shared model v1 和 v2 共用的模型**:   
如果能直连抱脸,点击就会自动下载所需模型,不需要手动下载.  
3.11 unet [link](https://huggingface.co/lambdalabs/sd-image-variations-diffusers)   
3.12 V1 & V2 audio  [link](https://huggingface.co/BadToBest/EchoMimic/tree/main)   
3.13 vae(stabilityai/sd-vae-ft-mse)    [link](https://huggingface.co/stabilityai/sd-vae-ft-mse)        
3.14 optional (可选) hallo upscale [huggingface](https://huggingface.co/fudan-generative-ai/hallo2/tree/main)  # auto downlad

```
├── ComfyUI/models/ echo_mimic
|         ├── unet
|             ├── diffusion_pytorch_model.bin
|             ├── config.json
|         ├── audio_processor
|             ├── whisper_tiny.pt
|         ├── vae
|             ├── diffusion_pytorch_model.safetensors
|             ├── config.json

```

**3.2 V1 models V1使用以下模型**:   
V1 address   [link](https://huggingface.co/BadToBest/EchoMimic/tree/main)    
Audio-Drived Algo Inference 音频驱动        
```
├── ComfyUI/models/echo_mimic
|         ├── denoising_unet.pth
|         ├── face_locator.pth
|         ├── motion_module.pth
|         ├── reference_unet.pth
Audio-Drived Algo Inference  acc  音频驱动加速版
|         ├── denoising_unet_acc.pth
|         ├── motion_module_acc.pth
```

Using Pose-Drived Algo Inference  姿态驱动   
```
├── ComfyUI/models/echo_mimic
|         ├── denoising_unet_pose.pth
|         ├── face_locator_pose.pth
|         ├── motion_module_pose.pth
|         ├── reference_unet_pose.pth
Using Pose-Drived Algo Inference  ACC   姿态驱动加速版
|         ├── denoising_unet_pose_acc.pth
|         ├── motion_module_pose_acc.pth
```

**3.2 v2 version**   
use model below V2, Automatic download, you can manually add it 使用以下模型,使用及自动下载,你可以手动添加:    
模型地址address:[huggingface](https://huggingface.co/BadToBest/EchoMimicV2/tree/main)
```
├── ComfyUI/models/echo_mimic/v2
|         ├── denoising_unet.pth
|         ├── motion_module.pth
|         ├── pose_encoder.pth
|         ├── reference_unet.pth
if use acc 姿态驱动加速版   
|         ├── denoising_unet_acc.pth
|         ├── motion_module_acc.pth
```
YOLOm8 [download link](https://huggingface.co/Ultralytics/YOLOv8/tree/main)   
sapiens pose [download link](https://huggingface.co/facebook/sapiens-pose-1b-torchscript/tree/main)  
sapiens的pose 模型可以量化为fp16的，详细见我的sapiens插件 [地址](https://github.com/smthemex/ComfyUI_Sapiens)   
```
├── ComfyUI/models/echo_mimic
|         ├── yolov8m.pt
|         ├── sapiens_1b_goliath_best_goliath_AP_639_torchscript.pt2  or/或者 sapiens_1b_goliath_best_goliath_AP_639_torchscript_fp16.pt2
```

---

# 4 Example
-----
* 自动对齐输入图片Automatically align input images；  
![](https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/alignA.png)
![](https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/align.png)

* V2加载自定义视频驱动视频，V2 loads custom video driver videos  
![](https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/example.png)

* V2选择自定义pose驱动视频，V2 Choose Custom Pose Driver Video   
![](https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/cropC.png)

* Echomimic_v2 use default pose  new version 使用官方默认的pose文件 
![](https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/v2.gif)

* motion_sync Extract facial features directly from the video (with the option of voice synchronization), while generating a PKL model for the reference video ，The old version 
直接从从视频中提取面部特征(可以选择声音同步),同时生成参考视频的pkl模型  旧版   
 ![](https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/video2video.gif)

* mormal Audio-Drived Algo Inference   The old  version  workflow  音频驱动视频常规示例 旧版  
![](https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/audio2video.png)

* mormal Audio-Drived Algo Inference   The old version  workflow  音频驱动视频常规示例  2倍放大 1024*1024  旧版本    
![](https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/echonew.png)

* pose from pkl，The old  version, 基于预生成的pkl模型生成视频.  旧版      
 ![](https://github.com/smthemex/ComfyUI_EchoMimic/blob/main/example/new.png)

* 示例的 VH node ComfyUI-VideoHelperSuite node: [ComfyUI-VideoHelperSuite](https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite)


---

# 5 Function Description
---
* infer_mode：音频驱动视频生成，“audio_drived” 和"audio_drived_acc"；      
* infer_mode：参考pkl模型文件视频pose生成 "pose_normal", "pose_acc"；   
    ----motion_sync：如果打开且video_file有视频文件时，生成pkl文件，并生成参考视频的视频；pkl文件在input\tensorrt_lite 目录下，再次使用需要重启comfyUI。   
    ----motion_sync：如果关闭且pose_mode不为none的时候，读取选定的pose_mode目录名的pkl文件，生成pose视频；如果pose_mode为空的时候，生成基于默认assets\test_pose_demo_pose的视频   
 
**特别的选项**：  
  * save_video：如果不想使用VH节点时，可以开启，默认关闭；     
  * draw_mouse：你可以试试；    
  * length：帧数，时长等于length/fps；     
  * acc模型 ，6步就可以，但是质量略有下降；   
  * lowvram :低显存用户可以开启 lowvram users can enable it  
  * 内置内置图片等比例裁切。   
**特别注意的地方**：   
  * cfg数值设置为1，仅在turbo模式有效，其他会报错。    

**Infir_mode**: Audio driven video generation, "audio-d rived" and "audio-d rived_acc";   
**Infer_rode**: Refer to the PKL model file to generate "pose_normal" and "pose_acc" for the video pose;   
**Motion_Sync**: If opened and there is a video file in videoFILE, generate a pkl file and generate a reference video for the video; The pkl file is located in the input \ sensorrt_lite directory. To use it again, you need to restart ComfyUI.    
**Motion_Sync**: If turned off and pose_mode is not 'none', read the pkl file of the selected pose_mode directory name and generate a pose video; If pose_mode is empty, generate a video based on the default assets \ test_pose_demo_pose    

 
**Special options:**   
--**Save_video**: If you do not want to use VH nodes, it can be turned on and turned off by default;   
--**Draw_mause**: You can try it out;   
--**Length**: frame rate, duration equal to length/fps;   
--The ACC model only requires 6 steps, but the quality has slightly decreased;   
--Built in image proportional cropping.   
Special attention should be paid to:   
--The cfg value is set to 1, which is only valid in turbo mode, otherwise an error will be reported.   

---

**既往更新：**  

* 增加detection_Resnet50_Final.pth 和RealESRGAN_x2plus.pth自动下载的代码，首次使用，保持realesrgan和face_detection_model菜单为‘none’（无）时就会自动下载，如果菜单里已有模型，请选择模型。    
* 新增hallo2的2倍放大节点，输入视频的尺寸必须是512 * 512方形，输出为1024 * 1024
* 当你用torch 2.2.0+cuda 成功安装最新的facenet-pytorch库后，可以卸载掉基于 2.2.0版本的torch torchvision torchaudio xformers 然后重新安装更高版本的torch torchvision torchaudio xformers，以下是卸载和安装的示例（假设安装torch2.4）：   
* 添加lowvram模式，方便6G或者8G显存用户使用，注意，开启之后会很慢，而且占用内存较大，请谨慎尝试。     
* 修改vae模型的加载方式，移至ComfyUI/models/echo_mimic/vae路径（详细见下方模型存放地址指示图），降低hf加载模型的优先级，适用于无梯子用户。     


**Previous updates：**   
* ﻿The magnification factor of 'facecrop-ratio' is '1/facecrop-ratio'. If set to 0.5, the face will be magnified twice. It is recommended to adjust facecrop-ratio to a smaller value only when the proportion of faces in the reference image or driving video is very small,Do not cut when it is 1 or 0;     
* facecrop_ratio的放大系数为1/facecrop_ratio，如果设置为0.5，面部会得到2倍的放大，建议只在参考图片或者驱动视频中的人脸占比很小的时候，才将facecrop_ratio调整为较小的值.为1 或者0 时不裁切  
* Add upscale model and Resnet model auto download codes（if had ，they in comfyUI/models/upscale_models/RealESRGAN_x2plus.pth and comfyUI/models/Hallo/facelib/detection_Resnet50_Final.pth）， first use ，keep “realesrgan” and “face_detection_model” ‘none’ will auto download.. 
* After successfully installing the latest ‘facenet-pytorch’ library using torch 2.2.0+CUDA, you can uninstall torch torch vision torch audio xformers based on version 2.2.0 and then reinstall a higher version of torch、 torch vision、 torch audio xformers. Here is an example of uninstallation and installation (installing torch 2.4):  
* Add lowvram mode for convenient use by 6G or 8G video memory users. Please note that it will be slow and consume a large amount of memory when turned on. Please try carefully  

  
---

6 Citation
------
EchoMimici
``` python  
@misc{chen2024echomimic,
  title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
  author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
  year={2024},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
```

EchoMimici-V2
``` python  
@misc{meng2024echomimic,
  title={EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation},
  author={Rang Meng, Xingyu Zhang, Yuming Li, Chenguang Ma},
  year={2024},
  eprint={2411.10061},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
```

hallo2
``` python  
@misc{cui2024hallo2,
	title={Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation},
	author={Jiahao Cui and Hui Li and Yao Yao and Hao Zhu and Hanlin Shang and Kaihui Cheng and Hang Zhou and Siyu Zhu and️ Jingdong Wang},
	year={2024},
	eprint={2410.07718},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
}
```
sapiens
```
@article{khirodkar2024sapiens,
  title={Sapiens: Foundation for Human Vision Models},
  author={Khirodkar, Rawal and Bagautdinov, Timur and Martinez, Julieta and Zhaoen, Su and James, Austin and Selednik, Peter and Anderson, Stuart and Saito, Shunsuke},
  journal={arXiv preprint arXiv:2408.12569},
  year={2024}
}
```