# AnyText

**Repository Path**: pythoncr/AnyText

## Basic Information

- **Project Name**: AnyText
- **Description**: No description available
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-03-12
- **Last Updated**: 2024-03-12

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# AnyText: 多语言视觉文本生成和编辑

<a href='https://arxiv.org/abs/2311.03054'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://github.com/tyxsspa/AnyText'><img src='https://img.shields.io/badge/Code-Github-green'></a> <a href='https://modelscope.cn/studios/damo/studio_anytext'><img src='https://img.shields.io/badge/Demo-ModelScope-lightblue'></a> <a href='https://huggingface.co/spaces/modelscope/AnyText'><img src='https://img.shields.io/badge/Demo-HuggingFace-yellow'></a>

![sample](docs/sample.jpg "sample")

## 📌新闻
[2024.02.21] - 评估代码和数据集(**AnyText-benchmark**) 已发布.  
[2024.02.06] - 祝大家新年快乐！我们在 [ModelScope](https://modelscope.cn/studios/iic/MemeMaster/summary) 和 [HuggingFace](https://huggingface.co/spaces/martinxm/MemeMaster) 上发布了一个有趣的应用程序（表情包大师/MeMeMaster），用于创建可爱的表情贴纸。快来一起玩吧！
[2024.01.17] - 🎉AnyText已被ICLR 2024(**Spotlight**)接受！ 
[2024.01.04] - FP16推理可用，速度快3倍！现在演示可以在内存大于8GB的GPU上部署。尽情享受吧！
[2024.01.04] - HuggingFace在线演示在[这里](https://huggingface.co/spaces/modelscope/AnyText)!
[2023.12.28] - ModelScope在线演示在[这里](https://modelscope.cn/studios/damo/studio_anytext/summary)!  
[2023.12.27] - 🧨 我们发布了最新的检查点（v1.1）和推理代码，请查看 [ModelScope](https://modelscope.cn/models/damo/cv_anytext_text_generation_editing/summary) 上的中文信息。
[2023.12.05] - 论文可在[这里](https://arxiv.org/abs/2311.03054)获取。

For more AIGC related works of our group, please visit [here](https://github.com/AIGCDesignGroup), and we are seeking collaborators and research interns([Email us](mailto:cangyu.gyf@alibaba-inc.com)).

## ⏰需要做的事情
- [x] 发布模型和推理代码
- [x] 提供公开可访问的演示链接
- [ ] 提供一个免费的字体文件(🤔)
- [ ] 发布用于合并社区模型或LoRA权重的工具
- [ ] 在stable-diffusion-webui中支持AnyText(🤔)
- [x] 发布AnyText-benchmark数据集和评估代码
- [ ] 发布AnyWord-3M数据集和训练代码
 

## 💡方法论
AnyText包括两个主要元素的扩散管道：辅助潜在模块和文本嵌入模块。前者使用文本字形、位置和掩蔽图像等输入来生成用于文本生成或编辑的潜在特征。后者采用OCR模型将笔画数据编码为嵌入，这些嵌入与来自分词器的图像标题嵌入相结合，以生成与背景无缝融合的文本。我们使用了文本控制扩散损失和文本感知损失进行训练，以进一步提高写作准确性。
![framework](docs/framework.jpg "framework")

## 🛠安装
```bash
# Install git (skip if already done)
conda install -c anaconda git
# Clone anytext code
git clone https://github.com/tyxsspa/AnyText.git
cd AnyText
# Prepare a font file; Arial Unicode MS is recommended, **you need to download it on your own**
mv your/path/to/arialuni.ttf ./font/Arial_Unicode.ttf
# Create a new environment and install packages as follows:
conda env create -f environment.yaml
conda activate anytext
```

## 🔮推理
**[推荐]**： 我们在 [ModelScope](https://modelscope.cn/studios/damo/studio_anytext/summary)和[HuggingFace](https://huggingface.co/spaces/modelscope/AnyText)上发布了一个演示！

AnyText包括两种模式：文本生成和文本编辑。运行下面的简单代码，在这两种模式下进行推理并验证环境是否正确安装。
```bash
python inference.py
```
如果您有高级GPU（至少8G内存），建议部署我们的演示，其中包括使用说明、用户界面和丰富的示例。
```bash
export CUDA_VISIBLE_DEVICES=0 && python demo.py
```
    
默认情况下使用FP16推理，并加载了一个中文到英文的翻译模型，用于直接输入中文提示（占用约4GB的GPU内存）。可以通过以下命令修改默认行为，启用FP32推理并禁用翻译模型：
```bash
export CUDA_VISIBLE_DEVICES=0 && python demo.py --use_fp32 --no_translator
```
如果使用FP16并且不使用翻译模型（或将其加载到CPU上，参见[这里](https://github.com/tyxsspa/AnyText/issues/33))，生成一个单独的512x512图像将占用约7.5GB的GPU内存。
此外，还可以通过以下命令使用其他字体文件（尽管结果可能不是最佳）：
```bash
export CUDA_VISIBLE_DEVICES=0 && python demo.py --font_path your/path/to/font/file.ttf
```
![演示](docs/demo.jpg "demo")
**请注意**，当第一次执行推理时，模型文件将被下载到：`~/.cache/modelscope/hub`。如果需要修改下载目录，可以手动指定环境变量：`MODELSCOPE_CACHE`。


## 📈评估
### 1. 数据准备
从[ModelScope](https://modelscope.cn/datasets/iic/AnyText-benchmark/summary) 或 [GoogleDrive](https://drive.google.com/drive/folders/1Eesj6HTqT1kCi6QLyL5j0mL_ELYRp3GV)
下载 AnyText-benchmark 数据集并解压缩文件。在*benchmark*文件夹中，*laion_word*和*wukong_word*分别是英文和中文评估的数据集。打开每个*test1k.json*并修改`data_root`为*imgs*文件夹的路径。*FID*目录包含用于计算 FID（Fréchet Inception Distance）分数的图像。

### 2. 生成图像
在进行评估之前，我们需要根据评估集为每种方法生成相应的图像。我们还提供了[预生成的图像](https://drive.google.com/file/d/1pGN35myilYY04ChFtgAosYr0oqeBy4NU/view?usp=drive_link) 供所有方法使用。按照以下说明在自己的设备上生成图像。请注意，您需要相应地修改bash脚本中的路径和其他参数。

- AnyText
```bash
bash ./eval/gen_imgs_anytext.sh
```
（如果您遇到由于huggingface被阻止而导致的错误，请取消注释./models_yaml/anytext_sd15.yaml的第98行，并将*clip-vit-large-patch14*文件夹的路径替换为本地路径）

- ControlNet, Textdiffuser, GlyphControl  
我们使用AnyText-benchmark数据集渲染的字形图像作为这些方法的条件输入：
```bash
bash eval/gen_glyph.sh
```
接下来，请克隆**ControlNet**, **Textdiffuser**和**GlyphControl**的官方仓库，并按照它们的文档设置环境、下载相应的检查点，并确保可以正常执行推理。然后，将三个文件<method>_singleGPU.py、`<method>_singleGPU.py`, `<method>_multiGPUs.py`, 和 `gen_imgs_<method>.sh` 从*./eval*文件夹复制到相应代码库的根目录，并运行：

```bash
bash gen_imgs_<method>.sh
```

### 3. 评估

我们使用句子准确性（Sen. ACC）和归一化编辑距离（NED）来评估生成文本的准确性。请运行：
```bash
bash eval/eval_ocr.sh
```
我们使用FID指标来评估生成图像的质量。请运行：
```bash
bash eval/eval_fid.sh
```

与现有方法相比，AnyText在英文和中文文本生成方面具有显著优势。

![eval](docs/eval.jpg "eval")

请注意，我们已经重新组织了代码，并对每个受评估方法的配置进行了进一步的调整。因此，与原始论文中报告的数字相比，可能存在一些微小的差异。

## 🌄图库
![gallery](docs/gallery.png "gallery")

## 引用
```
@article{tuo2023anytext,
      title={AnyText: Multilingual Visual Text Generation And Editing}, 
      author={Yuxiang Tuo and Wangmeng Xiang and Jun-Yan He and Yifeng Geng and Xuansong Xie},
      year={2023},
      eprint={2311.03054},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```