# elem2design

**Repository Path**: mirrors_microsoft/elem2design

## Basic Information

- **Project Name**: elem2design
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-06-21
- **Last Updated**: 2025-06-23

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

[[Paper](https://arxiv.org/abs/2412.19712)]
[[Video](https://www.youtube.com/watch?v=omXtLEiwEPU)]
[[Hugging Face](https://huggingface.co/microsoft/elem2design)]
[[Demo](app/app.py)]

In this work, we investigate automatic design composition from multimodal graphic elements.
We propose [LaDeCo](https://arxiv.org/pdf/2412.19712), which introduces the layered design principle to accomplish this challenging task through two steps: layer planning and layered design composition.

![](teaser.png)

# Requirements

1. Clone this repository
```bash
git clone https://github.com/microsoft/elem2design.git
cd elem2design
```

2. Install
```bash
conda create -n e2d python=3.10 -y
conda activate e2d
pip install --upgrade pip
pip install -e .
pip install -e thirdparty/opencole
pip install -e dataset/src
```

3. Install additional packages for training cases
```bash
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
```

# How to Use

## Layer Planning

Please check this [folder](dataset/labeling) for the `layer planning` code, or you can directly use the predicted labels [here](dataset/dataset/role/crello_role.pkl).

Index-to-layer mapping:
```python
{
    0: "Background", 
    1: "Underlay", 
    2: "Logo/Image", 
    3: "Text", 
    4: "Embellishment"
}
```

## Rendering

We render an image for each element, which is useful during inference.
Meanwhile, we also render the intermediate designs (denoted as `layer_{index}.png`) and use them for end-to-end training.

Please run the following script to get these assets:
```bash
python dataset/src/crello/render.py
```

## Dataset Preparation

After rendering, we move to the next step to construct the dataset according to layered design principle. 
Each sample has 5 rounds of dialogue, where the model progressively predicts element attributes from the background layer to the embellishment layer.

```bash
python dataset/src/crello/create_dataset.py --tag ours
```

## Model

We have a model available on [Hugging Face](https://huggingface.co/microsoft/elem2design).
Please download it for inference.

This model is trained on the public [Crello dataset](https://huggingface.co/datasets/cyberagent/crello).
We find that using a dataset approximately five times larger leads to significantly improved performance. For a detailed evaluation, please refer to Table 2 in our [paper](https://arxiv.org/pdf/2412.19712). Unfortunately, we are unable to release this model as it was trained on a private dataset.

## Inference

Now it is time to do inference using the prepared data and model:
```bash
python llava/infer/infer.py \
    --model_name_or_path /path/to/model/checkpoint-xxxx \
    --data_path /path/to/data/test.json \
    --image_folder /path/to/crello_images \
    --output_dir /path/to/output_dir \
    --start_layer_index 0 \ 
    --end_layer_index 4
```

# Demo

Besides command-line inference, we also provide a demo interface that allows users to interact with the model via a web-based UI. This interface makes it more user-friendly and better suited for running inference on custom datasets.

To launch the web UI, run the following command:

```bash
python app/app.py --model_name_or_path /path/to/model/checkpoint-xxxx
```

# Evaluation

We compute the LVM scores and geometry-related metrics for the generated designs:

- [LVM scores](llava/metrics/llava_ov.py)
```bash
python llava/metrics/llava_ov.py -i /path/to/output_dir
```

- [Geometry-related metrics](llava/metrics/layout.py)
```bash
python llava/metrics/layout.py --pred /path/to/output_dir/pred.jsonl
```

# Training

We fine-tune LLMs using the `crello` training set for layered design composition.
For your own dataset, please prepare the dataset in the given format and run:
```bash
bash scripts/finetune_lora.sh \
    1 \
    meta-llama/Llama-3.1-8B \
    /path/to/dataset/train.json \
    /path/to/image/base \
    /path/to/output_dir \
    50 \
    2 \
    16 \
    250 \
    2e-4 \
    2e-4 \
    cls_pooling \
    Llama-3.1-8B_lora_ours \
    32 \
    64 \
    4
```

For example, the specific script in our setting is:
```bash
bash scripts/finetune_lora.sh \
    1 \
    meta-llama/Llama-3.1-8B \
    dataset/dataset/json/ours/train.json \
    dataset/dataset/crello_images \
    output/Llama-3.1-8B_lora_ours \
    50 \
    2 \
    16 \
    250 \
    2e-4 \
    2e-4 \
    cls_pooling \
    Llama-3.1-8B_lora_ours \
    32 \
    64 \
    4
```

Remember to login to [Hugging Face](https://huggingface.co/meta-llama/Llama-3.1-8B) using the Llama access token:
```bash
huggingface-cli login --token $TOKEN
```

The following is a list of supported LLMs:

- [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b)
- [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b)
- [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
- [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
- [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)
- [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- [mistralai/Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3)

# BibTeX

```
@InProceedings{lin2024elements,
  title={From Elements to Design: A Layered Approach for Automatic Graphic Design Composition},
  author={Lin, Jiawei and Sun, Shizhao and Huang, Danqing and Liu, Ting and Li, Ji and Bian, Jiang},
  booktitle={CVPR},
  year={2025}
}
```

## Acknowledgments

We would like to express our gratitude to [CanvasVAE](https://huggingface.co/datasets/cyberagent/crello) for providing the dataset, [OpenCole](https://github.com/CyberAgentAILab/OpenCOLE) for the rendering code, and [LLaVA](https://github.com/haotian-liu/LLaVA) for the codebase. 
We deeply appreciate all the incredible work that made this project possible.

## Contributing

This project welcomes contributions and suggestions.  Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 
trademarks or logos is subject to and must follow 
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.