# DecompOpt

**Repository Path**: ByteDance/DecompOpt

## Basic Information

- **Project Name**: DecompOpt
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-08-01
- **Last Updated**: 2026-01-25

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

This repository is the official implementation of _DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization._


## Dependencies
### Install via Conda and Pip
```bash
conda create -n decompdiff python=3.8
conda activate decompdiff
conda install numpy==1.22.3
conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install pyg -c pyg
conda install rdkit openbabel tensorboard pyyaml easydict python-lmdb -c conda-forge

# For decomposition
conda install -c conda-forge mdtraj
pip install alphaspace2

# For Vina Docking
pip install meeko==0.3.0 scipy pdb2pqr vina==1.2.2 
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3
```

## Preprocess 
We decomposed molecules in CrossDocked2020 trainig set into arms and stored processed data in `arm_info_2.pt`, which can be downloaded [here](https://huggingface.co/datasets/Annie37/DecompOpt/blob/main/arm_info_2.pt). Then we docked arms with target protein with Vina Minimize and obtained docked arm conformations as conditions for training.
```bash
python scripts/data_preparation/dock_training_arms.py
```
We follow the preprocess of [DecompDiff](https://github.com/bytedance/DecompDiff). We have provided processed dataset [here](https://huggingface.co/datasets/Annie37/DecompOpt/tree/main).

## Training
To train the model from scratch, you need to download the `*.lmdb`, `*_name2id.pt` and `split_by_name.pt` files and put them in the `./data` directory. Then, you can run the following command:
```bash
python scripts/train_diffusion_decompopt.py configs/training.yml
```

## Sampling and Evaluation
To sample molecules given protein pockets in the test set, you need to download `test_index.pkl` and `*_eval.tar.gz` files, unzip it and put them in the `./data` directory. To sample molecules with beta priors, you also need to download `beta_priors.zip` and `natom_models.pkl` and put them in the `./pregen_info` directory. Then, you can run the following command:
```bash
bash scripts/run/sample_compose.sh ${data_id} ${outdir}
```
This script samples for opt prior by default. We have provided the trained model checkpoint [here](https://huggingface.co/datasets/Annie37/DecompOpt/tree/main). You need to download both `decompdiff.pt` and `decompopt.pt`.
After sampling, Vina Dock is evaluated and the best results are selected:
```bash
bash scripts/run/eval_vina_full.sh ${data_id} ${outdir}
python scripts/select_best_arm.py ${outdir}
```

## BibTex
```
@inproceedings{
    zhou2024decompopt,
    title={DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization},
    author={Xiangxin Zhou and Xiwei Cheng and Yuwei Yang and Yu Bao and Liang Wang and Quanquan Gu},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=Y3BbxvAQS9}
}
```