# Helixfold

**Repository Path**: scnet-lib/HelixFold

## Basic Information

- **Project Name**: Helixfold
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-10-24
- **Last Updated**: 2023-10-24

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# HelixFold-Single Inference

AI-based protein structure prediction pipelines, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on Multiple Sequence Alignments (MSAs) and templates as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs and templates from protein databases is time-consuming, usually taking dozens of minutes. Consequently, we attempt to explore the limits of fast protein structure prediction by using only primary sequences of proteins. **HelixFold-Single** is proposed to combine a large-scale protein language model with the superior geometric learning capability of AlphaFold2. Our proposed method, HelixFold-Single, first pre-trains a large-scale protein language model (PLM) with thousands of millions of primary sequences utilizing the self-supervised learning paradigm, which will be used as an alternative to MSAs and templates for learning the co-evolution information. Then, by combining the pre-trained PLM and the essential components of AlphaFold2, we obtain an end-to-end differentiable model to predict the 3D coordinates of atoms from only the primary sequence. 

## Online Service
For those who want to try out our model without any installation, we also provide an online interface [PaddleHelix HelixFold-Single Forecast](https://paddlehelix.baidu.com/app/drug/protein-single/forecast) through web service.

## Installation
Except those listed in the `requirements.txt`, PaddlePaddle `dev` package is required to run HelixFold.
Visit [here](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html) to 
install PaddlePaddle `dev`. Also, we provide a package here if your machine environment is Nvidia A100 with
cuda=11.2.

```bash
python -m pip install -r requirements.txt
wget https://baidu-nlp.bj.bcebos.com/PaddleHelix/HelixFold/paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl
python -m pip install paddlepaddle_gpu-0.0.0-cp37-cp37m-linux_x86_64.whl
```

## Download the Trained Model
Here we provide the trained model that can be used to reproduce the results of our paper.

```bash
wget https://baidu-nlp.bj.bcebos.com/PaddleHelix/HelixFold-Single/helixfold-single.pdparams
```

## Usage
To run the inference, what you need is a fasta file and the pre-downloaded trained model:

```bash
python helixfold_single_inference.py \
        --init_model=./helixfold-single.pdparams \
        --fasta_file=data/7O9F_B.fasta \
        --output_dir="./output" 
```

- `init_model`: the trained model.
- `fasta_file`: the fasta_file file which contains the protein sequence to be predicted.

The output is organized as：

    ./output
        unrelaxed.pdb

where `unrelaxed.pdb` is the predicted pdb file.

## Citing this work

If you use the code or data in this repos, please cite:

```bibtex
@article{fang2022helixfold_single,
  title={HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative},
  author={Fang, Xiaomin and Wang, Fan and Liu, Lihang and He, Jingzhou and Lin, Dayong and Xiang, Yingfei and Zhang, Xiaonan and Wu, Hua and Li, Hui and Song, Le},
  journal={arXiv preprint arXiv:2207.13921},
  year={2022}
}
```