diff --git a/tutorials/training/source_en/advanced_use/hpc_sponge.md b/tutorials/training/source_en/advanced_use/hpc_sponge.md index c2cf80b7076e1776e3f044e32d2ca3f353dea61d..fb991c42b70c55c22d1715fb1642dd739b8c2cf1 100644 --- a/tutorials/training/source_en/advanced_use/hpc_sponge.md +++ b/tutorials/training/source_en/advanced_use/hpc_sponge.md @@ -1,5 +1,199 @@ # SPONGE Molecular Simulation Practice -No English version right now, welcome to contribute. +`Linux` `GPU` `Model Development` `Senior` -   + + +- [SPONGE Molecular Simulation Practice](#SPONGE Molecular Simulation Practice) + - [Overview](#概述) + - [The Overall Execution](#整体执行) + - [Ready to Link](#准备环节) + - [Example of Simulated Polypeptide Aqueous Solution System](#模拟多肽水溶液体系示例) + - [Prepare the Input File](#准备输入文件) + - [Load the Data](#加载数据) + - [Building the Simulation Process](#构建模拟流程) + - [Run the Script](#运行脚本) + - [The Results](#运行结果) + + +   + +## Overview + +Molecular simulation is a method to simulate the structure and behavior of molecules, and then simulate the physical and chemical properties of the molecular system by using the molecular model at the atomic level. It is based on the experiment, through the basic principles, to build a set of models and algorithms, so as to calculate the reasonable molecular structure and molecular behavior. + +In recent years, molecular simulation technology has developed rapidly and been widely used in many fields. In the field of drug design, it can be used to study the mechanism of action of viruses and drugs. In the field of biological science, it can be used to characterize the multi-level structure and properties of proteins. In the field of material science, it can be used to study the structure and mechanical properties, the optimal design of materials, etc. In the field of chemistry, can be used to study surface catalysis and mechanism; In the petrochemical field, it can be used for the structure characterization, synthesis design, adsorption and diffusion of molecular sieve catalysts. It can construct and characterize the structure of polymer chains and crystalline or amorphous bulk polymers, and predict important properties including blending behavior, mechanical properties, diffusion, cohesion and so on. + +Sponge is a molecular simulation library jointly developed by Peking University and Shenzhen Bay Laboratory's Gao Yiqin research group and Huawei's Mindspore team. Sponge has high performance and modularizing characteristics. Sponge can efficiently complete traditional molecular simulation processes based on MindSpore's automatic parallelism and graph-computing fusion. Sponge uses MindSpore's automatic differentiation feature to combine AI methods such as neural networks with traditional molecular simulations. + +This tutorial will focus on how to use Sponge, the built-in Mindspore, to perform high performance molecular simulation on the GPU. + +> You can download the full sample code here:。 + +## The Overall Execution + +1. Prepare the molecular simulation input file, load the data, and determine the calculated molecular system; +2. Sponge module is defined and initialized to determine the calculation process. +3. Run the training script, output the thermodynamic information file of the simulation, and view the results; +## Ready to Link + +Before practice, make sure it is installed correctly MindSpore。If not, you can go through[MindSpore download website](https://www.mindspore.cn/install)download MindSpore。 + +## Example of Simulated Polypeptide Aqueous Solution System + +Sponge is high performance and easy to use. This tutorial uses Sponge to simulate a peptide aqueous solution system. The simulated system was alanine tripeptide aqueous solution. + +### Prepare the Input File + +The simulation system of this tutorial needs to load three input files, respectively: + +- Property file (suffix `.in`file), declare the basic conditions of the simulation, the whole simulation process parameter control. +- Topology files (file suffix `.param7`). Topology files describe the topological relationships and parameters of the internal molecules of the system. +- A coordinate file (a file with the suffix `.rst7`), which describes the coordinates of each atom at its initial moment in the system. + + +Topology and coordinate file can bring their own through the modeling process by AmberTools tleap tools (download address < http://ambermd.org/GetAmber.php >, comply with the GPL) modeling is complete. The modeling process is as follows: + +- Open tleap + + ```bash + tleap + ``` + +- Load TLEAP's own FF14SB force field + + ```bash + > source leaprc.protein.ff14SB + ``` + +- The alanine tripeptide model was established + + ```bash + > ala = sequence {ALA ALA ALA} + ``` + +- Use TLEAP to load its own TIP3P force field + + ```bash + > source leaprc.water.tip3p + ``` + +- The `slovatebox` in LEAP was used to dissolve the alanine tripeptide chain to complete the construction of the system. `10.0` means that the added water is at least `10.0` away from the molecular and system boundaries we are dissolving + + ```bash + > solvatebox ala TIP3PBOX 10.0 + ``` + +- Save the established system into `parm7` and `rst7` files + + ```bash + > saveamberparm ala ala.parm7 ala_350_cool_290.rst7 + ``` + + +After the topology file (`WATER_ALA.parm7`) and coordinate file (`WATER_ALA_350_cool_290.rst7`) are built by TLEAP, the basic conditions of simulation need to be declared through the property file to control the parameters of the whole simulation process. Take, for example, the properties file `NVT_290_10ns.in` in this tutorial, which has the following contents: + +```text +NVT 290k + mode = 1, # Simulation mode ; mode=1 for NVT ensemble + dt= 0.001, # Time step in picoseconds (ps). The time length of each MD step + step_limit = 1, # Total step limit, number of MD steps run + thermostat=1, # Thermostat for temperature ; thermostat=0 for Langevin thermostat + langevin_gamma=1.0, # Gamma_ln for Langevin thermostat represents coupling strength between thermostat and system + target_temperature=290, # Target temperature + write_information_interval=1000, # Output frequency + amber_irest=1, # Input style ; amber_irest=1 for using amber style input & rst7 file contains veclocity + cut=10.0, # Nonbonded cutoff distance in Angstroms +``` + +- `mode`, molecular dynamics (MD) mode, `1` means`NVT`ensemble is used in simulation. +- `dt` represents the simulated step size. +- `step_limit`represents the total number of simulated steps. +- `thermostat`means the temperature control method, and`1`means the`Liujian-Langevin` method is adopted. +- `langevin_gamma` represents the parameter 'Gamma_ln' in the temperature controller. +- `target_temperature`, indicating the target temperature. +- `amber_irest` means the type of input, `1` means the type of input with amber, and the `rst7` file contains the `veclocity` attribute. +- `cut` denotes the distance of the non-bonding interaction. + +### Load the Data + +After the construction of the input file is completed, store the file in the 'sponge_in' path of the local workspace with the following directory structure: + +```text +└─sponge + ├─sponge_in + │ NVT_290_10ns.in # specific MD simulation setting + │ WATER_ALA.parm7 # topology file include atom & residue & bond & nonbond information + │ WATER_ALA_350_cool_290.rst7 # restart file record atom coordinate & velocity and box information +``` + +From the three input files, the parameters required by the simulation system are read for MindSpore calculation. The loading code is as follows: + +```python +import argparse +from mindspore import context + +parser = argparse.ArgumentParser(description='Sponge Controller') +parser.add_argument('--i', type=str, default=None, help='input file') +parser.add_argument('--amber_parm', type=str, default=None, help='paramter file in AMBER type') +parser.add_argument('--c', type=str, default=None, help='initial coordinates file') +parser.add_argument('--r', type=str, default="restrt", help='') +parser.add_argument('--x', type=str, default="mdcrd", help='') +parser.add_argument('--o', type=str, default="mdout", help="") +parser.add_argument('--box', type=str, default="mdbox", help='') +parser.add_argument('--device_id', type=int, default=0, help='') +args_opt = parser.parse_args() + +context.set_context(mode=context.GRAPH_MODE, device_target="GPU", device_id=args_opt.device_id, save_graphs=False) +``` + +### Building the Simulation Process + +Using the computational force module and computational energy module defined in Sponge, the molecular dynamics process evolves through multiple iterations so that the system reaches the required equilibrium state. The energy and other data obtained in each simulation step are recorded. For convenience, the number of calculated iterations in this tutorial is set to `1`, and the simulation process construction code is as follows: + +```python +from src.simulation_initial import Simulation +from mindspore import Tensor + +if __name__ == "__main__": + simulation = Simulation(args_opt) + save_path = args_opt.o + for steps in range(simulation.md_info.step_limit): + print_step = steps % simulation.ntwx + if steps == simulation.md_info.step_limit - 1: + print_step = 0 + temperature, total_potential_energy, sigma_of_bond_ene, sigma_of_angle_ene, sigma_of_dihedral_ene, \ + nb14_lj_energy_sum, nb14_cf_energy_sum, LJ_energy_sum, ee_ene, _ = simulation(Tensor(steps), Tensor(print_step)) + # compute energy and temperature +``` + +### Run the Script + +Execute the following command to start the training script`main.py`training: + +```text +python main.py --i /path/NVT_290_10ns.in \ + --amber_parm /path/WATER_ALA.parm7 \ + --c /path/WATER_ALA_350_cool_290.rst7 \ + --o /path/ala_NVT_290_10ns.out +``` + +- -`i` Is the properties file of MD simulation, which controls the simulation process +- -`amber_parm` Is the topology file of MD simulation system +- -`c` Is the initial coordinate file that we entered +- -`o` For our analog output record file, which records the output of each step energy and other information +- -`path` Is the path to the file, which in this tutorial is`sponge_in` + +During the training, properties files (files with the suffix `.in`), topology files (files with the suffix `.param7`) and coordinate files (files with the suffix `.rst7`) were used to simulate and calculate forces and energies at specified temperatures to evolve the molecular dynamics process. + +### The Results + +After the training, the output file `ala_NVT_290_10ns.out` can be obtained, in which the energy changes of the system are recorded, and the thermodynamic information of the simulated system can be viewed. Look at `ala_NVT_290_10ns.out` to see the following: + +```text +_steps_ _TEMP_ _TOT_POT_ENE_ _BOND_ENE_ _ANGLE_ENE_ _DIHEDRAL_ENE_ _14LJ_ENE_ _14CF_ENE_ _LJ_ENE_ _CF_PME_ENE_ + 1 293.105 -6117.709 1204.406 7.096 4.491 3.456 44.018 1372.488 -8753.664 + ... +``` + +The various types of energy output in the simulation process are recorded, which are iteration number (_steps_), temperature (_TEMP_), total energy (_TOT_POT_E_), bond length (_BOND_ENE_), bond Angle (_ANGLE_ENE_), dihedral Angle interaction (_DIHEDRAL_ENE_), respectively. Nonbonding interactions, which include electrostatic forces and Leonard-Jones interactions.