diff --git a/.jenkins/test/config/chemistry_config/dependent_packages.yaml b/.jenkins/test/config/chemistry_config/dependent_packages.yaml
index 14e9969f1471573384288cb69f1b0b2e086a1fa4..9e70df642d476be60aeb9313c8938893f69eee28 100644
--- a/.jenkins/test/config/chemistry_config/dependent_packages.yaml
+++ b/.jenkins/test/config/chemistry_config/dependent_packages.yaml
@@ -1,2 +1,2 @@
mindspore:
- '/mindspore/mindspore/version/202411/20241129/r2.4.1_20241129194610_0fd8a04edb85b498cc12d9c41216e3e78cbc8564_newest/'
\ No newline at end of file
+ '/mindspore/mindspore/version/202503/20250326/master_20250326010019_b91eca2945e61641319f9887aa76a1ccb38604d3_newest/'
\ No newline at end of file
diff --git a/MindChemistry/README.md b/MindChemistry/README.md
index 93cdf9d70dabc3db6cb781c6d4c0acf54578f869..6c02d93fd390bbe6f9fc5d396a758babb7eb31bc 100644
--- a/MindChemistry/README.md
+++ b/MindChemistry/README.md
@@ -1,17 +1,24 @@
- ENGLISH | [简体中文](README_CN.md)
+# MindSpore Chemistry
+
+[查看中文](README_CN.md)
[](https://badge.fury.io/py/mindspore)
[](https://github.com/mindspore-ai/mindspore/blob/master/LICENSE)
[](https://gitee.com/mindspore/mindscience/pulls)
-# MindSpore Chemistry
+---
+
+## Contents
- [MindSpore Chemistry](#mindspore-chemistry)
+ - [Contents](#contents)
- [Introduction](#introduction)
- [Latest News](#latest-news)
- - [Features](#features)
- - [Applications](#applications)
- - [Modules](#modules)
+ - [Applications](#applications)
+ - [Force Prediction](#force-prediction)
+ - [DFT Prediction](#dft-prediction)
+ - [Property Prediction](#property-prediction)
+ - [Structure Generation](#structure-generation)
- [Installation](#installation)
- [Version Dependency](#version-dependency)
- [Dependency](#dependency)
@@ -23,6 +30,8 @@
- [License](#license)
- [References](#references)
+---
+
## Introduction
Conventional chemistry studies have long been confronted with numerous challenges. The process of experimental design, synthesis, characterization, and analysis can be time-consuming, costly, and highly dependent on experts’ experiences.
@@ -30,56 +39,44 @@ The synergy between AI and chemistry offers unprecedented opportunities to overc
**MindSpore Chemistry**(MindChemistry) is a toolkit built on MindSpore endeavoring to integrate AI with conventional chemistry research. It supports multi-scale tasks including molecular generation, property prediction and synthesis optimization on multiple chemistry systems such as organic, inorganic and composites chemistry systems. MindChemistry dedicates to enabling the joint research of AI and chemistry with high efficiency, and seek to facilitate an innovative paradigm of joint research between AI and chemistry, providing experts with novel perspectives and efficient tools.
-

+
## Latest News
-- 🔥`2024.07.30` MindChemistry 0.1.0 is released.
-
-## Features
+- `2025.03.30` MindChemistry 0.2.0 has been released, featuring several powerful applications, including NequIP, Allegro, DeephE3nn, Matformer, and DiffCSP.
+- `2024.07.30` MindChemistry 0.1.0 has been released.
-### Applications
+## Applications
-- Material Generation
- - **Scenario**:Inorganic chemistry
- - **Dataset**:High-entropy alloy dataset. The high-entropy alloy dataset includes the chemical composition of known high-entropy alloys and thermodynamic properties of the alloys. It provides chemical composition information such as the metal element types and corresponding percentages as well as thermodynamic properties such as magnetostrictive effects and Curie temperatures.
- - **Task**:High-entropy alloy composition design. We integrate Machine learning-enabled high-entropy alloy discovery[1] approach for designing novel high-entropy alloys with low thermal expansion coefficients(TEC) in active learning fashion. In the active learning circle, candidates of high-enropy alloys are firstly generated based on the AI model and the candidate components are filtered based on the prediction model and the predicted thermal expansion coefficient calculated by the thermodynamics. Finally, the researchers need to determine the final high-entropy alloy components based on experimental verification.
+### Force Prediction
-
-
-- **Property Prediction**:
- - **Scenario**:Organic chemistry
- - **Dataset**: Revised Molecular Dynamics 17(rMD17). rMD17 dataset includes molecular dynamics simulations of multiple organic chemical moleculars. It provides chemical desciptive information such as the atomic numbers and positions as well as molecular property information such as energies and forces.
- - **Task**:Molecular energy prediction. We integrate the NequIP model [2] and Allegro model [3], according to the position of each atom in the molecular system and structure description of the atomic number information construction diagram, and calculate the energy of the molecular system based on the equivariant calculation and graph neural network.
+- **Scenario**:Organic chemistry
+- **Dataset**: Revised Molecular Dynamics 17(rMD17). rMD17 dataset includes molecular dynamics simulations of multiple organic chemical moleculars. It provides chemical desciptive information such as the atomic numbers and positions as well as molecular property information such as energies and forces.
+- **Task**:Molecular energy prediction. We integrate the NequIP model [1] and Allegro model [2], according to the position of each atom in the molecular system and structure description of the atomic number information construction diagram, and calculate the energy of the molecular system based on the equivariant calculation and graph neural network.

-- **Electronic Structure Prediction**:
- - **Scenario**: Materials
- - **Dataset**: Bilayer graphene dataset. The dataset contains descriptive information such as atomic positions and atomic numbers, as well as property information such as Hamiltonian.
- - **Task**: Density Functional Theory Hamiltonian Prediction. We integrate the DeephE3nn model [4], an equivariant neural network based on E3, to predict a Hamiltonian by using the structure of atoms.
+### DFT Prediction
-- **Prediction of crystalline material properties**:
- - **Scenario**: Materials
- - **Dataset**: JARVIS-DFT 3D dataset. The dataset contains descriptive information such as atomic position and atomic number of crystal materials, as well as property information such as energy and force field.
- - **Task**: Prediction of crystalline material properties. We integrate the Matformer model [5] based on graph neural networks and Transformer architectures, for predicting various properties of crystalline materials.
+- **Scenario**: Materials Chemistry
+- **Dataset**: Bilayer graphene dataset. The dataset contains descriptive information such as atomic positions and atomic numbers, as well as property information such as Hamiltonian.
+- **Task**: Density Functional Theory Hamiltonian Prediction. We integrate the DeephE3nn model [3], an equivariant neural network based on E3, to predict a Hamiltonian by using the structure of atoms.
-- **Crystal Material Structure Prediction**:
- - **Scenario**: Materials Chemistry
- - **Dataset**:
- - Perov-5: A perovskite dataset in which each unit cell contains five fixed atoms, and the structures are relatively similar.
- - Carbon-24: A carbon crystal dataset, where each crystal contains between 6 and 24 carbon atoms, with various different material structures.
- - MP-20: A dataset collected from the MP database, featuring experimental structures with up to 20 atoms per unit cell. The materials and structures are highly diverse.
- - MPTS-52: An advanced version of MP-20, expanding the number of atoms per unit cell to 52. The materials and structures are highly diverse.
- - **Task**: Crystal material structure prediction. We integrated the DiffCSP model[6], which is based on a graph neural network and diffusion model architecture, to predict the crystal material structures given their composition.
+### Property Prediction
-### Modules
+- **Scenario**: Materials Chemistry
+- **Dataset**: JARVIS-DFT 3D dataset. The dataset contains descriptive information such as atomic position and atomic number of crystal materials, as well as property information such as energy and force field.
+- **Task**: Prediction of crystalline material properties. We integrate the Matformer model [4] based on graph neural networks and Transformer architectures, for predicting various properties of crystalline materials.
-- **Equivariant Computing**
- - **Introduction**:Symmetry is an essential property in science domain. Equivarient neural network adopts intuitive representation as input and computing equivariently with respect to spatial rotation,shift and inversion. Adopting equivariant neural network for modeling scientific scenarios results in higher representation effectiveness for data and high efficiency for model training.
- - **Functions**:E(3) computing modules integrates basic modules such as Irreps, Spherical Harmonics and Tensor Products. Based on the basic modules, equivariant neural network layers such as equivariant Activation, Linear and Convolution layers are provided for constructing user customed equivariant neural networks.
+### Structure Generation
-
+- **Scenario**: Materials Chemistry
+- **Dataset**:
+ - Perov-5: A perovskite dataset in which each unit cell contains five fixed atoms, and the structures are relatively similar.
+ - Carbon-24: A carbon crystal dataset, where each crystal contains between 6 and 24 carbon atoms, with various different material structures.
+ - MP-20: A dataset collected from the MP database, featuring experimental structures with up to 20 atoms per unit cell. The materials and structures are highly diverse.
+ - MPTS-52: An advanced version of MP-20, expanding the number of atoms per unit cell to 52. The materials and structures are highly diverse.
+- **Task**: Crystal material structure prediction. We integrated the DiffCSP model [5], which is based on a graph neural network and diffusion model architecture, to predict the crystal material structures given their composition.
## Installation
@@ -89,8 +86,9 @@ Because MindChemistry is dependent on MindSpore, please click [MindSpore Downloa
| MindChemistry | Branch | MindSpore | Python |
|:-------- | :------ | :-------- | :------|
-| master | master | >=2.3 | >=3.8 |
-| 0.1.0 | r0.6 | >=2.2.12 | >=3.8 |
+| [master](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry) | master | >=2.3 | >=3.8 |
+| [0.2.0]() | r0.7 | >=2.5.0 | >=3.11 |
+| [0.1.0](https://gitee.com/mindspore/mindscience/tree/r0.6/MindChemistry) | r0.6 | >=2.2.12 | >=3.8 |
### Dependency
@@ -134,9 +132,9 @@ pip install -r requirements.txt
### Core Contributor
-Thanks goes to these wonderful people 🧑🤝🧑:
+Thanks goes to these wonderful people:
-yufan, wangzidong, liuhongsheng, gongyue, gengchenhua, linghejing, yanchaojie, suyun, wujian, caowenbin, Lin Peijia
+wujian, wangyuheng, Lin Peijia, gengchenhua, caowenbin,Siyu Yang
## Contribution Guide
@@ -148,14 +146,12 @@ yufan, wangzidong, liuhongsheng, gongyue, gengchenhua, linghejing, yanchaojie, s
## References
-[1] Rao Z, Tung P Y, Xie R, et al. Machine learning-enabled high-entropy alloy discovery[J]. Science, 2022, 378(6615): 78-85.
-
-[2] Batzner S, Musaelian A, Sun L, et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials[J]. Nature communications, 2022, 13(1): 2453.
+[1] Batzner S, Musaelian A, Sun L, et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials[J]. Nature communications, 2022, 13(1): 2453.
-[3] Musaelian A, Batzner S, Johansson A, et al. Learning local equivariant representations for large-scale atomistic dynamics[J]. Nature communications, 2023, 14(1): 579.
+[2] Musaelian A, Batzner S, Johansson A, et al. Learning local equivariant representations for large-scale atomistic dynamics[J]. Nature communications, 2023, 14(1): 579.
-[4] Xiaoxun Gong, He Li, Nianlong Zou, et al. General framework for E(3)-equivariant neural network representation of density functional theory Hamiltonian[J]. Nature communications, 2023, 14: 2848.
+[3] Xiaoxun Gong, He Li, Nianlong Zou, et al. General framework for E(3)-equivariant neural network representation of density functional theory Hamiltonian[J]. Nature communications, 2023, 14: 2848.
-[5] Keqiang Yan, Yi Liu, Yuchao Lin, Shuiwang ji, et al. Periodic Graph Transformers for Crystal Material Property Prediction[J]. arXiv:2209.11807v1 [cs.LG] 23 sep 2022.
+[4] Keqiang Yan, Yi Liu, Yuchao Lin, Shuiwang ji, et al. Periodic Graph Transformers for Crystal Material Property Prediction[J]. arXiv:2209.11807v1 [cs.LG] 23 sep 2022.
-[6] Jiao Rui and Huang Wenbing and Lin Peijia, et al. Crystal structure prediction by joint equivariant diffusion[J]. Advances in Neural Information Processing Systems, 2024, 36.
\ No newline at end of file
+[5] Jiao Rui and Huang Wenbing and Lin Peijia, et al. Crystal structure prediction by joint equivariant diffusion[J]. Advances in Neural Information Processing Systems, 2024, 36.
\ No newline at end of file
diff --git a/MindChemistry/README_CN.md b/MindChemistry/README_CN.md
index 7ae514662d10f53737b4185df7cc412a3160b213..d1d1546caca911f9c492b2c2e4f6a187b8597dfa 100644
--- a/MindChemistry/README_CN.md
+++ b/MindChemistry/README_CN.md
@@ -1,17 +1,24 @@
-[ENGLISH](README.md) | 简体中文
+# MindSpore Chemistry
+
+[View English](README.md)
[](https://badge.fury.io/py/mindspore)
[](https://github.com/mindspore-ai/mindspore/blob/master/LICENSE)
[](https://gitee.com/mindspore/mindscience/pulls)
-# MindSpore Chemistry
+---
+
+## 目录
- [MindSpore Chemistry](#mindspore-chemistry)
+ - [目录](#目录)
- [介绍](#介绍)
- [最新消息](#最新消息)
- - [特性](#特性)
- - [应用案例](#应用案例)
- - [功能模块](#功能模块)
+ - [应用案例](#应用案例)
+ - [力场模拟](#力场模拟)
+ - [DFT模拟](#dft模拟)
+ - [性质预测](#性质预测)
+ - [结构生成](#结构生成)
- [安装教程](#安装教程)
- [版本依赖关系](#版本依赖关系)
- [依赖安装](#依赖安装)
@@ -23,58 +30,56 @@
- [许可证](#许可证)
- [引用](#引用)
+---
+
## 介绍
传统化学研究长期以来面临着众多挑战,实验设计、合成、表征和分析的过程往往耗时、昂贵,并且高度依赖专家经验。AI与化学的协同可以克服传统方法的局限性、开拓全新的研究范式,结合AI模型与化学知识,可以高效处理大量数据、挖掘隐藏的关联信息,构建仿真模型,从而加快化学反应的设计和优化,实现材料的性质预测,并辅助设计新材料。
**MindSpore Chemistry**(MindChemistry)是基于MindSpore构建的化学领域套件,支持多体系(有机/无机/复合材料化学)、多尺度任务(微观分子生成/预测、宏观反应优化)的AI+化学仿真,致力于高效使能AI与化学的融合研究,践行和牵引AI与化学联合多研究范式跃迁,为化学领域专家的研究提供全新视角与高效的工具。
-
+
-## 最新消息
+---
-- `2024.07.30` 2024年7月30日 MindChemistry 0.1.0版本发布。
+## 最新消息
-## 特性
+- `2025.03.30` MindChemistry 0.2.0版本发布,包括多个应用案例,支持NequIP、Allegro、DeephE3nn、Matformer以及DiffCSP模型。
+- `2024.07.30` MindChemistry 0.1.0版本发布。
-### 应用案例
+---
-- **分子生成**:
- - **体系**:无机化学
- - **数据**:高熵合金数据集。高熵合金数据集中包含了已知高熵合金的组分以及热动力学性质等信息,提供金属组分类型及组分比例,以及居里温度、磁致伸缩等热动力学性质信息。
- - **任务**:高熵合金组分设计。我们集成了基于主动学习进行高熵合金设计的方法[1],设计热膨胀系数极低的高熵合金组分。在主动学习流程中,首先基于AI模型生成候选的高熵合金组分,并基于预测模型和热动力学计算预测热膨胀系数对候选组分进行筛选,最终需要研究者基于实验验证确定最终的高熵合金组分。
+## 应用案例
-
+### 力场模拟
-- **分子预测**:
- - **体系**:有机化学
- - **数据**:Revised Molecular Dynamics 17(rMD17)数据集。rMD17数据集包含了多种有机化合物的分子动力学性质,提供化合物的原子位置、原子数等描述信息以及能量、力场等性质信息。
- - **任务**:分子能量预测。我们集成了NequIP模型[2]、Allegro模型[3],根据分子体系中各原子的位置与原子数信息构建图结构描述,基于等变计算与图神经网络,计算出分子体系能量。
+- **体系**:有机化学
+- **数据**:Revised Molecular Dynamics 17 (rMD17) 数据集。该数据集包含了多种有机化合物的分子动力学性质,提供化合物的原子位置、原子数等描述信息,以及能量、力场等性质信息。
+- **任务**:分子能量预测。集成了 **NequIP** 模型[1] 和 **Allegro** 模型[2],根据分子体系中各原子的位置与原子数信息构建图结构描述,基于等变计算与图神经网络,计算出分子体系能量。
-
+
-- **电子结构预测**:
- - **体系**:材料化学
- - **数据**:双层石墨烯数据集。该数据集包含了原子位置、原子数等描述信息以及哈密顿量等性质信息。
- - **任务**:密度泛函理论哈密顿量预测。我们集成了DeephE3nn模型[4],基于E3的等变神经网络,利用原子的结构去预测其的哈密顿量。
+### DFT模拟
-- **晶体材料性质预测**:
- - **体系**:材料化学
- - **数据**:JARVIS-DFT 3D数据集。该数据集包含了晶体材料的原子位置、原子数等描述信息以及能量、力场等性质信息。
- - **任务**:晶体材料性质预测。我们集成了Matformer模型[5],基于图神经网络和Transformer架构的模型,用于预测晶体材料的各种性质。
+- **体系**:材料化学
+- **数据**:双层石墨烯数据集。该数据集包含了原子位置、原子数等描述信息,以及哈密顿量等性质信息。
+- **任务**:密度泛函理论哈密顿量预测。集成了 **DeephE3nn** 模型[3],基于E3的等变神经网络,利用原子的结构预测哈密顿量。
-- **晶体材料结构预测**:
- - **体系**:材料化学
- - **数据**:Perov-5是钙钛矿数据集,其中每个晶胞中都固定有五个原子,且结构上比较接近。Carbon-24是碳晶体数据集,每个晶体中含有6到24个碳原子,不同材料结构各异。MP-20是从MP数据集中收集到胞内不超过20个原子的实验结构,MPTS-52是它的进阶版本,将胞内原子数扩展到了52个,这两个数据集中的材料组分和结构都比较多样。
- - **任务**:晶体材料结构预测。我们集成了DiffCSP模型[6],基于图神经网络和扩散模型架构的模型,用于给定组分,预测晶体材料的结构。
+### 性质预测
-### 功能模块
+- **体系**:材料化学
+- **数据**:JARVIS-DFT 3D数据集,包含晶体材料的原子位置、原子数等描述信息以及能量、力场等性质信息。
+- **任务**:晶体材料性质预测。集成了 **Matformer** 模型[4],基于图神经网络和Transformer架构,预测晶体材料的各种性质。
-- **等变计算库**
- - **简介**:对称性是科学领域的重要性质。等变神经网络以具有物理意义表征刻画化合物体系输入,并使得输入与输出在空间平移、旋转和反演等变换中具有等变性。使用等变神经网络来对科学场景建模可以提高数据的表征效率和模型的训练效率。
- - **核心模块**:等变计算库中集成了不可约表示、球谐函数以及张量积等基础模块,实现底层逻辑与运算过程,并基于基础模块构建了等变激活层、等变线性层和等变卷积层等神经网络层,可以更方便地调用从而构建等变神经网络。
+### 结构生成
-
+- **体系**:材料化学
+- **数据**:
+ - **Perov-5**:钙钛矿数据集,每个晶胞中固定5个原子,结构接近。
+ - **Carbon-24**:碳晶体数据集,包含6到24个碳原子的不同结构。
+ - **MP-20**:MP数据集中的实验数据,胞内不超过20个原子。
+ - **MPTS-52**:MP-20的进阶版,胞内最多52个原子。
+- **任务**:晶体材料结构预测。集成了 **DiffCSP** 模型[5],基于图神经网络和扩散模型,预测晶体材料的结构。
## 安装教程
@@ -82,13 +87,16 @@
由于MindChemistry与MindSpore有依赖关系,请根据下表中所指示的对应关系,在[MindSpore下载页面](https://www.mindspore.cn/versions)下载并安装对应的whl包。
-| MindChemistry | 分支 | MindSpore | Python |
-| :------------ | :----- |:----------| :----- |
-| master | master | >=2.3 | >=3.8 |
-| 0.1.0 | r0.6 | >=2.2.12 | >=3.8 |
+| **MindChemistry** | **分支** | **MindSpore** | **Python** |
+|:------------------|:--------|:--------------|:----------|
+| [master](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry) | master | >=2.3 | >=3.8 |
+| [0.2.0]() | r0.7 | >=2.5.0 | >=3.11 |
+| [0.1.0](https://gitee.com/mindspore/mindscience/tree/r0.6/MindChemistry) | r0.6 | >=2.2.12 | >=3.8 |
### 依赖安装
+使用以下命令安装所需的依赖包:
+
```bash
pip install -r requirements.txt
```
@@ -131,7 +139,7 @@ pip install -r requirements.txt
感谢以下开发者做出的贡献:
-yufan, wangzidong, liuhongsheng, gongyue, gengchenhua, linghejing, yanchaojie, suyun, wujian, caowenbin, Lin Peijia
+wujian, wangyuheng, Lin Peijia, gengchenhua, caowenbin,Siyu Yang
## 贡献指南
@@ -143,14 +151,12 @@ yufan, wangzidong, liuhongsheng, gongyue, gengchenhua, linghejing, yanchaojie, s
## 引用
-[1] Rao Z, Tung P Y, Xie R, et al. Machine learning-enabled high-entropy alloy discovery[J]. Science, 2022, 378(6615): 78-85.
-
-[2] Batzner S, Musaelian A, Sun L, et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials[J]. Nature communications, 2022, 13(1): 2453.
+[1] Batzner S, Musaelian A, Sun L, et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials[J]. Nature communications, 2022, 13(1): 2453.
-[3] Musaelian A, Batzner S, Johansson A, et al. Learning local equivariant representations for large-scale atomistic dynamics[J]. Nature communications, 2023, 14(1): 579.
+[2] Musaelian A, Batzner S, Johansson A, et al. Learning local equivariant representations for large-scale atomistic dynamics[J]. Nature communications, 2023, 14(1): 579.
-[4] Xiaoxun Gong, He Li, Nianlong Zou, et al. General framework for E(3)-equivariant neural network representation of density functional theory Hamiltonian[J]. Nature communications, 2023, 14: 2848.
+[3] Xiaoxun Gong, He Li, Nianlong Zou, et al. General framework for E(3)-equivariant neural network representation of density functional theory Hamiltonian[J]. Nature communications, 2023, 14: 2848.
-[5] Keqiang Yan, Yi Liu, Yuchao Lin, Shuiwang ji, et al. Periodic Graph Transformers for Crystal Material Property Prediction[J]. arXiv:2209.11807v1 [cs.LG] 23 sep 2022.
+[4] Keqiang Yan, Yi Liu, Yuchao Lin, Shuiwang ji, et al. Periodic Graph Transformers for Crystal Material Property Prediction[J]. arXiv:2209.11807v1 [cs.LG] 23 sep 2022.
-[6] Jiao Rui and Huang Wenbing and Lin Peijia, et al. Crystal structure prediction by joint equivariant diffusion[J]. Advances in Neural Information Processing Systems, 2024, 36.
+[5] Jiao Rui and Huang Wenbing and Lin Peijia, et al. Crystal structure prediction by joint equivariant diffusion[J]. Advances in Neural Information Processing Systems, 2024, 36.
diff --git a/MindChemistry/RELEASE.md b/MindChemistry/RELEASE.md
index 1369ca8c6c3964aa292e344fa1cbae64991c865a..1af49595ba5f4138fc2495fd1915b5cebe4b08ea 100644
--- a/MindChemistry/RELEASE.md
+++ b/MindChemistry/RELEASE.md
@@ -4,6 +4,35 @@
MindSpore Chemistry is a toolkit built on MindSpore endeavoring to enable the joint research of AI and chemistry with high efficiency and to seek th facilitate an innovative paradigm of joint research between AI and chemistry.
+## MindSpore Chemistry 0.2.0 Release Notes
+
+### Major Features and Enhancement
+
+#### Force prediction
+
+* [STABLE] [NequIP](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry/applications/nequip): Leveraging the equivariant computing library, the model is trained efficiently and achieves highly accurate inference of molecular energy based on atomic information.
+* [STABLE] [Allegro](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry/applications/allegro): Leveraging the equivariant computing library, the model is trained efficiently and achieves highly accurate inference of molecular energy based on atomic information.
+
+#### DFT Prediction
+
+* [STABLE] [DeephE3nn](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry/applications/deephe3nn): An equivariant neural network based on the E(3) group, designed to predict Hamiltonians using atomic structures.
+
+#### Property Prediction
+
+* [STABLE] [Matformer](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry/applications/matformer): Leveraging graph neural networks and Transformer architectures to predict diverse properties of crystalline materials.
+
+#### Structure Generation
+
+* [STABLE] [DiffCSP](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry/applications/diffcsp): New feature. This is a crystal structure prediction method based on diffusion models, specifically designed to learn structural distributions from stable crystal data. It predicts crystal structures by jointly generating lattice and atomic coordinates, and leverages a periodic E(3)-equivariant denoising model to better simulate the geometric properties of crystals. It is significantly more cost-effective in terms of computational resources compared to traditional methods based on Density Functional Theory (DFT) and performs remarkably well in crystal structure prediction tasks.
+
+### Contributors
+
+Thanks goes to these wonderful people:
+
+wujian, wangyuheng, Lin Peijia, gengchenhua, caowenbin,Siyu Yang
+
+------------------------------------------------
+
## MindSpore Chemistry 0.1.0 Release Notes
### Major Features
@@ -21,4 +50,4 @@ Thanks goes to these wonderful people:
yufan, wangzidong, liuhongsheng, gongyue, gengchenhua, linghejing, yanchaojie, suyun, wujian, caowenbin
-Contributions of any kind are welcome!
+Contributions of any kind are welcome!
\ No newline at end of file
diff --git a/MindChemistry/RELEASE_CN.md b/MindChemistry/RELEASE_CN.md
index 7a18b99134a2910eabcaf693d4c93a00528a9e5e..e85f4c2c8b887212bec5853142cff98b7a56c565 100644
--- a/MindChemistry/RELEASE_CN.md
+++ b/MindChemistry/RELEASE_CN.md
@@ -4,6 +4,35 @@
MindSpore Chemistry是一个基于MindSpore构建的化学套件,致力于高效使能AI与化学的联合创新,践行AI与化学结合的全新科学研究范式。
+## MindSpore Chemistry 0.2.0 Release Notes
+
+### 主要特性和增强
+
+#### 力场模拟
+
+* [STABLE] [NequIP](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry/applications/nequip): 基于等变图神经网络构建的SOTA模型,用于预测分子势能与力。
+* [STABLE] [Allegro](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry/applications/allegro): 基于等变图神经网络构建的SOTA模型,可以在大规模材料体系中进行高精度预测,用于预测分子势能与力。
+
+#### DFT模拟
+
+* [STABLE] [DeephE3nn](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry/applications/deephe3nn): 基于E3的等变神经网络,利用晶体中的原子结构去预测体系的电子哈密顿量。
+
+#### 性质预测
+
+* [STABLE] [Matformer](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry/applications/matformer): 基于图神经网络和Transformer架构的深度学习模型,用于预测晶体材料的各种性质。
+
+#### 结构生成
+
+* [STABLE] [DiffCSP](https://gitee.com/mindspore/mindscience/tree/master/MindChemistry/applications/diffcsp): 新增。是一种基于扩散模型的晶体结构预测方法,专门用于从稳定晶体数据中学习结构分布。它通过联合生成晶格和原子坐标来预测晶体结构,并利用周期性 E(3) 等变去噪模型来更好地模拟晶体的几何特性。它在计算成本上远低于传统的基于密度泛函理论的方法,并且在晶体结构预测任务中表现出色。
+
+### 贡献者
+
+感谢以下开发者做出的贡献:
+
+wujian, wangyuheng, Lin Peijia, gengchenhua, caowenbin,Siyu Yang
+
+------------------------------------------------
+
## MindSpore Chemistry 0.1.0 Release Notes
### 主要特性
diff --git a/MindChemistry/applications/cdvae/README.md b/MindChemistry/applications/cdvae/README.md
deleted file mode 100644
index 1672422847cc861893eaa0b280cdf1113a939b6b..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/README.md
+++ /dev/null
@@ -1,131 +0,0 @@
-# 模型名称
-
-> CDVAE
-
-## 介绍
-
-> Crystal Diffusion Variational AutoEncoder (CDVAE)是用来生成材料的周期性结构的SOTA模型,相关论文已发表在ICLR上。模型主要有两个部分组成,首先是encoder部分,将输入得信息转化成隐变量z,部分简单得特性,如原子数量和晶格常数等,直接使用MLP进行decode得到输出,其他部分如原子种类和原子在晶格中得位置等,则通过扩散模型得到。具体模型结构如下图所示:
-
-
-

-
-
-## 数据集
-
-> 提供了三个数据集:
-
-1. Perov_5 (Castelli et al., 2012): 包含接近19000个钙钛矿晶体结构,结构相似,但是组成不同,下载地址:[Perov_5](https://figshare.com/articles/dataset/Perov5/22705189)。
-2. Carbon_24 (Pickard, 2020): 包含10000个仅包含碳原子的晶体结构,因此其具有相同的组成,但是结构不同,下载地址:[Carbon_24](https://figshare.com/articles/dataset/Carbon24/22705192)。
-3. MP_20(Jain et al., 2013): 包含有45000个无机材料结构,包含绝大多数小于单胞小于20个原子的实验已知材料,下载地址:[mp_20](https://figshare.com/articles/dataset/mp_20/25563693)。
-
-前两个数据集下载后直接放在./data目录下即可。MP_20数据集下载后运行`python ./cdvae/dataloader/mp_20_process.py --init_path ./data/mp_20.json --data_path ./data/mp_20`, 其中 init_path是下载得到的json格式数据集的位置,而data_path是dataset存放的位置。
-
-## 环境要求
-
-> 1. 安装`pip install -r requirements.txt`
-
-## 脚本说明
-
-### 代码目录结构
-
-```txt
-└─cdvae
- │ README.md README文件
- │ train.py 训练启动脚本
- │ evaluation.py 推理启动脚本
- │ compute_metrics.py 评估结果脚本
- │ create_dataset.py 生成数据集
- │
- └─src
- │ evaluate_utils.py 推理结果生成
- │ metrics_utils.py 评估结果计算
- │ dataloader.py 将数据集加载到网络
- | mp_20_process.py 对mp_20数据集预处理
- │
- └─conf 参数配置
- │ config.yaml 网络参数
- └─data 数据集参数
-```
-
-## 训练
-
-## 快速开始
-
-> 训练命令: `python train.py --dataset 'perov_5'`
-
-### 命令行参数
-
-```txt
-dataset: 使用得数据集,perov_5, carbon_24, mp_20
-create_dataset: 是否重新对数据集进行处理
-num_sample_train: 如重新处理数据集,训练集得大小,-1为使用全部原始数据
-num_samples_val:如重新处理数据集,验证集得大小,-1为使用全部原始数据
-num_samples_test:如重新处理数据集,测试集得大小,-1为使用全部原始数据
-name_ckpt:保存权重的路径和名称
-load_ckpt:是否读取权重
-device_target:MindSpore使用的后端
-device_id:如MindSpore使用昇腾后端,使用的NPU卡号
-epoch_num:训练的epoch数
-```
-
-## 推理评估过程
-
-### 推理过程
-
-```txt
-1.将权重checkpoint文件保存至 `/loss/`目录下(默认读取目录)
-2.执行推理脚本:reconstruction任务:
- python evaluation.py --dataset perov_5 --tasks 'recon' (指定dataset为perov_5)
- generation任务:
- python evaluation.py --dataset perov_5 --tasks 'gen'
- optimization任务(如需使用optimization,在训练时请在configs.yaml中将predict_property设置为True):
- python evaluation.py --dataset perov_5 --tasks 'opt'
-```
-
-### 命令行参数
-
-```txt
-device_target:MindSpore使用的后端
-device_id:如MindSpore使用昇腾后端,使用的NPU卡号
-model_path: 权重保存路径
-dataset: 使用得数据集,perov_5, carbon_24, mp_20
-tasks:推理执行的任务,可选:recon,gen,opt
-n_step_each:执行的denoising的步数
-step_lr:opt任务中设置的lr
-min_sigma:生成随机噪声的最小值
-save_traj:是否保存traj
-disable_bar:是否展示进度条
-num_evals:gen任务中产生的结果数量
-start_from:随机或从头开始读取数据集,可选:randon, data
-batch_size: batch_size大小
-force_num_atoms:是否限制原子数不变
-force_atom_types:是否限制原子种类不变
-label:推理结果保存时的名称
-```
-
-推理结果
-
-```txt
-可以在`/eval_result/`路径下找到推理的输出文件。
-reconstruction的输出文件为eval_recon.npy和gt_recon.npy,分别包含了reconstruction后的晶体结构信息以及作为ground truth的晶体结构信息;
-generation的输出文件为eval_gen.npy,包含了随机生成结果的晶体结构信息;
-optimization的输出文件为eval_opt.npy,包含了基于特定性质优化的晶体结构信息。
-```
-
-### 结果评估
-
-```txt
-运行 python comput_metrics.py --eval_path './eval_result' --dataset 'perov_5' --task recon, 结果会保存在./eval_path文件夹下的eval_metrics.json文件中(目前支持recon和generation两种模式)
-```
-
-## 引用
-
-[1] Xie T, Fu X, Ganea O E, et al. Crystal diffusion variational autoencoder for periodic material generation[J]. arXiv preprint arXiv:2110.06197, 2021.
-
-[2] Castelli I E, Landis D D, Thygesen K S, et al. New cubic perovskites for one-and two-photon water splitting using the computational materials repository[J]. Energy & Environmental Science, 2012, 5(10): 9034-9043.
-
-[3] Castelli I E, Olsen T, Datta S, et al. Computational screening of perovskite metal oxides for optimal solar light capture[J]. Energy & Environmental Science, 2012, 5(2): 5814-5819.
-
-[4] Pickard C J. AIRSS data for carbon at 10GPa and the C+ N+ H+ O system at 1GPa[J]. (No Title), 2020.
-
-[5] Jain A, Ong S P, Hautier G, et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation[J]. APL materials, 2013, 1(1).
\ No newline at end of file
diff --git a/MindChemistry/applications/cdvae/compute_metrics.py b/MindChemistry/applications/cdvae/compute_metrics.py
deleted file mode 100644
index e51be922e3bbfd45212fd6f6f1ef5852f738c072..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/compute_metrics.py
+++ /dev/null
@@ -1,321 +0,0 @@
-# Copyright 2025 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""Compute metrics
-"""
-from collections import Counter
-import logging
-import argparse
-import os
-import json
-
-import numpy as np
-from tqdm import tqdm
-from p_tqdm import p_map
-from scipy.stats import wasserstein_distance
-from pymatgen.core.structure import Structure
-from pymatgen.core.composition import Composition
-from pymatgen.core.lattice import Lattice
-from pymatgen.analysis.structure_matcher import StructureMatcher
-from matminer.featurizers.site.fingerprint import CrystalNNFingerprint
-from matminer.featurizers.composition.composite import ElementProperty
-from mindchemistry.cell.gemnet.data_utils import StandardScaler
-from src.metrics_utils import (
- smact_validity, structure_validity, get_fp_pdist,
- get_crystals_list, compute_cov)
-
-CRYSTALNNFP = CrystalNNFingerprint.from_preset("ops")
-COMPFP = ElementProperty.from_preset("magpie")
-
-COV_CUTOFFS = {
- "mp_20": {"struct": 0.4, "comp": 10.},
- "carbon_24": {"struct": 0.2, "comp": 4.},
- "perov_5": {"struct": 0.2, "comp": 4},
-}
-# threshold for coverage metrics, olny struct distance and comp distance
-# smaller than the threshold will be counted as covered.
-
-
-class Crystal():
- """get crystal structures"""
-
- def __init__(self, crys_array_dict):
- self.frac_coords = crys_array_dict["frac_coords"]
- self.atom_types = crys_array_dict["atom_types"]
- self.lengths = crys_array_dict["lengths"]
- self.angles = crys_array_dict["angles"]
- self.dict = crys_array_dict
-
- self.get_structure()
- self.get_composition()
- self.get_validity()
- self.get_fingerprints()
-
- def get_structure(self):
- """get structure"""
- if min(self.lengths.tolist()) < 0:
- self.constructed = False
- self.invalid_reason = "non_positive_lattice"
- else:
- try:
- self.structure = Structure(
- lattice=Lattice.from_parameters(
- *(self.lengths.tolist() + self.angles.tolist())),
- species=self.atom_types, coords=self.frac_coords, coords_are_cartesian=False)
- self.constructed = True
- except (ValueError, AttributeError, TypeError):
- self.constructed = False
- self.invalid_reason = "construction_raises_exception"
- if self.structure.volume < 0.1:
- self.constructed = False
- self.invalid_reason = "unrealistically_small_lattice"
-
- def get_composition(self):
- elem_counter = Counter(self.atom_types)
- composition = [(elem, elem_counter[elem])
- for elem in sorted(elem_counter.keys())]
- elems, counts = list(zip(*composition))
- counts = np.array(counts)
- counts = counts / np.gcd.reduce(counts)
- self.elems = elems
- self.comps = tuple(counts.astype("int").tolist())
-
- def get_validity(self):
- self.comp_valid = smact_validity(self.elems, self.comps)
- if self.constructed:
- self.struct_valid = structure_validity(self.structure)
- else:
- self.struct_valid = False
- self.valid = self.comp_valid and self.struct_valid
-
- def get_fingerprints(self):
- """get fingerprints"""
- elem_counter = Counter(self.atom_types)
- comp = Composition(elem_counter)
- self.comp_fp = COMPFP.featurize(comp)
- try:
- site_fps = [CRYSTALNNFP.featurize(
- self.structure, i) for i in range(len(self.structure))]
- except (ValueError, AttributeError, TypeError):
- # counts crystal as invalid if fingerprint cannot be constructed.
- self.valid = False
- self.comp_fp = None
- self.struct_fp = None
- return
- self.struct_fp = np.array(site_fps).mean(axis=0)
-
-
-class RecEval():
- """reconstruction evaluation result"""
-
- def __init__(self, pred_crys, gt_crys, stol=0.5, angle_tol=10, ltol=0.3):
- assert len(pred_crys) == len(gt_crys)
- self.matcher = StructureMatcher(
- stol=stol, angle_tol=angle_tol, ltol=ltol)
- self.preds = pred_crys
- self.gts = gt_crys
-
- def get_match_rate_and_rms(self):
- """get match rate and rms, match rate shows how much rate of the prediction has
- the same structure as the ground truth."""
- def process_one(pred, gt, is_valid):
- if not is_valid:
- return None
- try:
- rms_dist = self.matcher.get_rms_dist(
- pred.structure, gt.structure)
- rms_dist = None if rms_dist is None else rms_dist[0]
- return rms_dist
- except (ValueError, AttributeError, TypeError):
- return None
- validity = [c.valid for c in self.preds]
-
- rms_dists = []
- for i in tqdm(range(len(self.preds))):
- rms_dists.append(process_one(
- self.preds[i], self.gts[i], validity[i]))
- rms_dists = np.array(rms_dists)
- match_rate = sum(x is not None for x in rms_dists) / len(self.preds)
- mean_rms_dist = np.array(
- [x for x in rms_dists if x is not None]).mean()
- return {"match_rate": match_rate,
- "rms_dist": mean_rms_dist}
-
- def get_metrics(self):
- return self.get_match_rate_and_rms()
-
-
-class GenEval():
- """Generation Evaluation result"""
-
- def __init__(self, pred_crys, gt_crys, comp_scaler, n_samples=10, eval_model_name=None):
- self.crys = pred_crys
- self.gt_crys = gt_crys
- self.n_samples = n_samples
- self.eval_model_name = eval_model_name
- self.comp_scaler = comp_scaler
-
- valid_crys = [c for c in pred_crys if c.valid]
- if len(valid_crys) >= n_samples:
- sampled_indices = np.random.choice(
- len(valid_crys), n_samples, replace=False)
- self.valid_samples = [valid_crys[i] for i in sampled_indices]
- else:
- raise Exception(
- f"not enough valid crystals in the predicted set: {len(valid_crys)}/{n_samples}")
-
- def get_validity(self):
- """
- Compute Validity, which means whether the structure is reasonable and phyically stable
- in both composition and structure.
- """
- comp_valid = np.array([c.comp_valid for c in self.crys]).mean()
- struct_valid = np.array([c.struct_valid for c in self.crys]).mean()
- valid = np.array([c.valid for c in self.crys]).mean()
- return {"comp_valid": comp_valid,
- "struct_valid": struct_valid,
- "valid": valid}
-
- def get_comp_diversity(self):
- """the earth mover’s distance (EMD) between the property distribution of
- generated materials and test materials.
- """
- comp_fps = [c.comp_fp for c in self.valid_samples]
- comp_fps = self.comp_scaler.transform(comp_fps)
- comp_div = get_fp_pdist(comp_fps)
- return {"comp_div": comp_div}
-
- def get_struct_diversity(self):
- return {"struct_div": get_fp_pdist([c.struct_fp for c in self.valid_samples])}
-
- def get_density_wdist(self):
- pred_densities = [c.structure.density for c in self.valid_samples]
- gt_densities = [c.structure.density for c in self.gt_crys]
- wdist_density = wasserstein_distance(pred_densities, gt_densities)
- return {"wdist_density": wdist_density}
-
- def get_num_elem_wdist(self):
- pred_nelems = [len(set(c.structure.species))
- for c in self.valid_samples]
- gt_nelems = [len(set(c.structure.species)) for c in self.gt_crys]
- wdist_num_elems = wasserstein_distance(pred_nelems, gt_nelems)
- return {"wdist_num_elems": wdist_num_elems}
-
- def get_coverage(self):
- """measure the similarity between ensembles of generated materials
- and ground truth materials. COV-R measures the percentage of
- ground truth materials being correctly predicted.
- """
- cutoff_dict = COV_CUTOFFS[self.eval_model_name]
- (cov_metrics_dict, _) = compute_cov(
- self.crys, self.gt_crys, self.comp_scaler,
- struc_cutoff=cutoff_dict["struct"],
- comp_cutoff=cutoff_dict["comp"])
- return cov_metrics_dict
-
- def get_metrics(self):
- metrics = {}
- metrics.update(self.get_validity())
- metrics.update(self.get_comp_diversity())
- metrics.update(self.get_struct_diversity())
- metrics.update(self.get_density_wdist())
- metrics.update(self.get_num_elem_wdist())
- print(f'evaluation metrics:{metrics}')
- metrics.update(self.get_coverage())
- return metrics
-
-
-def get_crystal_array_list(data, gt_data=None, ground_truth=False):
- """get crystal array list"""
- crys_array_list = get_crystals_list(
- np.concatenate(data["frac_coords"], axis=1).squeeze(0),
- np.concatenate(data["atom_types"], axis=1).squeeze(0),
- np.concatenate(data["lengths"], axis=1).squeeze(0),
- np.concatenate(data["angles"], axis=1).squeeze(0),
- np.concatenate(data["num_atoms"], axis=1).squeeze(0))
-
- # if "input_data_batch" in data:
- if ground_truth:
- true_crystal_array_list = get_crystals_list(
- np.concatenate(gt_data["frac_coords"], axis=0).squeeze(),
- np.concatenate(gt_data["atom_types"], axis=0).squeeze(),
- np.concatenate(gt_data["lengths"],
- axis=0).squeeze().reshape(-1, 3),
- np.concatenate(gt_data["angles"], axis=0).squeeze().reshape(-1, 3),
- np.concatenate(gt_data["num_atoms"], axis=0).squeeze())
- else:
- true_crystal_array_list = None
-
- return crys_array_list, true_crystal_array_list
-
-
-def main(args):
- all_metrics = {}
- eval_model_name = args.dataset
-
- if "recon" in args.tasks:
- out_data = np.load(args.eval_path+"/eval_recon.npy",
- allow_pickle=True).item()
- gt_data = np.load(args.eval_path+"/gt_recon.npy",
- allow_pickle=True).item()
- crys_array_list, true_crystal_array_list = get_crystal_array_list(
- out_data, gt_data, ground_truth=True)
- pred_crys = p_map(Crystal, crys_array_list)
- gt_crys = p_map(Crystal, true_crystal_array_list)
-
- rec_evaluator = RecEval(pred_crys, gt_crys)
- recon_metrics = rec_evaluator.get_metrics()
- all_metrics.update(recon_metrics)
-
- if "gen" in args.tasks:
- out_data = np.load(args.eval_path+"/eval_gen.npy",
- allow_pickle=True).item()
- gt_data = np.load(args.eval_path+"/gt_recon.npy",
- allow_pickle=True).item()
- crys_array_list, true_crystal_array_list = get_crystal_array_list(
- out_data, gt_data, ground_truth=True)
-
- gen_crys = p_map(Crystal, crys_array_list)
- gt_crys = p_map(Crystal, true_crystal_array_list)
- gt_comp_fps = [c.comp_fp for c in gt_crys]
- gt_fp_np = np.array(gt_comp_fps)
- comp_scaler = StandardScaler(replace_nan_token=0.)
- comp_scaler.fit(gt_fp_np)
-
- gen_evaluator = GenEval(
- gen_crys, gt_crys, comp_scaler, eval_model_name=eval_model_name)
- gen_metrics = gen_evaluator.get_metrics()
- all_metrics.update(gen_metrics)
-
- logging.info(all_metrics)
-
- if args.label == "":
- metrics_out_file = "eval_metrics.json"
- else:
- metrics_out_file = f"eval_metrics_{args.label}.json"
- metrics_out_file = os.path.join(args.eval_path, metrics_out_file)
-
- with open(metrics_out_file, "w") as f:
- json.dump(all_metrics, f)
-
-
-if __name__ == "__main__":
- parser = argparse.ArgumentParser()
- parser.add_argument("--dataset", default="perov_5")
- parser.add_argument("--eval_path", default="./eval_result")
- parser.add_argument("--label", default="")
- parser.add_argument("--tasks", nargs="+", default=["recon"])
- main_args = parser.parse_args()
- logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.INFO)
- main(main_args)
diff --git a/MindChemistry/applications/cdvae/conf/configs.yaml b/MindChemistry/applications/cdvae/conf/configs.yaml
deleted file mode 100644
index a75a20a86e390b7428eac0cccd961ccb450656ea..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/conf/configs.yaml
+++ /dev/null
@@ -1,65 +0,0 @@
-hidden_dim: 256
-latent_dim: 256
-fc_num_layers: 0
-max_atoms: 20
-cost_natom: 1.
-cost_coord: 10.
-cost_type: 1.
-cost_lattice: 10.
-cost_composition: 1.
-cost_edge: 10.
-cost_property: 1.
-beta: 0.01
-max_neighbors: 20
-radius: 7.
-sigma_begin: 10.
-sigma_end: 0.01
-type_sigma_begin: 5.
-type_sigma_end: 0.01
-num_noise_level: 50
-teacher_forcing_lattice: True
-predict_property: True
-
-Encoder:
- hidden_channels: 128
- num_blocks: 4
- int_emb_size: 64
- basis_emb_size: 8
- out_emb_channels: 256
- num_spherical: 7
- num_radial: 6
- cutoff: 7.0
- max_num_neighbors: 20
- envelope_exponent: 5
- num_before_skip: 1
- num_after_skip: 2
- num_output_layers: 3
-
-Decoder:
- hidden_dim: 128
-
-Optimizer:
- learning_rate: 0.001
- factor: 0.6
- patience: 30
- cooldown: 10
- min_lr: 0.0001
-
-Scaler:
- TripInteraction_1_had_rbf: 18.873615264892578
- TripInteraction_1_sum_cbf: 7.996850490570068
- AtomUpdate_1_sum: 1.220463752746582
- TripInteraction_2_had_rbf: 16.10817527770996
- TripInteraction_2_sum_cbf: 7.614634037017822
- AtomUpdate_2_sum: 0.9690994620323181
- TripInteraction_3_had_rbf: 15.01930046081543
- TripInteraction_3_sum_cbf: 7.025179862976074
- AtomUpdate_3_sum: 0.8903237581253052
- OutBlock_0_sum: 1.6437848806381226
- OutBlock_0_had: 16.161039352416992
- OutBlock_1_sum: 1.1077653169631958
- OutBlock_1_had: 13.54678726196289
- OutBlock_2_sum: 0.9477927684783936
- OutBlock_2_had: 12.754337310791016
- OutBlock_3_sum: 0.9059251546859741
- OutBlock_3_had: 13.484951972961426
diff --git a/MindChemistry/applications/cdvae/conf/data/carbon_24.yaml b/MindChemistry/applications/cdvae/conf/data/carbon_24.yaml
deleted file mode 100644
index 8a7c7093b586db54bbee2afc09bb05da2b5e31fa..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/conf/data/carbon_24.yaml
+++ /dev/null
@@ -1,12 +0,0 @@
-prop: energy_per_atom
-num_targets: 1
-niggli: true
-primitive: false
-graph_method: crystalnn
-lattice_scale_method: scale_length
-preprocess_workers: 30
-readout: mean
-max_atoms: 24
-otf_graph: false
-eval_model_name: carbon
-batch_size: 50
diff --git a/MindChemistry/applications/cdvae/conf/data/mp_20.yaml b/MindChemistry/applications/cdvae/conf/data/mp_20.yaml
deleted file mode 100644
index f43fb448daecec40d3ce5b1f1237add9ecd5c743..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/conf/data/mp_20.yaml
+++ /dev/null
@@ -1,12 +0,0 @@
-prop: formation_energy_per_atom
-num_targets: 1
-niggli: true
-primitive: False
-graph_method: crystalnn
-lattice_scale_method: scale_length
-preprocess_workers: 30
-readout: mean
-max_atoms: 20
-otf_graph: false
-eval_model_name: mp20
-batch_size: 50
diff --git a/MindChemistry/applications/cdvae/conf/data/perov_5.yaml b/MindChemistry/applications/cdvae/conf/data/perov_5.yaml
deleted file mode 100644
index f25a93abd529484492d46d02156591f150b5d656..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/conf/data/perov_5.yaml
+++ /dev/null
@@ -1,12 +0,0 @@
-prop: heat_ref
-num_targets: 1
-niggli: true
-primitive: false
-graph_method: crystalnn
-lattice_scale_method: scale_length
-preprocess_workers: 24
-readout: mean
-max_atoms: 20
-otf_graph: false
-eval_model_name: perovskite
-batch_size: 128
diff --git a/MindChemistry/applications/cdvae/create_dataset.py b/MindChemistry/applications/cdvae/create_dataset.py
deleted file mode 100644
index 827a1f2bab210863bfa31e0aba3ee7e599c78e95..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/create_dataset.py
+++ /dev/null
@@ -1,341 +0,0 @@
-# Copyright 2025 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""create_dataset"""
-
-import os
-import logging
-import argparse
-import numpy as np
-import pandas as pd
-from p_tqdm import p_umap
-from pymatgen.core.structure import Structure
-from pymatgen.core.lattice import Lattice
-from pymatgen.analysis.graphs import StructureGraph
-from pymatgen.analysis import local_env
-
-from mindchemistry.utils.load_config import load_yaml_config_from_path
-from mindchemistry.cell.gemnet.data_utils import get_scaler_from_data_list
-from mindchemistry.cell.gemnet.data_utils import lattice_params_to_matrix
-from mindchemistry.cell.dimenet.preprocess import PreProcess
-logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.INFO)
-
-
-class CreateDataset:
- """Create Dataset for crystal structures
-
- Args:
- name (str): Name of the dataset
- path (str): Path to the dataset
- prop (str): Property to predict
- niggli (bool): Whether to convert to Niggli reduced cell
- primitive (bool): Whether to convert to primitive cell
- graph_method (str): Method to create graph
- preprocess_workers (int): Number of workers for preprocessing
- lattice_scale_method (str): Method to scale lattice
- num_samples (int): Number of samples to use, if None use all
- """
-
- def __init__(self, name, path,
- prop, niggli, primitive,
- graph_method, preprocess_workers,
- lattice_scale_method, config_path,
- num_samples=None):
- super().__init__()
- self.path = path
- self.name = name
- self.num_samples = num_samples
- self.prop = prop
- self.niggli = niggli
- self.primitive = primitive
- self.graph_method = graph_method
- self.lattice_scale_method = lattice_scale_method
- self.config = load_yaml_config_from_path(config_path).get("Encoder")
- self.preprocess = PreProcess(
- num_spherical=self.config.get("num_spherical"),
- num_radial=self.config.get("num_radial"),
- envelope_exponent=self.config.get("envelope_exponent"),
- otf_graph=False,
- cutoff=self.config.get("cutoff"),
- max_num_neighbors=self.config.get("max_num_neighbors"),)
-
- self.cached_data = data_preprocess(
- self.path,
- preprocess_workers,
- niggli=self.niggli,
- primitive=self.primitive,
- graph_method=self.graph_method,
- prop_list=[prop],
- num_samples=self.num_samples
- )[:self.num_samples]
- add_scaled_lattice_prop(self.cached_data, lattice_scale_method)
- self.lattice_scaler = None
- self.scaler = None
-
- def __len__(self):
- return len(self.cached_data)
-
- def __getitem__(self, index):
- data = self.cached_data[index]
-
- # scaler is set in DataModule set stage
- prop = self.scaler.transform(data[self.prop])
- (frac_coords, atom_types, lengths, angles, edge_indices,
- to_jimages, num_atoms) = data["graph_arrays"]
- data_res = self.preprocess.data_process(angles.reshape(1, -1), lengths.reshape(1, -1),
- np.array([num_atoms]), edge_indices.T, frac_coords,
- edge_indices.shape[0], to_jimages, atom_types, prop)
- return data_res
-
- def __repr__(self):
- return f"CrystDataset({self.name}, {self.path})"
-
- def get_dataset_size(self):
- return len(self.cached_data)
-
-
-# match element with its chemical symbols
-chemical_symbols = [
- # 0
- "X",
- # 1
- "H", "He",
- # 2
- "Li", "Be", "B", "C", "N", "O", "F", "Ne",
- # 3
- "Na", "Mg", "Al", "Si", "P", "S", "Cl", "Ar",
- # 4
- "K", "Ca", "Sc", "Ti", "V", "Cr", "Mn", "Fe", "Co", "Ni", "Cu", "Zn",
- "Ga", "Ge", "As", "Se", "Br", "Kr",
- # 5
- "Rb", "Sr", "Y", "Zr", "Nb", "Mo", "Tc", "Ru", "Rh", "Pd", "Ag", "Cd",
- "In", "Sn", "Sb", "Te", "I", "Xe",
- # 6
- "Cs", "Ba", "La", "Ce", "Pr", "Nd", "Pm", "Sm", "Eu", "Gd", "Tb", "Dy",
- "Ho", "Er", "Tm", "Yb", "Lu",
- "Hf", "Ta", "W", "Re", "Os", "Ir", "Pt", "Au", "Hg", "Tl", "Pb", "Bi",
- "Po", "At", "Rn",
- # 7
- "Fr", "Ra", "Ac", "Th", "Pa", "U", "Np", "Pu", "Am", "Cm", "Bk",
- "Cf", "Es", "Fm", "Md", "No", "Lr",
- "Rf", "Db", "Sg", "Bh", "Hs", "Mt", "Ds", "Rg", "Cn", "Nh", "Fl", "Mc",
- "Lv", "Ts", "Og"
-]
-
-# used for crystal matching
-CRYSTALNN = local_env.CrystalNN(
- distance_cutoffs=None, x_diff_weight=-1, porous_adjustment=False)
-
-
-def build_crystal(crystal_str, niggli=True, primitive=False):
- """Build crystal from cif string."""
- crystal = Structure.from_str(crystal_str, fmt="cif")
-
- if primitive:
- crystal = crystal.get_primitive_structure()
-
- if niggli:
- crystal = crystal.get_reduced_structure()
-
- canonical_crystal = Structure(
- lattice=Lattice.from_parameters(*crystal.lattice.parameters),
- species=crystal.species,
- coords=crystal.frac_coords,
- coords_are_cartesian=False,
- )
- # match is gaurantteed because cif only uses lattice params & frac_coords
- assert canonical_crystal.matches(crystal)
- return canonical_crystal
-
-
-def build_crystal_graph(crystal, graph_method="crystalnn"):
- """build crystal graph"""
-
- if graph_method == "crystalnn":
- crystal_graph = StructureGraph.with_local_env_strategy(
- crystal, CRYSTALNN)
- elif graph_method == "none":
- pass
- else:
- raise NotImplementedError
-
- frac_coords = crystal.frac_coords
- atom_types = crystal.atomic_numbers
- lattice_parameters = crystal.lattice.parameters
- lengths = lattice_parameters[:3]
- angles = lattice_parameters[3:]
-
- assert np.allclose(crystal.lattice.matrix,
- lattice_params_to_matrix(*lengths, *angles))
-
- edge_indices, to_jimages = [], []
- if graph_method != "none":
- for i, j, to_jimage in crystal_graph.graph.edges(data="to_jimage"):
- edge_indices.append([j, i])
- to_jimages.append(to_jimage)
- edge_indices.append([i, j])
- to_jimages.append(tuple(-tj for tj in to_jimage))
-
- atom_types = np.array(atom_types)
- lengths, angles = np.array(lengths), np.array(angles)
- edge_indices = np.array(edge_indices)
- to_jimages = np.array(to_jimages)
- num_atoms = atom_types.shape[0]
-
- return frac_coords, atom_types, lengths, angles, edge_indices, to_jimages, num_atoms
-
-
-def save_data(dataset, is_train, dataset_name):
- """save created dataset to npy"""
- processed_data = dict()
- data_parameters = ["atom_types", "dist", "angle", "idx_kj", "idx_ji",
- "edge_j", "edge_i", "pos", "batch", "lengths",
- "num_atoms", "angles", "frac_coords",
- "num_bonds", "num_triplets", "sbf", "y"]
- for j, name in enumerate(data_parameters):
- if j == 16:
- # Here, y is mindspore.Tensor, while others are all numpy.array, so need to change the type first.
- processed_data[name] = [i[j].astype(np.float32) for i in dataset]
- elif j == 14:
- # Here, we need the sum of num_triplets, so get the summary before we save it.
- processed_data[name] = [i[j].sum() for i in dataset]
- else:
- processed_data[name] = [i[j] for i in dataset]
-
- if not os.path.exists(f"./data/{dataset_name}/{is_train}"):
- os.makedirs(f"./data/{dataset_name}/{is_train}")
- logging.info("%s has been created",
- f"./data/{dataset_name}/{is_train}")
- if is_train == "train":
- np.savetxt(f"./data/{dataset_name}/{is_train}/scaler_mean.csv",
- dataset.scaler.means.reshape(-1))
- np.savetxt(f"./data/{dataset_name}/{is_train}/scaler_std.csv",
- dataset.scaler.stds.reshape(-1))
- np.savetxt(
- f"./data/{dataset_name}/{is_train}/lattice_scaler_mean.csv", dataset.lattice_scaler.means)
- np.savetxt(
- f"./data/{dataset_name}/{is_train}/lattice_scaler_std.csv", dataset.lattice_scaler.stds)
- np.save(
- f"./data/{dataset_name}/{is_train}/processed_data.npy", processed_data)
-
-
-def process_one(row, niggli, primitive, graph_method, prop_list):
- """process one one sample"""
- crystal_str = row["cif"]
- crystal = build_crystal(
- crystal_str, niggli=niggli, primitive=primitive)
- graph_arrays = build_crystal_graph(crystal, graph_method)
- properties = {k: row[k] for k in prop_list if k in row.keys()}
- result_dict = {
- "mp_id": row["material_id"],
- "cif": crystal_str,
- "graph_arrays": graph_arrays,
- }
- result_dict.update(properties)
- return result_dict
-
-
-def data_preprocess(input_file, num_workers, niggli, primitive, graph_method, prop_list, num_samples):
- """process data"""
- df = pd.read_csv(input_file)[:num_samples]
-
- unordered_results = p_umap(
- process_one,
- [df.iloc[idx] for idx in range(len(df))],
- [niggli] * len(df),
- [primitive] * len(df),
- [graph_method] * len(df),
- [prop_list] * len(df),
- num_cpus=num_workers)
-
- mpid_to_results = {result["mp_id"]: result for result in unordered_results}
- ordered_results = [mpid_to_results[df.iloc[idx]["material_id"]]
- for idx in range(len(df))]
-
- return ordered_results
-
-
-def add_scaled_lattice_prop(data_list, lattice_scale_method):
- """add scaled lattice prop to dataset"""
- for data in data_list:
- graph_arrays = data["graph_arrays"]
- # the indexes are brittle if more objects are returned
- lengths = graph_arrays[2]
- angles = graph_arrays[3]
- num_atoms = graph_arrays[-1]
- assert lengths.shape[0] == angles.shape[0] == 3
- assert isinstance(num_atoms, int)
-
- if lattice_scale_method == "scale_length":
- lengths = lengths / float(num_atoms)**(1 / 3)
-
- data["scaled_lattice"] = np.concatenate([lengths, angles])
-
-
-def create_dataset(args):
- """create dataset"""
- config_data_path = f"./conf/data/{args.dataset}.yaml"
- config_path = f"./conf/configs.yaml"
- config_data = load_yaml_config_from_path(config_data_path)
- prop = config_data.get("prop")
- niggli = config_data.get("niggli")
- primitive = config_data.get("primitive")
- graph_method = config_data.get("graph_method")
- lattice_scale_method = config_data.get("lattice_scale_method")
- preprocess_workers = config_data.get("preprocess_workers")
- path_train = f"./data/{args.dataset}/train.csv"
- train_dataset = CreateDataset("Formation energy train", path_train, prop,
- niggli, primitive, graph_method,
- preprocess_workers, lattice_scale_method,
- config_path, args.num_samples_train)
- lattice_scaler = get_scaler_from_data_list(
- train_dataset.cached_data,
- key="scaled_lattice")
- scaler = get_scaler_from_data_list(
- train_dataset.cached_data,
- key=train_dataset.prop)
- train_dataset.lattice_scaler = lattice_scaler
- train_dataset.scaler = scaler
- save_data(train_dataset, "train", args.dataset)
-
- path_val = f"./data/{args.dataset}/val.csv"
- val_dataset = CreateDataset("Formation energy val", path_val, prop,
- niggli, primitive, graph_method,
- preprocess_workers, lattice_scale_method, args.num_samples_val)
- val_dataset.lattice_scaler = lattice_scaler
- val_dataset.scaler = scaler
- save_data(val_dataset, "val", args.dataset)
-
- path_test = f"./data/{args.dataset}/test.csv"
- test_dataset = CreateDataset("Formation energy test", path_test, prop,
- niggli, primitive, graph_method,
- preprocess_workers, lattice_scale_method,
- args.num_samples_test)
- test_dataset.lattice_scaler = lattice_scaler
- test_dataset.scaler = scaler
- save_data(test_dataset, "test", args.dataset)
-
-
-def main(args):
- create_dataset(args)
-
-
-if __name__ == "__main__":
- parser = argparse.ArgumentParser()
- parser.add_argument("--dataset", default="perov_5")
- parser.add_argument("--num_samples_train", default=300, type=int)
- parser.add_argument("--num_samples_val", default=300, type=int)
- parser.add_argument("--num_samples_test", default=300, type=int)
- main_args = parser.parse_args()
- main(main_args)
diff --git a/MindChemistry/applications/cdvae/evaluation.py b/MindChemistry/applications/cdvae/evaluation.py
deleted file mode 100644
index 4361fb5e4f593f3f170c6af33e5f2d2022a15910..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/evaluation.py
+++ /dev/null
@@ -1,192 +0,0 @@
-# Copyright 2025 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""Evaluation
-"""
-
-import os
-import time
-import logging
-from types import SimpleNamespace
-import argparse
-import mindspore as ms
-import numpy as np
-
-from mindchemistry.cell.cdvae import CDVAE
-from src.dataloader import DataLoaderBaseCDVAE
-from src.evaluate_utils import (get_reconstructon_res, get_generation_res,
- get_optimization_res)
-from train import get_scaler
-
-
-def task_reconstruction(model, ld_kwargs, graph_dataset, recon_args):
- """Evaluate model on the reconstruction task."""
- logging.info("Evaluate model on the reconstruction task.")
- (frac_coords, num_atoms, atom_types, lengths, angles,
- gt_frac_coords, gt_num_atoms, gt_atom_types,
- gt_lengths, gt_angles) = get_reconstructon_res(
- graph_dataset, model, ld_kwargs, recon_args.num_evals,
- recon_args.force_num_atoms, recon_args.force_atom_types)
-
- if recon_args.label == "":
- recon_out_name = "eval_recon.npy"
- else:
- recon_out_name = f"eval_recon_{recon_args.label}.npy"
-
- result = {
- "eval_setting": recon_args,
- "frac_coords": frac_coords,
- "num_atoms": num_atoms,
- "atom_types": atom_types,
- "lengths": lengths,
- "angles": angles,
- }
- # save result as numpy
- np.save("./eval_result/" + recon_out_name, result)
- groundtruth = {
- "frac_coords": gt_frac_coords,
- "num_atoms": gt_num_atoms,
- "atom_types": gt_atom_types,
- "lengths": gt_lengths,
- "angles": gt_angles,
- }
- # save ground truth as numpy
- np.save("./eval_result/gt_recon.npy", groundtruth)
-
-
-def task_generation(model, ld_kwargs, gen_args):
- """Evaluate model on the generation task."""
- logging.info("Evaluate model on the generation task.")
-
- (frac_coords, num_atoms, atom_types, lengths, angles,
- all_frac_coords_stack, all_atom_types_stack) = get_generation_res(
- model, ld_kwargs, gen_args.num_batches_to_samples, gen_args.num_evals,
- gen_args.batch_size, gen_args.down_sample_traj_step)
-
- if gen_args.label == "":
- gen_out_name = "eval_gen.npy"
- else:
- gen_out_name = f"eval_gen_{gen_args.label}.npy"
-
- result = {
- "eval_setting": gen_args,
- "frac_coords": frac_coords,
- "num_atoms": num_atoms,
- "atom_types": atom_types,
- "lengths": lengths,
- "angles": angles,
- "all_frac_coords_stack": all_frac_coords_stack,
- "all_atom_types_stack": all_atom_types_stack,
- }
- # save result as numpy
- np.save("./eval_result/" + gen_out_name, result)
-
-
-def task_optimization(model, ld_kwargs, graph_dataset, opt_args):
- """Evaluate model on the property optimization task."""
- logging.info("Evaluate model on the property optimization task.")
- if opt_args.start_from == "data":
- loader = graph_dataset
- else:
- loader = None
- optimized_crystals = get_optimization_res(model, ld_kwargs, loader)
- if opt_args.label == "":
- gen_out_name = "eval_opt.npy"
- else:
- gen_out_name = f"eval_opt_{opt_args.label}.npy"
- # save result as numpy
- np.save("./eval_result/" + gen_out_name, optimized_crystals)
-
-
-def main(args):
- # check whether path exists, if not exists create the direction
- folder_path = os.path.dirname(args.model_path)
- if not os.path.exists(folder_path):
- os.makedirs(folder_path)
- logging.info("%s has been created", folder_path)
- result_path = "./eval_result/"
- if not os.path.exists(result_path):
- os.makedirs(result_path)
- logging.info("%s has been created", result_path)
- config_path = "./conf/configs.yaml"
- data_config_path = f"./conf/data/{args.dataset}.yaml"
- # load model
- model = CDVAE(config_path, data_config_path)
- # load mindspore check point
- param_dict = ms.load_checkpoint(args.model_path)
- param_not_load, _ = ms.load_param_into_net(model, param_dict)
- logging.info("parameter not load: %s.", param_not_load)
- model.set_train(False)
-
- ld_kwargs = SimpleNamespace(n_step_each=args.n_step_each,
- step_lr=args.step_lr,
- min_sigma=args.min_sigma,
- save_traj=args.save_traj,
- disable_bar=args.disable_bar)
- # load dataset
- graph_dataset = DataLoaderBaseCDVAE(
- args.batch_size, args.dataset, shuffle_dataset=False, mode="test")
- # load scaler
- lattice_scaler, scaler = get_scaler(args)
- model.lattice_scaler = lattice_scaler
- model.scaler = scaler
-
- start_time_eval = time.time()
- if "recon" in args.tasks:
- task_reconstruction(model, ld_kwargs, graph_dataset, args)
- if "gen" in args.tasks:
- task_generation(model, ld_kwargs, args)
- if "opt" in args.tasks:
- task_optimization(model, ld_kwargs, graph_dataset, args)
- logging.info("end evaluation, time: %f s.", time.time() - start_time_eval)
-
-def get_args():
- """args used for evaluation"""
- parser = argparse.ArgumentParser()
- parser.add_argument("--device_target", default="Ascend", help="device target")
- parser.add_argument("--device_id", default=7, type=int, help="device id")
- parser.add_argument("--model_path", default="./loss/loss.ckpt",
- help="path to checkpoint")
- parser.add_argument("--dataset", default="perov_5", help="name of dataset")
- parser.add_argument("--tasks", nargs="+", default=["gen"],
- help="tasks to evaluate, choose from 'recon, gen, opt'")
- parser.add_argument("--n_step_each", default=1, type=int,
- help="number of steps in diffusion")
- parser.add_argument("--step_lr", default=1e-3, type=float, help="learning rate")
- parser.add_argument("--min_sigma", default=0, type=float, help="minimum sigma")
- parser.add_argument("--save_traj", default=False, type=bool,
- help="whether to save trajectory")
- parser.add_argument("--disable_bar", default=False, type=bool,
- help="disable progress bar")
- parser.add_argument("--num_evals", default=1, type=int,
- help="number of evaluations returned for each task")
- parser.add_argument("--num_batches_to_samples", default=1, type=int,
- help="number of batches to sample")
- parser.add_argument("--start_from", default="data", type=str,
- help="start from data or random")
- parser.add_argument("--batch_size", default=128, type=int, help="batch size")
- parser.add_argument("--force_num_atoms", action="store_true",
- help="fixed num atoms or not")
- parser.add_argument("--force_atom_types", action="store_true",
- help="fixed atom types or not")
- parser.add_argument("--down_sample_traj_step", default=10, type=int, help="down sample")
- parser.add_argument("--label", default="", help="label for output file")
- return parser.parse_args()
-
-if __name__ == "__main__":
- main_args = get_args()
- logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.INFO)
- ms.context.set_context(device_target=main_args.device_target,
- device_id=main_args.device_id, mode=1)
- main(main_args)
diff --git a/MindChemistry/applications/cdvae/images/illustrative.png b/MindChemistry/applications/cdvae/images/illustrative.png
deleted file mode 100644
index a70858f7f67a881ba63606c03ec3eba13fb7ef1a..0000000000000000000000000000000000000000
Binary files a/MindChemistry/applications/cdvae/images/illustrative.png and /dev/null differ
diff --git a/MindChemistry/applications/cdvae/mp_20_process.py b/MindChemistry/applications/cdvae/mp_20_process.py
deleted file mode 100644
index 99dbef741c0458037ce62868ccc263b9f6c9e03d..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/mp_20_process.py
+++ /dev/null
@@ -1,72 +0,0 @@
-# Copyright 2025 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-""" script used for generate mp_20 dataset from raw data"""
-import os
-import logging
-import argparse
-import pandas as pd
-from pymatgen.core.structure import Structure
-from pymatgen.core.lattice import Lattice
-from pymatgen.io.cif import CifWriter
-
-
-def mp_20_process():
- """process the mp_20 dataset"""
- if not os.path.exists(args.data_path):
- os.makedirs(args.data_path)
- logging.info("%s has been created", args.data_path)
-
- # read json file and transfer to pandasframe
- df = pd.read_json(args.init_path)
- df = df[["id", "formation_energy_per_atom", "band_gap", "pretty_formula",
- "e_above_hull", "elements", "atoms", "spacegroup_number"]]
- struct_list = []
- element_list = []
- # generate Structure from its df["atoms"] for each samples
- for struct in df["atoms"]:
- lattice = Lattice(struct["lattice_mat"], (False, False, False))
- pos = struct["coords"]
- species = struct["elements"]
- structure = Structure(lattice, species, pos)
- # save cif from Structure
- cif = CifWriter(structure)
- struct_list.append(cif.__str__())
- element_list.append(struct["elements"])
-
- # add cif to df
- df.insert(7, "cif", struct_list)
- df = df.drop("atoms", axis=1)
- df["elements"] = element_list
-
- # save to csv file
- # solit the dataset to train:val:test = 6:2:2
- train_df = df.iloc[:int(0.6 * len(df))]
- val_df = df.iloc[int(0.6 * len(df)):int(0.8 * len(df))]
- test_df = df.iloc[int(0.8 * len(df)):]
- train_df.to_csv(args.data_path+"/train.csv", index=False)
- val_df.to_csv(args.data_path+"/val.csv", index=False)
- test_df.to_csv(args.data_path+"/test.csv", index=False)
- logging.info("Finished!")
-
-
-if __name__ == "__main__":
- logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.INFO)
- parser = argparse.ArgumentParser()
- parser.add_argument("--init_path", default="./data/mp_20.json",
- help="path to the initial dataset file")
- parser.add_argument("--data_path", default="./data/mp_20",
- help="path to save the processed dataset")
- args = parser.parse_args()
- mp_20_process()
diff --git a/MindChemistry/applications/cdvae/requirements.txt b/MindChemistry/applications/cdvae/requirements.txt
deleted file mode 100644
index 8cac7af853e1281c606c9a5aa325e6453394feee..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/requirements.txt
+++ /dev/null
@@ -1,12 +0,0 @@
-matminer==0.7.3
-mindchemistry_ascend==0.1.0
-mindspore==2.3.0.20240411
-numpy==1.26.4
-p_tqdm==1.4.0
-pandas==2.2.2
-pymatgen==2023.8.10
-sciai==0.1.0
-scipy==1.13.1
-SMACT==2.2.1
-sympy==1.12
-tqdm==4.66.2
diff --git a/MindChemistry/applications/cdvae/src/__init__.py b/MindChemistry/applications/cdvae/src/__init__.py
deleted file mode 100644
index 0be6d3fcd8c0be618bfa801a8585477fd5326443..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/src/__init__.py
+++ /dev/null
@@ -1,15 +0,0 @@
-# Copyright 2025 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""src"""
diff --git a/MindChemistry/applications/cdvae/src/dataloader.py b/MindChemistry/applications/cdvae/src/dataloader.py
deleted file mode 100644
index f02b257767810b8088405f449d7996ca92f5ba0a..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/src/dataloader.py
+++ /dev/null
@@ -1,233 +0,0 @@
-# Copyright 2025 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""dataloader
-"""
-import random
-import numpy as np
-from mindspore import Tensor
-import mindspore as ms
-
-
-class DataLoaderBaseCDVAE:
- r"""
- DataLoader for CDVAE
- """
-
- def __init__(self,
- batch_size,
- dataset,
- shuffle_dataset=True,
- mode="train"):
- dataset = np.load(
- f"./data/{dataset}/{mode}/processed_data.npy", allow_pickle=True).item()
- self.atom_types = dataset["atom_types"]
- self.dist = dataset["dist"]
- self.angle = dataset["angle"]
- self.idx_kj = dataset["idx_kj"]
- self.idx_ji = dataset["idx_ji"]
- self.edge_j = dataset["edge_j"]
- self.edge_i = dataset["edge_i"]
- self.pos = dataset["pos"]
- self.batch = dataset["batch"]
- self.lengths = dataset["lengths"]
- self.num_atoms = dataset["num_atoms"]
- self.angles = dataset["angles"]
- self.frac_coords = dataset["frac_coords"]
- self.y = dataset["y"]
- self.num_bonds = dataset["num_bonds"]
- self.num_triplets = dataset["num_triplets"]
- self.sbf = dataset["sbf"]
- self.edge_attr = self.edge_j
- self.batch_size = batch_size
- self.index = 0
- self.step = 0
- self.shuffle_dataset = shuffle_dataset
- self.feature = [self.atom_types, self.dist, self.angle, self.idx_kj, self.idx_ji,
- self.edge_j, self.edge_i, self.pos, self.batch, self.lengths,
- self.num_atoms, self.angles, self.frac_coords, self.y,
- self.num_bonds, self.num_triplets, self.sbf]
-
- # can be customized to specific dataset
- self.label = self.num_atoms
- self.node_attr = self.atom_types
- self.sample_num = len(self.node_attr)
-
- self.max_start_sample = self.sample_num - self.batch_size + 1
-
- def get_dataset_size(self):
- return self.sample_num
-
- def __iter__(self):
- if self.shuffle_dataset:
- self.shuffle()
- else:
- self.restart()
- while self.index < self.max_start_sample:
- # can be customized to generate different attributes or labels according to specific dataset
- num_bonds_step = self.gen_global_attr(
- self.num_bonds, self.batch_size).astype(np.int32)
- num_atoms_step = self.gen_global_attr(
- self.num_atoms, self.batch_size).squeeze().astype(np.int32)
- num_triplets_step = self.gen_global_attr(
- self.num_triplets, self.batch_size).astype(np.int32)
- atom_types_step = self.gen_node_attr(
- self.atom_types, self.batch_size).astype(np.int32)
- dist_step = self.gen_edge_attr(
- self.dist, self.batch_size).astype(np.float32)
- angle_step = self.gen_triplet_attr(
- self.angle, self.batch_size).astype(np.float32)
- idx_kj_step = self.gen_triplet_attr(self.idx_kj, self.batch_size)
- idx_kj_step = self.add_index_offset(
- idx_kj_step, num_bonds_step, num_triplets_step).astype(np.int32)
- idx_ji_step = self.gen_triplet_attr(self.idx_ji, self.batch_size)
- idx_ji_step = self.add_index_offset(
- idx_ji_step, num_bonds_step, num_triplets_step).astype(np.int32)
- edge_j_step = self.gen_edge_attr(self.edge_j, self.batch_size)
- edge_j_step = self.add_index_offset(
- edge_j_step, num_atoms_step, num_bonds_step).astype(np.int32)
- edge_i_step = self.gen_edge_attr(self.edge_j, self.batch_size)
- edge_i_step = self.add_index_offset(
- edge_i_step, num_atoms_step, num_bonds_step).astype(np.int32)
- batch_step = np.repeat(
- np.arange(num_atoms_step.shape[0],), num_atoms_step, axis=0).astype(np.int32)
- lengths_step = self.gen_crystal_attr(
- self.lengths, self.batch_size).astype(np.float32)
- angles_step = self.gen_crystal_attr(
- self.angles, self.batch_size).astype(np.float32)
- frac_coords_step = self.gen_node_attr(
- self.frac_coords, self.batch_size).astype(np.float32)
- y_step = self.gen_global_attr(
- self.y, self.batch_size).astype(np.float32)
- sbf_step = self.gen_triplet_attr(
- self.sbf, self.batch_size).astype(np.float32)
- total_atoms = num_atoms_step.sum().item()
- self.add_step_index(self.batch_size)
-
- ############## change to mindspore Tensor #############
- yield self.np2tensor(atom_types_step, dist_step, angle_step, idx_kj_step,
- idx_ji_step, edge_j_step, edge_i_step, batch_step,
- lengths_step, num_atoms_step, angles_step, frac_coords_step,
- y_step, self.batch_size, sbf_step, total_atoms)
-
- def np2tensor(self, atom_types_step, dist_step, angle_step, idx_kj_step,
- idx_ji_step, edge_j_step, edge_i_step, batch_step,
- lengths_step, num_atoms_step, angles_step, frac_coords_step,
- y_step, batch_size, sbf_step, total_atoms):
- """np2tensor"""
- atom_types_step = Tensor(atom_types_step, ms.int32)
- dist_step = Tensor(dist_step, ms.float32)
- angle_step = Tensor(angle_step, ms.float32)
- idx_kj_step = Tensor(idx_kj_step, ms.int32)
- idx_ji_step = Tensor(idx_ji_step, ms.int32)
- edge_j_step = Tensor(edge_j_step, ms.int32)
- edge_i_step = Tensor(edge_i_step, ms.int32)
- batch_step = Tensor(batch_step, ms.int32)
- lengths_step = Tensor(lengths_step, ms.float32)
- num_atoms_step = Tensor(num_atoms_step, ms.int32)
- angles_step = Tensor(angles_step, ms.float32)
- frac_coords_step = Tensor(frac_coords_step, ms.float32)
- y_step = Tensor(y_step, ms.float32)
- sbf_step = Tensor(sbf_step, ms.float32)
- return (atom_types_step, dist_step, angle_step, idx_kj_step,
- idx_ji_step, edge_j_step, edge_i_step, batch_step,
- lengths_step, num_atoms_step, angles_step, frac_coords_step,
- y_step, batch_size, sbf_step, total_atoms)
-
- def add_index_offset(self, edge_index, num_atoms, num_bonds):
- index_offset = (
- np.cumsum(num_atoms, axis=0) - num_atoms
- )
-
- index_offset_expand = np.repeat(
- index_offset, num_bonds
- )
- edge_index += index_offset_expand
- return edge_index
-
- def shuffle_index(self):
- """shuffle_index"""
- indices = list(range(self.sample_num))
- random.shuffle(indices)
- return indices
-
- def shuffle(self):
- """shuffle"""
- self.shuffle_action()
- self.step = 0
- self.index = 0
-
- def shuffle_action(self):
- """shuffle_action"""
- indices = self.shuffle_index()
- self.atom_types = [self.atom_types[i] for i in indices]
- self.dist = [self.dist[i] for i in indices]
- self.angle = [self.angle[i] for i in indices]
- self.idx_kj = [self.idx_kj[i] for i in indices]
- self.idx_ji = [self.idx_ji[i] for i in indices]
- self.edge_j = [self.edge_j[i] for i in indices]
- self.edge_i = [self.edge_i[i] for i in indices]
- self.pos = [self.pos[i] for i in indices]
- self.batch = [self.batch[i] for i in indices]
- self.lengths = [self.lengths[i] for i in indices]
- self.num_atoms = [self.num_atoms[i] for i in indices]
- self.angles = [self.angles[i] for i in indices]
- self.frac_coords = [self.frac_coords[i] for i in indices]
- self.y = [self.y[i] for i in indices]
- self.num_bonds = [self.num_bonds[i] for i in indices]
- self.num_triplets = [self.num_triplets[i] for i in indices]
- self.sbf = [self.sbf[i] for i in indices]
-
- def restart(self):
- """restart"""
- self.step = 0
- self.index = 0
-
- def gen_node_attr(self, node_attr, batch_size):
- """gen_node_attr"""
- node_attr_step = np.concatenate(
- node_attr[self.index:self.index + batch_size], 0)
- return node_attr_step
-
- def gen_edge_attr(self, edge_attr, batch_size):
- """gen_edge_attr"""
- edge_attr_step = np.concatenate(
- edge_attr[self.index:self.index + batch_size], 0)
-
- return edge_attr_step
-
- def gen_global_attr(self, global_attr, batch_size):
- """gen_global_attr"""
- global_attr_step = np.stack(
- global_attr[self.index:self.index + batch_size], 0)
-
- return global_attr_step
-
- def gen_crystal_attr(self, global_attr, batch_size):
- """gen_global_attr"""
- global_attr_step = np.stack(
- global_attr[self.index:self.index + batch_size], 0).squeeze()
- return global_attr_step
-
- def gen_triplet_attr(self, triplet_attr, batch_size):
- """gen_triplet_attr"""
- global_attr_step = np.concatenate(
- triplet_attr[self.index:self.index + batch_size], 0)
-
- return global_attr_step
-
- def add_step_index(self, batch_size):
- """add_step_index"""
- self.index = self.index + batch_size
- self.step += 1
diff --git a/MindChemistry/applications/cdvae/src/evaluate_utils.py b/MindChemistry/applications/cdvae/src/evaluate_utils.py
deleted file mode 100644
index 236ececa4c6eb8f3f5c0d28bfef7de6b5d9b3328..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/src/evaluate_utils.py
+++ /dev/null
@@ -1,191 +0,0 @@
-# Copyright 2025 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""evaluate_utils"""
-import logging
-import mindspore as ms
-import mindspore.mint as mint
-from mindspore.nn import Adam
-from tqdm import tqdm
-
-logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.INFO)
-
-
-def get_reconstructon_res(loader, model, ld_kwargs, num_evals,
- force_num_atoms=False, force_atom_types=False):
- """
- reconstruct the crystals in .
- """
- result_frac_coords = []
- result_num_atoms = []
- result_atom_types = []
- result_lengths = []
- result_angles = []
- gt_frac_coords = []
- groundtruth_num_atoms = []
- groundtruth_atom_types = []
- gt_lengths = []
- gt_angles = []
- for idx, data in enumerate(loader):
- logging.info("Reconstructing %d", int(idx * data[-3]))
- batch_frac_coords, batch_num_atoms, batch_atom_types = [], [], []
- batch_lengths, batch_angles = [], []
-
- # only sample one z, multiple evals for stoichaticity in langevin dynamics
- (atom_types, dist, _, idx_kj, idx_ji,
- edge_j, edge_i, batch, lengths, num_atoms,
- angles, frac_coords, _, batch_size, sbf,
- total_atoms) = data
- gt_frac_coords.append(frac_coords.asnumpy())
- gt_angles.append(angles.asnumpy())
- gt_lengths.append(lengths.asnumpy())
- groundtruth_atom_types.append(atom_types.asnumpy())
- groundtruth_num_atoms.append(num_atoms.asnumpy())
- _, _, z = model.encode(atom_types, dist,
- idx_kj, idx_ji, edge_j, edge_i,
- batch, total_atoms, batch_size, sbf)
- for _ in range(num_evals):
- gt_num_atoms = num_atoms if force_num_atoms else None
- gt_atom_types = atom_types if force_atom_types else None
- outputs = model.langevin_dynamics(
- z, ld_kwargs, batch_size, total_atoms, gt_num_atoms, gt_atom_types)
- # collect sampled crystals in this batch.
- batch_frac_coords.append(outputs["frac_coords"].asnumpy())
- batch_num_atoms.append(outputs["num_atoms"].asnumpy())
- batch_atom_types.append(outputs["atom_types"].asnumpy())
- batch_lengths.append(outputs["lengths"].asnumpy())
- batch_angles.append(outputs["angles"].asnumpy())
- # collect sampled crystals for this z.
- result_frac_coords.append(batch_frac_coords)
- result_num_atoms.append(batch_num_atoms)
- result_atom_types.append(batch_atom_types)
- result_lengths.append(batch_lengths)
- result_angles.append(batch_angles)
-
- return (
- result_frac_coords, result_num_atoms, result_atom_types,
- result_lengths, result_angles,
- gt_frac_coords, groundtruth_num_atoms, groundtruth_atom_types,
- gt_lengths, gt_angles)
-
-
-def get_generation_res(model, ld_kwargs, num_batches_to_sample, num_samples_per_z,
- batch_size=512, down_sample_traj_step=1):
- """
- generate new crystals based on randomly sampled z.
- """
- all_frac_coords_stack = []
- all_atom_types_stack = []
- result_frac_coords = []
- result_num_atoms = []
- result_atom_types = []
- result_lengths = []
- result_angles = []
-
- for _ in range(num_batches_to_sample):
- batch_all_frac_coords = []
- batch_all_atom_types = []
- batch_frac_coords, batch_num_atoms, batch_atom_types = [], [], []
- batch_lengths, batch_angles = [], []
-
- z = ms.ops.randn(batch_size, model.hidden_dim)
-
- for _ in range(num_samples_per_z):
- samples = model.langevin_dynamics(z, ld_kwargs, batch_size)
-
- # collect sampled crystals in this batch.
- batch_frac_coords.append(samples["frac_coords"].asnumpy())
- batch_num_atoms.append(samples["num_atoms"].asnumpy())
- batch_atom_types.append(samples["atom_types"].asnumpy())
- batch_lengths.append(samples["lengths"].asnumpy())
- batch_angles.append(samples["angles"].asnumpy())
- if ld_kwargs.save_traj:
- batch_all_frac_coords.append(
- samples["all_frac_coords"][::down_sample_traj_step].asnumpy())
- batch_all_atom_types.append(
- samples["all_atom_types"][::down_sample_traj_step].asnumpy())
-
- # collect sampled crystals for this z.
- result_frac_coords.append(batch_frac_coords)
- result_num_atoms.append(batch_num_atoms)
- result_atom_types.append(batch_atom_types)
- result_lengths.append(batch_lengths)
- result_angles.append(batch_angles)
- if ld_kwargs.save_traj:
- all_frac_coords_stack.append(
- batch_all_frac_coords)
- all_atom_types_stack.append(
- batch_all_atom_types)
-
- return (result_frac_coords, result_num_atoms, result_atom_types,
- result_lengths, result_angles,
- all_frac_coords_stack, all_atom_types_stack)
-
-
-def get_optimization_res(model, ld_kwargs, data_loader,
- num_starting_points=128, num_gradient_steps=5000,
- lr=1e-3, num_saved_crys=10):
- """
- optimize the structure based on specific proprety.
- """
- model.set_train(True)
- if data_loader is not None:
- data = next(iter(data_loader))
- (atom_types, dist, _, idx_kj, idx_ji,
- edge_j, edge_i, batch, _, num_atoms,
- _, _, _, batch_size, sbf,
- total_atoms) = data
- _, _, z = model.encode(atom_types, dist,
- idx_kj, idx_ji, edge_j, edge_i,
- batch, total_atoms, batch_size, sbf)
- z = mint.narrow(z, 0, 0, num_starting_points)
- z = ms.Parameter(z, requires_grad=True)
- else:
- z = mint.randn(num_starting_points, model.hparams.hidden_dim)
- z = ms.Parameter(z, requires_grad=True)
-
- opt = Adam([z], learning_rate=lr)
- freeze_model(model)
-
- loss_fn = model.fc_property
-
- def forward_fn(data):
- loss = loss_fn(data)
- return loss
- grad_fn = ms.value_and_grad(forward_fn, None, opt.parameters)
-
- def train_step(data):
- loss, grads = grad_fn(data)
- opt(grads)
- return loss
-
- all_crystals = []
- total_atoms = mint.sum(mint.narrow(
- num_atoms, 0, 0, num_starting_points)).item()
- interval = num_gradient_steps // (num_saved_crys - 1)
- for i in tqdm(range(num_gradient_steps)):
- loss = mint.mean(train_step(z))
- logging.info("Task opt step: %d, loss: %f", i, loss)
- if i % interval == 0 or i == (num_gradient_steps - 1):
- crystals = model.langevin_dynamics(
- z, ld_kwargs, batch_size, total_atoms)
- all_crystals.append(crystals)
- return {k: mint.cat([d[k] for d in all_crystals]).unsqueeze(0).asnumpy() for k in
- ["frac_coords", "atom_types", "num_atoms", "lengths", "angles"]}
-
-
-def freeze_model(model):
- """ The model is fixed, only optimize z"""
- for param in model.get_parameters():
- param.requires_grad = False
diff --git a/MindChemistry/applications/cdvae/src/metrics_utils.py b/MindChemistry/applications/cdvae/src/metrics_utils.py
deleted file mode 100644
index cf179ec1053b7e2776e2ff55b7d9247f4ed30cf9..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/src/metrics_utils.py
+++ /dev/null
@@ -1,191 +0,0 @@
-# Copyright 2025 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""utils for compute metrics"""
-import itertools
-import numpy as np
-
-from scipy.spatial.distance import pdist
-from scipy.spatial.distance import cdist
-
-import smact
-from smact.screening import pauling_test
-
-from create_dataset import chemical_symbols
-
-
-def get_crystals_list(
- frac_coords, atom_types, lengths, angles, num_atoms):
- """
- args:
- frac_coords: (num_atoms, 3)
- atom_types: (num_atoms)
- lengths: (num_crystals)
- angles: (num_crystals)
- num_atoms: (num_crystals)
- """
- assert frac_coords.shape[0] == atom_types.shape[0] == num_atoms.sum()
- assert lengths.shape[0] == angles.shape[0] == num_atoms.shape[0]
-
- start_idx = 0
- crystal_array_list = []
- for batch_idx, num_atom in enumerate(num_atoms.tolist()):
- cur_frac_coords = frac_coords[start_idx:start_idx+num_atom]
- cur_atom_types = atom_types[start_idx:start_idx+num_atom]
- cur_lengths = lengths[batch_idx]
- cur_angles = angles[batch_idx]
-
- crystal_array_list.append({
- "frac_coords": cur_frac_coords,
- "atom_types": cur_atom_types,
- "lengths": cur_lengths,
- "angles": cur_angles,
- })
- start_idx = start_idx + num_atom
- return crystal_array_list
-
-
-def smact_validity(comp, count,
- use_pauling_test=True,
- include_alloys=True):
- """compute smact validity"""
- elem_symbols = tuple([chemical_symbols[elem] for elem in comp])
- space = smact.element_dictionary(elem_symbols)
- smact_elems = [e[1] for e in space.items()]
- electronegs = [e.pauling_eneg for e in smact_elems]
- ox_combos = [e.oxidation_states for e in smact_elems]
- if len(set(elem_symbols)) == 1:
- return True
- if include_alloys:
- is_metal_list = [elem_s in smact.metals for elem_s in elem_symbols]
- if all(is_metal_list):
- return True
-
- threshold = np.max(count)
- compositions = []
- for ox_states in itertools.product(*ox_combos):
- stoichs = [(c,) for c in count]
- # Test for charge balance
- cn_e, cn_r = smact.neutral_ratios(
- ox_states, stoichs=stoichs, threshold=threshold)
- # Electronegativity test
- if cn_e:
- if use_pauling_test:
- try:
- electroneg_pass = pauling_test(ox_states, electronegs)
- except TypeError:
- # if no electronegativity data, assume it is okay
- electroneg_pass = True
- else:
- electroneg_pass = True
- if electroneg_pass:
- for ratio in cn_r:
- compositions.append(
- tuple([elem_symbols, ox_states, ratio]))
- compositions = [(i[0], i[2]) for i in compositions]
- compositions = list(set(compositions))
- res = bool(compositions)
- return res
-
-
-def structure_validity(crystal, cutoff=0.5):
- """compute structure validity"""
- dist_mat = crystal.distance_matrix
- # Pad diagonal with a large number
- dist_mat = dist_mat + np.diag(
- np.ones(dist_mat.shape[0]) * (cutoff + 10.))
- res = None
- if dist_mat.min() < cutoff or crystal.volume < 0.1:
- res = False
- else:
- res = True
- return res
-
-
-def get_fp_pdist(fp_array):
- if isinstance(fp_array, list):
- fp_array = np.array(fp_array)
- fp_pdists = pdist(fp_array)
- return fp_pdists.mean()
-
-
-def filter_fps(struc_fps, comp_fps):
- assert len(struc_fps) == len(comp_fps)
-
- filtered_struc_fps, filtered_comp_fps = [], []
-
- for struc_fp, comp_fp in zip(struc_fps, comp_fps):
- if struc_fp is not None and comp_fp is not None:
- filtered_struc_fps.append(struc_fp)
- filtered_comp_fps.append(comp_fp)
- return filtered_struc_fps, filtered_comp_fps
-
-
-def compute_cov(crys, gt_crys, comp_scaler,
- struc_cutoff, comp_cutoff, num_gen_crystals=None):
- """compute COV"""
- struc_fps = [c.struct_fp for c in crys]
- comp_fps = [c.comp_fp for c in crys]
- gt_struc_fps = [c.struct_fp for c in gt_crys]
- gt_comp_fps = [c.comp_fp for c in gt_crys]
-
- assert len(struc_fps) == len(comp_fps)
- assert len(gt_struc_fps) == len(gt_comp_fps)
-
- # Use number of crystal before filtering to compute COV
- if num_gen_crystals is None:
- num_gen_crystals = len(struc_fps)
-
- struc_fps, comp_fps = filter_fps(struc_fps, comp_fps)
-
- comp_fps = comp_scaler.transform(comp_fps)
- gt_comp_fps = comp_scaler.transform(gt_comp_fps)
-
- struc_fps = np.array(struc_fps)
- gt_struc_fps = np.array(gt_struc_fps)
- comp_fps = np.array(comp_fps)
- gt_comp_fps = np.array(gt_comp_fps)
-
- struc_pdist = cdist(struc_fps, gt_struc_fps)
- comp_pdist = cdist(comp_fps, gt_comp_fps)
-
- struc_recall_dist = struc_pdist.min(axis=0)
- struc_precision_dist = struc_pdist.min(axis=1)
- comp_recall_dist = comp_pdist.min(axis=0)
- comp_precision_dist = comp_pdist.min(axis=1)
-
- cov_recall = np.mean(np.logical_and(
- struc_recall_dist <= struc_cutoff,
- comp_recall_dist <= comp_cutoff))
- cov_precision = np.sum(np.logical_and(
- struc_precision_dist <= struc_cutoff,
- comp_precision_dist <= comp_cutoff)) / num_gen_crystals
-
- metrics_dict = {
- "cov_recall": cov_recall,
- "cov_precision": cov_precision,
- "amsd_recall": np.mean(struc_recall_dist),
- "amsd_precision": np.mean(struc_precision_dist),
- "amcd_recall": np.mean(comp_recall_dist),
- "amcd_precision": np.mean(comp_precision_dist),
- }
-
- combined_dist_dict = {
- "struc_recall_dist": struc_recall_dist.tolist(),
- "struc_precision_dist": struc_precision_dist.tolist(),
- "comp_recall_dist": comp_recall_dist.tolist(),
- "comp_precision_dist": comp_precision_dist.tolist(),
- }
-
- return metrics_dict, combined_dist_dict
diff --git a/MindChemistry/applications/cdvae/train.py b/MindChemistry/applications/cdvae/train.py
deleted file mode 100644
index 1b73927f594f602817db32fa1986c68afebce833..0000000000000000000000000000000000000000
--- a/MindChemistry/applications/cdvae/train.py
+++ /dev/null
@@ -1,185 +0,0 @@
-# Copyright 2025 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""Train
-"""
-
-import os
-import logging
-import argparse
-import time
-import numpy as np
-import mindspore as ms
-from mindspore.experimental import optim
-from mindchemistry.utils.load_config import load_yaml_config_from_path
-from mindchemistry.cell.cdvae import CDVAE
-from mindchemistry.cell.gemnet.data_utils import StandardScalerMindspore
-from create_dataset import create_dataset
-from src.dataloader import DataLoaderBaseCDVAE
-
-
-def train_epoch(epoch, model, optimizer, scheduler, train_dataset):
- """Train the model for one epoch"""
- model.set_train()
- # Define forward function
-
- def forward_fn(data):
- (atom_types, dist, _, idx_kj, idx_ji,
- edge_j, edge_i, batch, lengths, num_atoms,
- angles, frac_coords, y, batch_size, sbf, total_atoms) = data
- loss = model(atom_types, dist, idx_kj, idx_ji, edge_j, edge_i,
- batch, lengths, num_atoms, angles, frac_coords,
- y, batch_size, sbf, total_atoms, True, True)
- return loss
- # Get gradient function
- grad_fn = ms.value_and_grad(
- forward_fn, None, optimizer.parameters, has_aux=False)
-
- # Define function of one-step training
- def train_step(data):
- loss, grads = grad_fn(data)
- scheduler.step(loss)
- optimizer(grads)
- return loss
-
- start_time_step = time.time()
- for batch, data in enumerate(train_dataset):
- loss = train_step(data)
- time_step = time.time() - start_time_step
- start_time_step = time.time()
- if batch % 10 == 0:
- logging.info("Train Epoch: %d [%d]\tLoss: %4f,\t time_step: %4f",
- epoch, batch, loss, time_step)
-
-
-def test_epoch(model, val_dataset):
- """test for one epoch"""
- model.set_train(False)
- test_loss = 0
- i = 1
- for i, data in enumerate(val_dataset):
- (atom_types, dist, _, idx_kj, idx_ji,
- edge_j, edge_i, batch, lengths, num_atoms,
- angles, frac_coords, y, batch_size, sbf, total_atoms) = data
- output = model(atom_types, dist,
- idx_kj, idx_ji, edge_j, edge_i,
- batch, lengths, num_atoms,
- angles, frac_coords, y, batch_size,
- sbf, total_atoms, False, True)
- test_loss += float(output)
- test_loss /= (i+1)
- logging.info("Val Loss: %4f", test_loss)
- return test_loss
-
-def get_scaler(args):
- """get scaler"""
- lattice_scaler_mean = ms.Tensor(np.loadtxt(
- f"./data/{args.dataset}/train/lattice_scaler_mean.csv"), ms.float32)
- lattice_scaler_std = ms.Tensor(np.loadtxt(
- f"./data/{args.dataset}/train/lattice_scaler_std.csv"), ms.float32)
- scaler_std = ms.Tensor(np.loadtxt(
- f"./data/{args.dataset}/train/scaler_std.csv"), ms.float32)
- scaler_mean = ms.Tensor(np.loadtxt(
- f"./data/{args.dataset}/train/scaler_mean.csv"), ms.float32)
- lattice_scaler = StandardScalerMindspore(
- lattice_scaler_mean, lattice_scaler_std)
- scaler = StandardScalerMindspore(scaler_mean, scaler_std)
- return lattice_scaler, scaler
-
-def train_net(args):
- """training process"""
- folder_path = os.path.dirname(args.name_ckpt)
- if not os.path.exists(folder_path):
- os.makedirs(folder_path)
- logging.info("%s has been created", folder_path)
- config_path = "./conf/configs.yaml"
- data_config_path = f"./conf/data/{args.dataset}.yaml"
-
- model = CDVAE(config_path, data_config_path)
-
- # load checkpoint
- if args.load_ckpt:
- model_path = args.name_ckpt
- param_dict = ms.load_checkpoint(model_path)
- param_not_load, _ = ms.load_param_into_net(model, param_dict)
- logging.info("%s have not been loaded", param_not_load)
-
- # create dataset when running the model first-time or when dataset is not exist
- if args.create_dataset or not os.path.exists(f"./data/{args.dataset}/train/processed_data.npy"):
- logging.info("Creating dataset......")
- create_dataset(args) # dataset created will be save to the dir based on args.dataset as npy
-
- # read dataset from processed_data
- batch_size = load_yaml_config_from_path(data_config_path).get("batch_size")
- train_dataset = DataLoaderBaseCDVAE(
- batch_size, args.dataset, shuffle_dataset=True, mode="train")
- val_dataset = DataLoaderBaseCDVAE(
- batch_size, args.dataset, shuffle_dataset=False, mode="val")
- lattice_scaler, scaler = get_scaler(args)
- model.lattice_scaler = lattice_scaler
- model.scaler = scaler
-
- config_opt = load_yaml_config_from_path(config_path).get("Optimizer")
- learning_rate = config_opt.get("learning_rate")
- min_lr = config_opt.get("min_lr")
- factor = config_opt.get("factor")
- patience = config_opt.get("patience")
-
- optimizer = optim.Adam(model.trainable_params(), learning_rate)
- scheduler = optim.lr_scheduler.ReduceLROnPlateau(
- optimizer, 'min', factor=factor, patience=patience, min_lr=min_lr)
-
- min_test_loss = float("inf")
- for epoch in range(args.epoch_num):
- train_epoch(epoch, model, optimizer, scheduler, train_dataset)
- if epoch % 10 == 0:
- test_loss = test_epoch(model, val_dataset)
- if test_loss < min_test_loss:
- min_test_loss = test_loss
- ms.save_checkpoint(model, args.name_ckpt)
- logging.info("Updata best acc: %f", test_loss)
-
- logging.info('Finished Training')
-
-def get_args():
- """get args"""
- parser = argparse.ArgumentParser()
- parser.add_argument("--dataset", default="perov_5", help="dataset name")
- parser.add_argument("--create_dataset", default=False,
- type=bool, help="whether create dataset again or not")
- parser.add_argument("--num_samples_train", default=500, type=int,
- help="number of samples for training,\
- only valid when create_dataset is True")
- parser.add_argument("--num_samples_val", default=300, type=int,
- help="number of samples for validation,\
- only valid when create_dataset is True")
- parser.add_argument("--num_samples_test", default=300, type=int,
- help="number of samples for test,\
- only valid when create_dataset is True")
- parser.add_argument("--name_ckpt", default="./loss/loss.ckpt",
- help="the path to save checkpoint")
- parser.add_argument("--load_ckpt", default=False, type=bool,
- help="whether load checkpoint or not")
- parser.add_argument("--device_target", default="Ascend", help="device target")
- parser.add_argument("--device_id", default=3, type=int, help="device id")
- parser.add_argument("--epoch_num", default=100, type=int, help="number of epoch")
- return parser.parse_args()
-
-if __name__ == "__main__":
- main_args = get_args()
- logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.INFO)
- ms.context.set_context(device_target=main_args.device_target,
- device_id=main_args.device_id,
- mode=1)
- train_net(main_args)
diff --git a/MindChemistry/docs/high-alloy.png b/MindChemistry/docs/high-alloy.png
deleted file mode 100644
index ce2e0c7e1bac9ff4a34bc0475f797a6792eac6a1..0000000000000000000000000000000000000000
Binary files a/MindChemistry/docs/high-alloy.png and /dev/null differ
diff --git a/MindChemistry/docs/high-alloy_cn.png b/MindChemistry/docs/high-alloy_cn.png
deleted file mode 100644
index e32404974ee5f242b5c64e506aae655c351f2929..0000000000000000000000000000000000000000
Binary files a/MindChemistry/docs/high-alloy_cn.png and /dev/null differ
diff --git a/MindChemistry/docs/mindchemistry_arch.png b/MindChemistry/docs/mindchemistry_arch.png
deleted file mode 100644
index fae4dfcc250fcdaf33b44a283779889c382cc45d..0000000000000000000000000000000000000000
Binary files a/MindChemistry/docs/mindchemistry_arch.png and /dev/null differ
diff --git a/MindChemistry/docs/mindchemistry_archi.png b/MindChemistry/docs/mindchemistry_archi.png
new file mode 100644
index 0000000000000000000000000000000000000000..4cff59cf72de047b1e83c1ad5ca774b7e1eea0ff
Binary files /dev/null and b/MindChemistry/docs/mindchemistry_archi.png differ
diff --git a/MindChemistry/docs/mindchemistry_archi_cn.png b/MindChemistry/docs/mindchemistry_archi_cn.png
index 52b48ff4b6f99a59285fb8addc6ea99a8afd21cc..400d47732af1d585680669cdb019182774fd83c4 100644
Binary files a/MindChemistry/docs/mindchemistry_archi_cn.png and b/MindChemistry/docs/mindchemistry_archi_cn.png differ
diff --git a/MindChemistry/mindchemistry/cell/__init__.py b/MindChemistry/mindchemistry/cell/__init__.py
index 005d0869515d77c92ab29b69b8e693912a4ae9ac..f92153c675690f6e66162a99535340533a424760 100644
--- a/MindChemistry/mindchemistry/cell/__init__.py
+++ b/MindChemistry/mindchemistry/cell/__init__.py
@@ -19,7 +19,6 @@ from .cspnet import CSPNet
from .basic_block import AutoEncoder, FCNet, MLPNet
from .deephe3nn import *
from .matformer import *
-from .cdvae import *
from .dimenet import *
from .gemnet import *
@@ -29,6 +28,5 @@ __all__ = [
__all__.extend(deephe3nn.__all__)
__all__.extend(matformer.__all__)
__all__.extend(allegro.__all__)
-__all__.extend(cdvae.__all__)
__all__.extend(dimenet.__all__)
__all__.extend(gemnet.__all__)
diff --git a/MindChemistry/mindchemistry/cell/cdvae/__init__.py b/MindChemistry/mindchemistry/cell/cdvae/__init__.py
deleted file mode 100644
index aa4a87f4c7eef2d8399270ad740ac00b41694fd1..0000000000000000000000000000000000000000
--- a/MindChemistry/mindchemistry/cell/cdvae/__init__.py
+++ /dev/null
@@ -1,19 +0,0 @@
-# Copyright 2024 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""modules"""
-
-from .cdvae import CDVAE
-
-__all__ = ["CDVAE"]
diff --git a/MindChemistry/mindchemistry/cell/cdvae/cdvae.py b/MindChemistry/mindchemistry/cell/cdvae/cdvae.py
deleted file mode 100644
index 7aac68bddb598bb59431ab2159c4eae0e1246091..0000000000000000000000000000000000000000
--- a/MindChemistry/mindchemistry/cell/cdvae/cdvae.py
+++ /dev/null
@@ -1,777 +0,0 @@
-# Copyright 2024 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""CDVAE model
-"""
-
-import numpy as np
-import mindspore as ms
-import mindspore.mint as mint
-from tqdm import tqdm
-from ...utils.load_config import load_yaml_config_from_path
-from ...graph.graph import AggregateNodeToGlobal, LiftGlobalToNode
-from ..dimenet import DimeNetPlusPlus
-from ..gemnet.data_utils import (
- cart_to_frac_coords, frac_to_cart_coords, min_distance_sqr_pbc,
- cart_to_frac_coords_numpy, frac_to_cart_coords_numpy)
-from ..gemnet.layers.embedding_block import MAX_ATOMIC_NUM
-from ..gemnet.layers.base_layers import MLP
-from ..gemnet.preprocess import GemNetPreprocess
-from .decoder import GemNetTDecoder
-
-
-class CDVAE(ms.nn.Cell):
- r"""
- CDVAE Model
-
- Args:
- config_path (str): Path to the config file.
- data_config_path (str): Path to the data config file.
-
- Inputs:
- - **atom_types** (Tensor) - The shape of tensor is :math:`(total\_atoms,)`.
- - **dist** (Tensor) - The shape of tensor is :math:`(total\_edges,)`.
- - **idx_kj** (Tensor) - The index of the first edge in the triples.
- The shape of tensor is :math:`(total\_triplets,)`.
- - **idx_ji** (Tensor) - The index of the sechond edge in the triples.
- The shape of tensor is :math:`(total\_triplets,)`.
- - **edge_j** (Tensor) - The index of the first atom of the edges.
- The shape of tensor is :math:`(total\_edges,)`.
- - **edge_i** (Tensor) - The index of the second atom of the edges.
- The shape of tensor is :math:`(total\_edges,)`.
- - **batch** (Tensor) - The shape of Tensor is :math:`(total\_atoms,)`.
- - **length** (Tensor) - The lattice constant of each crystal. The shape of Tensor is :math:`(batch\_size, 3)`.
- - **angles** (Tensor) - The lattice agnle of each crystal. The shape of Tensor is :math:`(batch\_size, 3)`.
- - **num_atoms** (Tensor) - Num_atoms of each crystal. The shape of Tensor is :math:`(batch\_size,)`.
- - **frac_coords** (Tensor) - Position of each atoms. The shape of Tensor is :math:`(total\_atoms,3)`.
- - **y** (Tensor) - Position of each atoms. The shape of Tensor is :math:`(batch\_size,)`.
- - **batch_size** (int) - Batchsize.
- - **sbf** (Tensor) - The shape of Tensor is :math:`(total\_triplets, num\_spherical * num\_radial)`.
- - **total_atoms** (int) - Total atoms
- - **teacher_forcing** (bool) - if teacher_forcing: True, else: False
- - **training** (bool) - If training: True, else: False
-
- Outputs:
- - **loss** (Tensor) - Scaler
-
- Raises:
- TypeError: If predict_property is not bool.
- TypeError: If teacher_forcing_lattice is not bool.
- ValueError: If lattice_scale_method is not 'scale_length'.
-
- Supported Platforms:
- ``Ascend``
-
- Examples:
- >>> import numpy as np
- >>> import mindspore as ms
- >>> from mindspore import context, Tensor
- >>> from mindchemistry.cell import CDVAE
- >>> from mindchemistry.cell.cdvae.data_utils import StandardScalerMindspore
- >>> os.environ["MS_JIT_MODULES"] = "mindchemistry"
- >>> context.set_context(mode=context.PYNATIVE_MODE)
- >>> config_path = "./configs.yaml"
- >>> data_config_path = "./perov_5.yaml"
- >>> cdvae_model = CDVAE(config_path, data_config_path)
- >>> # input data
- >>> batch_size = 2
- >>> atom_types = Tensor([6, 7, 6, 8], ms.int32)
- >>> dist = Tensor([1.4, 1.7, 1.8, 1.9, 2.0, 2.1, 1.8, 1.6], ms.float32)
- >>> idx_kj = Tensor([0, 1, 2, 3, 4, 5, 5, 4, 3, 2, 1, 0, 7, 6, 6, 7], ms.int32)
- >>> idx_ji = Tensor([1, 0, 3, 2, 5, 4, 4, 5, 2, 3, 0, 1, 6, 7, 7, 6], ms.int32)
- >>> edge_j = Tensor([0, 1, 1, 0, 2, 3, 3, 2], ms.int32)
- >>> edge_i = Tensor([1, 0, 0, 1, 3, 2, 2, 3], ms.int32)
- >>> batch = Tensor([0, 0, 1, 1], ms.int32)
- >>> lengths = Tensor([[2.5, 2.5, 2.5],
- ... [2.5, 2.5, 2.5]], ms.float32)
- >>> angles = Tensor([[90.0, 90.0, 90.0],
- ... [90.0, 90.0, 90.0]], ms.float32)
- >>> frac_coords = Tensor([[0.0, 0.0, 0.0],
- ... [0.5, 0.5, 0.5],
- ... [0.7, 0.7, 0.7],
- ... [0.5, 0.5, 0.5]], ms.float32)
- >>> num_atoms = Tensor([2, 2], ms.int32)
- >>> y = Tensor([0.08428, 0.01353], ms.float32)
- >>> total_atoms = 4
- >>> sbf = Tensor(np.random.randn(16, 42), ms.float32)
- >>> cdvae_model.lattice_scaler = StandardScalerMindspore(
- ... Tensor([2.5, 2.5, 2.5, 90.0, 90.0, 90.0], ms.float32),
- ... Tensor([1.0, 1.0, 1.0, 1.0, 1.0, 1.0], ms.float32))
- >>> cdvae_model.scaler = StandardScalerMindspore(
- ... Tensor([2.62], ms.float32),
- ... Tensor([1.0], ms.float32))
- >>> out = cdvae_model(atom_types, dist,
- ... idx_kj, idx_ji, edge_j, edge_i,
- ... batch, lengths, num_atoms,
- ... angles, frac_coords, y, batch_size,
- ... sbf, total_atoms, False, True)
- >>> print("out:", out)
- out: 27.780727
- """
-
- def __init__(self, config_path, data_config_path):
- super().__init__()
- self.configs = load_yaml_config_from_path(config_path)
- decoder_configs = self.configs.get("Decoder")
- encoder_configs = self.configs.get("Encoder")
- data_configs = load_yaml_config_from_path(data_config_path)
- self.latent_dim = self.configs.get("latent_dim")
- self.hidden_dim = self.configs.get("hidden_dim")
- self.fc_num_layers = self.configs.get("fc_num_layers")
- if isinstance(self.configs.get("predict_property"), bool):
- self.set_predict_property = self.configs.get("predict_property")
- else:
- raise TypeError("predict_property should be bool.")
- self.sigma_begin = self.configs.get("sigma_begin")
- self.sigma_end = self.configs.get("sigma_end")
- self.num_noise_level = self.configs.get("num_noise_level")
- self.type_sigma_begin = self.configs.get("type_sigma_begin")
- self.type_sigma_end = self.configs.get("type_sigma_end")
- if isinstance(self.configs.get("teacher_forcing_lattice"), bool):
- self.teacher_forcing_lattice = self.configs.get(
- "teacher_forcing_lattice")
- else:
- raise TypeError("teacher_forcing_lattice should be bool.")
- if data_configs.get("lattice_scale_method") in ["scale_length"]:
- self.lattice_scale_method = data_configs.get(
- "lattice_scale_method")
- else:
- raise ValueError(
- "For lattice scale method, supported methods: 'scale_length'.")
- self.max_atoms = data_configs.get("max_atoms")
-
- self.encoder = DimeNetPlusPlus(
- num_targets=self.latent_dim,
- hidden_channels=encoder_configs.get("hidden_channels"),
- num_blocks=encoder_configs.get("num_blocks"),
- int_emb_size=encoder_configs.get("int_emb_size"),
- basis_emb_size=encoder_configs.get("basis_emb_size"),
- out_emb_channels=encoder_configs.get("out_emb_channels"),
- num_spherical=encoder_configs.get("num_spherical"),
- num_radial=encoder_configs.get("num_radial"),
- cutoff=encoder_configs.get("cutoff"),
- envelope_exponent=encoder_configs.get("envelope_exponent"),
- num_before_skip=encoder_configs.get("num_before_skip"),
- num_after_skip=encoder_configs.get("num_after_skip"),
- num_output_layers=encoder_configs.get("num_output_layers"),
- readout=data_configs.get("readout"))
-
- self.decoder = GemNetTDecoder(
- hidden_dim=decoder_configs.get("hidden_dim"),
- latent_dim=self.configs.get("latent_dim"),
- max_neighbors=self.configs.get("max_neighbors"),
- radius=self.configs.get("radius"),
- config_path=config_path
- )
- self.fc_mu = mint.nn.Linear(self.latent_dim, self.latent_dim)
- self.fc_var = mint.nn.Linear(self.latent_dim, self.latent_dim)
- self.fc_num_atoms = MLP(self.latent_dim, self.hidden_dim,
- self.fc_num_layers, self.max_atoms + 1,
- activation='ReLU')
- self.fc_lattice = MLP(self.latent_dim, self.hidden_dim,
- self.fc_num_layers, 6,
- activation='ReLU')
- self.max_atomic_num = MAX_ATOMIC_NUM
- self.fc_composition = MLP(self.latent_dim, self.hidden_dim,
- self.fc_num_layers, self.max_atomic_num,
- activation='ReLU')
- # for property prediction.
- if self.set_predict_property:
- self.fc_property = MLP(self.latent_dim, self.hidden_dim,
- self.fc_num_layers, 1,
- activation='ReLU')
- self.sigmas = np.exp(np.linspace(
- np.log(self.sigma_begin),
- np.log(self.sigma_end),
- self.num_noise_level))
-
- self.type_sigmas = np.exp(np.linspace(
- np.log(self.type_sigma_begin),
- np.log(self.type_sigma_end),
- self.num_noise_level))
-
- # obtain from datamodule.
- self.lattice_scaler = None
- self.scaler = None
- self.decoder_preprocess = GemNetPreprocess(otf_graph=True)
- self.aggregate_mean = AggregateNodeToGlobal(mode="mean")
- self.lift_global = LiftGlobalToNode()
-
- def reparameterize(self, mu, logvar):
- r"""
- Reparameterization trick to sample to N(mu, var) from N(0,1).
-
- Args:
- mu (Tensor): Mean of the latent Gaussian.
- The shape of tensor is :math:`(batch\_size, latent\_dim)`.
- logvar (Tensor): Standard deviation of the latent Gaussian.
- The shape of tensor is :math:`(batch\_size, latent\_dim)`.
-
- Returns:
- (Tensor) Randomly generated latent parameter.
- The shape of tensor is :math:`(batch\_size, latent\_dim)`.
- """
-
- std = mint.exp(mint.mul(0.5, logvar))
- eps = ms.Tensor(np.random.randn(
- std.shape[0], std.shape[1]), ms.float32)
- return eps * std + mu
-
- def encode(self, atom_types, dist, idx_kj, idx_ji, edge_j, edge_i, batch, total_atoms, batch_size, sbf):
- r"""
- encode crystal structures to latents.
-
- Args:
- atom_types (Tensor): Atom types of each atom.
- The shape of tensor is :math:`(total\_atoms,)`.
- dist (Tensor): Distance between atoms.
- The shape of tensor is :math:`(total\_edges,)`.
- idx_kj (Tensor): The index of the first edge in the triples.
- The shape of tensor is :math:`(total\_triplets,)`.
- idx_ji (Tensor): The index of the sechond edge in the triples.
- The shape of tensor is :math:`(total\_triplets,)`.
- edge_j (Tensor): The index of the first atom of the edges.
- The shape of tensor is :math:`(total\_edges,)`.
- edge_i (Tensor): The index of the second atom of the edges.
- The shape of tensor is :math:`(total\_edges,)`.
- batch (Tensor): The shape of tensor is :math:`(total\_atoms,)`.
- total_atoms (int): Total atoms.
- batch_size (int): Batch size.
- sbf (Tensor): The shape of tensor is :math:`(total\_triplets, num\_spherical * num\_radial)`.
-
- Returns:
- mu (Tensor): Mean of the latent Gaussian.
- The shape of tensor is :math:`(batch\_size, latent\_dim)`.
- log_var (Tensor): Standard deviation of the latent Gaussian.
- The shape of tensor is :math:`(batch\_size, latent\_dim)`.
- z (Tensor): Randomly generated latent parameter.
- The shape of tensor is :math:`(batch\_size, latent\_dim)`.
- """
- hidden = self.encoder(atom_types, dist, idx_kj, idx_ji, edge_j, edge_i,
- batch, total_atoms, batch_size, sbf)
- mu = self.fc_mu(hidden)
- log_var = self.fc_var(hidden)
- z = self.reparameterize(mu, log_var)
- return mu, log_var, z
-
- def decode_stats(self, z, batch=None, gt_num_atoms=None, gt_lengths=None, gt_angles=None,
- teacher_forcing=False):
- r"""
- Decode key statistics from latent embeddings.
-
- This method decodes key statistics from the given latent embeddings.
-
- Args:
- z (Tensor): Randomly generated latent parameter.
- The shape of the tensor is :math:`(batch\_size, latent\_dim)`.
- batch (Tensor, optional): The shape of the tensor is :math:`(total\_atoms,)`.
- gt_num_atoms (Tensor, optional): Ground truth number of atoms.
- gt_lengths (Tensor, optional): Ground truth lattice constant.
- gt_angles (Tensor, optional): Ground truth lattice angle.
- teacher_forcing (bool): If `True`, teacher forcing is used during training;
- otherwise, `False`. Default is `False`.
-
- Returns:
- tuple of Tensor including: num_atoms, lengths_and_angles, lengths, angles,
- composition_per_atom, z_per_atom and batch.
- num_atoms (Tensor): The predicted number of atoms. The shape of the tensor is :math:`(batch\_size,)`.
- lengths_and_angles (Tensor): The predicted lattice constants and angles.
- The shape of the tensor is :math:`(batch\_size, 6)`.
- lengths (Tensor): The predicted lattice constants. The shape of the tensor is :math:`(batch\_size, 3)`.
- angles (Tensor): The predicted lattice angles. The shape of the tensor is :math:`(batch\_size, 3)`.
- composition_per_atom (Tensor): The predicted composition per atom.
- The shape of the tensor is :math:`(total\_atoms, max_atomic\_num)`.
- z_per_atom (Tensor): The lifted global latent embeddings per atom.
- The shape of the tensor is :math:`(total\_atoms, latent\_dim)`.
- batch (Tensor): The batch tensor used to index which sample is the node belonging.
- The shape of the tensor is :math:`(total\_atoms,)`.
- """
- if gt_num_atoms is not None:
- num_atoms = self.predict_num_atoms(z)
- lengths_and_angles, lengths, angles = (
- self.predict_lattice(z, gt_num_atoms))
- assert batch is not None
- z_per_atom = self.lift_global(z, batch)
- composition_per_atom = self.predict_composition(z_per_atom)
- if self.teacher_forcing_lattice and teacher_forcing:
- lengths = gt_lengths
- angles = gt_angles
- else:
- num_atoms = mint.argmax(
- self.predict_num_atoms(z), dim=-1).astype(ms.int32)
- lengths_and_angles, lengths, angles = (
- self.predict_lattice(z, num_atoms))
- batch = ms.ops.repeat_interleave(mint.arange(
- num_atoms.shape[0], dtype=ms.int32), num_atoms)
- z_per_atom = self.lift_global(z, batch)
- composition_per_atom = self.predict_composition(z_per_atom)
- # set the max and min values for lengths and angles.
- angles = mint.clamp(angles, -180, 180)
- lengths = mint.clamp(lengths, 0.5, 20)
- return num_atoms, lengths_and_angles, lengths, angles, composition_per_atom, z_per_atom, batch
-
- def langevin_dynamics(self, z, ld_kwargs, batch_size, total_atoms=None, gt_num_atoms=None, gt_atom_types=None):
- r"""
- decode crystral structure from latent embeddings.
-
- Args:
- ld_kwargs: args for doing annealed langevin dynamics sampling:
- n_step_each (int): number of steps for each sigma level.
- step_lr (int): step size param.
- min_sigma (int): minimum sigma to use in annealed langevin dynamics.
- save_traj (bool): if True, save the entire LD trajectory.
- disable_bar (bool): disable the progress bar of langevin dynamics.
- gt_num_atoms (Tensor, optional): if not , use the ground truth number of atoms.
- gt_atom_types (Tensor, optional): if not , use the ground truth atom types.
-
- Returns:
- (dict): Including num_atoms, lengths, anlges, frac_coords, atom_types, is_traj
- num_atoms (Tensor): Number of atoms in each crystal.
- The shape of tensor is :math:`(batch\_size,)`.
- lengths (Tensor): Lattice constant of each crystal.
- The shape of tensor is :math:`(batch\_size, 3)`.
- angles (Tensor): Lattice angle of each crystal.
- The shape of tensor is :math:`(batch\_size, 3)`.
- frac_coords (Tensor): Fractional coordinates of each atom.
- The shape of tensor is :math:`(total\_atoms, 3)`.
- atom_types (Tensor): Atom types of each atom.
- The shape of tensor is :math:`(total\_atoms,)`.
- is_traj (bool): If True, save the entire LD trajectory.
- """
-
- if ld_kwargs.save_traj:
- all_frac_coords = []
- all_pred_cart_coord_diff = []
- all_noise_cart = []
- all_atom_types = []
- # obtain key stats.
- num_atoms, _, lengths, angles, composition_per_atom, z_per_atom, batch = self.decode_stats(
- z, gt_num_atoms)
- if gt_num_atoms is not None:
- num_atoms = gt_num_atoms
- else:
- total_atoms = num_atoms.sum().item()
- # obtain atom types.
- composition_per_atom = mint.softmax(composition_per_atom, dim=-1)
- if gt_atom_types is None:
- cur_atom_types = self.sample_composition(
- composition_per_atom, num_atoms, batch, batch_size, total_atoms)
- else:
- cur_atom_types = gt_atom_types
- # init coords.
- cur_frac_coords = np.random.rand(total_atoms, 3)
-
- # annealed langevin dynamics.
- for sigma in tqdm(self.sigmas, total=self.sigmas.shape[0], disable=ld_kwargs.disable_bar, position=0):
- if sigma < ld_kwargs.min_sigma:
- break
- step_size = ld_kwargs.step_lr * (sigma / self.sigmas[-1]) ** 2
- step_size_ms = ms.Tensor(step_size, ms.float32)
-
- for _ in range(ld_kwargs.n_step_each):
- noise_cart = np.random.randn(cur_frac_coords.shape[0],
- cur_frac_coords.shape[1]) * np.sqrt(step_size * 2)
- noise_cart = ms.Tensor(noise_cart, ms.float32)
- (_, idx_s, idx_t, id3_ca, id3_ba,
- id3_ragged_idx, id3_ragged_idx_max, _, d_st, v_st,
- id_swap, y_l_m) = self.decoder_preprocess.graph_generation(
- cur_frac_coords, num_atoms.asnumpy(),
- lengths.asnumpy(), angles.asnumpy(),
- edge_index=None,
- to_jimages=None,
- num_bonds=None)
- cur_frac_coords = ms.Tensor(cur_frac_coords, ms.float32)
- batch = ms.ops.repeat_interleave(
- mint.arange(num_atoms.shape[0]), num_atoms, 0)
-
- #### decoder ####
- pred_cart_coord_diff, pred_atom_types = self.decoder(
- cur_atom_types, idx_s, idx_t, id3_ca, id3_ba, id3_ragged_idx, id3_ragged_idx_max,
- y_l_m, d_st, v_st, id_swap, batch, z_per_atom, total_atoms, batch_size)
-
- cur_cart_coords = frac_to_cart_coords(
- cur_frac_coords, lengths, angles, batch, self.lift_global)
- pred_cart_coord_diff = mint.div(
- pred_cart_coord_diff, ms.Tensor(sigma, ms.float32))
- cur_cart_coords = cur_cart_coords + \
- mint.mul(step_size_ms, pred_cart_coord_diff) + noise_cart
- cur_frac_coords = cart_to_frac_coords(
- cur_cart_coords, lengths, angles, batch, self.lift_global)
-
- if gt_atom_types is None:
- cur_atom_types = mint.argmax(pred_atom_types, dim=1) + 1
- if ld_kwargs.save_traj:
- all_frac_coords.append(cur_frac_coords)
- all_pred_cart_coord_diff.append(
- step_size * pred_cart_coord_diff)
- all_noise_cart.append(noise_cart)
- all_atom_types.append(cur_atom_types)
- cur_frac_coords = cur_frac_coords.asnumpy()
-
- output_dict = {
- "num_atoms": num_atoms, "lengths": lengths, "angles": angles,
- "frac_coords": ms.Tensor(cur_frac_coords), "atom_types": cur_atom_types,
- "is_traj": False
- }
- if ld_kwargs.save_traj:
- output_dict.update(dict(
- all_frac_coords=mint.stack(all_frac_coords, dim=0),
- all_atom_types=mint.stack(all_atom_types, dim=0),
- all_pred_cart_coord_diff=mint.stack(
- all_pred_cart_coord_diff, dim=0),
- all_noise_cart=mint.stack(all_noise_cart, dim=0),
- is_traj=True))
- return output_dict
-
- def construct(self, atom_types, dist, idx_kj, idx_ji, edge_j, edge_i,
- batch, lengths, num_atoms, angles, frac_coords, y, batch_size, sbf, total_atoms,
- teacher_forcing=True, training=True):
- """CDVAE construct"""
- ########### encoder ############
- mu, log_var, z = self.encode(atom_types, dist, idx_kj, idx_ji, edge_j, edge_i,
- batch, total_atoms, batch_size, sbf)
- ########### decode stats ############
- (pred_num_atoms, pred_lengths_and_angles, pred_lengths, pred_angles,
- pred_composition_per_atom, z_per_atom, batch) = self.decode_stats(
- z, batch, num_atoms, lengths, angles, teacher_forcing)
-
- out = self.add_noise(atom_types, num_atoms, pred_lengths, pred_angles, frac_coords,
- pred_composition_per_atom)
- (num_atoms_numpy, pred_lengths_numpy, pred_angles_numpy, noisy_frac_coords,
- used_sigmas_per_atom, used_type_sigmas_per_atom, rand_atom_types) = out
-
- (_, idx_s, idx_t, id3_ca, id3_ba,
- id3_ragged_idx, id3_ragged_idx_max, _, d_st, v_st,
- id_swap, y_l_m) = self.decoder_preprocess.graph_generation(
- noisy_frac_coords, num_atoms_numpy, pred_lengths_numpy, pred_angles_numpy,
- edge_index=None, to_jimages=None, num_bonds=None,)
- # switch to ms.Tensor
- noisy_frac_coords = ms.Tensor(noisy_frac_coords, ms.float32)
- used_sigmas_per_atom = ms.Tensor(used_sigmas_per_atom, ms.float32)
- used_type_sigmas_per_atom = ms.Tensor(
- used_type_sigmas_per_atom, ms.float32)
- rand_atom_types = ms.Tensor(rand_atom_types, ms.int32)
-
- ################ decoder ############
- pred_cart_coord_diff, pred_atom_types = self.decoder(
- rand_atom_types, idx_s, idx_t, id3_ca, id3_ba, id3_ragged_idx, id3_ragged_idx_max,
- y_l_m, d_st, v_st, id_swap, batch, z_per_atom, total_atoms, batch_size)
-
- ################ compute loss ############
- num_atom_loss = self.num_atom_loss(pred_num_atoms, num_atoms)
- lattice_loss = self.lattice_loss(
- pred_lengths_and_angles, lengths, num_atoms, angles)
- composition_loss = self.composition_loss(
- pred_composition_per_atom, atom_types, batch, batch_size)
- coord_loss = self.coord_loss(
- pred_cart_coord_diff, noisy_frac_coords, used_sigmas_per_atom,
- batch, lengths, angles, frac_coords, batch_size)
- type_loss = self.type_loss(pred_atom_types, atom_types,
- used_type_sigmas_per_atom, batch, batch_size)
- kld_loss = self.kld_loss(mu, log_var)
-
- if self.set_predict_property:
- property_loss = self.property_loss(z, y)
- else:
- property_loss = ms.Tensor([0], ms.float32)
- outputs = {
- "num_atom_loss": num_atom_loss,
- "lattice_loss": lattice_loss,
- "composition_loss": composition_loss,
- "coord_loss": coord_loss,
- "type_loss": type_loss,
- "kld_loss": kld_loss,
- "property_loss": property_loss,
- "pred_num_atoms": pred_num_atoms,
- "pred_lengths_and_angles": pred_lengths_and_angles,
- "pred_lengths": pred_lengths,
- "pred_angles": pred_angles,
- "pred_cart_coord_diff": pred_cart_coord_diff,
- "pred_atom_types": pred_atom_types,
- "pred_composition_per_atom": pred_composition_per_atom,
- "target_frac_coords": frac_coords,
- "target_atom_types": atom_types,
- "rand_frac_coords": noisy_frac_coords,
- "rand_atom_types": rand_atom_types,
- "z": z,
- }
- loss = self.compute_stats(batch, outputs, batch_size, training)
- return loss
-
- def add_noise(self, atom_types, num_atoms, pred_lengths, pred_angles, frac_coords,
- pred_composition_per_atom):
- r"""
- Adds noise to the given input parameters and returns the modified values.
-
- Args:
- atom_types (Tensor): Array of atom types. The shape of tensor is :math:`(total\_atoms,)`.
- num_atoms (Tensor): Array of number of atoms. The shape of tensor is :math:`(batch\_size,)`.
- pred_lengths (Tensor): Array of predicted lengths. The shape of tensor is :math:`(batch\_size, 3)`.
- pred_angles (Tensor): Array of predicted angles. The shape of tensor is :math:`(batch\_size, 3)`.
- frac_coords (Tensor): Array of fractional coordinates. The shape of tensor is :math:`(total\_atoms, 3)`.
- pred_composition_per_atom (Tensor): Array of predicted composition probabilities per atom.
- The shape of tensor is :math:`(total\_atoms,)`.
-
- Returns:
- Tuple of ndarray, including: num_atoms_numpy, pred_lengths_numpy, pred_angles_numpy, noisy_frac_coords,
- used_sigmas_per_atom, used_type_sigmas_per_atom and rand_atom_types.
- """
- one_hot_res = mint.nn.functional.one_hot(
- mint.sub(atom_types, 1), self.max_atomic_num)
- one_hot_res = one_hot_res.asnumpy()
- pred_composition_probs = mint.softmax(
- pred_composition_per_atom, dim=-1)
- pred_composition_probs = pred_composition_probs.asnumpy()
- num_atoms_numpy = num_atoms.asnumpy()
- pred_lengths_numpy = pred_lengths.asnumpy()
- pred_angles_numpy = pred_angles.asnumpy()
- frac_coords_numpy = frac_coords.asnumpy()
- # sample noise levels.
- noise_level = np.random.randint(
- 0, self.sigmas.shape[0], (1, num_atoms.shape[0]))
- used_sigmas_per_atom = np.repeat(
- self.sigmas[noise_level], num_atoms_numpy)
- type_noise_level = np.random.randint(
- 0, self.type_sigmas.shape[0], (1, num_atoms.shape[0]))
- # test num_atoms
- used_type_sigmas_per_atom = np.repeat(self.type_sigmas[type_noise_level],
- num_atoms_numpy)
- # add noise to atom types and sample atom types.
- atom_type_probs = (one_hot_res + pred_composition_probs *
- used_type_sigmas_per_atom[:, None])
-
- rand_atom_types = np.zeros(atom_types.shape[0])
- for i in range(atom_types.shape[0]):
- rand_atom_types[i] = np.random.choice(
- 100, 1, p=atom_type_probs[i] / atom_type_probs[i].sum()) + 1
-
- coord_rand = np.random.rand(frac_coords.shape[0], frac_coords.shape[1])
- cart_noises_per_atom = (
- coord_rand * used_sigmas_per_atom[:, None])
- cart_coords = frac_to_cart_coords_numpy(
- frac_coords_numpy, pred_lengths_numpy, pred_angles_numpy, num_atoms_numpy)
- cart_coords = cart_coords + cart_noises_per_atom
- noisy_frac_coords = cart_to_frac_coords_numpy(
- cart_coords, pred_lengths_numpy, pred_angles_numpy, num_atoms_numpy)
- return (num_atoms_numpy, pred_lengths_numpy, pred_angles_numpy, noisy_frac_coords,
- used_sigmas_per_atom, used_type_sigmas_per_atom, rand_atom_types)
-
- def sample_composition(self, composition_prob, num_atoms, batch, batch_size, total_atoms):
- r"""
- Samples composition such that it exactly satisfies composition_prob
-
- Args:
- composition_prob (Tensor): The shape of tensor is :math:`(total\_atoms, max_atomic_num)`.
- num_atoms (Tensor): The shape of tensor is :math:`(batch\_size,)`.
- batch (Tensor): The shape of tensor is :math:`(total\_atoms,)`.
- batch_size (int): Batch size.
- total_atoms (int): Total atoms.
-
- Returns:
- (Tensor): Sampled composition.
- """
- assert composition_prob.shape[0] == total_atoms == batch.shape[0]
- out = mint.zeros((batch_size, composition_prob.shape[1]))
- composition_prob = self.aggregate_mean(composition_prob, batch, out)
- all_sampled_comp = []
- for comp_prob, num_atom in zip(list(composition_prob), list(num_atoms)):
- comp_num = ms.ops.round(comp_prob * num_atom).astype(ms.int32)
- if mint.max(comp_num) != 0:
- atom_type = mint.nonzero(comp_num)[:, 0] + 1
- else:
- atom_type = (mint.argmax(
- mint.mul(comp_prob, num_atom)) + 1).view(1, -1)
- comp_num[atom_type - 1] = 1
- atom_num = comp_num[atom_type - 1].view(-1)
-
- sampled_comp = ms.ops.repeat_interleave(
- atom_type, atom_num).astype(ms.int32)
-
- # if the rounded composition gives less atoms, sample the rest
- if sampled_comp.shape[0] < num_atom:
- left_atom_num = num_atom - sampled_comp.shape[0]
- left_comp_prob = mint.div(
- comp_prob - comp_num.float(), num_atom)
- # left_comp_prob[left_comp_prob < 0.] = 0.
- left_comp_prob = mint.where(
- left_comp_prob < 0., 0., left_comp_prob)
- left_comp = ms.ops.multinomial(
- left_comp_prob, num_samples=left_atom_num, replacement=True)
- # convert to atomic number
- left_comp = left_comp + 1
- sampled_comp = mint.cat((sampled_comp, left_comp), dim=0)
- # sampled_comp[:num_atom])
- sampled_comp = ms.ops.shuffle(mint.narrow(
- sampled_comp, 0, 0, num_atom.item()))
- all_sampled_comp.append(sampled_comp)
- all_sampled_comp = mint.cat(all_sampled_comp, dim=0)
- assert all_sampled_comp.shape[0] == num_atoms.sum()
- return all_sampled_comp
-
- def predict_num_atoms(self, z):
- return self.fc_num_atoms(z)
-
- def predict_lattice(self, z, num_atoms):
- """predict lattice constants and angles"""
- pred_lengths_and_angles = self.fc_lattice(z)
- scaled_preds = self.lattice_scaler.inverse_transform(
- pred_lengths_and_angles)
- pred_lengths, pred_angles = mint.split(scaled_preds, 3, 1)
- if self.lattice_scale_method == "scale_length":
- pred_lengths = mint.mul(pred_lengths, mint.pow(
- num_atoms.view(-1, 1), (1 / 3)))
- return pred_lengths_and_angles, pred_lengths, pred_angles
-
- def predict_composition(self, z_per_atom):
- pred_composition_per_atom = self.fc_composition(z_per_atom)
- return pred_composition_per_atom
-
- def num_atom_loss(self, pred_num_atoms, num_atoms):
- """compute num atom loss"""
- return ms.ops.cross_entropy(pred_num_atoms, num_atoms)
-
- def property_loss(self, z, y):
- """compute property loss"""
- return ms.ops.mse_loss(self.fc_property(z), y)
-
- def lattice_loss(self, pred_lengths_and_angles, lengths, num_atoms, angles):
- """compute lattice loss"""
- assert self.lattice_scale_method == "scale_length"
- target_lengths = lengths / \
- mint.pow(num_atoms.view(-1, 1), (1 / 3))
- target_lengths_and_angles = mint.cat(
- (target_lengths, angles), dim=-1)
- target_lengths_and_angles = self.lattice_scaler.transform(
- target_lengths_and_angles)
- return ms.ops.mse_loss(pred_lengths_and_angles, target_lengths_and_angles)
-
- def composition_loss(self, pred_composition_per_atom, target_atom_types, batch, batch_size):
- """compute composition loss"""
- target_atom_types = target_atom_types - 1
- loss = ms.ops.cross_entropy(pred_composition_per_atom,
- target_atom_types, reduction="none")
- out = mint.zeros(batch_size)
- return mint.mean(self.aggregate_mean(loss, batch, out))
-
- def coord_loss(self, pred_cart_coord_diff, noisy_frac_coords,
- used_sigmas_per_atom, batch, lengths, angles,
- frac_coords, batch_size):
- r"""
- comput coord loss
-
- Args:
- pred_cart_coord_diff (Tensor): The shape of tensor is :math:`(total\_atoms, 3)`..
- noisy_frac_coords (Tensor): The shape of tensor is :math:`(total\_atoms, 3)`.
- used_sigmas_per_atom (Tensor): The shape of tensor is :math:`(total\_atoms,)`.
- batch (Tensor): The shape of tensor is :math:`(total\_atoms,)`.
- lengths (Tensor): The shape of tensor is :math:`(batch\_size, 3)`.
- angles (Tensor): The shape of tensor is :math:`(batch\_size, 3)`.
- frac_coords (Tensor): The shape of tensor is :math:`(total\_atoms, 3)`.
- batch_size (int): Batch size.
-
- Returns:
- (Tensor): Loss.
- """
- noisy_cart_coords = frac_to_cart_coords(
- noisy_frac_coords, lengths, angles, batch, self.lift_global)
- target_cart_coords = frac_to_cart_coords(
- frac_coords, lengths, angles, batch, self.lift_global)
- _, target_cart_coord_diff = min_distance_sqr_pbc(
- target_cart_coords, noisy_cart_coords, lengths, angles,
- batch, batch_size, self.lift_global, return_vector=True)
-
- target_cart_coord_diff = target_cart_coord_diff / \
- ms.ops.pow(used_sigmas_per_atom.view(-1, 1), 2)
- pred_cart_coord_diff = pred_cart_coord_diff / \
- used_sigmas_per_atom.view(-1, 1)
-
- loss_per_atom = mint.sum(
- ms.ops.pow((target_cart_coord_diff - pred_cart_coord_diff), 2), dim=1)
-
- loss_per_atom = 0.5 * loss_per_atom * \
- ms.ops.pow(used_sigmas_per_atom, 2)
- out = mint.zeros(batch_size)
- return mint.mean(self.aggregate_mean(loss_per_atom, batch, out))
-
- def type_loss(self, pred_atom_types, target_atom_types,
- used_type_sigmas_per_atom, batch, batch_size):
- """compute type loss"""
- target_atom_types = target_atom_types - 1
- loss = ms.ops.cross_entropy(
- pred_atom_types, target_atom_types, reduction="none")
- # rescale loss according to noise
- loss = mint.div(loss, used_type_sigmas_per_atom)
- out = mint.zeros(batch_size)
- return mint.mean(self.aggregate_mean(loss, batch, out))
-
- def kld_loss(self, mu, log_var):
- """compute kld loss"""
- kld_loss = mint.mean(
- -0.5 * mint.sum(mint.sub(mint.sub(mint.add(1, log_var), ms.ops.pow(mu, 2)),
- mint.exp(log_var)), dim=1), dim=0)
- return kld_loss
-
- def compute_stats(self, batch, outputs, batch_size, prefix):
- r"""compute stats
- Args:
- batch (Tensor): The shape of tensor is :math:`(total\_atoms,)`.
- outputs (dict): The output dict.
- batch_size (int): Batch size.
- prefix (bool): If True, return training loss,
- else, return validation loss.
- Returns:
- (Tensor): Loss.
- """
- num_atom_loss = outputs["num_atom_loss"]
- lattice_loss = outputs["lattice_loss"]
- coord_loss = outputs["coord_loss"]
- type_loss = outputs["type_loss"]
- kld_loss = outputs["kld_loss"]
- composition_loss = outputs["composition_loss"]
- property_loss = outputs["property_loss"]
-
- cost_natom = self.configs.get("cost_natom")
- cost_coord = self.configs.get("cost_coord")
- cost_type = self.configs.get("cost_type")
- cost_lattice = self.configs.get("cost_lattice")
- cost_composition = self.configs.get("cost_composition")
- cost_property = self.configs.get("cost_property")
- beta = self.configs.get("beta")
-
- loss = mint.sum(mint.stack((
- mint.mul(cost_natom, num_atom_loss),
- mint.mul(cost_lattice, lattice_loss),
- mint.mul(cost_coord, coord_loss),
- mint.mul(cost_type, type_loss),
- mint.mul(beta, kld_loss),
- mint.mul(cost_composition, composition_loss),
- mint.mul(cost_property, property_loss))))
-
- if prefix is False:
- # validation/test loss only has coord and type
- loss = (
- cost_coord * coord_loss +
- cost_type * type_loss)
-
- # evaluate atom type prediction.
- pred_atom_types = outputs["pred_atom_types"]
- target_atom_types = outputs["target_atom_types"]
- type_accuracy = pred_atom_types.argmax(
- axis=-1) == (target_atom_types - 1)
- type_accuracy = type_accuracy.astype(ms.float32)
- out = mint.zeros_like(type_accuracy[:batch_size])
- type_accuracy = mint.mean(
- self.aggregate_mean(type_accuracy, batch, out))
-
- return loss
diff --git a/MindChemistry/mindchemistry/cell/cdvae/decoder.py b/MindChemistry/mindchemistry/cell/cdvae/decoder.py
deleted file mode 100644
index 5ac61268401c70a16b44a9fd162ec3cf048bdfc1..0000000000000000000000000000000000000000
--- a/MindChemistry/mindchemistry/cell/cdvae/decoder.py
+++ /dev/null
@@ -1,88 +0,0 @@
-# Copyright 2024 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ============================================================================
-"""decoder
-"""
-import mindspore.nn as nn
-import mindspore.mint as mint
-from ..gemnet.layers.embedding_block import MAX_ATOMIC_NUM
-from ..gemnet.gemnet import GemNetT
-
-
-class GemNetTDecoder(nn.Cell):
- r"""
- Decoder with GemNetT.
-
- Args:
- config_path (str): Path to the config file.
- hidden_dim (int): Number of prediction targets. Default: ``128``.
- latent_dim(int): Dimension of the latent parameter 'z'. Default: ``256``.
- max_neighbors(int): Dimension of the latent parameter 'z'. Default: ``20``.
- radius(float): Dimension of the latent parameter 'z'. Default: ``6.``.
-
- Inputs:
- - **pred_atom_types** (Tensor) - The shape of tensor is :math:`(total\_atoms,)`.
- - **idx_s** (Tensor) - The shape of tensor is :math:`(total\_edges,)`.
- - **idx_t** (Tensor) - The shape of tensor is :math:`(total\_edges,)`.
- - **id3_ca** (Tensor) - The shape of tensor is :math:`(total\_triplets,)`.
- - **id3_ba** (Tensor) - The shape of tensor is :math:`(total\_triplets,)`.
- - **id3_ragged_idx** (Tensor) - The shape of tensor is :math:`(total\_triplets,)`.
- - **id3_ragged_idx_max** (int) - The maximum of id3_ragged_idx.
- - **y_l_m** (Tensor) - The shape of tensor is :math:`(num\_spherical, total\_triplets)`.
- - **d_st** (Tensor) - The shape of tensor is :math:`(total\_edges,)`.
- - **v_st** (Tensor) - The shape of tensor is :math:`(total\_edges, 3)`.
- - **id_swap** (Tensor) - The shape of tensor is :math:`(total\_triplets,)`.
- - **batch** (Tensor) - The shape of tensor is :math:`(total\_atoms,)`.
- - **z_per_atom** (Tensor) - The shape of tensor is :math:`(total\_atoms, latent\_dim)`.
- - **total_atoms** (int) - Total number of atoms.
- - **batch_size** (int) - batch_size.
-
- Outputs:
- - **atom_frac_coords** (Tensor) - The shape of tensor is :math:`(total\_atoms, 3)`.
- - **atom_types** (Tensor) - The shape of tensor is :math:`(total\_atoms, MAX\_ATOMIC\_NUM)`.
- """
-
- def __init__(
- self,
- config_path,
- hidden_dim=128,
- latent_dim=256,
- max_neighbors=20,
- radius=6.,
- ):
- super().__init__()
- self.cutoff = radius
- self.max_num_neighbors = max_neighbors
-
- self.gemnet = GemNetT(
- num_targets=1,
- latent_dim=latent_dim,
- emb_size_atom=hidden_dim,
- emb_size_edge=hidden_dim,
- regress_forces=True,
- cutoff=self.cutoff,
- max_neighbors=self.max_num_neighbors,
- config_path=config_path
- )
- self.fc_atom = mint.nn.Linear(hidden_dim, MAX_ATOMIC_NUM)
-
- def construct(self, pred_atom_types, idx_s, idx_t, id3_ca, id3_ba,
- id3_ragged_idx, id3_ragged_idx_max, y_l_m, d_st, v_st, id_swap, batch, z_per_atom,
- total_atoms, batch_size):
- """construct"""
- h, pred_cart_coord_diff = self.gemnet(
- pred_atom_types, idx_s, idx_t, id3_ca, id3_ba, id3_ragged_idx,
- id3_ragged_idx_max, y_l_m, d_st, v_st, id_swap, batch, z_per_atom, total_atoms, batch_size)
- pred_atom_types = self.fc_atom(h)
- return pred_cart_coord_diff, pred_atom_types
diff --git a/MindChemistry/mindchemistry/e3/o3/rotation.py b/MindChemistry/mindchemistry/e3/o3/rotation.py
index 6a241a5c45d0006291abff5a195b0ced2fe66c73..96bbe21cc4be755124cf97043f55733d6158a11f 100644
--- a/MindChemistry/mindchemistry/e3/o3/rotation.py
+++ b/MindChemistry/mindchemistry/e3/o3/rotation.py
@@ -288,7 +288,7 @@ def matrix_to_angles(r_param):
Conversion from matrix to angles.
Args:
- R (Tensor): The rotation matrices. Matrices of shape :math:`(..., 3, 3)`.
+ r_param (Tensor): The rotation matrices. Matrices of shape :math:`(..., 3, 3)`.
Returns:
- alpha (Tensor), The alpha Euler angles. The shape of Tensor is :math:`(...)`.
diff --git a/MindChemistry/version.txt b/MindChemistry/version.txt
index 6c6aa7cb0918dc7a1cfa3635fb7f8792ac4cb218..74a1f8ab2e15452392df5be4c43c1e95779534e0 100644
--- a/MindChemistry/version.txt
+++ b/MindChemistry/version.txt
@@ -1 +1,2 @@
+0.2.0
0.1.0
\ No newline at end of file
diff --git a/docs/api_python/mindchemistry/e3/o3/mindchemistry.e3.o3.matrix_to_angles.rst b/docs/api_python/mindchemistry/e3/o3/mindchemistry.e3.o3.matrix_to_angles.rst
index 54ed92aa5970ff550a8aed1f6e59ecc608423421..9863e912afb5da8e8b77410dff94ee43050d196c 100644
--- a/docs/api_python/mindchemistry/e3/o3/mindchemistry.e3.o3.matrix_to_angles.rst
+++ b/docs/api_python/mindchemistry/e3/o3/mindchemistry.e3.o3.matrix_to_angles.rst
@@ -6,7 +6,7 @@ mindchemistry.e3.o3.matrix_to_angles
从矩阵到角度的转换。
参数:
- - **R** (Tensor) - 旋转矩阵。形状为 :math:`(..., 3, 3)` 的矩阵。
+ - **r_param** (Tensor) - 旋转矩阵。形状为 :math:`(..., 3, 3)` 的矩阵。
返回:
- **alpha** (Tensor) - Alpha 欧拉角。形状为 :math:`(...)` 的张量。