# DI-engine
**Repository Path**: wayne297/DI-engine
## Basic Information
- **Project Name**: DI-engine
- **Description**: OpenDILab决策智能引擎 https://github.com/opendilab/DI-engine
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 1
- **Forks**: 25
- **Created**: 2021-11-30
- **Last Updated**: 2021-12-08
## Categories & Tags
**Categories**: Uncategorized
**Tags**: AI
## README
---
[](https://pypi.org/project/DI-engine/)











[](https://codecov.io/gh/opendilab/DI-engine)

[](https://github.com/opendilab/DI-engine/stargazers)
[](https://github.com/opendilab/DI-engine/network)

[](https://github.com/opendilab/DI-engine/issues)
[](https://github.com/opendilab/DI-engine/pulls)
[](https://github.com/opendilab/DI-engine/graphs/contributors)
[](https://github.com/opendilab/DI-engine/blob/master/LICENSE)
Updated on 2021.12.03 DI-engine-v0.2.2 (beta)
## Introduction to DI-engine (beta)
DI-engine is a generalized Decision Intelligence engine. It supports most basic deep reinforcement learning (DRL) algorithms, such as DQN, PPO, SAC, and domain-specific algorithms like QMIX in multi-agent RL, GAIL in inverse RL, and RND in exploration problems. Various training pipelines and customized decision AI applications are also supported. Have fun with exploration and exploitation.
### Application
- [DI-star](https://github.com/opendilab/DI-star)
- [DI-drive](https://github.com/opendilab/DI-drive)
### Environment
- [GoBigger](https://github.com/opendilab/GoBigger)
### System Optimization and Design
- [DI-orchestrator](https://github.com/opendilab/DI-orchestrator)
- [DI-hpc](https://github.com/opendilab/DI-hpc)
- [DI-store](https://github.com/opendilab/DI-store)
### Other
- [DI-engine-docs](https://github.com/opendilab/DI-engine-docs)
- [treevalue](https://github.com/opendilab/treevalue)
- [DI-treetensor](https://github.com/opendilab/DI-treetensor)
## Installation
You can simply install DI-engine from PyPI with the following command:
```bash
pip install DI-engine
```
If you use Anaconda or Miniconda, you can install DI-engine from conda-forge through the following command:
```bash
conda install -c opendilab di-engine
```
For more information about installation, you can refer to [installation](https://opendilab.github.io/DI-engine/installation/index.html).
And our dockerhub repo can be found [here](https://hub.docker.com/repository/docker/opendilab/ding),we prepare `base image` and `env image` with common RL environments.
- base: opendilab/ding:nightly
- atari: opendilab/ding:nightly-atari
- mujoco: opendilab/ding:nightly-mujoco
- smac: opendilab/ding:nightly-smac
## Documentation
The detailed documentation are hosted on [doc](https://opendilab.github.io/DI-engine/)([中文文档](https://di-engine-docs.readthedocs.io/en/main-zh/)).
## Quick Start
[3 Minutes Kickoff](https://opendilab.github.io/DI-engine/quick_start/index.html)
[3 Minutes Kickoff(colab)](https://colab.research.google.com/drive/1J29voOD2v9_FXjW-EyTVfRxY_Op_ygef#scrollTo=MIaKQqaZCpGz)
[3 分钟上手中文版(kaggle)](https://www.kaggle.com/shenzhenperson/di-engine)
**Bonus: Train RL agent in one line code:**
```bash
ding -m serial -e cartpole -p dqn -s 0
```
## Supporters
### ↳ Stargazers
[](https://github.com/opendilab/DI-engine/stargazers)
### ↳ Forkers
[](https://github.com/opendilab/DI-engine/network/members)
## Feature
### Algorithm Versatility
| No | Algorithm | Label | Doc and Implementation | Runnable Demo |
| :--: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| 1 | [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) |  | [DQN中文文档](https://di-engine-docs.readthedocs.io/en/main-zh/hands_on/dqn_zh.html)
[policy/dqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0 |
| 2 | [C51](https://arxiv.org/pdf/1707.06887.pdf) |  | [policy/c51](https://github.com/opendilab/DI-engine/blob/main/ding/policy/c51.py) | ding -m serial -c cartpole_c51_config.py -s 0 |
| 3 | [QRDQN](https://arxiv.org/pdf/1710.10044.pdf) |  | [policy/qrdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qrdqn.py) | ding -m serial -c cartpole_qrdqn_config.py -s 0 |
| 4 | [IQN](https://arxiv.org/pdf/1806.06923.pdf) |  | [policy/iqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/iqn.py) | ding -m serial -c cartpole_iqn_config.py -s 0 |
| 5 | [Rainbow](https://arxiv.org/pdf/1710.02298.pdf) |  | [policy/rainbow](https://github.com/opendilab/DI-engine/blob/main/ding/policy/rainbow.py) | ding -m serial -c cartpole_rainbow_config.py -s 0 |
| 6 | [SQL](https://arxiv.org/pdf/1702.08165.pdf) |  | [policy/sql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sql.py) | ding -m serial -c cartpole_sql_config.py -s 0 |
| 7 | [R2D2](https://openreview.net/forum?id=r1lyTjAqYX) |  | [policy/r2d2](https://github.com/opendilab/DI-engine/blob/main/ding/policy/r2d2.py) | ding -m serial -c cartpole_r2d2_config.py -s 0 |
| 8 | [A2C](https://arxiv.org/pdf/1602.01783.pdf) |  | [policy/a2c](https://github.com/opendilab/DI-engine/blob/main/ding/policy/a2c.py) | ding -m serial -c cartpole_a2c_config.py -s 0 |
| 9 | [PPO](https://arxiv.org/abs/1707.06347)/[MAPPO](https://arxiv.org/pdf/2103.01955.pdf) |  | [policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | python3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 |
| 10 | [PPG](https://arxiv.org/pdf/2009.04416.pdf) |  | [policy/ppg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppg.py) | python3 -u cartpole_ppg_main.py |
| 11 | [ACER](https://arxiv.org/pdf/1611.01224.pdf) |  | [policy/acer](https://github.com/opendilab/DI-engine/blob/main/ding/policy/acer.py) | ding -m serial -c cartpole_acer_config.py -s 0 |
| 12 | [IMPALA](https://arxiv.org/abs/1802.01561) |  | [policy/impala](https://github.com/opendilab/DI-engine/blob/main/ding/policy/impala.py) | ding -m serial -c cartpole_impala_config.py -s 0 |
| 13 | [DDPG](https://arxiv.org/pdf/1509.02971.pdf)/[PADDPG](https://arxiv.org/pdf/1511.04143.pdf) |  | [policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) | ding -m serial -c pendulum_ddpg_config.py -s 0 |
| 14 | [TD3](https://arxiv.org/pdf/1802.09477.pdf) |  | [policy/td3](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3.py) | python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0 |
| 15 | [D4PG](https://arxiv.org/pdf/1804.08617.pdf) |  | [policy/d4pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/d4pg.py) | python3 -u pendulum_d4pg_config.py |
| 16 | [SAC](https://arxiv.org/abs/1801.01290) |  | [policy/sac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py) | ding -m serial -c pendulum_sac_config.py -s 0 |
| 17 | [PDQN](https://arxiv.org/pdf/1810.06394.pdf) |  | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_pdqn_config.py -s 0 |
| 18 | [MPDQN](https://arxiv.org/pdf/1905.04388.pdf) |  | [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py) | ding -m serial -c gym_hybrid_mpdqn_config.py -s 0 |
| 19 | [QMIX](https://arxiv.org/pdf/1803.11485.pdf) |  | [policy/qmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qmix.py) | ding -m serial -c smac_3s5z_qmix_config.py -s 0 |
| 20 | [COMA](https://arxiv.org/pdf/1705.08926.pdf) |  | [policy/coma](https://github.com/opendilab/DI-engine/blob/main/ding/policy/coma.py) | ding -m serial -c smac_3s5z_coma_config.py -s 0 |
| 21 | [QTran](https://arxiv.org/abs/1905.05408) |  | [policy/qtran](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qtran.py) | ding -m serial -c smac_3s5z_qtran_config.py -s 0 |
| 22 | [WQMIX](https://arxiv.org/abs/2006.10800) |  | [policy/wqmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/wqmix.py) | ding -m serial -c smac_3s5z_wqmix_config.py -s 0 |
| 23 | [CollaQ](https://arxiv.org/pdf/2010.08531.pdf) |  | [policy/collaq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/collaq.py) | ding -m serial -c smac_3s5z_collaq_config.py -s 0 |
| 24 | [GAIL](https://arxiv.org/pdf/1606.03476.pdf) |  | [reward_model/gail](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/gail_irl_model.py) | ding -m serial_gail -c cartpole_dqn_gail_config.py -s 0 |
| 25 | [SQIL](https://arxiv.org/pdf/1905.11108.pdf) |  | [entry/sqil](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_sqil.py) | ding -m serial_sqil -c cartpole_sqil_config.py -s 0 |
| 26 | [DQFD](https://arxiv.org/pdf/1704.03732.pdf) |  | [policy/dqfd](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqfd.py) | ding -m serial_dqfd -c cartpole_dqfd_config.py -s 0 |
| 27 | [R2D3](https://arxiv.org/pdf/1909.01387.pdf) |  | [policy/r2d3](https://github.com/opendilab/DI-engine/blob/main/ding/policy/r2d3.py) | python3 -u pong_r2d3_r2d2expert_config.py |
| 28 | [GCL](https://arxiv.org/pdf/1603.00448.pdf) |  | [reward_model/guided_cost](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/guided_cost_reward_model.py) | python3 lunarlander_gcl_config.py
| 29 | [HER](https://arxiv.org/pdf/1707.01495.pdf) |  | [reward_model/her](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/her_reward_model.py) | python3 -u bitflip_her_dqn.py |
| 30 | [RND](https://arxiv.org/abs/1810.12894) |  | [reward_model/rnd](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/rnd_reward_model.py) | python3 -u cartpole_ppo_rnd_main.py |
| 31 | [ICM](https://arxiv.org/pdf/1705.05363.pdf) |  | [reward_model/icm](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/icm_reward_model.py) | python3 -u cartpole_ppo_icm_config.py |
| 32 | [CQL](https://arxiv.org/pdf/2006.04779.pdf) |  | [policy/cql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/cql.py) | python3 -u d4rl_cql_main.py |
| 33 | [TD3BC](https://arxiv.org/pdf/2106.06860.pdf) |  | [policy/td3_bc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3_bc.py) | python3 -u mujoco_td3_bc_main.py |
| 34 | [MBPO](https://arxiv.org/pdf/1906.08253.pdf) |  | [model/template/model_based/mbpo](https://github.com/opendilab/DI-engine/blob/main/ding/model/template/model_based/mbpo.py) | python3 -u sac_halfcheetah_mopo_default_config.py |
| 35 | [PER](https://arxiv.org/pdf/1511.05952.pdf) |  | [worker/replay_buffer](https://github.com/opendilab/DI-engine/blob/main/ding/worker/replay_buffer/advanced_buffer.py) | `rainbow demo` |
| 36 | [GAE](https://arxiv.org/pdf/1506.02438.pdf) |  | [rl_utils/gae](https://github.com/opendilab/DI-engine/blob/main/ding/rl_utils/gae.py) | `ppo demo` |
 means discrete action space, which is only label in normal DRL algorithms (1-16)
 means continuous action space, which is only label in normal DRL algorithms (1-16)
means hybrid (discrete + continuous) action space (1-16)
 means distributed training (collector-learner parallel) RL algorithm
 means multi-agent RL algorithm
 means RL algorithm which is related to exploration and sparse reward
 means Imitation Learning, including Behaviour Cloning, Inverse RL, Adversarial Structured IL
 means offline RL algorithm
 means model-based RL algorithm
 means other sub-direction algorithm, usually as plugin-in in the whole pipeline
P.S: The `.py` file in `Runnable Demo` can be found in `dizoo`
### Environment Versatility
| No | Environment | Label | Visualization | Code and Doc Links |
| :--: | :--------------------------------------: | :---------------------------------: | :--------------------------------:|:---------------------------------------------------------: |
| 1 | [atari](https://github.com/openai/gym/tree/master/gym/envs/atari) |  |  | [code link](https://github.com/opendilab/DI-engine/tree/main/dizoo/atari/envs)
[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/env_tutorial/atari.html)
[环境指南](https://di-engine-docs.readthedocs.io/en/main-zh/env_tutorial/atari_zh.html) |
| 2 | [box2d/bipedalwalker](https://github.com/openai/gym/tree/master/gym/envs/box2d) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/bipedalwalker/envs) |
| 3 | [box2d/lunarlander](https://github.com/openai/gym/tree/master/gym/envs/box2d) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/lunarlander/envs) |
| 4 | [classic_control/cartpole](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/cartpole/envs) |
| 5 | [classic_control/pendulum](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/pendulum/envs) |
| 6 | [competitive_rl](https://github.com/cuhkrlcourse/competitive-rl) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.classic_control) |
| 7 | [gfootball](https://github.com/google-research/football) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.gfootball/envs) |
| 8 | [minigrid](https://github.com/maximecb/gym-minigrid) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/minigrid/envs) |
| 9 | [mujoco](https://github.com/openai/gym/tree/master/gym/envs/mujoco) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/majoco/envs) |
| 10 | [multiagent_particle](https://github.com/openai/multiagent-particle-envs) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/multiagent_particle/envs) |
| 11 | [overcooked](https://github.com/HumanCompatibleAI/overcooked-demo) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/overcooded/envs) |
| 12 | [procgen](https://github.com/openai/procgen) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/procgen) |
| 13 | [pybullet](https://github.com/benelot/pybullet-gym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pybullet/envs) |
| 14 | [smac](https://github.com/oxwhirl/smac) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/smac/envs) |
| 15 | [d4rl](https://github.com/rail-berkeley/d4rl) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/d4rl) |
| 16 | league_demo |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/league_demo/envs) |
| 17 | pomdp atari |  | | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pomdp/envs) |
| 18 | [bsuite](https://github.com/deepmind/bsuite) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bsuite/envs) |
| 19 | [ImageNet](https://www.image-net.org/) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/image_classification) |
| 20 | [slime_volleyball](https://github.com/hardmaru/slimevolleygym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/slime_volley) |
| 21 | [gym_hybrid](https://github.com/thomashirtz/gym-hybrid) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_hybrid) |
| 22 | [GoBigger](https://github.com/opendilab/GoBigger) |  |  | [opendilab link](https://github.com/opendilab/GoBigger-Challenge-2021/tree/main/di_baseline) |
| 23 | [gym_soccer](https://github.com/openai/gym-soccer) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_soccer) |
 means discrete action space
 means continuous action space
 means hybrid (discrete + continuous) action space
 means multi-agent RL environment
 means environment which is related to exploration and sparse reward
 means offline RL environment
 means Imitation Learning or Supervised Learning Dataset
 means environment that allows agent VS agent battle
P.S. some enviroments in Atari, such as **MontezumaRevenge**, are also sparse reward type
## Contribution
We appreciate all contributions to improve DI-engine, both algorithms and system designs. Please refer to CONTRIBUTING.md for more guides. And our roadmap can be accessed by [this link](https://github.com/opendilab/DI-engine/projects).
And users can join our [slack communication channel](https://join.slack.com/t/opendilab/shared_invite/zt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ) or our [forum](https://github.com/opendilab/DI-engine/discussions) for more detailed discussion.
For future plans or milestones, please refer to our [GitHub Projects](https://github.com/opendilab/DI-engine/projects).
## Citation
```latex
@misc{ding,
title={{DI-engine: OpenDILab} Decision Intelligence Engine},
author={DI-engine Contributors},
publisher = {GitHub},
howpublished = {\url{https://github.com/opendilab/DI-engine}},
year={2021},
}
```
## License
DI-engine released under the Apache 2.0 license.