# PPOxFamily
**Repository Path**: opendilab_admin/PPOxFamily
## Basic Information
- **Project Name**: PPOxFamily
- **Description**: PPO x Family DRL Tutorial Course(决策智能入门级公开课:8节课帮你盘清算法理论,理顺代码逻辑,玩转决策AI应用实践 )
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 5
- **Forks**: 1
- **Created**: 2022-12-05
- **Last Updated**: 2024-06-05
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# PPO x Family 决策智能入门公开课
欢迎来到 **PPO x Family** 系列决策智能入门公开课。该系列将深入理解深度强化学习算法 PPO ,灵活运用**一个 PPO 算法**解决几乎**所有常见的决策智能应用** ,帮助一切对于深度强化学习技术有好奇心的人,轻便且高效地制作应用原型,了解和学习最强大最易用的 PPO Family 。
P.S. 路过记得点个 star  ,2022年12月起持续更新中~
# News
- 2023.05.31: PPO x Family 第七章(挖掘黑科技)将在6月1日晚七点正式上线
- 2023.04.06: [bilibili] [PPO x Family 第六章(统筹多智能体)正式上线](https://www.bilibili.com/video/BV1dg4y1g7BC)
- 2023.03.09: [bilibili] [PPO x Family 第五章(探索时序建模)正式上线](https://www.bilibili.com/video/BV1Uj411u7GA)
- 2023.02.23: [bilibili] [PPO x Family 第四章(解密稀疏奖励空间)正式上线](https://www.bilibili.com/video/BV15j411F7ni)
- 2023.01.16: [bilibili] [PPO x Family 第三章(表征多模态观察空间)正式上线](https://www.bilibili.com/video/BV1rK411r7Kg)
- 2022.12.23: [bilibili] [PPO x Family 第二章(解构复杂动作空间)正式上线](https://www.bilibili.com/video/BV1wv4y167w2)
- 2022.12.23: PPO x Family ”算法-代码“ 注解文档网站上线 [传送门](https://opendilab.github.io/PPOxFamily/)
- 2022.12.08: [bilibili] [PPO x Family 第一章(开启决策AI探索之旅)正式上线](https://www.bilibili.com/video/BV1cG4y137dJ)
- 2022.12.06: [bilibili] [PPO x Family 第一章微课视频:4分钟带你快速入门强化学习的万能钥匙](https://www.bilibili.com/video/BV1e841157Um/)
- 2022.12.05: [PaperWeekly] [给你一个 PPO × Family 课程,撑起整个决策 AI 宇宙](https://mp.weixin.qq.com/s/KCKfH1VnQGnNWW6svVxdXw)
- 2022.12.01: [bilibili] [PPO x Family 课程品牌宣传视频](https://www.bilibili.com/video/BV1sK411R7JP/?spm_id_from=333.337.search-card.all.click)
- 2022.11.30: [机器之心] [集中一点,演化无限:PPO × Family决策智能入门公开课即日开讲](https://mp.weixin.qq.com/s/l_JB3-BgLE2pEBJ2zgRkGQ)
- 2022.11.30: [中国计算机学会CCF] [【CCF科普群星计划】决策智能入门公开课开课啦](https://mp.weixin.qq.com/s/NkHi7eeUgQkp31R5Qgsbvw)
# 课程大纲
# 内容导航
| 章节(视频课) | 算法理论资料 | 补充资料 | 习题 |代码样例 | 应用样例|
|------|-----|----------|-------|-----| ---|
| [第一章:开启决策AI探索之旅](https://www.bilibili.com/video/BV1cG4y137dJ) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_lecture.pdf)
[课程文字稿](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_manuscript.pdf) | [微课视频](https://www.bilibili.com/video/BV1e841157Um)
[策略梯度](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_supp_pg.pdf)
[A2C](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_supp_a2c.pdf)
[TRPO](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_supp_trpo.pdf)
[符号表](https://github.com/opendilab/PPOxFamily/blob/main/common/notation.pdf)
[QA总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_hw_solution.pdf) | [PG算法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/pg_zh.py)
[A2C算法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/a2c_zh.py)
[PPO算法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/ppo_zh.py) | [应用混剪](https://www.bilibili.com/video/BV1vW4y1M7cH/?spm_id_from=333.337.search-card.all.click) |
| [第二章:解构复杂动作空间](https://www.bilibili.com/video/BV1wv4y167w2) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_lecture.pdf)
[课程文字稿](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_manuscript.pdf) | [重参数化](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_supp_reparameterization.pdf)
[PPO&DDPG](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_supp_ppovsddpg.pdf)
[HyAR](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_supp_hyar.pdf)
[QA总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_hw_solution.pdf) | [离散动作示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/discrete_tutorial_zh.py)
[连续动作示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/continuous_tutorial_zh.py)
[混合动作示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/hybrid_tutorial_zh.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_application_demo.py) | [火箭回收等](https://github.com/opendilab/PPOxFamily/issues/4) |
| [第三章:表征多模态动作空间](https://www.bilibili.com/video/BV1rK411r7Kg) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_lecture.pdf)
[课程文字稿](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_manuscript.pdf) | [表征学习](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_supp_representation.pdf)
[PPG](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_supp_ppg.pdf)
[不变性](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_supp_invariance.pdf)
[QA总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_hw_solution.pdf) | [编码方法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/encoding.py)
[Wrapper示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/mario_wrapper.py)
[计算图示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/gradient.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_application_demo.py) | [软体机器人等](https://github.com/opendilab/PPOxFamily/issues/8) |
| [第四章:解密稀疏奖励空间](https://www.bilibili.com/video/BV15j411F7ni) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_lecture.pdf) | [逆强化学习](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_supp_irl.pdf)
[行为克隆BC](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_supp_bc.pdf)
[QA总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_hw_solution.pdf) | [RND示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_rnd.py)
[Pop-Art示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_popart.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_application_demo.py) | [自动驾驶等](https://github.com/opendilab/PPOxFamily/issues/44) |
| [第五章:探索时序建模](https://www.bilibili.com/video/BV1Uj411u7GA) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_lecture.pdf) | [随机性策略](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_supp_sto_det.pdf)
[RWKV](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_supp_rwkv.pdf)
[Belief MDP](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_supp_belief.pdf)
[QA总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_hw_solution.pdf) | [LSTM示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/lstm.py)
[GTrXL示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/gtrxl.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_application_demo.py) | [记忆型决策](https://github.com/opendilab/PPOxFamily/issues/48) |
| [第六章:统筹多智能体](https://www.bilibili.com/video/BV1dg4y1g7BC) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_lecture.pdf) | [HAPPO](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/chapter6_supp_happo.pdf)
[ACE](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_supp_ace.pdf)
[值分解](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/chapter6_supp_value_dec.pdf)
[QA总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_hw_solution.pdf) | [应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_application_demo.py) | [多智能体协作](https://github.com/opendilab/PPOxFamily/issues/62) |
| [第七章:挖掘黑科技] | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_lecture.pdf) | [Adv 估计](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_supp_adv.pdf)
[PPO off 版](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_supp_ppo_offpolicy.pdf)
[Entropy] | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_homework.pdf) | [GAE]
[Recompute]
[梯度裁剪]
[正交初始化]
[Dual Clip]
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_application_demo.py) | [学术基准环境](https://github.com/opendilab/PPOxFamily/issues/79) |
# 课程特点
## 一个算法解决万千应用 [视频传送门](https://www.bilibili.com/video/BV1vW4y1M7cH/?spm_id_from=333.337.search-card.all.click)
## 算法理论和代码实现一一对应 [网站传送门](https://opendilab.github.io/PPOxFamily/)
# 项目结构
```text
.
├── LICENSE
├── assets --> 相关图片素材(转载请注明来源)
├── chapter2_action --> 课程第二章相关内容
└── chapter1_overview --> 课程第一章相关内容
├── chapter1_manuscript.pdf --> 课程第一章文字稿(对于PPT的补充说明)
├── chapter1_lecture.pdf --> 课程第一章PPT
├── chapter1_qa.pdf --> 课程第一章答疑文稿
├── chapter1_homework.pdf --> 课程第一章习题作业
├── chapter1_hw_solution.pdf --> 课程第一章习题作业题解
├── chapter1_supp_trpo.pdf --> 课程第一章补充材料(算法理论推导等)
└── chapter1_demo_code.py --> 课程第一章相关代码实现
```
# 课程答疑和反馈
- 常见问题FAQ:[传送门](https://github.com/opendilab/PPOxFamily/tree/main/common/faq.pdf)
- 小助手微信号:OpenDILab
- Slack:[OpenDILab](https://join.slack.com/t/opendilab/shared_invite/zt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ)
- GitHub Issue区:[链接](https://github.com/opendilab/PPOxFamily/issues)
- B站账号:[OpenDILab](https://space.bilibili.com/1112854351?spm_id_from=333.337.0.0)
- 知乎账号:[OpenDILab浦策](https://www.zhihu.com/people/opendilab)
- 邮箱:opendilab@pjlab.org.cn
# License
PPOxFamily is released under the Apache 2.0 license.