# rl_learn

**Repository Path**: lg21c/rl_learn

## Basic Information

- **Project Name**: rl_learn
- **Description**: 我的强化学习笔记和学习材料:book:  still updating ... ...
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 1
- **Forks**: 0
- **Created**: 2021-05-07
- **Last Updated**: 2021-09-27

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# [WIP]强化学习的学习仓库

这是我个人学习**强化学习**的时候收集的比较经典的学习资料、笔记和代码，分享给所有人。

为了直接在GitHub上用markdown文件看公式，推荐安装chrome插件：[MathJax Plugin for Github](https://chrome.google.com/webstore/detail/mathjax-plugin-for-github/ioemnmodlmafdkllaclgeombjnmnbima)

## 入门指南

- [入门指南](learning_route.md)

## 课程笔记

- [David Silver 的 Reinforcement Learning 课程学习笔记。](class_note.ipynb)
- [课程对应的所有PPT](slides)
- Sutton 的 Reinforcement Learning: An Introduction书本学习笔记
  - [1. Introduction](notes/intro_note_01.md)
  - [2. Multi-armed Bandits](notes/intro_note_02.md)
  - [3. Finite Markov DecisionProcesses](notes/intro_note_03.md)
  - [4. Dynamic Programming](notes/intro_note_04.md)
  - [5. Monte Carlo Methods](notes/intro_note_05.md)
  - [6. Temporal-Difference Learning](notes/intro_note_06.md)
  - [7. n-step Bootstrapping](notes/intro_note_07.md)
  - [8. Planning and Learning with Tabular Methods](notes/intro_note_08.md)
  - [9. On-policy Prediction with Approximation](notes/intro_note_09.md)
  - [10. On-policy Control with Approximation](notes/intro_note_10.md)
  - [11. Off-policy Methods with Approximation](notes/intro_note_11.md)
  - [12. Eligibility Traces](notes/intro_note_12.md)
  - [13. Policy Gradient Methods](notes/intro_note_13.md)
  - [14. Psychology](notes/intro_note_14.md)
  - [15. Neuroscience](notes/intro_note_15.md)
  - [16. Applications and Case Studies](notes/intro_note_16.md)
  - [17. Frontiers](notes/intro_note_17.md)

- [书本的各版本pdf](book)
  - [2017-6 draft](book/bookdraft2017june19.pdf)
  - [2018 second edition](book/bookdraft2018.pdf)

## 实验目录

所有的实验源代码都在`lib`目录下，来自[dennybritz](https://github.com/dennybritz/reinforcement-learning)。在原先代码的基础上，增加了对实验背景的具体介绍、代码和公式的对照。

- [Gridworld](exp/1_gridworld.ipynb)：对应**MDP**的**Dynamic Programming**
- [Blackjack](exp/2_blackjack.ipynb)：对应**Model Free**的**Monte Carlo**的Planning和Controlling
- [Windy Gridworld](exp/3_windy_gridworld.ipynb)：对应**Model Free**的**Temporal Difference**的**On-Policy Controlling**：**SARSA**。
- [Cliff Walking](exp/4_cliff_walking.ipynb)：对应**Model Free**的**Temporal Difference**的**Off-Policy Controlling**：**Q-learning**。
- [Mountain Car](exp/5_mountain_car.ipynb)：对应Q表格很大无法处理（state空间连续）的**Q-Learning with Linear Function Approximation**。
- [Atari](exp/6_atari.ipynb)：对应**Deep-Q Learning**。

## 其他重要学习资料：

- [WildML的博客](http://www.wildml.com/2016/10/learning-reinforcement-learning/)
- [David Silver’s Reinforcement Learning Course](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/the-book-2nd.html)
- [书本的python代码实现](https://github.com/ShangtongZhang/reinforcement-learning-an-introduction)