# pytorch-rl

**Repository Path**: githjk/pytorch-rl

## Basic Information

- **Project Name**: pytorch-rl
- **Description**: Deep Reinforcement Learning with pytorch & visdom
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-03-14
- **Last Updated**: 2021-11-03

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# **Deep Reinforcement Learning** with
# **pytorch** & **visdom**
*******


* Sample testings of trained agents (DQN on Breakout, A3C on Pong, DoubleDQN on CartPole, continuous A3C on InvertedPendulum(MuJoCo)):
<table>
  <tr>
    <td><img src="/assets/breakout.gif?raw=true" width="200"></td>
    <td><img src="/assets/a3c_pong.gif?raw=true" width="200"></td>
    <td><img src="/assets/cartpole.gif?raw=true" width="200"></td>
    <td><img src="/assets/a3c_con.gif?raw=true" width="200"></td>
  </tr>
</table>

* Sample on-line plotting while training an A3C agent on Pong (with 16 learner processes):
![a3c_pong_plot](/assets/a3c_pong.png)

* Sample loggings while training a DQN agent on CartPole (we use ```WARNING``` as the logging level currently to get rid of the ```INFO``` printouts from visdom):
```bash
[WARNING ] (MainProcess) <===================================>
[WARNING ] (MainProcess) bash$: python -m visdom.server
[WARNING ] (MainProcess) http://localhost:8097/env/daim_17040900
[WARNING ] (MainProcess) <===================================> DQN
[WARNING ] (MainProcess) <-----------------------------------> Env
[WARNING ] (MainProcess) Creating {gym | CartPole-v0} w/ Seed: 123
[INFO    ] (MainProcess) Making new env: CartPole-v0
[WARNING ] (MainProcess) Action Space: [0, 1]
[WARNING ] (MainProcess) State  Space: 4
[WARNING ] (MainProcess) <-----------------------------------> Model
[WARNING ] (MainProcess) MlpModel (
  (fc1): Linear (4 -> 16)
  (rl1): ReLU ()
  (fc2): Linear (16 -> 16)
  (rl2): ReLU ()
  (fc3): Linear (16 -> 16)
  (rl3): ReLU ()
  (fc4): Linear (16 -> 2)
)
[WARNING ] (MainProcess) No Pretrained Model. Will Train From Scratch.
[WARNING ] (MainProcess) <===================================> Training ...
[WARNING ] (MainProcess) Validation Data @ Step: 501
[WARNING ] (MainProcess) Start  Training @ Step: 501
[WARNING ] (MainProcess) Reporting       @ Step: 2500 | Elapsed Time: 5.32397913933
[WARNING ] (MainProcess) Training Stats:   epsilon:          0.972
[WARNING ] (MainProcess) Training Stats:   total_reward:     2500.0
[WARNING ] (MainProcess) Training Stats:   avg_reward:       21.7391304348
[WARNING ] (MainProcess) Training Stats:   nepisodes:        115
[WARNING ] (MainProcess) Training Stats:   nepisodes_solved: 114
[WARNING ] (MainProcess) Training Stats:   repisodes_solved: 0.991304347826
[WARNING ] (MainProcess) Evaluating      @ Step: 2500
[WARNING ] (MainProcess) Iteration: 2500; v_avg: 1.73136949539
[WARNING ] (MainProcess) Iteration: 2500; tderr_avg: 0.0964358523488
[WARNING ] (MainProcess) Iteration: 2500; steps_avg: 9.34579439252
[WARNING ] (MainProcess) Iteration: 2500; steps_std: 0.798395631184
[WARNING ] (MainProcess) Iteration: 2500; reward_avg: 9.34579439252
[WARNING ] (MainProcess) Iteration: 2500; reward_std: 0.798395631184
[WARNING ] (MainProcess) Iteration: 2500; nepisodes: 107
[WARNING ] (MainProcess) Iteration: 2500; nepisodes_solved: 106
[WARNING ] (MainProcess) Iteration: 2500; repisodes_solved: 0.990654205607
[WARNING ] (MainProcess) Saving Model    @ Step: 2500: /home/zhang/ws/17_ws/pytorch-rl/models/daim_17040900.pth ...
[WARNING ] (MainProcess) Saved  Model    @ Step: 2500: /home/zhang/ws/17_ws/pytorch-rl/models/daim_17040900.pth.
[WARNING ] (MainProcess) Resume Training @ Step: 2500
...
```
*******


## What is included?
This repo currently contains the following agents:

- Deep Q Learning (DQN) [[1]](http://arxiv.org/abs/1312.5602), [[2]](http://home.uchicago.edu/~arij/journalclub/papers/2015_Mnih_et_al.pdf)
- Double DQN [[3]](http://arxiv.org/abs/1509.06461)
- Dueling network DQN (Dueling DQN) [[4]](https://arxiv.org/abs/1511.06581)
- Asynchronous Advantage Actor-Critic (A3C) (w/ both discrete/continuous action space support) [[5]](https://arxiv.org/abs/1602.01783), [[6]](https://arxiv.org/abs/1506.02438)
- Sample Efficient Actor-Critic with Experience Replay (ACER) (currently w/ discrete action space support (Truncated Importance Sampling, 1st Order TRPO)) [[7]](https://arxiv.org/abs/1611.01224), [[8]](https://arxiv.org/abs/1606.02647)

Work in progress:
- Testing ACER

Future Plans:
- Deep Deterministic Policy Gradient (DDPG) [[9]](http://arxiv.org/abs/1509.02971), [[10]](http://proceedings.mlr.press/v32/silver14.pdf)
- Continuous DQN (CDQN or NAF) [[11]](http://arxiv.org/abs/1603.00748)


## Code structure & Naming conventions:
NOTE: we follow the exact code structure as [pytorch-dnc](https://github.com/jingweiz/pytorch-dnc) so as to make the code easily transplantable.
* ```./utils/factory.py```
> We suggest the users refer to ```./utils/factory.py```,
 where we list all the integrated ```Env```, ```Model```,
 ```Memory```, ```Agent``` into ```Dict```'s.
 All of those four core classes are implemented in ```./core/```.
 The factory pattern in ```./utils/factory.py``` makes the code super clean,
 as no matter what type of ```Agent``` you want to train,
 or which type of ```Env``` you want to train on,
 all you need to do is to simply modify some parameters in ```./utils/options.py```,
 then the ```./main.py``` will do it all (NOTE: this ```./main.py``` file never needs to be modified).
* namings
> To make the code more clean and readable, we name the variables using the following pattern (mainly in inherited ```Agent```'s):
> * ```*_vb```: ```torch.autograd.Variable```'s or a list of such objects
> * ```*_ts```: ```torch.Tensor```'s or a list of such objects
> * otherwise: normal python datatypes


## Dependencies
- Python 2.7
- [PyTorch >=v0.2.0](http://pytorch.org/)
- [Visdom](https://github.com/facebookresearch/visdom)
- [OpenAI Gym >=v0.9.0 (for lower versoins, just need to change into the available games, e.g. change PongDeterministic-v4 to PongDeterministic-v3)](https://github.com/openai/gym)
- [mujoco-py (Optional: for training continuous version of a3c)](https://github.com/openai/mujoco-py)
*******


## How to run:
You only need to modify some parameters in ```./utils/options.py``` to train a new configuration.

* Configure your training in ```./utils/options.py```:
> * ```line 14```: add an entry into ```CONFIGS``` to define your training (```agent_type```, ```env_type```, ```game```, ```model_type```, ```memory_type```)
> * ```line 33```: choose the entry you just added
> * ```line 29-30```: fill in your machine/cluster ID (```MACHINE```) and timestamp (```TIMESTAMP```) to define your training signature (```MACHINE_TIMESTAMP```),
 the corresponding model file and the log file of this training will be saved under this signature (```./models/MACHINE_TIMESTAMP.pth``` & ```./logs/MACHINE_TIMESTAMP.log``` respectively).
 Also the visdom visualization will be displayed under this signature (first activate the visdom server by type in bash: ```python -m visdom.server &```, then open this address in your browser: ```http://localhost:8097/env/MACHINE_TIMESTAMP```)
> * ```line 32```: to train a model, set ```mode=1``` (training visualization will be under ```http://localhost:8097/env/MACHINE_TIMESTAMP```); to test the model of this current training, all you need to do is to set ```mode=2``` (testing visualization will be under ```http://localhost:8097/env/MACHINE_TIMESTAMP_test```).

* Run:
> ```python main.py```
*******


## Bonus Scripts :)
We also provide 2 additional scripts for quickly evaluating your results after training. (Dependecies: [lmj-plot](https://github.com/lmjohns3/py-plot))
* ```plot.sh``` (e.g., plot from log file: ```logs/machine1_17080801.log```)
> * ```./plot.sh machine1 17080801```
> * the generated figures will be saved into ```figs/machine1_17080801/```
* ```plot_compare.sh``` (e.g., compare log files: ```logs/machine1_17080801.log```,```logs/machine2_17080802.log```)
> ```./plot.sh 00 machine1 17080801 machine2 17080802```
> * the generated figures will be saved into ```figs/compare_00/```
> * the color coding will be in the order of: ```red green blue magenta yellow cyan```
*******


## Repos we referred to during the development of this repo:
* [matthiasplappert/keras-rl](https://github.com/matthiasplappert/keras-rl)
* [transedward/pytorch-dqn](https://github.com/transedward/pytorch-dqn)
* [ikostrikov/pytorch-a3c](https://github.com/ikostrikov/pytorch-a3c)
* [onlytailei/A3C-PyTorch](https://github.com/onlytailei/A3C-PyTorch)
* [Kaixhin/ACER](https://github.com/Kaixhin/ACER)
* And a private implementation of A3C from [@stokasto](https://github.com/stokasto)
*******


## Citation
If you find this library useful and would like to cite it, the following would be appropriate:
```
@misc{pytorch-rl,
  author = {Zhang, Jingwei and Tai, Lei},
  title = {jingweiz/pytorch-rl},
  url = {https://github.com/jingweiz/pytorch-rl},
  year = {2017}
}
```