# pytorch-A3C **Repository Path**: nidao/pytorch-A3C ## Basic Information - **Project Name**: pytorch-A3C - **Description**: Simple A3C implementation with pytorch + multiprocessing - **Primary Language**: Python - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2019-07-02 - **Last Updated**: 2020-12-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Simple implementation of Reinforcement Learning (A3C) using Pytorch This is a toy example of using multiprocessing in Python to asynchronously train a neural network to play discrete action [CartPole](https://gym.openai.com/envs/CartPole-v0/) and continuous action [Pendulum](https://gym.openai.com/envs/Pendulum-v0/) games. The asynchronous algorithm I used is called [Asynchronous Advantage Actor-Critic](https://arxiv.org/pdf/1602.01783.pdf) or A3C. I believe it would be the simplest toy implementation you can find at the moment (2018-01). ## What are the main focuses in this implementation? * Pytorch + multiprocessing (NOT threading) for parallel training * Both discrete and continuous action environments * To be simple and easy to dig into the code (less than 200 lines) ## Reason of using [Pytorch](http://pytorch.org/) instead of [Tensorflow](https://www.tensorflow.org/) Both of them are great for building your customized neural network. But to work with multiprocessing, Tensorflow is not that great due to its low compatibility with multiprocessing. I have an implementation of [Tensorflow A3C build on threading](https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tree/master/contents/10_A3C). I even tried to implement [distributed Tensorflow](https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/contents/10_A3C/A3C_distributed_tf.py). However, the distributed version is for cluster computing which I don't have. When using only one machine, it is slower than threading version I wrote. Fortunately, Pytorch gets the [multiprocessing compatibility](http://pytorch.org/docs/master/notes/multiprocessing.html). I went through many Pytorch A3C examples ([there](https://github.com/ikostrikov/pytorch-a3c), [there](https://github.com/jingweiz/pytorch-rl) and [there](https://github.com/ShangtongZhang/DeepRL)). They are great but too complicated to dig into the code. Therefore, this is my motivation to write my simple example codes. BTW, if you are interested to learn Pytorch, [there](https://github.com/MorvanZhou/PyTorch-Tutorial) is my simple tutorial code with many visualizations. I also made the tensorflow tutorial (same as pytorch) available in [here](https://github.com/MorvanZhou/Tensorflow-Tutorial). ## Codes & Results * [shared_adam.py](/shared_adam.py): optimizer that shares its parameters in parallel * [utils.py](/utils.py): useful function that can be used more than once * [discrete_A3C.py](/discrete_A3C.py): CartPole, neural net and training for discrete action space * [continuous_A3C.py](/continuous_A3C.py): Pendulum, neural net and training for continuous action space CartPole result ![cartpole](/results/cartpole.png) Pendulum result ![pendulum](/results/pendulum.png) ## Dependencies * pytorch >= 0.4.0 * numpy * gym