# Deep_RL_with_pytorch

**Repository Path**: fucgg/Deep_RL_with_pytorch

## Basic Information

- **Project Name**: Deep_RL_with_pytorch
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2021-07-05
- **Last Updated**: 2025-07-30

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Deep Reinforcement Learning with PyTorch

- [x] 1. Dynamic Programming (Update : 13. 2. 2019)

     1. Conditional GAN

     ![cde](./1_Dynamic_Programming/cde_with_gan.gif)

     2. Policy Iteration & Value Iteration
- [x] 2. Value Based Methods (Update : 17. 2. 2019)

     1. [Vanilla DQN](https://www.nature.com/articles/nature14236)

     2. [PDD DQN](https://blog.openai.com/openai-baselines-dqn/)

     ![pong_dqn](./2_Value_Based_Methods/pong_result.gif)
- [x] 3. Policy Based Methods (Update : 23. 2. 2019)

     1. [A2C](https://blog.openai.com/baselines-acktr-a2c/)

     2. [PPO](https://blog.openai.com/openai-baselines-ppo/)

     ![pong_ppo](./3_Policy_Based_Methods/ppo_pong_result.gif)

- [x] 4. Off-policy Policy Based Methods (Update : 10. 3. 2019)

     1. [SAC](https://ai.googleblog.com/2019/01/soft-actor-critic-deep-reinforcement.html)

     2. [SIL](https://arxiv.org/abs/1806.05635) ( not with A2C, PPO but SAC)

     ![breakout_sil](./4_Off-policy_Policy_Based_Methods/ssac_breakout_result.gif)

- [x] 5. Exploration Techniques (Update : 16. 3. 2019)

     1. [Thompson sampling with MCDO](http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html)
     2. [RND](https://openai.com/blog/reinforcement-learning-with-prediction-based-rewards/)

    Breakout with only intrinsic rewards

    ![breakout_only_intrinsic](./5_Exploration_Techinques/breakout_only_intrinsic.png)

- [x] 6. Uncertainty in RL (Update : 24. 3. 2019)

     1. [Categorical DQN(C51)](https://flyyufelix.github.io/2017/10/24/distributional-bellman.html)
     2. [QR-DQN](https://arxiv.org/pdf/1710.10044)
     3. [Implicit Quantile Networks](https://arxiv.org/pdf/1806.06923)

     ![breakout_iqn](./6_Uncertainty_in_RL/iqn_breakout_result.gif)

- [x] 7. Imitation Learning (Update : 30. 3. 2019)
     1. [GAIL](https://arxiv.org/abs/1606.03476)

- [x] 8. Multi-Agent RL (Update : 4. 4. 2019)
     1. [Upper Confidence Bounds for Tree(UCT)](http://mcts.ai/pubs/mcts-survey-master.pdf)
     2. [Counterfactual](https://arxiv.org/pdf/1710.11424.pdf) [Hedge](https://arxiv.org/pdf/1411.5007.pdf)