# deep_exploration_with_E_network

**Repository Path**: goforfar/deep_exploration_with_E_network

## Basic Information

- **Project Name**: deep_exploration_with_E_network
- **Description**: for eecs 598 class final project
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-06-30
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

#+TITLE: README
#+DATE: <2017-11-11 Sat>
#+AUTHOR: Jiaxuan Wang
#+EMAIL: jiaxuan@umich
#+OPTIONS: ':nil *:t -:t ::t <:t H:3 \n:nil ^:t arch:headline author:t c:nil
#+OPTIONS: creator:comment d:(not "LOGBOOK") date:t e:t email:nil f:t inline:t
#+OPTIONS: num:t p:nil pri:nil stat:t tags:t tasks:t tex:t timestamp:t toc:nil
#+OPTIONS: todo:t |:t
#+CREATOR: Emacs 25.1.1 (Org mode 8.2.10)
#+DESCRIPTION:
#+EXCLUDE_TAGS: noexport
#+KEYWORDS:
#+LANGUAGE: en
#+SELECT_TAGS: export

* Purpose

Reproduce results in ICLR submission DORA

* Running the code

To reproduce the results in our reproduction workshop paper, please read the
following setup

** Function approximation

The code has multiple arguments that you can pass in (e.g., to use Dora or DQN,
to use epsilon greedy or softmax, to render the environment or not, choose the 
environment to run). You can read about the options by running

#+BEGIN_SRC bash
python main.py -h
#+END_SRC

An example run using mountain car environment with the paper's setting is

#+BEGIN_SRC bash
python main.py -m dora -a softmax -g mountain_car
#+END_SRC

To repeat code in parralel using the same setting, run

#+BEGIN_SRC bash
python run_parallel.py -m dora -a softmax -g mountain_car
#+END_SRC

by default, this repeat 10 runs of the same experiment

** Tabular setting

Please refer to env/readme.txt

* register the bridge environment

get your gym file location, call it gym/

#+BEGIN_SRC python :results output
import gym
import os
print(os.path.dirname(gym.__file__))
#+END_SRC

then add the following line to ~gym/envs/__init__.py~

#+BEGIN_SRC python
register(
    id="BridgeEnv-v0",
    entry_point="gym.envs.bridge.bridge:BridgeEnv",
)
register(
    id="BridgeLargeEnv-v0",
    entry_point="gym.envs.bridge.bridge:BridgeLargeEnv",
)
#+END_SRC

then make a directory in ~gym/envs/bridge~ and put ~env/bridge.py~ in that directory

To run the code

#+BEGIN_SRC python
import gym
env_small = gym.make("BridgeEnv-v0")
env_large = gym.make("BridgeLargeEnv-v0")
#+END_SRC

#+RESULTS:
: None


* Conclusion of our replication

We verified that DORA is working in the tabular setting. However, DORA's
experiments using function approximation put DQN into a disadvantageous position
(not a fair comparison). We are able to adjust the setting to get much better
result using DQN.

For replication of our setting, switch to branch openai (named by referencing
[[https://github.com/openai/baselines/blob/master/baselines/deepq/experiments/train_mountaincar.py][openai setting]]), and execute

#+BEGIN_SRC bash
python main.py -m dqn -g mountain_car.py -l logs
#+END_SRC

Then ~logs/dqn_default.pkl~ caches the rewards of this run. You should be able
to verify it worked by executing code in ~plot.ipynb~

* Authors Response

Authours response and the whole openreview process can be found here:
https://openreview.net/forum?id=ry1arUgCW

In short, the first experiment needed crucial fixes but basically the problem was changed so no agent converged at all (not that E values were worse, just with no change). Fixing the problem allowed replicatin.
On the different problem it was still the case that E values were better on average, even though an example could be found in which a single DQN run was better than a single E values based run. 
Additionally, freeway experiments were added to the paper replicating the results on another problem (not found in the replication due to the review process).