代码拉取完成,页面将自动刷新
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>REINFORCEjs: Gridworld with Dynamic Programming</title>
<meta name="description" content="">
<meta name="author" content="">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- jquery and jqueryui -->
<script src="https://code.jquery.com/jquery-2.1.3.min.js"></script>
<!-- bootstrap -->
<script src="http://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
<link href="http://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css" rel="stylesheet">
<!-- markdown -->
<script type="text/javascript" src="external/marked.js"></script>
<script type="text/javascript" src="external/highlight.pack.js"></script>
<link rel="stylesheet" href="external/highlight_default.css">
<script>hljs.initHighlightingOnLoad();</script>
<!-- GA -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-3698471-24', 'auto');
ga('send', 'pageview');
</script>
<style>
#wrap {
width:800px;
margin-left: auto;
margin-right: auto;
}
</style>
<script>
function start() {
$(".md").each(function(){
$(this).html(marked($(this).html()));
});
}
</script>
</head>
<body onload="start();">
<a href="https://github.com/karpathy/reinforcejs"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://camo.githubusercontent.com/38ef81f8aca64bb9a64448d0d70f1308ef5341ab/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f72696768745f6461726b626c75655f3132313632312e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png"></a>
<div id="wrap">
<div id="mynav" style="border-bottom:1px solid #999; padding-bottom: 10px; margin-bottom:50px;">
<div>
<img src="loop.svg" style="width:50px;height:50px;float:left;">
<h1 style="font-size:50px;">REINFORCE<span style="color:#058;">js</span></h1>
</div>
<ul class="nav nav-pills">
<li role="presentation" class="active"><a href="index.html">About</a></li>
<li role="presentation"><a href="gridworld_dp.html">GridWorld: DP</a></li>
<li role="presentation"><a href="gridworld_td.html">GridWorld: TD</a></li>
<li role="presentation"><a href="puckworld.html">PuckWorld: DQN</a></li>
<li role="presentation"><a href="waterworld.html">WaterWorld: DQN</a></li>
</ul>
</div>
<div id="exp" class="md">
# About
**REINFORCEjs** is a Reinforcement Learning library that implements several common RL algorithms supported with fun web demos, and is currently maintained by [@karpathy](https://twitter.com/karpathy). In particular, the library currently includes:
### Dynamic Programming
For solving finite (and not too large), deterministic MDPs. The solver uses standard tabular methods will no bells and whistles, and the environment must provide the dynamics.
<div style="text-align:justify; margin: 20px; float:right; max-width:300px;">
<img src="img/dpsolved.jpeg" style="max-width:300px;"><br>
Right: A simple Gridworld solved with a Dynamic Programming. Very exciting. Head over to the <a href="gridworld_dp.html">GridWorld: DP demo</a> to play with the GridWorld environment and policy iteration.
</div>
### Tabular Temporal Difference Learning
Both SARSA and Q-Learning are included. The agent still maintains tabular value functions but does not require an environment model and learns from experience. Support for many bells and whistles is also included such as Eligibility Traces and Planning (with priority sweeps).
### Deep Q Learning
Reimplementation of [Mnih et al.](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) Atari Game Playing model. The approach models the action value function Q(s,a) with a neural network and hence allows continuous input spaces. However, with a fixed number of discrete actions. The implementation includes most of the bells and whistles (e.g. experience replay, TD error clamping for robustness).
### Policy Gradients
The implementation includes a stochastic policy gradient Agent that uses REINFORCE and LSTMs that learn both the actor policy and the value function baseline, and also an implementation of recent Deterministic Policy Gradients by [Silver et al](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Publications_files/deterministic-policy-gradients.pdf). To make a lot of this happen (e.g. LSTMs in particular), the library includes a fork of my previous project [recurrentjs](https://github.com/karpathy/recurrentjs), which allows one to set up graphs of computations over matrices and perform automatic backprop.
I do not include the demo for policy gradient methods because the current implementations are unfortunately finicky and unstable (both stochastic and deterministic). I still include the code in the library in case someone wants to poke around. I suspect that either there are bugs (It's proving difficult to know for sure), or I'm missing some tips/tricks needed to get them to work reliably.
# Example Library Usage
Including the library (currently there is no nodejs support out of the box):
```javascript
<script type="text/javascript" src="lib/rl.js"></script>
```
For most applications (e.g. simple games), the DQN algorithm is a safe bet to use. If your project has a finite state space that is not too large, the DP or tabular TD methods are more appropriate. As an example, the DQN Agent satisfies a very simple API:
<pre><code class="js">
// create an environment object
var env = {};
env.getNumStates = function() { return 8; }
env.getMaxNumActions = function() { return 4; }
// create the DQN agent
var spec = { alpha: 0.01 } // see full options on DQN page
agent = new RL.DQNAgent(env, spec);
setInterval(function(){ // start the learning loop
var action = agent.act(s); // s is an array of length 8
//... execute action in environment and get the reward
agent.learn(reward); // the agent improves its Q,policy,model, etc. reward is a float
}, 0);
</code></pre>
In other words, you pass the agent some vector and it gives you an action. Then you reward or punish its behavior with the `reward` signal. The agent will over time tune its parameters to maximize the rewards it obtains.
The full source code is on <a href="https://github.com/karpathy/reinforcejs">Github</a> under the MIT license.
<br><br><br><br><br><br><br>
</div>
</div>
</body>
</html>
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。