-
-.. toctree::
- :glob:
- :maxdepth: 1
- :caption: 安装部署
-
- install
-
-.. toctree::
- :glob:
- :maxdepth: 1
- :caption: 使用指南
-
- offline_learning
- online_learning
-
-.. toctree::
- :maxdepth: 1
- :caption: API参考
-
- recommender
-
-.. toctree::
- :glob:
- :maxdepth: 1
- :caption: RELEASE NOTES
-
- RELEASE
diff --git a/docs/recommender/docs/source_zh_cn/install.md b/docs/recommender/docs/source_zh_cn/install.md
deleted file mode 100644
index 09c6c3261ee2f12b340e29686cf8c188420fafda..0000000000000000000000000000000000000000
--- a/docs/recommender/docs/source_zh_cn/install.md
+++ /dev/null
@@ -1,36 +0,0 @@
-# 安装MindSpore Recommender
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/recommender/docs/source_zh_cn/install.md)
-
-MindSpore Recommender依赖MindSpore训练框架,安装完[MindSpore](https://gitee.com/mindspore/mindspore#安装),再安装MindSpore Recommender。可以采用pip安装或者源码编译安装两种方式。
-
-## pip安装
-
-使用pip命令安装,请从[MindSpore Recommender下载页面](https://www.mindspore.cn/versions)下载并安装whl包。
-
-```shell
-pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/{ms_version}/Recommender/any/mindspore_rec-{mr_version}-py3-none-any.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple
-```
-
-- 在联网状态下,安装whl包时会自动下载MindSpore Recommender安装包的依赖项(依赖项详情参见requirement.txt),其余情况需自行安装。
-- `{ms_version}`表示与MindSpore Recommender匹配的MindSpore版本号。
-- `{mr_version}`表示MindSpore Recommender版本号,例如下载0.2.0版本MindSpore Recommender时,`{mr_version}`应写为0.2.0。
-
-## 源码编译安装
-
-下载[源码](https://github.com/mindspore-lab/mindrec),下载后进入`mindrec`目录。
-
-```shell
-bash build.sh
-pip install output/mindspore_rec-0.2.0-py3-none-any.whl
-```
-
-其中,`build.sh`为`recommender`目录下的编译脚本文件。
-
-## 验证安装是否成功
-
-执行以下命令,验证安装结果。导入Python模块不报错即安装成功:
-
-```python
-import mindspore_rec
-```
diff --git a/docs/recommender/docs/source_zh_cn/offline_learning.md b/docs/recommender/docs/source_zh_cn/offline_learning.md
deleted file mode 100644
index e63e4a6e1372586492e4afec85d28ffa021cf2e0..0000000000000000000000000000000000000000
--- a/docs/recommender/docs/source_zh_cn/offline_learning.md
+++ /dev/null
@@ -1,17 +0,0 @@
-# 离线训练
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/recommender/docs/source_zh_cn/offline_learning.md)
-
-## 概述
-
-推荐模型训练的主要挑战之一是对于大规模特征向量的存储与训练,MindSpore Recommender为离线场景的大规模特征向量训练提供了完善的解决方案。
-
-## 整体架构
-
-针对推荐模型中大规模特征向量的训练架构如下图所示,其中核心采用了分布式多级Embedding Cache的技术方案,同时基于模型并行的多机多卡分布式并行技术,实现了大规模低成本的推荐大模型训练。
-
-
-
-## 使用样例
-
-[Wide&Deep 分布式训练](https://github.com/mindspore-lab/mindrec/tree/master/models/wide_deep)
diff --git a/docs/recommender/docs/source_zh_cn/online_learning.md b/docs/recommender/docs/source_zh_cn/online_learning.md
deleted file mode 100644
index 234279665ec84186aa5e16f9307ced3ac010448e..0000000000000000000000000000000000000000
--- a/docs/recommender/docs/source_zh_cn/online_learning.md
+++ /dev/null
@@ -1,213 +0,0 @@
-# 在线学习
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/recommender/docs/source_zh_cn/online_learning.md)
-
-## 概述
-
-推荐网络模型更新的实时性是重要的技术指标之一,在线学习可有效提升推荐网络模型更新的实时性。
-
-在线学习与离线训练的主要区别:
-
-1. 在线学习的数据集为流式数据,无确定的dataset size、epoch,离线训练的数据集有确定的dataset size、epoch。
-2. 在线学习为常驻服务形式,离线训练结束后任务退出。
-3. 在线学习需要收集并存储训练数据,收集到固定数量的数据或经过固定的时间窗口后驱动训练流程。
-
-## 整体架构
-
-用户的流式训练数据推送到Kafka中,MindSpore Pandas从Kafka读取数据并进行特征工程转换,然后写入特征存储引擎中,MindData从存储引擎中读取数据作为训练数据进行训练,MindSpore作为服务常驻,持续接收数据并执行训练,整体流程如下图所示:
-
-
-
-## 使用约束
-
-- 需要安装Python3.8及以上版本。
-- 目前仅支持GPU训练、Linux操作系统。
-
-## Python包依赖
-
-mindpandas v0.1.0
-
-mindspore_rec v0.2.0
-
-kafka-python v2.0.2
-
-## 使用样例
-
-下面以Criteo数据集训练Wide&Deep为例,介绍一下在线学习的流程,样例代码位于[在线学习](https://github.com/mindspore-lab/mindrec/tree/master/examples/online_learning)。
-
-MindSpore Recommender为在线学习提供了专门的算法模型`RecModel`,搭配实时数据源Kafka数据读取与特征处理的MindSpore Pandas即可实现一个简单的在线学习流程。
-首先自定义一个实时数据处理的数据集,其中的构造函数参数`receiver`是MindPands中的`DataReceiver`类型,用于接收实时数据,`__getitem__`表示一次读取一条数据。
-
-```python
-class StreamingDataset:
- def __init__(self, receiver):
- self.data_ = []
- self.receiver_ = receiver
-
- def __getitem__(self, item):
- while not self.data_:
- data = self.receiver_.recv()
- if data is not None:
- self.data_ = data.tolist()
-
- last_row = self.data_.pop()
- return np.array(last_row[0], dtype=np.int32), np.array(last_row[1], dtype=np.float32), np.array(last_row[2], dtype=np.float32)
-```
-
-接着将上述自定义数据集封装成`RecModel`所需要的在线数据集。
-
-```python
-from mindpandas.channel import DataReceiver
-from mindspore_rec import RecModel as Model
-
-receiver = DataReceiver(address=config.address, namespace=config.namespace,
- dataset_name=config.dataset_name, shard_id=0)
-stream_dataset = StreamingDataset(receiver)
-
-dataset = ds.GeneratorDataset(stream_dataset, column_names=["id", "weight", "label"])
-dataset = dataset.batch(config.batch_size)
-
-train_net, _ = GetWideDeepNet(config)
-train_net.set_train()
-
-model = Model(train_net)
-```
-
-在配置好模型Checkpoint的导出策略后,启动在线训练进程。
-
-```python
-ckptconfig = CheckpointConfig(save_checkpoint_steps=100, keep_checkpoint_max=5)
-ckpoint_cb = ModelCheckpoint(prefix='widedeep_train', directory="./ckpt", config=ckptconfig)
-
-model.online_train(dataset, callbacks=[TimeMonitor(1), callback, ckpoint_cb], dataset_sink_mode=True)
-```
-
-下面介绍在线学习流程中涉及各个模块的启动流程:
-
-### 下载Kafka
-
-```bash
-wget https://archive.apache.org/dist/kafka/3.2.0/kafka_2.13-3.2.0.tgz
-tar -xzf kafka_2.13-3.2.0.tgz
-cd kafka_2.13-3.2.0
-```
-
-如需安装其他版本,请参照。
-
-### 启动kafka-zookeeper
-
-```bash
-bin/zookeeper-server-start.sh config/zookeeper.properties
-```
-
-### 启动kafka-server
-
-打开另一个命令终端,启动kafka服务。
-
-```bash
-bin/kafka-server-start.sh config/server.properties
-```
-
-### 启动kafka_client
-
-进入recommender仓在线学习样例目录,启动kafka_client。kafka_client只需要启动一次,可以使用kafka设置topic对应的partition数量。
-
-```bash
-cd recommender/examples/online_learning
-python kafka_client.py
-```
-
-### 启动分布式计算引擎
-
-```bash
-yrctl start --master --address $MASTER_HOST_IP
-
-# 参数说明
-# --master: 表示当前host为master节点,非master节点不用指定‘--master’参数
-# --address: master节点的ip
-```
-
-### 启动数据producer
-
-producer用于模拟在线学习场景,将本地的criteo数据集写入到Kafka,供consumer使用。当前样例使用多进程读取两个文件,并将数据写入Kafka。
-
-```bash
-python producer.py --file1=$CRITEO_DATASET_FILE_PATH --file2=$CRITEO_DATASET_FILE_PATH
-
-# 参数说明
-# --file1: criteo数据集在本地磁盘的存放路径
-# --file2: criteo数据集在本地磁盘的存放路径
-# 上述文件均为criteo原始数据集文本文件,file1和file2可以被并发处理,file1和file2可以相同也可以不同,如果相同则相当于文件中每个样本被使用两次。
-```
-
-### 启动数据consumer
-
-```bash
-python consumer.py --num_shards=$DEVICE_NUM --address=$LOCAL_HOST_IP --dataset_name=$DATASET_NAME
- --max_dict=$PATH_TO_VAL_MAX_DICT --min_dict=$PATH_TO_VAL_MIN_DICT --map_dict=$PATH_TO_CAT_TO_ID_DICT
-
-# 参数说明
-# --num_shards: 对应训练侧的device 卡数,单卡训练则设置为1,8卡训练设置为8
-# --address: 当前sender的地址
-# --dataset_name: 数据集名称
-# --namespace: channel名称
-# --max_dict: 稠密特征列的最大值字典
-# --min_dict: 稠密特征列的最小值字典
-# --map_dict: 稀疏特征列的字典
-```
-
-consumer为criteo数据集进行特征工程需要3个数据集相关文件:`all_val_max_dict.pkl`、`all_val_min_dict.pkl`和`cat2id_dict.pkl`。`$PATH_TO_VAL_MAX_DICT`、`$PATH_TO_VAL_MIN_DICT`和`$PATH_TO_CAT_TO_ID_DICT` 分别为这些文件在环境上的绝对路径。这3个pkl文件具体生产方法可以参考[process_data.py](https://github.com/mindspore-lab/mindrec/blob/master/datasets/criteo_1tb/process_data.py),对原始criteo数据集做转换生成对应的.pkl文件。
-
-### 启动在线训练
-
-config采用yaml的形式,见[default_config.yaml](https://github.com/mindspore-lab/mindrec/blob/master/examples/online_learning/default_config.yaml)。
-
-单卡训练:
-
-```bash
-python online_train.py --address=$LOCAL_HOST_IP --dataset_name=criteo
-
-# 参数说明:
-# --address: 本机host ip,从MindSpore Pandas接收训练数据需要配置
-# --dataset_name: 数据集名字,和consumer模块保持一致
-```
-
-多卡训练MPI方式启动:
-
-```bash
-bash mpirun_dist_online_train.sh [$RANK_SIZE] [$LOCAL_HOST_IP]
-
-# 参数说明:
-# RANK_SIZE:多卡训练卡数量
-# LOCAL_HOST_IP:本机host ip,用于MindSpore Pandas接收训练数据
-```
-
-动态组网方式启动多卡训练:
-
-```bash
-bash run_dist_online_train.sh [$WORKER_NUM] [$SHED_HOST] [$SCHED_PORT] [$LOCAL_HOST_IP]
-
-# 参数说明:
-# WORKER_NUM:多卡训练卡数量
-# SHED_HOST:MindSpore动态组网需要的Scheduler 角色的IP
-# SCHED_PORT:MindSpore动态组网需要的Scheduler 角色的Port
-# LOCAL_HOST_IP:本机host ip,从MindSpore Pandas接收训练数据需要配置
-```
-
-成功启动训练后,会输出如下日志:
-
-其中epoch和step表示当前训练步骤对应的epoch和step数,wide_loss和deep_loss表示wide&deep网络中的训练loss值。
-
-```text
-epoch: 1, step: 1, wide_loss: 0.66100323, deep_loss: 0.72502613
-epoch: 1, step: 2, wide_loss: 0.46781272, deep_loss: 0.5293098
-epoch: 1, step: 3, wide_loss: 0.363207, deep_loss: 0.42204413
-epoch: 1, step: 4, wide_loss: 0.3051032, deep_loss: 0.36126155
-epoch: 1, step: 5, wide_loss: 0.24045062, deep_loss: 0.29395688
-epoch: 1, step: 6, wide_loss: 0.24296054, deep_loss: 0.29386574
-epoch: 1, step: 7, wide_loss: 0.20943595, deep_loss: 0.25780612
-epoch: 1, step: 8, wide_loss: 0.19562452, deep_loss: 0.24153553
-epoch: 1, step: 9, wide_loss: 0.16500896, deep_loss: 0.20854339
-epoch: 1, step: 10, wide_loss: 0.2188702, deep_loss: 0.26011512
-epoch: 1, step: 11, wide_loss: 0.14963374, deep_loss: 0.18867904
-```
diff --git a/docs/reinforcement/docs/Makefile b/docs/reinforcement/docs/Makefile
deleted file mode 100644
index 1eff8952707bdfa503c8d60c1e9a903053170ba2..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/Makefile
+++ /dev/null
@@ -1,20 +0,0 @@
-# Minimal makefile for Sphinx documentation
-#
-
-# You can set these variables from the command line, and also
-# from the environment for the first two.
-SPHINXOPTS ?=
-SPHINXBUILD ?= sphinx-build
-SOURCEDIR = source_zh_cn
-BUILDDIR = build_zh_cn
-
-# Put it first so that "make" without argument is like "make help".
-help:
- @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
-
-.PHONY: help Makefile
-
-# Catch-all target: route all unknown targets to Sphinx using the new
-# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
-%: Makefile
- @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/reinforcement/docs/_ext/customdocumenter.txt b/docs/reinforcement/docs/_ext/customdocumenter.txt
deleted file mode 100644
index 2d37ae41f6772a21da2a7dc5c7bff75128e68330..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/_ext/customdocumenter.txt
+++ /dev/null
@@ -1,245 +0,0 @@
-import re
-import os
-from sphinx.ext.autodoc import Documenter
-
-
-class CustomDocumenter(Documenter):
-
- def document_members(self, all_members: bool = False) -> None:
- """Generate reST for member documentation.
-
- If *all_members* is True, do all members, else those given by
- *self.options.members*.
- """
- # set current namespace for finding members
- self.env.temp_data['autodoc:module'] = self.modname
- if self.objpath:
- self.env.temp_data['autodoc:class'] = self.objpath[0]
-
- want_all = all_members or self.options.inherited_members or \
- self.options.members is ALL
- # find out which members are documentable
- members_check_module, members = self.get_object_members(want_all)
-
- # **** 排除已写中文接口名 ****
- file_path = os.path.join(self.env.app.srcdir, self.env.docname+'.rst')
- exclude_re = re.compile(r'(.. py:class::|.. py:function::)\s+(.*?)(\(|\n)')
- includerst_re = re.compile(r'.. include::\s+(.*?)\n')
- with open(file_path, 'r', encoding='utf-8') as f:
- content = f.read()
- excluded_members = exclude_re.findall(content)
- if excluded_members:
- excluded_members = [i[1].split('.')[-1] for i in excluded_members]
- rst_included = includerst_re.findall(content)
- if rst_included:
- for i in rst_included:
- include_path = os.path.join(os.path.dirname(file_path), i)
- if os.path.exists(include_path):
- with open(include_path, 'r', encoding='utf8') as g:
- content_ = g.read()
- excluded_member_ = exclude_re.findall(content_)
- if excluded_member_:
- excluded_member_ = [j[1].split('.')[-1] for j in excluded_member_]
- excluded_members.extend(excluded_member_)
-
- if excluded_members:
- if self.options.exclude_members:
- self.options.exclude_members |= set(excluded_members)
- else:
- self.options.exclude_members = excluded_members
-
- # remove members given by exclude-members
- if self.options.exclude_members:
- members = [
- (membername, member) for (membername, member) in members
- if (
- self.options.exclude_members is ALL or
- membername not in self.options.exclude_members
- )
- ]
-
- # document non-skipped members
- memberdocumenters = [] # type: List[Tuple[Documenter, bool]]
- for (mname, member, isattr) in self.filter_members(members, want_all):
- classes = [cls for cls in self.documenters.values()
- if cls.can_document_member(member, mname, isattr, self)]
- if not classes:
- # don't know how to document this member
- continue
- # prefer the documenter with the highest priority
- classes.sort(key=lambda cls: cls.priority)
- # give explicitly separated module name, so that members
- # of inner classes can be documented
- full_mname = self.modname + '::' + \
- '.'.join(self.objpath + [mname])
- documenter = classes[-1](self.directive, full_mname, self.indent)
- memberdocumenters.append((documenter, isattr))
- member_order = self.options.member_order or \
- self.env.config.autodoc_member_order
- if member_order == 'groupwise':
- # sort by group; relies on stable sort to keep items in the
- # same group sorted alphabetically
- memberdocumenters.sort(key=lambda e: e[0].member_order)
- elif member_order == 'bysource' and self.analyzer:
- # sort by source order, by virtue of the module analyzer
- tagorder = self.analyzer.tagorder
-
- def keyfunc(entry: Tuple[Documenter, bool]) -> int:
- fullname = entry[0].name.split('::')[1]
- return tagorder.get(fullname, len(tagorder))
- memberdocumenters.sort(key=keyfunc)
-
- for documenter, isattr in memberdocumenters:
- documenter.generate(
- all_members=True, real_modname=self.real_modname,
- check_module=members_check_module and not isattr)
-
- # reset current objects
- self.env.temp_data['autodoc:module'] = None
- self.env.temp_data['autodoc:class'] = None
-
- def generate(self, more_content: Any = None, real_modname: str = None,
- check_module: bool = False, all_members: bool = False) -> None:
- """Generate reST for the object given by *self.name*, and possibly for
- its members.
-
- If *more_content* is given, include that content. If *real_modname* is
- given, use that module name to find attribute docs. If *check_module* is
- True, only generate if the object is defined in the module name it is
- imported from. If *all_members* is True, document all members.
- """
- if not self.parse_name():
- # need a module to import
- logger.warning(
- __('don\'t know which module to import for autodocumenting '
- '%r (try placing a "module" or "currentmodule" directive '
- 'in the document, or giving an explicit module name)') %
- self.name, type='autodoc')
- return
-
- # now, import the module and get object to document
- if not self.import_object():
- return
-
- # If there is no real module defined, figure out which to use.
- # The real module is used in the module analyzer to look up the module
- # where the attribute documentation would actually be found in.
- # This is used for situations where you have a module that collects the
- # functions and classes of internal submodules.
- self.real_modname = real_modname or self.get_real_modname() # type: str
-
- # try to also get a source code analyzer for attribute docs
- try:
- self.analyzer = ModuleAnalyzer.for_module(self.real_modname)
- # parse right now, to get PycodeErrors on parsing (results will
- # be cached anyway)
- self.analyzer.find_attr_docs()
- except PycodeError as err:
- logger.debug('[autodoc] module analyzer failed: %s', err)
- # no source file -- e.g. for builtin and C modules
- self.analyzer = None
- # at least add the module.__file__ as a dependency
- if hasattr(self.module, '__file__') and self.module.__file__:
- self.directive.filename_set.add(self.module.__file__)
- else:
- self.directive.filename_set.add(self.analyzer.srcname)
-
- # check __module__ of object (for members not given explicitly)
- if check_module:
- if not self.check_module():
- return
-
- # document members, if possible
- self.document_members(all_members)
-
-
-class ModuleDocumenter(CustomDocumenter):
- """
- Specialized Documenter subclass for modules.
- """
- objtype = 'module'
- content_indent = ''
- titles_allowed = True
-
- option_spec = {
- 'members': members_option, 'undoc-members': bool_option,
- 'noindex': bool_option, 'inherited-members': bool_option,
- 'show-inheritance': bool_option, 'synopsis': identity,
- 'platform': identity, 'deprecated': bool_option,
- 'member-order': identity, 'exclude-members': members_set_option,
- 'private-members': bool_option, 'special-members': members_option,
- 'imported-members': bool_option, 'ignore-module-all': bool_option
- } # type: Dict[str, Callable]
-
- def __init__(self, *args: Any) -> None:
- super().__init__(*args)
- merge_members_option(self.options)
-
- @classmethod
- def can_document_member(cls, member: Any, membername: str, isattr: bool, parent: Any
- ) -> bool:
- # don't document submodules automatically
- return False
-
- def resolve_name(self, modname: str, parents: Any, path: str, base: Any
- ) -> Tuple[str, List[str]]:
- if modname is not None:
- logger.warning(__('"::" in automodule name doesn\'t make sense'),
- type='autodoc')
- return (path or '') + base, []
-
- def parse_name(self) -> bool:
- ret = super().parse_name()
- if self.args or self.retann:
- logger.warning(__('signature arguments or return annotation '
- 'given for automodule %s') % self.fullname,
- type='autodoc')
- return ret
-
- def add_directive_header(self, sig: str) -> None:
- Documenter.add_directive_header(self, sig)
-
- sourcename = self.get_sourcename()
-
- # add some module-specific options
- if self.options.synopsis:
- self.add_line(' :synopsis: ' + self.options.synopsis, sourcename)
- if self.options.platform:
- self.add_line(' :platform: ' + self.options.platform, sourcename)
- if self.options.deprecated:
- self.add_line(' :deprecated:', sourcename)
-
- def get_object_members(self, want_all: bool) -> Tuple[bool, List[Tuple[str, object]]]:
- if want_all:
- if (self.options.ignore_module_all or not
- hasattr(self.object, '__all__')):
- # for implicit module members, check __module__ to avoid
- # documenting imported objects
- return True, get_module_members(self.object)
- else:
- memberlist = self.object.__all__
- # Sometimes __all__ is broken...
- if not isinstance(memberlist, (list, tuple)) or not \
- all(isinstance(entry, str) for entry in memberlist):
- logger.warning(
- __('__all__ should be a list of strings, not %r '
- '(in module %s) -- ignoring __all__') %
- (memberlist, self.fullname),
- type='autodoc'
- )
- # fall back to all members
- return True, get_module_members(self.object)
- else:
- memberlist = self.options.members or []
- ret = []
- for mname in memberlist:
- try:
- ret.append((mname, safe_getattr(self.object, mname)))
- except AttributeError:
- logger.warning(
- __('missing attribute mentioned in :members: or __all__: '
- 'module %s, attribute %s') %
- (safe_getattr(self.object, '__name__', '???'), mname),
- type='autodoc'
- )
- return False, ret
diff --git a/docs/reinforcement/docs/_ext/overwriteobjectiondirective.txt b/docs/reinforcement/docs/_ext/overwriteobjectiondirective.txt
deleted file mode 100644
index e7ffdfe09a737771ead4a9c2ce1d0b945bb49947..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/_ext/overwriteobjectiondirective.txt
+++ /dev/null
@@ -1,374 +0,0 @@
-"""
- sphinx.directives
- ~~~~~~~~~~~~~~~~~
-
- Handlers for additional ReST directives.
-
- :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
- :license: BSD, see LICENSE for details.
-"""
-
-import re
-import inspect
-import importlib
-from functools import reduce
-from typing import TYPE_CHECKING, Any, Dict, Generic, List, Tuple, TypeVar, cast
-
-from docutils import nodes
-from docutils.nodes import Node
-from docutils.parsers.rst import directives, roles
-
-from sphinx import addnodes
-from sphinx.addnodes import desc_signature
-from sphinx.deprecation import RemovedInSphinx50Warning, deprecated_alias
-from sphinx.util import docutils, logging
-from sphinx.util.docfields import DocFieldTransformer, Field, TypedField
-from sphinx.util.docutils import SphinxDirective
-from sphinx.util.typing import OptionSpec
-
-if TYPE_CHECKING:
- from sphinx.application import Sphinx
-
-
-# RE to strip backslash escapes
-nl_escape_re = re.compile(r'\\\n')
-strip_backslash_re = re.compile(r'\\(.)')
-
-T = TypeVar('T')
-logger = logging.getLogger(__name__)
-
-def optional_int(argument: str) -> int:
- """
- Check for an integer argument or None value; raise ``ValueError`` if not.
- """
- if argument is None:
- return None
- else:
- value = int(argument)
- if value < 0:
- raise ValueError('negative value; must be positive or zero')
- return value
-
-def get_api(fullname):
- """
- 获取接口对象。
-
- :param fullname: 接口名全称
- :return: 属性对象或None(如果不存在)
- """
- main_module = fullname.split('.')[0]
- main_import = importlib.import_module(main_module)
-
- try:
- return reduce(getattr, fullname.split('.')[1:], main_import)
- except AttributeError:
- return None
-
-def get_example(name: str):
- try:
- api_doc = inspect.getdoc(get_api(name))
- example_str = re.findall(r'Examples:\n([\w\W]*?)(\n\n|$)', api_doc)
- if not example_str:
- return []
- example_str = re.sub(r'\n\s+', r'\n', example_str[0][0])
- example_str = example_str.strip()
- example_list = example_str.split('\n')
- return ["", "**样例:**", ""] + example_list + [""]
- except:
- return []
-
-def get_platforms(name: str):
- try:
- api_doc = inspect.getdoc(get_api(name))
- example_str = re.findall(r'Supported Platforms:\n\s+(.*?)\n\n', api_doc)
- if not example_str:
- example_str_leak = re.findall(r'Supported Platforms:\n\s+(.*)', api_doc)
- if example_str_leak:
- example_str = example_str_leak[0].strip()
- example_list = example_str.split('\n')
- example_list = [' ' + example_list[0]]
- return ["", "支持平台:"] + example_list + [""]
- return []
- example_str = example_str[0].strip()
- example_list = example_str.split('\n')
- example_list = [' ' + example_list[0]]
- return ["", "支持平台:"] + example_list + [""]
- except:
- return []
-
-class ObjectDescription(SphinxDirective, Generic[T]):
- """
- Directive to describe a class, function or similar object. Not used
- directly, but subclassed (in domain-specific directives) to add custom
- behavior.
- """
-
- has_content = True
- required_arguments = 1
- optional_arguments = 0
- final_argument_whitespace = True
- option_spec: OptionSpec = {
- 'noindex': directives.flag,
- } # type: Dict[str, DirectiveOption]
-
- # types of doc fields that this directive handles, see sphinx.util.docfields
- doc_field_types: List[Field] = []
- domain: str = None
- objtype: str = None
- indexnode: addnodes.index = None
-
- # Warning: this might be removed in future version. Don't touch this from extensions.
- _doc_field_type_map = {} # type: Dict[str, Tuple[Field, bool]]
-
- def get_field_type_map(self) -> Dict[str, Tuple[Field, bool]]:
- if self._doc_field_type_map == {}:
- self._doc_field_type_map = {}
- for field in self.doc_field_types:
- for name in field.names:
- self._doc_field_type_map[name] = (field, False)
-
- if field.is_typed:
- typed_field = cast(TypedField, field)
- for name in typed_field.typenames:
- self._doc_field_type_map[name] = (field, True)
-
- return self._doc_field_type_map
-
- def get_signatures(self) -> List[str]:
- """
- Retrieve the signatures to document from the directive arguments. By
- default, signatures are given as arguments, one per line.
-
- Backslash-escaping of newlines is supported.
- """
- lines = nl_escape_re.sub('', self.arguments[0]).split('\n')
- if self.config.strip_signature_backslash:
- # remove backslashes to support (dummy) escapes; helps Vim highlighting
- return [strip_backslash_re.sub(r'\1', line.strip()) for line in lines]
- else:
- return [line.strip() for line in lines]
-
- def handle_signature(self, sig: str, signode: desc_signature) -> Any:
- """
- Parse the signature *sig* into individual nodes and append them to
- *signode*. If ValueError is raised, parsing is aborted and the whole
- *sig* is put into a single desc_name node.
-
- The return value should be a value that identifies the object. It is
- passed to :meth:`add_target_and_index()` unchanged, and otherwise only
- used to skip duplicates.
- """
- raise ValueError
-
- def add_target_and_index(self, name: Any, sig: str, signode: desc_signature) -> None:
- """
- Add cross-reference IDs and entries to self.indexnode, if applicable.
-
- *name* is whatever :meth:`handle_signature()` returned.
- """
- return # do nothing by default
-
- def before_content(self) -> None:
- """
- Called before parsing content. Used to set information about the current
- directive context on the build environment.
- """
- pass
-
- def transform_content(self, contentnode: addnodes.desc_content) -> None:
- """
- Called after creating the content through nested parsing,
- but before the ``object-description-transform`` event is emitted,
- and before the info-fields are transformed.
- Can be used to manipulate the content.
- """
- pass
-
- def after_content(self) -> None:
- """
- Called after parsing content. Used to reset information about the
- current directive context on the build environment.
- """
- pass
-
- def check_class_end(self, content):
- for i in content:
- if not i.startswith('.. include::') and i != "\n" and i != "":
- return False
- return True
-
- def extend_items(self, rst_file, start_num, num):
- ls = []
- for i in range(1, num+1):
- ls.append((rst_file, start_num+i))
- return ls
-
- def run(self) -> List[Node]:
- """
- Main directive entry function, called by docutils upon encountering the
- directive.
-
- This directive is meant to be quite easily subclassable, so it delegates
- to several additional methods. What it does:
-
- * find out if called as a domain-specific directive, set self.domain
- * create a `desc` node to fit all description inside
- * parse standard options, currently `noindex`
- * create an index node if needed as self.indexnode
- * parse all given signatures (as returned by self.get_signatures())
- using self.handle_signature(), which should either return a name
- or raise ValueError
- * add index entries using self.add_target_and_index()
- * parse the content and handle doc fields in it
- """
- if ':' in self.name:
- self.domain, self.objtype = self.name.split(':', 1)
- else:
- self.domain, self.objtype = '', self.name
- self.indexnode = addnodes.index(entries=[])
-
- node = addnodes.desc()
- node.document = self.state.document
- node['domain'] = self.domain
- # 'desctype' is a backwards compatible attribute
- node['objtype'] = node['desctype'] = self.objtype
- node['noindex'] = noindex = ('noindex' in self.options)
- if self.domain:
- node['classes'].append(self.domain)
- node['classes'].append(node['objtype'])
-
- self.names: List[T] = []
- signatures = self.get_signatures()
- for sig in signatures:
- # add a signature node for each signature in the current unit
- # and add a reference target for it
- signode = addnodes.desc_signature(sig, '')
- self.set_source_info(signode)
- node.append(signode)
- try:
- # name can also be a tuple, e.g. (classname, objname);
- # this is strictly domain-specific (i.e. no assumptions may
- # be made in this base class)
- name = self.handle_signature(sig, signode)
- except ValueError:
- # signature parsing failed
- signode.clear()
- signode += addnodes.desc_name(sig, sig)
- continue # we don't want an index entry here
- if name not in self.names:
- self.names.append(name)
- if not noindex:
- # only add target and index entry if this is the first
- # description of the object with this name in this desc block
- self.add_target_and_index(name, sig, signode)
-
- contentnode = addnodes.desc_content()
- node.append(contentnode)
- if self.names:
- # needed for association of version{added,changed} directives
- self.env.temp_data['object'] = self.names[0]
- self.before_content()
- try:
- example = get_example(self.names[0][0])
- platforms = get_platforms(self.names[0][0])
- except Exception as e:
- example = ''
- platforms = ''
- logger.warning(f'Error API names in {self.arguments[0]}.')
- logger.warning(f'{e}')
- extra = platforms + example
- if extra:
- if self.objtype == "method":
- self.content.data.extend(extra)
- else:
- index_num = 0
- for num, i in enumerate(self.content.data):
- if i.startswith('.. py:method::') or self.check_class_end(self.content.data[num:]):
- index_num = num
- break
- if index_num:
- count = len(self.content.data)
- for i in extra:
- self.content.data.insert(index_num-count, i)
- else:
- self.content.data.extend(extra)
- try:
- self.content.items.extend(self.extend_items(self.content.items[0][0], self.content.items[-1][1], len(extra)))
- except Exception as e:
- logger.warning(f'{e}')
- self.state.nested_parse(self.content, self.content_offset, contentnode)
- self.transform_content(contentnode)
- self.env.app.emit('object-description-transform',
- self.domain, self.objtype, contentnode)
- DocFieldTransformer(self).transform_all(contentnode)
- self.env.temp_data['object'] = None
- self.after_content()
- return [self.indexnode, node]
-
-
-class DefaultRole(SphinxDirective):
- """
- Set the default interpreted text role. Overridden from docutils.
- """
-
- optional_arguments = 1
- final_argument_whitespace = False
-
- def run(self) -> List[Node]:
- if not self.arguments:
- docutils.unregister_role('')
- return []
- role_name = self.arguments[0]
- role, messages = roles.role(role_name, self.state_machine.language,
- self.lineno, self.state.reporter)
- if role:
- docutils.register_role('', role)
- self.env.temp_data['default_role'] = role_name
- else:
- literal_block = nodes.literal_block(self.block_text, self.block_text)
- reporter = self.state.reporter
- error = reporter.error('Unknown interpreted text role "%s".' % role_name,
- literal_block, line=self.lineno)
- messages += [error]
-
- return cast(List[nodes.Node], messages)
-
-
-class DefaultDomain(SphinxDirective):
- """
- Directive to (re-)set the default domain for this source file.
- """
-
- has_content = False
- required_arguments = 1
- optional_arguments = 0
- final_argument_whitespace = False
- option_spec = {} # type: Dict
-
- def run(self) -> List[Node]:
- domain_name = self.arguments[0].lower()
- # if domain_name not in env.domains:
- # # try searching by label
- # for domain in env.domains.values():
- # if domain.label.lower() == domain_name:
- # domain_name = domain.name
- # break
- self.env.temp_data['default_domain'] = self.env.domains.get(domain_name)
- return []
-
-def setup(app: "Sphinx") -> Dict[str, Any]:
- app.add_config_value("strip_signature_backslash", False, 'env')
- directives.register_directive('default-role', DefaultRole)
- directives.register_directive('default-domain', DefaultDomain)
- directives.register_directive('describe', ObjectDescription)
- # new, more consistent, name
- directives.register_directive('object', ObjectDescription)
-
- app.add_event('object-description-transform')
-
- return {
- 'version': 'builtin',
- 'parallel_read_safe': True,
- 'parallel_write_safe': True,
- }
-
diff --git a/docs/reinforcement/docs/_ext/overwriteviewcode.txt b/docs/reinforcement/docs/_ext/overwriteviewcode.txt
deleted file mode 100644
index 172780ec56b3ed90e7b0add617257a618cf38ee0..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/_ext/overwriteviewcode.txt
+++ /dev/null
@@ -1,378 +0,0 @@
-"""
- sphinx.ext.viewcode
- ~~~~~~~~~~~~~~~~~~~
-
- Add links to module code in Python object descriptions.
-
- :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS.
- :license: BSD, see LICENSE for details.
-"""
-
-import posixpath
-import traceback
-import warnings
-from os import path
-from typing import Any, Dict, Generator, Iterable, Optional, Set, Tuple, cast
-
-from docutils import nodes
-from docutils.nodes import Element, Node
-
-import sphinx
-from sphinx import addnodes
-from sphinx.application import Sphinx
-from sphinx.builders import Builder
-from sphinx.builders.html import StandaloneHTMLBuilder
-from sphinx.deprecation import RemovedInSphinx50Warning
-from sphinx.environment import BuildEnvironment
-from sphinx.locale import _, __
-from sphinx.pycode import ModuleAnalyzer
-from sphinx.transforms.post_transforms import SphinxPostTransform
-from sphinx.util import get_full_modname, logging, status_iterator
-from sphinx.util.nodes import make_refnode
-
-
-logger = logging.getLogger(__name__)
-
-
-OUTPUT_DIRNAME = '_modules'
-
-
-class viewcode_anchor(Element):
- """Node for viewcode anchors.
-
- This node will be processed in the resolving phase.
- For viewcode supported builders, they will be all converted to the anchors.
- For not supported builders, they will be removed.
- """
-
-
-def _get_full_modname(app: Sphinx, modname: str, attribute: str) -> Optional[str]:
- try:
- return get_full_modname(modname, attribute)
- except AttributeError:
- # sphinx.ext.viewcode can't follow class instance attribute
- # then AttributeError logging output only verbose mode.
- logger.verbose('Didn\'t find %s in %s', attribute, modname)
- return None
- except Exception as e:
- # sphinx.ext.viewcode follow python domain directives.
- # because of that, if there are no real modules exists that specified
- # by py:function or other directives, viewcode emits a lot of warnings.
- # It should be displayed only verbose mode.
- logger.verbose(traceback.format_exc().rstrip())
- logger.verbose('viewcode can\'t import %s, failed with error "%s"', modname, e)
- return None
-
-
-def is_supported_builder(builder: Builder) -> bool:
- if builder.format != 'html':
- return False
- elif builder.name == 'singlehtml':
- return False
- elif builder.name.startswith('epub') and not builder.config.viewcode_enable_epub:
- return False
- else:
- return True
-
-
-def doctree_read(app: Sphinx, doctree: Node) -> None:
- env = app.builder.env
- if not hasattr(env, '_viewcode_modules'):
- env._viewcode_modules = {} # type: ignore
-
- def has_tag(modname: str, fullname: str, docname: str, refname: str) -> bool:
- entry = env._viewcode_modules.get(modname, None) # type: ignore
- if entry is False:
- return False
-
- code_tags = app.emit_firstresult('viewcode-find-source', modname)
- if code_tags is None:
- try:
- analyzer = ModuleAnalyzer.for_module(modname)
- analyzer.find_tags()
- except Exception:
- env._viewcode_modules[modname] = False # type: ignore
- return False
-
- code = analyzer.code
- tags = analyzer.tags
- else:
- code, tags = code_tags
-
- if entry is None or entry[0] != code:
- entry = code, tags, {}, refname
- env._viewcode_modules[modname] = entry # type: ignore
- _, tags, used, _ = entry
- if fullname in tags:
- used[fullname] = docname
- return True
-
- return False
-
- for objnode in list(doctree.findall(addnodes.desc)):
- if objnode.get('domain') != 'py':
- continue
- names: Set[str] = set()
- for signode in objnode:
- if not isinstance(signode, addnodes.desc_signature):
- continue
- modname = signode.get('module')
- fullname = signode.get('fullname')
- try:
- if fullname and modname==None:
- if fullname.split('.')[-1].lower() == fullname.split('.')[-1] and fullname.split('.')[-2].lower() != fullname.split('.')[-2]:
- modname = '.'.join(fullname.split('.')[:-2])
- fullname = '.'.join(fullname.split('.')[-2:])
- else:
- modname = '.'.join(fullname.split('.')[:-1])
- fullname = fullname.split('.')[-1]
- fullname_new = fullname
- except Exception:
- logger.warning(f'error_modename:{modname}')
- logger.warning(f'error_fullname:{fullname}')
- refname = modname
- if env.config.viewcode_follow_imported_members:
- new_modname = app.emit_firstresult(
- 'viewcode-follow-imported', modname, fullname,
- )
- if not new_modname:
- new_modname = _get_full_modname(app, modname, fullname)
- modname = new_modname
- # logger.warning(f'new_modename:{modname}')
- if not modname:
- continue
- # fullname = signode.get('fullname')
- # if fullname and modname==None:
- fullname = fullname_new
- if not has_tag(modname, fullname, env.docname, refname):
- continue
- if fullname in names:
- # only one link per name, please
- continue
- names.add(fullname)
- pagename = posixpath.join(OUTPUT_DIRNAME, modname.replace('.', '/'))
- signode += viewcode_anchor(reftarget=pagename, refid=fullname, refdoc=env.docname)
-
-
-def env_merge_info(app: Sphinx, env: BuildEnvironment, docnames: Iterable[str],
- other: BuildEnvironment) -> None:
- if not hasattr(other, '_viewcode_modules'):
- return
- # create a _viewcode_modules dict on the main environment
- if not hasattr(env, '_viewcode_modules'):
- env._viewcode_modules = {} # type: ignore
- # now merge in the information from the subprocess
- for modname, entry in other._viewcode_modules.items(): # type: ignore
- if modname not in env._viewcode_modules: # type: ignore
- env._viewcode_modules[modname] = entry # type: ignore
- else:
- if env._viewcode_modules[modname]: # type: ignore
- used = env._viewcode_modules[modname][2] # type: ignore
- for fullname, docname in entry[2].items():
- if fullname not in used:
- used[fullname] = docname
-
-
-def env_purge_doc(app: Sphinx, env: BuildEnvironment, docname: str) -> None:
- modules = getattr(env, '_viewcode_modules', {})
-
- for modname, entry in list(modules.items()):
- if entry is False:
- continue
-
- code, tags, used, refname = entry
- for fullname in list(used):
- if used[fullname] == docname:
- used.pop(fullname)
-
- if len(used) == 0:
- modules.pop(modname)
-
-
-class ViewcodeAnchorTransform(SphinxPostTransform):
- """Convert or remove viewcode_anchor nodes depends on builder."""
- default_priority = 100
-
- def run(self, **kwargs: Any) -> None:
- if is_supported_builder(self.app.builder):
- self.convert_viewcode_anchors()
- else:
- self.remove_viewcode_anchors()
-
- def convert_viewcode_anchors(self) -> None:
- for node in self.document.findall(viewcode_anchor):
- anchor = nodes.inline('', _('[源代码]'), classes=['viewcode-link'])
- refnode = make_refnode(self.app.builder, node['refdoc'], node['reftarget'],
- node['refid'], anchor)
- node.replace_self(refnode)
-
- def remove_viewcode_anchors(self) -> None:
- for node in list(self.document.findall(viewcode_anchor)):
- node.parent.remove(node)
-
-
-def missing_reference(app: Sphinx, env: BuildEnvironment, node: Element, contnode: Node
- ) -> Optional[Node]:
- # resolve our "viewcode" reference nodes -- they need special treatment
- if node['reftype'] == 'viewcode':
- warnings.warn('viewcode extension is no longer use pending_xref node. '
- 'Please update your extension.', RemovedInSphinx50Warning)
- return make_refnode(app.builder, node['refdoc'], node['reftarget'],
- node['refid'], contnode)
-
- return None
-
-
-def get_module_filename(app: Sphinx, modname: str) -> Optional[str]:
- """Get module filename for *modname*."""
- source_info = app.emit_firstresult('viewcode-find-source', modname)
- if source_info:
- return None
- else:
- try:
- filename, source = ModuleAnalyzer.get_module_source(modname)
- return filename
- except Exception:
- return None
-
-
-def should_generate_module_page(app: Sphinx, modname: str) -> bool:
- """Check generation of module page is needed."""
- module_filename = get_module_filename(app, modname)
- if module_filename is None:
- # Always (re-)generate module page when module filename is not found.
- return True
-
- builder = cast(StandaloneHTMLBuilder, app.builder)
- basename = modname.replace('.', '/') + builder.out_suffix
- page_filename = path.join(app.outdir, '_modules/', basename)
-
- try:
- if path.getmtime(module_filename) <= path.getmtime(page_filename):
- # generation is not needed if the HTML page is newer than module file.
- return False
- except IOError:
- pass
-
- return True
-
-
-def collect_pages(app: Sphinx) -> Generator[Tuple[str, Dict[str, Any], str], None, None]:
- env = app.builder.env
- if not hasattr(env, '_viewcode_modules'):
- return
- if not is_supported_builder(app.builder):
- return
- highlighter = app.builder.highlighter # type: ignore
- urito = app.builder.get_relative_uri
-
- modnames = set(env._viewcode_modules) # type: ignore
-
- for modname, entry in status_iterator(
- sorted(env._viewcode_modules.items()), # type: ignore
- __('highlighting module code... '), "blue",
- len(env._viewcode_modules), # type: ignore
- app.verbosity, lambda x: x[0]):
- if not entry:
- continue
- if not should_generate_module_page(app, modname):
- continue
-
- code, tags, used, refname = entry
- # construct a page name for the highlighted source
- pagename = posixpath.join(OUTPUT_DIRNAME, modname.replace('.', '/'))
- # highlight the source using the builder's highlighter
- if env.config.highlight_language in ('python3', 'default', 'none'):
- lexer = env.config.highlight_language
- else:
- lexer = 'python'
- highlighted = highlighter.highlight_block(code, lexer, linenos=False)
- # split the code into lines
- lines = highlighted.splitlines()
- # split off wrap markup from the first line of the actual code
- before, after = lines[0].split('')
- lines[0:1] = [before + '', after]
- # nothing to do for the last line; it always starts with
anyway
- # now that we have code lines (starting at index 1), insert anchors for
- # the collected tags (HACK: this only works if the tag boundaries are
- # properly nested!)
- maxindex = len(lines) - 1
- for name, docname in used.items():
- type, start, end = tags[name]
- backlink = urito(pagename, docname) + '#' + refname + '.' + name
- lines[start] = (
- '%s' % (name, backlink, _('[文档]')) +
- lines[start])
- lines[min(end, maxindex)] += '
'
- # try to find parents (for submodules)
- parents = []
- parent = modname
- while '.' in parent:
- parent = parent.rsplit('.', 1)[0]
- if parent in modnames:
- parents.append({
- 'link': urito(pagename,
- posixpath.join(OUTPUT_DIRNAME, parent.replace('.', '/'))),
- 'title': parent})
- parents.append({'link': urito(pagename, posixpath.join(OUTPUT_DIRNAME, 'index')),
- 'title': _('Module code')})
- parents.reverse()
- # putting it all together
- context = {
- 'parents': parents,
- 'title': modname,
- 'body': (_('Source code for %s
') % modname +
- '\n'.join(lines)),
- }
- yield (pagename, context, 'page.html')
-
- if not modnames:
- return
-
- html = ['\n']
- # the stack logic is needed for using nested lists for submodules
- stack = ['']
- for modname in sorted(modnames):
- if modname.startswith(stack[-1]):
- stack.append(modname + '.')
- html.append('
')
- else:
- stack.pop()
- while not modname.startswith(stack[-1]):
- stack.pop()
- html.append('
')
- stack.append(modname + '.')
- html.append('%s\n' % (
- urito(posixpath.join(OUTPUT_DIRNAME, 'index'),
- posixpath.join(OUTPUT_DIRNAME, modname.replace('.', '/'))),
- modname))
- html.append('' * (len(stack) - 1))
- context = {
- 'title': _('Overview: module code'),
- 'body': (_('All modules for which code is available
') +
- ''.join(html)),
- }
-
- yield (posixpath.join(OUTPUT_DIRNAME, 'index'), context, 'page.html')
-
-
-def setup(app: Sphinx) -> Dict[str, Any]:
- app.add_config_value('viewcode_import', None, False)
- app.add_config_value('viewcode_enable_epub', False, False)
- app.add_config_value('viewcode_follow_imported_members', True, False)
- app.connect('doctree-read', doctree_read)
- app.connect('env-merge-info', env_merge_info)
- app.connect('env-purge-doc', env_purge_doc)
- app.connect('html-collect-pages', collect_pages)
- app.connect('missing-reference', missing_reference)
- # app.add_config_value('viewcode_include_modules', [], 'env')
- # app.add_config_value('viewcode_exclude_modules', [], 'env')
- app.add_event('viewcode-find-source')
- app.add_event('viewcode-follow-imported')
- app.add_post_transform(ViewcodeAnchorTransform)
- return {
- 'version': sphinx.__display_version__,
- 'env_version': 1,
- 'parallel_read_safe': True
- }
diff --git a/docs/reinforcement/docs/_ext/rename_include.py b/docs/reinforcement/docs/_ext/rename_include.py
deleted file mode 100644
index bf7dea25f3ee7fd371659e80a3551439fbddee5a..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/_ext/rename_include.py
+++ /dev/null
@@ -1,60 +0,0 @@
-"""Rename .rst file to .txt file for include directive."""
-import os
-import re
-import glob
-import logging
-
-logging.basicConfig(level=logging.WARNING, format='%(message)s')
-logger = logging.getLogger(__name__)
-
-origin = "rst"
-replace = "txt"
-
-include_re = re.compile(r'\.\. include::\s+(.*?)(\.rst|\.txt)')
-include_re_sub = re.compile(rf'(\.\. include::\s+(.*?))\.{origin}')
-
-# Specified file_name lists excluded from rename procedure.
-whitepaper = ['operations.rst']
-
-def repl(matchobj):
- """Replace functions for matched."""
- if matchobj.group(2).split('/')[-1] + f'.{origin}' in whitepaper:
- return matchobj.group(0)
- return rf'{matchobj.group(1)}.{replace}'
-
-def rename_include(api_dir):
- """
- Rename .rst file to .txt file for include directive.
-
- api_dir - api path relative.
- """
- tar = []
- for root, _, files in os.walk(api_dir):
- for file in files:
- if not file.endswith('.rst'):
- continue
- try:
- with open(os.path.join(root, file), 'r+', encoding='utf-8') as f:
- content = f.read()
- tar_ = include_re.findall(content)
- if tar_:
- tar_ = [i[0].split('/')[-1]+f'.{origin}' for i in tar_]
- tar.extend(tar_)
- sub = include_re_sub.findall(content)
- if sub:
- content_ = include_re_sub.sub(repl, content)
- f.seek(0)
- f.truncate()
- f.write(content_)
- except UnicodeDecodeError:
- # pylint: disable=logging-fstring-interpolation
- logger.warning(f"UnicodeDecodeError for: {file}")
-
- all_rst = glob.glob(f'{api_dir}/**/*.{origin}', recursive=True)
-
- for i in all_rst:
- if os.path.dirname(i).endswith("api_python") or os.path.basename(i) in whitepaper:
- continue
- name = os.path.basename(i)
- if name in tar:
- os.rename(i, i.replace(f'.{origin}', f'.{replace}'))
diff --git a/docs/reinforcement/docs/requirements.txt b/docs/reinforcement/docs/requirements.txt
deleted file mode 100644
index a1b6a69f6dbd9c6f78710f56889e14f0e85b27f4..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/requirements.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-sphinx == 4.4.0
-docutils == 0.17.1
-myst-parser == 0.18.1
-sphinx_rtd_theme == 1.0.0
-numpy
-IPython
-jieba
diff --git a/docs/reinforcement/docs/source_en/conf.py b/docs/reinforcement/docs/source_en/conf.py
deleted file mode 100644
index f5dd393cf298d0ae60f59aef88c3106a93b9d1cf..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_en/conf.py
+++ /dev/null
@@ -1,169 +0,0 @@
-# Configuration file for the Sphinx documentation builder.
-#
-# This file only contains a selection of the most common options. For a full
-# list see the documentation:
-# https://www.sphinx-doc.org/en/master/usage/configuration.html
-
-# -- Path setup --------------------------------------------------------------
-
-# If extensions (or modules to document with autodoc) are in another directory,
-# add these directories to sys.path here. If the directory is relative to the
-# documentation root, use os.path.abspath to make it absolute, like shown here.
-#
-import os
-import shutil
-import sys
-import IPython
-import re
-sys.path.append(os.path.abspath('../_ext'))
-from sphinx.ext import autodoc as sphinx_autodoc
-
-import mindspore_rl
-
-# -- Project information -----------------------------------------------------
-
-project = 'MindSpore Reinforcement'
-copyright = 'MindSpore'
-author = 'MindSpore'
-
-# The full version, including alpha/beta/rc tags
-release = 'master'
-
-
-# -- General configuration ---------------------------------------------------
-
-# Add any Sphinx extension module names here, as strings. They can be
-# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
-# ones.
-myst_enable_extensions = ["dollarmath", "amsmath"]
-
-
-myst_heading_anchors = 5
-extensions = [
- 'sphinx.ext.autodoc',
- 'sphinx.ext.doctest',
- 'sphinx.ext.intersphinx',
- 'sphinx.ext.todo',
- 'sphinx.ext.coverage',
- 'sphinx.ext.napoleon',
- 'sphinx.ext.viewcode',
- 'myst_parser',
- 'sphinx.ext.mathjax',
- 'IPython.sphinxext.ipython_console_highlighting'
-]
-
-source_suffix = {
- '.rst': 'restructuredtext',
- '.md': 'markdown',
-}
-
-# List of patterns, relative to source directory, that match files and
-# directories to ignore when looking for source files.
-# This pattern also affects html_static_path and html_extra_path.
-mathjax_path = 'https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/mathjax/MathJax-3.2.2/es5/tex-mml-chtml.js'
-
-mathjax_options = {
- 'async':'async'
-}
-
-smartquotes_action = 'De'
-
-exclude_patterns = []
-
-pygments_style = 'sphinx'
-
-# -- Options for HTML output -------------------------------------------------
-
-# The theme to use for HTML and HTML Help pages. See the documentation for
-# a list of builtin themes.
-#
-html_theme = 'sphinx_rtd_theme'
-
-import sphinx_rtd_theme
-layout_target = os.path.join(os.path.dirname(sphinx_rtd_theme.__file__), 'layout.html')
-layout_src = '../../../../resource/_static/layout.html'
-if os.path.exists(layout_target):
- os.remove(layout_target)
-shutil.copy(layout_src, layout_target)
-
-html_search_language = 'en'
-
-# Example configuration for intersphinx: refer to the Python standard library.
-intersphinx_mapping = {
- 'python': ('https://docs.python.org/3', '../../../../resource/python_objects.inv'),
-}
-
-# Modify default signatures for autodoc.
-autodoc_source_path = os.path.abspath(sphinx_autodoc.__file__)
-autodoc_source_re = re.compile(r'stringify_signature\(.*?\)')
-get_param_func_str = r"""\
-import re
-import inspect as inspect_
-
-def get_param_func(func):
- try:
- source_code = inspect_.getsource(func)
- if func.__doc__:
- source_code = source_code.replace(func.__doc__, '')
- all_params_str = re.findall(r"def [\w_\d\-]+\(([\S\s]*?)(\):|\) ->.*?:)", source_code)
- all_params = re.sub("(self|cls)(,|, )?", '', all_params_str[0][0].replace("\n", "").replace("'", "\""))
- return all_params
- except:
- return ''
-
-def get_obj(obj):
- if isinstance(obj, type):
- return obj.__init__
-
- return obj
-"""
-
-with open(autodoc_source_path, "r+", encoding="utf8") as f:
- code_str = f.read()
- code_str = autodoc_source_re.sub('"(" + get_param_func(get_obj(self.object)) + ")"', code_str, count=0)
- exec(get_param_func_str, sphinx_autodoc.__dict__)
- exec(code_str, sphinx_autodoc.__dict__)
-
-# Copy source files of chinese python api from mindscience repository.
-from sphinx.util import logging
-logger = logging.getLogger(__name__)
-
-src_dir_rm = os.path.join(os.getenv("RM_PATH"), 'docs/api/api_python_en')
-
-present_path = os.path.dirname(__file__)
-
-for i in os.listdir(src_dir_rm):
- if os.path.isfile(os.path.join(src_dir_rm,i)):
- if os.path.exists('./'+i):
- os.remove('./'+i)
- shutil.copy(os.path.join(src_dir_rm,i),'./'+i)
- else:
- if os.path.exists('./'+i):
- shutil.rmtree('./'+i)
- shutil.copytree(os.path.join(src_dir_rm,i),'./'+i)
-
-sys.path.append(os.path.abspath('../../../../resource/sphinx_ext'))
-# import anchor_mod
-import nbsphinx_mod
-
-sys.path.append(os.path.abspath('../../../../resource/search'))
-import search_code
-
-sys.path.append(os.path.abspath('../../../../resource/custom_directives'))
-from custom_directives import IncludeCodeDirective
-
-def setup(app):
- app.add_directive('includecode', IncludeCodeDirective)
-
-src_release = os.path.join(os.getenv("RM_PATH"), 'RELEASE.md')
-des_release = "./RELEASE.md"
-with open(src_release, "r", encoding="utf-8") as f:
- data = f.read()
-if len(re.findall("\n## (.*?)\n",data)) > 1:
- content = re.findall("(## [\s\S\n]*?)\n## ", data)
-else:
- content = re.findall("(## [\s\S\n]*)", data)
-#result = content[0].replace('# MindSpore', '#', 1)
-with open(des_release, "w", encoding="utf-8") as p:
- p.write("# Release Notes"+"\n\n")
- p.write(content[0])
\ No newline at end of file
diff --git a/docs/reinforcement/docs/source_en/custom_config_info.md b/docs/reinforcement/docs/source_en/custom_config_info.md
deleted file mode 100644
index 8fb76f38c10d930c98949a9edbfea443e0ac7be5..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_en/custom_config_info.md
+++ /dev/null
@@ -1,190 +0,0 @@
-# MindSpore RL Configuration Instruction
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/reinforcement/docs/source_en/custom_config_info.md)
-
-
-## Overview
-
-Recent years, deep reinforcement learning is developing by leaps and bounds, new algorithms come out every year. To offer high scalability and reuable reinforcement framework, MindSpore RL separates an algorithm into several parts, such as Actor, Learner, Policy, Environment, ReplayBuffer, etc. Moreover, due to the complexity of deep reinforcement learning algorithm, its performance is largely influenced by different hyper-parameters. MindSpore RL provides centeral configuration API, which decouples the algorithm from deployment and execution considerations to help users adjust model and algorithm conveniently.
-
-This instruction uses DQN algorithm as an example to introduce how to use this configuration API, and help users customize their algorithms.
-
-You can obtain the code of DQN algorithm from [https://github.com/mindspore-lab/mindrl/tree/master/example/dqn](https://github.com/mindspore-lab/mindrl/tree/master/example/dqn).
-
-## Configuration Details
-
-MindSpore RL uses `algorithm_config` to define each algorithm component and corresponding hyper-parameters. `algorithm_config` is a Python dictionary, which describes actor, learner, policy, collect_environment, eval_environment and replay buffer respectively. Framework can arrange the execution and deployment, which means that user only needs to focus on the algorithm design.
-
-The following code defines a set of algorithm configurations and uses `algorithm_config` to create a `Session`. `Session` is responsible for allocating resources and executing computational graph compilation and execution.
-
-```python
-from mindspore_rl.mindspore_rl import Session
-algorithm_config = {
- 'actor': {...},
- 'learner': {...},
- 'policy_and_network': {...},
- 'collect_environment': {...},
- 'eval_environment': {...},
- 'replay_buffer': {...}
-}
-
-session = Session(algorithm_config)
-session.run(...)
-```
-
-Each parameter and their instruction in algorithm_config will be described below.
-
-### Policy Configuration
-
-Policy is usually used to determine the behaviour (or action) that agent will execute in the next step, it takes `type` and `params` as the subitems.
-
-- `type` : specify the name of Policy, Actor determines the action through Policy. In deep reinforcement learning, Policy usually uses deep neural network to extract the feature of environment, and outputs the action in the next step.
-- `params` : specify the parameter that used during creating the instance of Policy. One thing should be noticed is that `type` and `params` need to be matched.
-
-```python
-from dqn.src.dqn import DQNPolicy
-
-policy_params = {
- 'epsi_high': 0.1, # epsi_high/epsi_low/decay control the proportion of exploitation and exploration
- 'epsi_low': 0.1, # epsi_high: the highest probability of exploration, epsi_low: the lowest probability of exploration
- 'decay': 200, # decay: the step decay
- 'state_space_dim': 0, # the dimension of state space, 0 means that it will read from the environment automatically
- 'action_space_dim': 0, # the dimension of action space, 0 means that it will read from the environment automatically
- 'hidden_size': 100, # the dimension of hidden layer
-}
-
-algorithm_config = {
- ...
- 'policy_and_network': {
- 'type': DQNPolicy,
- 'params': policy_params,
- },
- ...
-}
-```
-
-| key | Type | Range | Description |
-| :--------------: | :--------: | :-------------------------------------: | :----------------------------------------------------------: |
-| type | Class | The user-defined class | This type is the same name as user-defined class |
-| params(optional) | Dictionary | Any value with key value format or None | Customized parameter, user can input any value with key value format |
-
-### Environment Configuration
-
-`collect_environment` and `eval_environment` are used to collect experience during interaction with environment and evaluate model after training respectively. `number`, `type` and `params` need to be provided to create their instances.
-
-- `number`: number of environment used in the algorithm.
-- `type` : specify the name of environment, which could be either environment from MindSpore RL, such as `GymEnvironment` or user defined environment.
-- `params` : specify the parameter that used during creating the instance of environment. One thing should be noticed is that `type` and `params` need to be matched.
-
-The following example defines the configuration of environment. Framework will create a `CartPole-v0` environment like `Environment(name='CartPole-v0')` . The configuration of `collect_environment` and `eval_environment` are the same.
-
-```python
-from mindspore_rl.environment import GymEnvironment
-collect_env_params = {'name': 'CartPole-v0'}
-eval_env_params = {'name': 'CartPole-v0'}
-algorithm_config = {
- ...
- 'collect_environment': {
- 'number': 1,
- 'type': GymEnvironment, # the class name of environment
- 'params': collect_env_params # parameter of environment
- },
- 'eval_environment': {
- 'number': 1,
- 'type': GymEnvironment, # the class name of environment
- 'params': eval_env_params # parameter of environment
- },
- ...
-}
-```
-
-| key | Type | Range | Description |
-| :--------------------: | :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
-| number (optional) | Integer | [1, +∞) | When user fills the number of environment, number must be larger than 0. When user does not fill it, framework will not wrap environment by `MultiEnvironmentWrapper` |
-| num_parallel(optional) | Integer | [1, number] | If user does not fill it, the environment will run in parallel by default. User can fill num_parallel: 1 to turn off the parallel environment, or enter their own parallel configuration |
-| type | Class | The subclass of environment that is user-defined and implemented | The class name of environment |
-| params | Dictionary | Any value with key value format or None | Customized parameter, user can input any value with key value format |
-
-### Actor Configuration
-
-`Actor` is charge of interacting with environment. Generally, `Actor` interacts with `Env` through `Policy`. Some algorithms will store the experience which obtained during the interaction into `ReplayBuffer`. Therefore, `Actor` will hold the `Policy` and `Environment` and create the `ReplayBuffer` as needed. In Actor configuration, `policies` and `networks` need to specify the name of member variable in `Policy`.
-
-The following code defines the configuration of `DQNActor` . Framework will create the instance of Actor like `DQNActor(algorithm_config['actor'])`.
-
-```python
-algorithm_config = {
- ...
- 'actor': {
- 'number': 1, # the number of Actor
- 'type': DQNActor, # the class name of Actor
- 'policies': ['init_policy', 'collect_policy', 'eval_policy'], # Take the policies that called init_policy, collect_policy and eval_policy in Policy class as input to create the instance of actor
- 'share_env': True # Whether the environment is shared by each actor
- }
- ...
-}
-```
-
-| key | Type | Range | Description |
-| :-----------------: | :------------: | :--------------------------------------------------------: | :----------------------------------------------------------: |
-| number | Integer | [1, +∞) | Number of Actor, currently only support 1 |
-| type | Class | The subclass of actor that is user-defined and implemented | This type is the same name as the subclass of actor that is user-defined and implemented |
-| params(optional) | Dictionary | Any value with key value format or None | Customized parameter, user can input any value with key value format |
-| policies | List of String | Same variable name as the user-defined policies | Every string in list must correspond one-to-one with the name of the policies initialized in the user-defined policy class |
-| networks(optional) | List of String | Same variable name as the user-defined networks | Every string in list must correspond one-to-one with the name of the networks initialized in the user-defined policy class |
-| share_env(optional) | Boolean | True or False | Default: True, means every actor will share one `collect_environment`. Else, we will create an instance of `collect_environment` for each actor. |
-
-### ReplayBuffer Configuration
-
-For part of algorithms, `ReplayBuffer` is used to store experience which is obtained by interaction between actor and environment. Then experience will be used to train the network.
-
-```python
-from mindspore_rl.core.replay_buffer import ReplayBuffer
-algorithm_config = {
- ...
- 'replay_buffer': {'number': 1,
- 'type': ReplayBuffer,
- 'capacity': 100000, # the capacity of ReplayBuffer
- 'sample_size': 64, # sample Batch Size
- 'data_shape': [(4,), (1,), (1,), (4,)], # the dimension info of ReplayBuffer
- 'data_type': [ms.float32, ms.int32, ms.float32, ms.float32]}, # the data type of ReplayBuffer
-}
-```
-
-| key | Type | Range | Description |
-| :-------------------: | :-------------------------: | :-----------------------------------------: | :----------------------------------------------------------: |
-| number | Integer | [1, +∞) | Number of replaybuffer created |
-| type | Class | User-defined or provided ReplayBuffer class | This type is the same name as the user-defined or provided ReplayBuffer class |
-| capacity | Integer | [0, +∞) | The capacity of ReplayBuffer |
-| data_shape | List of Integer Tuple | [0, +∞) | The first number of tuple must equal to number of environment |
-| data_type | List of mindspore data type | Belongs to MindSpore data type | The length of this list must equal to the length of data_shape |
-| sample_size(optional) | Integer | [0, capacity] | The maximum value is the capacity of replay buffer. Default 1 |
-
-### Learner Configuration
-
-`Learner` is used to update the weights of neural network according to experience. `Learner` holds the DNN which is defined in `Policy` (the name of member variable in `Policy` match with the contains in `networks`), which is used to calculate the loss and update the weights of neural network.
-
-The following code defines the configuration of `DQNLearner` . Framework will create the instance of Learner like `DQNLearner(algorithm_config['learner'])`.
-
-```python
-from dqn.src.dqn import DQNLearner
-learner_params = {'gamma': 0.99,
- 'lr': 0.001, # learning rate
- }
-algorithm_config = {
- ...
- 'learner': {
- 'number': 1, # the number of Learner
- 'type': DQNLearner, # the class name of Learner
- 'params': learner_params, # the decay rate
- 'networks': ['policy_network', 'target_network'] # Learner takes the policy_network and target_network from DQNPolicy as input argument to update the network
- },
- ...
-}
-```
-
-| key | Type | Range | Description |
-| :------: | :------------: | :------------------------------------------------: | :----------------------------------------------------------: |
-| number | Integer | [1, +∞) | Number of Actor, currently only support 1 |
-| type | Class | The user-defined and implement subclass of learner | This type is the same name as the subclass of learner that is user-defined and implemented |
-| params(optional) | Dictionary | Any value with key value format or None | Customized parameter, user can input any value with key value format. |
-| networks | List of String | Same variable name as the user-defined network | Every string in list must match with networks' name which is user initialized in defined policy class |
diff --git a/docs/reinforcement/docs/source_en/dqn.md b/docs/reinforcement/docs/source_en/dqn.md
deleted file mode 100644
index 96b571353b205e6532aa090c7bff58fe5f56e902..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_en/dqn.md
+++ /dev/null
@@ -1,410 +0,0 @@
-# Deep Q Learning (DQN) with MindSpore Reinforcement
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/reinforcement/docs/source_en/dqn.md)
-
-## Summary
-
-To implement a reinforcement learning algorithm with MindSpore Reinforcement, a user needs to:
-
-- provide an algorithm configuration, which separates the implementation of the algorithm from its deployment details;
-- implement the algorithm based on an actor-learner-environment abstraction;
-- create a session object that executes the implemented algorithm.
-
-This tutorial shows the use of the MindSpore Reinforcement API to implement the Deep Q Learning (DQN) algorithm. Note that, for clarity and readability, only API-related code sections are presented, and irrelevant code is omitted. The source code of the full DQN implementation for MindSpore Reinforcement can be found [here](https://github.com/mindspore-lab/mindrl/tree/master/example/dqn).
-
-## Specifying the Actor-Learner-Environment Abstraction for DQN
-
-The DQN algorithm requires two deep neural networks, a *policy network* for approximating the action-value function (Q function) and a *target network* for stabilising the training. The policy network is the strategy on how to act on the environment, and the goal of the DQN algorithm is to train the policy network for maximum reward. In addition, the DQN algorithm uses an *experience replay* technique to maintain previous observations for off-policy learning, where an actor uses different behavioural policies to act on the environment.
-
-MindSpore Reinforcement uses an *algorithm configuration* to specify the logical components (Actor, Learner, Policy and Network, Collect Environment, Eval Environment, Replayuffer) required by the DQN algorithm and the associated hyperparameters. It can execute the algorithm with different strategies based on the provided configuration, which allows the user to focus on the algorithm design.
-
-The algorithm configuration is a Python dictionary that specifies how to construct different components of the DQN algorithm. The hyper-parameters of each component are configured in separate Python dictionaries. The DQN algorithm configuration can be defined as follows:
-
-```python
-algorithm_config = {
- 'actor': {
- 'number': 1, # Number of Actor
- 'type': DQNActor, # The Actor class
- 'policies': ['init_policy', 'collect_policy', 'evaluate_policy'], # The policy used to choose action
- },
- 'learner': {
- 'number': 1, # Number of Learner
- 'type': DQNLearner, # The Learner class
- 'params': learner_params, # The parameters of Learner
- 'networks': ['policy_network', 'target_network'] # The networks which is used by Learner
- },
- 'policy_and_network': {
- 'type': DQNPolicy, # The Policy class
- 'params': policy_params # The parameters of Policy
- },
- 'collect_environment': {
- 'number': 1, # Number of Collect Environment
- 'type': GymEnvironment, # The Collect Environment class
- 'params': collect_env_params # The parameters of Collect Environment
- },
- 'eval_environment': {
- 'number': 1, # Same as Collect Environment
- 'type': GymEnvironment,
- 'params': eval_env_params
- },
- 'replay_buffer': {'number': 1, # Number of ReplayBuffer
- 'type': ReplayBuffer, # The ReplayBuffer class
- 'capacity': 100000, # The capacity of ReplayBuffer
- 'data_shape': [(4,), (1,), (1,), (4,)], # Data shape of ReplayBuffer
- 'data_type': [ms.float32, ms.int32, ms.float32, ms.float32], # Data type off ReplayBuffer
- 'sample_size': 64}, # Sample size of ReplayBuffer
-}
-```
-
-The configuration defines six top-level entries, each corresponding to an algorithmic component: *actor, learner, policy*, *replaybuffer* and two *environment*s. Each entry corresponds to a class, which must be defined by the user to implement the DQN algorithm’s logic.
-
-A top-level entry has sub-entries that describe the component. The *number* entry defines the number of instances of the component used by the algorithm. The *type* entry refers to the name of the Python class that must be defined to implement the component. The *params* entry provides the necessary hyper-parameters for the component. The *policies* entry defines the policies used by the component. The *networks* in *earner* entry lists all neural networks used by this component. In the DQN example, only actors interact with the environment. The *reply_buffer* defines the *capacity, shape, sample size and data type* of the replay buffer.
-
-For the DQN algorithm, we configure one actor `'number': 1`, its Python class `'type': DQNActor`, and three behaviour policies `'policies': ['init_policy', 'collect_policy', 'evaluate_policy']`.
-
-Other components are defined in a similar way -- please refer to the [complete DQN code example](https://github.com/mindspore-lab/mindrl/tree/master/example/dqn) and the [MindSpore Reinforcement API documentation](https://www.mindspore.cn/reinforcement/docs/en/master/reinforcement.html) for more details.
-
-Note that MindSpore Reinforcement uses a single *policy* class to define all policies and neural networks used by the algorithm. In this way, it hides the complexity of data sharing and communication between policies and neural networks.
-
-In train.py, MindSpore Reinforcement executes the algorithm in the context of a *session*. A *Session* allocates resources (on one or more cluster machines) and executes the compiled computational graph. A user passes the algorithm configuration to instantiate a Session class:
-
-```python
-from mindspore_rl.core import Session
-dqn_session = Session(dqn_algorithm_config)
-```
-
-Invoke the `run` method and pass corresponding parameters to execute the DQN algorithm. *class_type* is user-defined Trainer class, which will be described later, episode is the iteration times of the algorithm, params are the parameters that is used in the trainer class. It is written in configuration file. For more detail, please check *config.py* file in the code example. Callbacks define some metrics methods. It is described more detailly in Callbacks part of API documentation.
-
-```python
-from src.dqn_trainer import DQNTrainer
-from mindspore_rl.utils.callback import CheckpointCallback, LossCallback, EvaluateCallback
-loss_cb = LossCallback()
-ckpt_cb = CheckpointCallback(50, config.trainer_params['ckpt_path'])
-eval_cb = EvaluateCallback(10)
-cbs = [loss_cb, ckpt_cb, eval_cb]
-dqn_session.run(class_type=DQNTrainer, episode=episode, params=config.trainer_params, callbacks=cbs)
-```
-
-To leverage MindSpore's computational graph feature, users set the execution mode to `GRAPH_MODE`.
-
-```python
-import mindspore as ms
-ms.set_context(mode=ms.GRAPH_MODE)
-```
-
-Methods that are annotated with `@jit` will be compiled into the MindSpore computational graph for auto-parallelisation and acceleration. In this tutorial, we use this feature to implement an efficient `DQNTrainer` class.
-
-### Defining the DQNTrainer Class
-
-The `DQNTrainer` class expresses how the algorithm runs. For example, iteratively collects experience through iteracting with environment and insert to *ReplayBuffer*, then obtain the data from *ReplayBuffer* to trains the targeted models. It must inherit from the `Trainer` class, which is part of the MindSpore Reinforcement API.
-
-The `Trainer` base class contains an `MSRL` (MindSpore Reinforcement) object, which allows the algorithm implementation to interact with MindSpore Reinforcement to implement the training logic. The `MSRL` class instantiates the RL algorithm components based on the previously defined algorithm configuration. It provides the function handlers that transparently bind to methods of actors, learners, or the replay buffer object, as defined by users. As a result, the `MSRL` class enables users to focus on the algorithm logic, while it transparently handles object creation, data sharing and communication between different algorithmic components on one or more workers. Users instantiate the `MSRL` object by creating the previously mentioned `Session` object with the algorithm configuration.
-
-The `DQNTrainer` must overload the `train_one_episode` for training, `evaluate` for evaluation and `trainable_varaible` for saving checkpoint. In this tutorial, it is defined as follows:
-
-```python
-class DQNTrainer(Trainer):
- def __init__(self, msrl, params):
- ...
- super(DQNTrainer, self).__init__(msrl)
-
- def trainable_variables(self):
- """Trainable variables for saving."""
- trainable_variables = {"policy_net": self.msrl.learner.policy_network}
- return trainable_variables
-
- @ms.jit
- def init_training(self):
- """Initialize training"""
- state = self.msrl.collect_environment.reset()
- done = self.false
- i = self.zero_value
- while self.less(i, self.fill_value):
- done, _, new_state, action, my_reward = self.msrl.agent_act(
- trainer.INIT, state)
- self.msrl.replay_buffer_insert(
- [state, action, my_reward, new_state])
- state = new_state
- if done:
- state = self.msrl.collect_environment.reset()
- done = self.false
- i += 1
- return done
-
- @ms.jit
- def evaluate(self):
- """Policy evaluate"""
- total_reward = self.zero_value
- eval_iter = self.zero_value
- while self.less(eval_iter, self.num_evaluate_episode):
- episode_reward = self.zero_value
- state = self.msrl.eval_environment.reset()
- done = self.false
- while not done:
- done, r, state = self.msrl.agent_act(trainer.EVAL, state)
- r = self.squeeze(r)
- episode_reward += r
- total_reward += episode_reward
- eval_iter += 1
- avg_reward = total_reward / self.num_evaluate_episode
- return avg_reward
-```
-
-User will call the `train` method in base class. It trains the models for the specified number of episodes (iterations), with each episode calling the user-defined `train_one_episode` method. Finally, the train method evaluates the policy to obtain a reward value by calling the `evaluate` method.
-
-In each iteration of the training loop, the `train_one_episode` method is invoked to train an episode:
-
-```python
-@ms.jit
-def train_one_episode(self):
- """Train one episode"""
- if not self.inited:
- self.init_training()
- self.inited = self.true
- state = self.msrl.collect_environment.reset()
- done = self.false
- total_reward = self.zero
- steps = self.zero
- loss = self.zero
- while not done:
- done, r, new_state, action, my_reward = self.msrl.agent_act(
- trainer.COLLECT, state)
- self.msrl.replay_buffer_insert(
- [state, action, my_reward, new_state])
- state = new_state
- r = self.squeeze(r)
- loss = self.msrl.agent_learn(self.msrl.replay_buffer_sample())
- total_reward += r
- steps += 1
- if not self.mod(steps, self.update_period):
- self.msrl.learner.update()
- return loss, total_reward, steps
-```
-
-The `@jit` annotation states that this method will be compiled into a MindSpore computational graph for acceleration. To support this, all scalar values must be defined as tensor types, e.g. `self.zero_value = Tensor(0, mindspore.float32)`.
-
-The `train_one_episode` method first calls the `reset` method of environment, `self.msrl.collect_environment.reset()` function to reset the environment. It then collects the experience from the environment with the `msrl.agent_act` function handler and inserts the experience data in the replay buffer using the `self.msrl.replay_buffer_insert` function. Afterwards, it invokes the `self.msrl.agent_learn` function to train the target model. The input of `self.msrl.agent_learn` is a set of sampled results returned by `self.msrl.replay_buffer_sample`.
-
-The replay buffer class, `ReplayBuffer`, is provided by MindSpore Reinforcement. It defines `insert` and `sample` methods to store and sample the experience data in a replay buffer, respectively. Please refer to the [complete DQN code example](https://github.com/mindspore-lab/mindrl/tree/master/example/dqn) for details.
-
-### Defining the DQNPolicy Class
-
-To implement the neural networks and define the policies, a user defines the `DQNPolicy` class:
-
-```python
-class DQNPolicy():
- def __init__(self, params):
- self.policy_network = FullyConnectedNet(
- params['state_space_dim'],
- params['hidden_size'],
- params['action_space_dim'],
- params['compute_type'])
- self.target_network = FullyConnectedNet(
- params['state_space_dim'],
- params['hidden_size'],
- params['action_space_dim'],
- params['compute_type'])
-```
-
-The constructor takes as input the previously-defined hyper-parameters of the Python dictionary type in config.py, `policy_params`.
-
-Before defining the policy network and the target network, users must define the structure of the neural networks using MindSpore operators. For example, they may be objects of the `FullyConnectedNetwork` class, which is defined as follows:
-
-```python
-class FullyConnectedNetwork(mindspore.nn.Cell):
- def __init__(self, input_size, hidden_size, output_size, compute_type=mstype.float32):
- super(FullyConnectedNet, self).__init__()
- self.linear1 = nn.Dense(
- input_size,
- hidden_size,
- weight_init="XavierUniform").to_float(compute_type)
- self.linear2 = nn.Dense(
- hidden_size,
- output_size,
- weight_init="XavierUniform").to_float(compute_type)
- self.relu = nn.ReLU()
-```
-
-The DQN algorithm uses a loss function to optimize the weights of the neural networks. At this point, a user must define a neural network used to compute the loss function. This network is specified as a nested class of `DQNLearner`. In addition, an optimizer is required to train the network. The optimizer and the loss function are defined as follows:
-
-```python
-class DQNLearner(Learner):
- """DQN Learner"""
-
- class PolicyNetWithLossCell(nn.Cell):
- """DQN policy network with loss cell"""
-
- def __init__(self, backbone, loss_fn):
- super(DQNLearner.PolicyNetWithLossCell,
- self).__init__(auto_prefix=False)
- self._backbone = backbone
- self._loss_fn = loss_fn
- self.gather = P.GatherD()
-
- def construct(self, x, a0, label):
- """constructor for Loss Cell"""
- out = self._backbone(x)
- out = self.gather(out, 1, a0)
- loss = self._loss_fn(out, label)
- return loss
- def __init__(self, params=None):
- super(DQNLearner, self).__init__()
- ...
- optimizer = nn.Adam(
- self.policy_network.trainable_params(),
- learning_rate=params['lr'])
- loss_fn = nn.MSELoss()
- loss_q_net = self.PolicyNetWithLossCell(self.policy_network, loss_fn)
- self.policy_network_train = nn.TrainOneStepCell(loss_q_net, optimizer)
- self.policy_network_train.set_train(mode=True)
- ...
-```
-
-The DQN algorithm is an *off-policy* algorithm that learns using a epsilon-greedy policy. It uses different behavioural policies for acting on the environment and collecting data. In this example, we use the `RandomPolicy` to initialize the training, the `EpsilonGreedyPolicy` to collect the experience during the training, and the `GreedyPolicy` to evaluate:
-
-```python
-class DQNPolicy():
- def __init__(self, params):
- ...
- self.init_policy = RandomPolicy(params['action_space_dim'])
- self.collect_policy = EpsilonGreedyPolicy(self.policy_network, (1, 1), params['epsi_high'],
- params['epsi_low'], params['decay'], params['action_space_dim'])
- self.evaluate_policy = GreedyPolicy(self.policy_network)
-```
-
-Since the above three behavioural policies are common for a range of RL algorithms, MindSpore Reinforcement provides them as reusable building blocks. Users may also define their own algorithm-specific behavioural policies.
-
-Note that the names of the methods and the keys of the parameter dictionary must be consistent with the algorithm configuration defined earlier.
-
-### Defining the DQNActor Class
-
-To implement the `DQNActor`, a user defines a new actor component that inherits from the `Actor` class provided by MindSpore Reinforcement. They must then overload the methods in `Actor` class:
-
-```python
-class DQNActor(Actor):
- ...
- def act(self, phase, params):
- if phase == 1:
- # Fill the replay buffer
- action = self.init_policy()
- new_state, reward, done = self._environment.step(action)
- action = self.reshape(action, (1,))
- my_reward = self.select(done, self.penalty, self.reward)
- return done, reward, new_state, action, my_reward
- if phase == 2:
- # Experience collection
- self.step += 1
- ts0 = self.expand_dims(params, 0)
- step_tensor = self.ones((1, 1), ms.float32) * self.step
-
- action = self.collect_policy(ts0, step_tensor)
- new_state, reward, done = self._environment.step(action)
- action = self.reshape(action, (1,))
- my_reward = self.select(done, self.penalty, self.reward)
- return done, reward, new_state, action, my_reward
- if phase == 3:
- # Evaluate the trained policy
- ts0 = self.expand_dims(params, 0)
- action = self.evaluate_policy(ts0)
- new_state, reward, done = self._eval_env.step(action)
- return done, reward, new_state
- self.print("Phase is incorrect")
- return 0
-```
-
-The three methods act on the specified environment with different policies, which map states to actions. The methods take as input a tensor-typed value and return the trajectory from the environment.
-
-To interact with the environment, the Actor uses the `step(action)` method defined in the `Environment` class. For an action applied to the specified environment, this method reacts and returns a ternary. The ternary includes the new state after applying the previous action, the reward obtained as a floating-point type, and boolean flags for terminating the episode and resetting the environment.
-
-`ReplayBuffer` defines an `insert` method, which is called by the `DQNActor` object to store the experience data in the playback buffer.
-
-`Environment` class and `ReplayBuffer` class are provided by MindSpore Reinforcement API.
-
-The constructor of the `DQNActor` class defines the environment, the reply buffer, the polices, and the networks. It takes as input the dictionary-typed parameters, which were defined in the algorithm configuration. Below, we only show the initialisation of the environment, other attributes are assigned in the similar way:
-
-```python
-class DQNActor(Actor):
- def __init__(self, params):
- self._environment = params['collect_environment']
- self._eval_env = params['eval_environment']
- ...
-```
-
-### Defining the DQNLearner Class
-
-To implement the `DQNLearner`, a class must inherit from the `Learner` class in the MindSpore Reinforcement API and overload the `learn` method:
-
-```python
-class DQNLearner(Learner):
- ...
- def learn(self, experience):
- """Model update"""
- s0, a0, r1, s1 = experience
- next_state_values = self.target_network(s1)
- next_state_values = next_state_values.max(axis=1)
- r1 = self.reshape(r1, (-1,))
-
- y_true = r1 + self.gamma * next_state_values
-
- # Modify last step reward
- one = self.ones_like(r1)
- y_true = self.select(r1 == -one, one, y_true)
- y_true = self.expand_dims(y_true, 1)
-
- success = self.policy_network_train(s0, a0, y_true)
- return success
-```
-
-Here, the `learn` method takes as input the trajectory (sampled from a reply buffer) to train the policy network. The constructor assigns the network, the policy, and the discount rate to the DQNLearner, by receiving a dictionary-typed configuration from the algorithm configuration:
-
-```python
-class DQNLearner(Learner):
- def __init__(self, params=None):
- super(DQNLearner, self).__init__()
- self.policy_network = params['policy_network']
- self.target_network = params['target_network']
-```
-
-## Executing and Viewing Results
-
-Execute script `train.py` to start DQN model training.
-
-```python
-cd example/dqn/
-python train.py
-```
-
-The execution results are shown below:
-
-```text
------------------------------------------
-Evaluation result in episode 0 is 95.300
------------------------------------------
-Episode 0, steps: 33.0, reward: 33.000
-Episode 1, steps: 45.0, reward: 12.000
-Episode 2, steps: 54.0, reward: 9.000
-Episode 3, steps: 64.0, reward: 10.000
-Episode 4, steps: 73.0, reward: 9.000
-Episode 5, steps: 82.0, reward: 9.000
-Episode 6, steps: 91.0, reward: 9.000
-Episode 7, steps: 100.0, reward: 9.000
-Episode 8, steps: 109.0, reward: 9.000
-Episode 9, steps: 118.0, reward: 9.000
-...
-...
-Episode 200, steps: 25540.0, reward: 200.000
-Episode 201, steps: 25740.0, reward: 200.000
-Episode 202, steps: 25940.0, reward: 200.000
-Episode 203, steps: 26140.0, reward: 200.000
-Episode 204, steps: 26340.0, reward: 200.000
-Episode 205, steps: 26518.0, reward: 178.000
-Episode 206, steps: 26718.0, reward: 200.000
-Episode 207, steps: 26890.0, reward: 172.000
-Episode 208, steps: 27090.0, reward: 200.000
-Episode 209, steps: 27290.0, reward: 200.000
------------------------------------------
-Evaluation result in episode 210 is 200.000
------------------------------------------
-```
-
-
diff --git a/docs/reinforcement/docs/source_en/environment.md b/docs/reinforcement/docs/source_en/environment.md
deleted file mode 100644
index 6985999b8c27383c8ebb83dfb5cb6fa4c28360f8..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_en/environment.md
+++ /dev/null
@@ -1,80 +0,0 @@
-# Reinforcement Learning Environment Access
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/reinforcement/docs/source_en/environment.md)
-
-## Overview
-
-In the field of reinforcement learning, learning strategy maximizes numerical gain signals during interaction between an intelligent body and its environment. The "environment" is an important element in the field of reinforcement learning as a problem to be solved.
-
-A wide variety of environments are currently used for reinforcement learning: [Mujoco](https://github.com/deepmind/mujoco), [MPE](https://github.com/openai/multiagent-particle-envs), [Atari]( https://github.com/gsurma/atari), [PySC2](https://www.github.com/deepmind/pysc2), [SMAC](https://github/oxwhirl/smac), [TORCS](https: //github.com/ugo-nama-kun/gym_torcs), [Isaac](https://github.com/NVIDIA-Omniverse/IsaacGymEnvs), etc. Currently MindSpore Reinforcement is connected to two environments Gym and SMAC, and will gradually access more environments with the subsequent enrichment of algorithms. In this article, we will introduce how to access the third-party environment under MindSpore Reinforcement.
-
-## Encapsulating Environmental Python Functions as Operators
-
-Before that, introduce the static and dynamic graph modes.
-
-- In dynamic graph mode, the program is executed line by line in the order in which the code is written, and the compiler sends down the individual operators in the neural network to the device one by one for computation operations, making it easy for the user to write and debug the neural network model.
-
-- In static graph mode, the program compiles the developer-defined algorithm into a computation graph when the program is compiled for execution. In the process, the compiler can reduce resource overhead to obtain better execution performance by using graph optimization techniques.
-
-Since the syntax supported by the static graph mode is a subset of the Python language, and commonly-used environments generally use the Python interface to implement interactions. The syntax differences between the two often result in graph compilation errors. For this problem, developers can use the `PyFunc` operator to encapsulate a Python function as an operator in a MindSpore computation graph.
-
-Next, using gym as an example, encapsulate `env.reset()` as an operator in a MindSpore computation graph.
-
-The following code creates a `CartPole-v0` environment and executes the `env.reset()` method. You can see that the type of `state` is `numpy.ndarray`, and the data type and dimension are `np.float64` and `(4,)` respectively.
-
-```python
-import gym
-
-env = gym.make('CartPole-v0')
-state = env.reset()
-print('type: {}, shape: {}, dtype: {}'.format(type(state), state.dtype, state.shape))
-
-# Result:
-# type: , shape: (4,), dtype: float64
-```
-
-`env.reset()` is encapsulated into a MindSpore operator by using the `PyFunc` operator.
-
-- `fn` specifies the name of the Python function to be encapsulated, either as a normal function or as a member function.
-- `in_types` and `in_shapes` specify the input data types and dimensions. `env.reset` has no input, so it fills in an empty list.
-- `out_types`, `out_shapes` specify the data types and dimensions of the returned values. From the previous execution, it can be seen that `env.reset()` returns a numpy array with data type and dimension `np.float64` and `(4,)` respectively, so `[ms.float64,]` and `[(4,),]` are filled in.
-- `PyFunc` returns tuple(Tensor).
-- For more detailed instructions, refer to the [reference](https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/ops/operations/other_ops.py).
-
-## Decoupling Environment and Algorithms
-
-Reinforcement learning algorithms should usually have good generalization, e.g., an algorithm that solves `HalfCheetah` should also be able to solve `Pendulum`. In order to implement the generalization, it is necessary to decouple the environment from the rest of the algorithm, thus ensuring that the rest of the script is modified as little as possible after changing the environment. It is recommended that developers refer to `Environment` to encapsulate the environment.
-
-```python
-class Environment(nn.Cell):
- def __init__(self):
- super(Environment, self).__init__(auto_prefix=False)
-
- def reset(self):
- pass
-
- def step(self, action):
- pass
-
- @property
- def action_space(self) -> Space:
- pass
-
- @property
- def observation_space(self) -> Space:
- pass
-
- @property
- def reward_space(self) -> Space:
- pass
-
- @property
- def done_space(self) -> Space:
- pass
-```
-
-`Environment` needs to provide methods such as `action_space` and `observation_space`, in addition to interfaces for interacting with the environment, such as `reset` and `step`, which return [Space](https://mindspore.cn/reinforcement/docs/en/master/reinforcement.html#mindspore_rl.environment.Space) type. The algorithm can achieve the following operations based on the `Space` information:
-
-- obtain the dimensions of the state space and action space in the environment, which used to construct the neural network.
-- read the range of legal actions, and scale and crop the actions given by the policy network.
-- Identify whether the action space of the environment is discrete or continuous, and choose whether to explore the environment by using a continuous or discrete distribution.
diff --git a/docs/reinforcement/docs/source_en/index.rst b/docs/reinforcement/docs/source_en/index.rst
deleted file mode 100644
index f6bc99f58375b1f8f8360580c62b1266421beb33..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_en/index.rst
+++ /dev/null
@@ -1,69 +0,0 @@
-MindSpore Reinforcement Documents
-===================================
-
-MindSpore Reinforcement is a reinforcement learning suite, which supports distributed training of agents by using reinforcement learning algorithms.
-
-MindSpore Reinforcement offers a clean API abstraction for writing reinforcement learning algorithms, which decouples the algorithm from deployment and execution considerations, including the use of accelerators, the level of parallelism and the distribution of computation across a cluster of workers. MindSpore Reinforcement translates the reinforcement learning algorithm into a series of compiled computational graphs, which are then run efficiently by the MindSpore framework on CPUs, GPUs, and Ascend AI processors.
-
-.. raw:: html
-
-
-
-Code repository address:
-
-Unique Design Features
------------------------
-
-1. Offers an algorithmic-centric API for writing reinforcement learning algorithms
-
- In MindSpore Reinforcement, users describe reinforcement algorithms in Python in terms of intuitive algorithmic concepts, such as agents, actors, environments, and learners. Agents contain actors that interact with an environment and collect rewards. Based on the rewards, learners update policies that govern the behaviour of actors. Users can focus on the implementation of their algorithm without the framework getting in their way.
-
-2. Decouples reinforcement learning algorithms from their execution strategy
-
- The API exposed by MindSpore Reinforcement for implementing algorithms makes no assumptions about how the algorithm will be executed. MindSpore Reinforcement can therefore execute the same algorithm on a single laptop with one GPU and on a cluster of machines with many GPUs. Users provide a separate execution configuration, which describes the resources that MindSpore Reinforcement can use for training.
-
-3. Accelerates reinforcement learning algorithms efficiently
-
- MindSpore Reinforcement is designed to speed up the training of reinforcement learning algorithms by executing the computation on hardware accelerators, such as GPUs or Ascend AI processors. It not only accelerates the neural network computation, but it also translates the logic of actors and learners to computational graphs with parallelizable operators. These computational graphs are executed by the MindSpore framework, taking advantage of its compilation and auto-parallelisation features.
-
-Future Roadmap
----------------
-
-- This initial release of MindSpore Reinforcement contains a stable API for implementing reinforcement learning algorithms and executing computation using MindSpore’s computational graphs. Now it supports semi-automatic distributed execution of algorithms and multi-agent, but does not support fully automatic distributed capabilities yet. These features will be included in the subsequent version of MindSpore Reinforcement. Please look forward to it.
-
-Typical MindSpore Reinforcement Application Scenarios
-------------------------------------------------------
-
-- `Train a deep Q network `_
-
- The DQN algorithm uses an experience replay technique to maintain previous observations for off-policylearning.
-
-.. toctree::
- :glob:
- :maxdepth: 1
- :caption: Installation
-
- reinforcement_install
-
-.. toctree::
- :glob:
- :maxdepth: 1
- :caption: Guide
-
- custom_config_info
- dqn
- replaybuffer
- environment
-
-.. toctree::
- :maxdepth: 1
- :caption: API References
-
- reinforcement
-
-.. toctree::
- :glob:
- :maxdepth: 1
- :caption: RELEASE NOTES
-
- RELEASE
\ No newline at end of file
diff --git a/docs/reinforcement/docs/source_en/reinforcement_install.md b/docs/reinforcement/docs/source_en/reinforcement_install.md
deleted file mode 100644
index f25867d5da5782544c249528191df8e19647d0a1..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_en/reinforcement_install.md
+++ /dev/null
@@ -1,37 +0,0 @@
-# MindSpore Reinforcement Installation
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/reinforcement/docs/source_en/reinforcement_install.md)
-
-MindSpore Reinforcement depends on the MindSpore training and inference framework. Therefore, install [MindSpore](https://gitee.com/mindspore/mindspore#安装) and then MindSpore Reinforcement. You can install MindSpore Reinforcement either by pip or by source code.
-
-## Installation by pip
-
-If use the pip command, download the .whl package from the [MindSpore Reinforcement page](https://www.mindspore.cn/versions/en) and install it.
-
-```shell
-pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/{ms_version}/Reinforcement/any/mindspore_rl-{mr_version}-py3-none-any.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple
-```
-
-> - When the network is connected, dependency items are automatically downloaded during .whl package installation. (For details about other dependency items, see requirements.txt). In other cases, you need to manually install dependency items.
-> - `{ms_version}` refers to the MindSpore version that matches with MindSpore Reinforcement. For example, if you want to install MindSpore Reinforcement 0.1.0, then `{ms_version}` should be 1.5.0.
-> - `{mr_version}` refers to the version of MindSpore Reinforcement. For example, when you are downloading MindSpore Reinforcement 0.1.0, `{mr_version}` should be 0.1.0.
-
-## Installation by Source Code
-
-Download [source code](https://github.com/mindspore-lab/mindrl), and enter `reinforcement` directory.
-
-```shell
-bash build.sh
-pip install output/mindspore_rl-0.1.0-py3-none-any.whl
-```
-
-The `build.sh` is the compile script under the `reinforcement` directory.
-
-## Installation Verification
-
-Execute the following command. If it prompts the following information, the installation is successful:
-
-```python
-import mindspore_rl
-```
-
diff --git a/docs/reinforcement/docs/source_en/replaybuffer.md b/docs/reinforcement/docs/source_en/replaybuffer.md
deleted file mode 100644
index 6572bc8747ec48dc11cdc4887a1f472aee5ca058..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_en/replaybuffer.md
+++ /dev/null
@@ -1,138 +0,0 @@
-# ReplayBuffer Usage Introduction
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/reinforcement/docs/source_en/replaybuffer.md)
-
-## Brief Introduction of ReplayBuffer
-
-In reinforcement learning, ReplayBuffer is a common basic data storage method, whose functions is to store the data obtained from the interaction of an intelligent body with its environment.
-
-Solve the following problems by using ReplayBuffer:
-
-1. Stored historical data can be extracted by sampling to break the correlation of training data, so that the sampled data have independent and identically distributed characteristics.
-2. Provide temporary storage of data and improve the utilization of data.
-
-## ReplayBuffer Implementation of MindSpore Reinforcement Learning
-
-Typically, algorithms people use native Python data structures or Numpy data structures to construct ReplayBuffer, or general reinforcement learning frameworks also provide standard API encapsulation. The difference is that MindSpore implements the ReplayBuffer structure on the device side. On the one hand, the structure can reduce the frequent copying of data between Host and Device when using GPU hardware, and on the other hand, expressing the ReplayBuffer in the form of MindSpore operator can build a complete IR graph and enable various graph optimizations of MindSpore GRAPH_MODE to improve the overall performance.
-
-In MindSpore, two kinds of ReplayBuffer are provided, UniformReplayBuffer and PriorityReplayBuffer, which are used for common FIFO storage and storage with priority, respectively. The following is an example of UniformReplayBuffer implementation and usage.
-
-ReplayBuffer is represented as a List of Tensors, and each Tensor represents a set of data stored by column (e.g., a set of [state, action, reward]). The data that is newly put into the UniformReplayBuffer is updated in a FIFO mechanism with insert, search, and sample functions.
-
-### Parameter Explanation
-
-Create a UniformReplayBuffer with the initialization parameters batch_size, capacity, shapes, and types.
-
-* batch_size indicates the size of the data at a time for sample, an integer value.
-* capacity indicates the total capacity of the created UniformReplayBuffer, an integer value.
-* shapes indicates the shape size of each set of data in Buffer, expressed as a list.
-* types indicates the data type corresponding to each set of data in the Buffer, represented as a list.
-
-### Functions Introduction
-
-#### 1 Insert
-
-The insert method takes a set of data as input, and needs to satisfy that the shape and type of the data are the same as the created UniformReplayBuffer parameters. No output.
-To simulate the FIFO characteristics of a circular queue, we use two cursors to determine the head and effective length count of the queue. The following figure shows the process of several insertion operations.
-
-1. The total size of the buffer is 6. In the initial state, the cursor head and count are both 0.
-2. After inserting a batch_size of 2, the current head is unchanged and count is added by 2.
-3. After continuing to insert a batch_size of 4, the queue is full and the count is 6.
-4. After continuing to insert a batch_size of 2, overwrite updates the old data and adds 2 to the head.
-
-
-
-#### 2 Search
-
-The search method accepts an index as an input, indicating the specific location of the data to be found. The output is a set of Tensor, as shown in the following figure:
-
-1. If the UniformReplayBuffer is just full or not full, the corresponding data is found directly according to the index.
-2. For data that has been overwritten, remap it by cursors.
-
-
-
-#### 3 Sample
-
-The sampling method has no input and the output is a set of Tensor with the size of the batch_size when the UniformReplayBuffer is created. This is shown in the following figure:
-Assuming that batch_size is 3, a random set of indexes will be generated in the operator, and this random set of indexes has two cases:
-
-1. Order preserving: each index means the real data position, which needs to be remapped by cursor operation.
-2. No order preserving: each index does not represent the real position and is obtained directly.
-
-Both approaches have a slight impact on randomness, and the default is to use no order preserving to get the best performance.
-
-
-
-## UniformReplayBuffer Introduction of MindSpore Reinforcement Learning
-
-### Creation of UniformReplayBuffer
-
-MindSpore Reinforcement Learning provides a standard ReplayBuffer API. The user can use the ReplayBuffer created by the framework by means of a configuration file, shaped like the configuration file of [dqn](https://github.com/mindspore-lab/mindrl/tree/master/mindspore_rl/algorithm/dqn/config.py).
-
-```python
-'replay_buffer':
- {'number': 1,
- 'type': UniformReplayBuffer,
- 'capacity': 100000,
- 'data_shape': [(4,), (1,), (1,), (4,)],
- 'data_type': [ms.float32, ms.int32, ms.foat32, ms.float32],
- 'sample_size': 64}
-```
-
-Alternatively, users can use the interfaces directly to create the required data structures:
-
-```python
-from mindspore_rl.core.uniform_replay_buffer import UniformReplayBuffer
-import mindspore as ms
-sample_size = 2
-capacity = 100000
-shapes = [(4,), (1,), (1,), (4,)]
-types = [ms.float32, ms.int32, ms.float32, ms.float32]
-replaybuffer = UniformReplayBuffer(sample_size, capacity, shapes, types)
-```
-
-### Using the Created UniformReplayBuffer
-
-Take [UniformReplayBuffer](https://github.com/mindspore-lab/mindrl/tree/master/mindspore_rl/core/uniform_replay_buffer.py) created in the form of an API to perform data manipulation as an example:
-
-* Insert operation
-
-```python
-state = ms.Tensor([0.1, 0.2, 0.3, 0.4], ms.float32)
-action = ms.Tensor([1], ms.int32)
-reward = ms.Tensor([1], ms.float32)
-new_state = ms.Tensor([0.4, 0.3, 0.2, 0.1], ms.float32)
-replaybuffer.insert([state, action, reward, new_state])
-replaybuffer.insert([state, action, reward, new_state])
-```
-
-* Search operation
-
-```python
-exp = replaybuffer.get_item(0)
-```
-
-* Sample operation
-
-```python
-samples = replaybuffer.sample()
-```
-
-* Reset operation
-
-```python
-replaybuffer.reset()
-```
-
-* The size of the current buffer used
-
-```python
-size = replaybuffer.size()
-```
-
-* Determine if the current buffer is full
-
-```python
-if replaybuffer.full():
- print("Full use of this buffer.")
-```
diff --git a/docs/reinforcement/docs/source_zh_cn/conf.py b/docs/reinforcement/docs/source_zh_cn/conf.py
deleted file mode 100644
index a4b34476bddd12f5ad92aca67a5e56b87a0c781e..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_zh_cn/conf.py
+++ /dev/null
@@ -1,243 +0,0 @@
-# Configuration file for the Sphinx documentation builder.
-#
-# This file only contains a selection of the most common options. For a full
-# list see the documentation:
-# https://www.sphinx-doc.org/en/master/usage/configuration.html
-
-# -- Path setup --------------------------------------------------------------
-
-# If extensions (or modules to document with autodoc) are in another directory,
-# add these directories to sys.path here. If the directory is relative to the
-# documentation root, use os.path.abspath to make it absolute, like shown here.
-#
-import os
-import sys
-import IPython
-import re
-sys.path.append(os.path.abspath('../_ext'))
-from sphinx.ext import autodoc as sphinx_autodoc
-
-
-# -- Project information -----------------------------------------------------
-
-project = 'MindSpore Reinforcement'
-copyright = 'MindSpore'
-author = 'MindSpore'
-
-# The full version, including alpha/beta/rc tags
-release = 'master'
-
-
-# -- General configuration ---------------------------------------------------
-
-# Add any Sphinx extension module names here, as strings. They can be
-# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
-# ones.
-myst_enable_extensions = ["dollarmath", "amsmath"]
-
-
-myst_heading_anchors = 5
-extensions = [
- 'sphinx.ext.autodoc',
- 'sphinx.ext.doctest',
- 'sphinx.ext.intersphinx',
- 'sphinx.ext.todo',
- 'sphinx.ext.coverage',
- 'sphinx.ext.napoleon',
- 'sphinx.ext.viewcode',
- 'myst_parser',
- 'sphinx.ext.mathjax',
- 'IPython.sphinxext.ipython_console_highlighting'
-]
-
-source_suffix = {
- '.rst': 'restructuredtext',
- '.md': 'markdown',
-}
-
-# List of patterns, relative to source directory, that match files and
-# directories to ignore when looking for source files.
-# This pattern also affects html_static_path and html_extra_path.
-mathjax_path = 'https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/mathjax/MathJax-3.2.2/es5/tex-mml-chtml.js'
-
-mathjax_options = {
- 'async':'async'
-}
-
-smartquotes_action = 'De'
-
-exclude_patterns = []
-
-pygments_style = 'sphinx'
-
-# -- Options for HTML output -------------------------------------------------
-
-# Reconstruction of sphinx auto generated document translation.
-language = 'zh_CN'
-locale_dirs = ['../../../../resource/locale/']
-gettext_compact = False
-
-# The theme to use for HTML and HTML Help pages. See the documentation for
-# a list of builtin themes.
-#
-html_theme = 'sphinx_rtd_theme'
-
-html_search_language = 'zh'
-
-html_search_options = {'dict': '../../../resource/jieba.txt'}
-
-# Example configuration for intersphinx: refer to the Python standard library.
-intersphinx_mapping = {
- 'python': ('https://docs.python.org/3', '../../../../resource/python_objects.inv'),
-}
-
-from sphinx import directives
-with open('../_ext/overwriteobjectiondirective.txt', 'r', encoding="utf8") as f:
- exec(f.read(), directives.__dict__)
-
-from sphinx.ext import viewcode
-with open('../_ext/overwriteviewcode.txt', 'r', encoding="utf8") as f:
- exec(f.read(), viewcode.__dict__)
-
-# Modify default signatures for autodoc.
-autodoc_source_path = os.path.abspath(sphinx_autodoc.__file__)
-autodoc_source_re = re.compile(r'stringify_signature\(.*?\)')
-get_param_func_str = r"""\
-import re
-import inspect as inspect_
-
-def get_param_func(func):
- try:
- source_code = inspect_.getsource(func)
- if func.__doc__:
- source_code = source_code.replace(func.__doc__, '')
- all_params_str = re.findall(r"def [\w_\d\-]+\(([\S\s]*?)(\):|\) ->.*?:)", source_code)
- all_params = re.sub("(self|cls)(,|, )?", '', all_params_str[0][0].replace("\n", "").replace("'", "\""))
- return all_params
- except:
- return ''
-
-def get_obj(obj):
- if isinstance(obj, type):
- return obj.__init__
-
- return obj
-"""
-
-with open(autodoc_source_path, "r+", encoding="utf8") as f:
- code_str = f.read()
- code_str = autodoc_source_re.sub('"(" + get_param_func(get_obj(self.object)) + ")"', code_str, count=0)
- exec(get_param_func_str, sphinx_autodoc.__dict__)
- exec(code_str, sphinx_autodoc.__dict__)
-
-with open("../_ext/customdocumenter.txt", "r", encoding="utf8") as f:
- code_str = f.read()
- exec(code_str, sphinx_autodoc.__dict__)
-
-# Copy source files of chinese python api from reinforcement repository.
-from sphinx.util import logging
-import shutil
-logger = logging.getLogger(__name__)
-
-copy_path = 'docs/api/api_python'
-src_dir = os.path.join(os.getenv("RM_PATH"), copy_path)
-
-copy_list = []
-
-present_path = os.path.dirname(__file__)
-
-for i in os.listdir(src_dir):
- if os.path.isfile(os.path.join(src_dir,i)):
- if os.path.exists('./'+i):
- os.remove('./'+i)
- shutil.copy(os.path.join(src_dir,i),'./'+i)
- copy_list.append(os.path.join(present_path,i))
- else:
- if os.path.exists('./'+i):
- shutil.rmtree('./'+i)
- shutil.copytree(os.path.join(src_dir,i),'./'+i)
- copy_list.append(os.path.join(present_path,i))
-
-# Rename .rst file to .txt file for include directive.
-from rename_include import rename_include
-
-rename_include(present_path)
-
-# add view
-import json
-
-if os.path.exists('../../../../tools/generate_html/version.json'):
- with open('../../../../tools/generate_html/version.json', 'r+', encoding='utf-8') as f:
- version_inf = json.load(f)
-elif os.path.exists('../../../../tools/generate_html/daily_dev.json'):
- with open('../../../../tools/generate_html/daily_dev.json', 'r+', encoding='utf-8') as f:
- version_inf = json.load(f)
-elif os.path.exists('../../../../tools/generate_html/daily.json'):
- with open('../../../../tools/generate_html/daily.json', 'r+', encoding='utf-8') as f:
- version_inf = json.load(f)
-
-if os.getenv("RM_PATH").split('/')[-1]:
- copy_repo = os.getenv("RM_PATH").split('/')[-1]
-else:
- copy_repo = os.getenv("RM_PATH").split('/')[-2]
-
-branch = [version_inf[i]['branch'] for i in range(len(version_inf))
- if version_inf[i]['name'] == copy_repo.replace('mindrl', 'reinforcement')][0]
-docs_branch = [version_inf[i]['branch'] for i in range(len(version_inf)) if version_inf[i]['name'] == 'tutorials'][0]
-
-re_view = f"\n.. image:: https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/{docs_branch}/" + \
- f"resource/_static/logo_github_source.svg\n :target: https://github.com/mindspore-lab/{copy_repo}/blob/{branch}/"
-
-for cur, _, files in os.walk(present_path):
- for i in files:
- flag_copy = 0
- if i.endswith('.rst'):
- for j in copy_list:
- if j in cur:
- flag_copy = 1
- break
- if os.path.join(cur, i) in copy_list or flag_copy:
- try:
- with open(os.path.join(cur, i), 'r+', encoding='utf-8') as f:
- content = f.read()
- new_content = content
- if '.. include::' in content and '.. automodule::' in content:
- continue
- if 'autosummary::' not in content and "\n=====" in content:
- re_view_ = re_view + copy_path + cur.split(present_path)[-1] + '/' + i + \
- '\n :alt: 查看源文件\n\n'
- new_content = re.sub('([=]{5,})\n', r'\1\n' + re_view_, content, 1)
- if new_content != content:
- f.seek(0)
- f.truncate()
- f.write(new_content)
- except Exception:
- print(f'打开{i}文件失败')
-
-import mindspore_rl
-
-sys.path.append(os.path.abspath('../../../../resource/sphinx_ext'))
-# import anchor_mod
-import nbsphinx_mod
-
-sys.path.append(os.path.abspath('../../../../resource/search'))
-import search_code
-
-sys.path.append(os.path.abspath('../../../../resource/custom_directives'))
-from custom_directives import IncludeCodeDirective
-
-def setup(app):
- app.add_directive('includecode', IncludeCodeDirective)
-
-src_release = os.path.join(os.getenv("RM_PATH"), 'RELEASE_CN.md')
-des_release = "./RELEASE.md"
-with open(src_release, "r", encoding="utf-8") as f:
- data = f.read()
-if len(re.findall("\n## (.*?)\n",data)) > 1:
- content = re.findall("(## [\s\S\n]*?)\n## ", data)
-else:
- content = re.findall("(## [\s\S\n]*)", data)
-#result = content[0].replace('# MindSpore', '#', 1)
-with open(des_release, "w", encoding="utf-8") as p:
- p.write("# Release Notes"+"\n\n")
- p.write(content[0])
\ No newline at end of file
diff --git a/docs/reinforcement/docs/source_zh_cn/custom_config_info.md b/docs/reinforcement/docs/source_zh_cn/custom_config_info.md
deleted file mode 100644
index df27d9b33fb09cedd65c7a19ecb14708ee02f6cf..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_zh_cn/custom_config_info.md
+++ /dev/null
@@ -1,194 +0,0 @@
-# 强化学习配置说明
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/reinforcement/docs/source_zh_cn/custom_config_info.md)
-
-
-## 概述
-
-深度强化学习作为当前发展最快的方向之一,新算法层出不穷。MindSpore Reinforcement将强化学习算法建模为Actor、Learner、Policy、Environment、ReplayBuffer等对象,从而提供易扩展、高重用的强化学习框架。与此同时,深度强化学习算法相对复杂,网络训练效果受到众多参数影响,MindSpore Reinforcement提供了集中的参数配置接口,将算法实现和部署细节进行分离,同时便于用户快速调整模型算法。
-
-本文以DQN算法为例介绍如何使用MindSpore Reinforcement算法和训练参数配置接口,帮助用户快速定制和调整强化学习算法。
-
-您可以从[https://github.com/mindspore-lab/mindrl/tree/master/example/dqn](https://github.com/mindspore-lab/mindrl/tree/master/example/dqn)获取DQN算法代码。
-
-## 算法相关参数配置
-
-MindSpore-RL使用`algorithm_config`定义逻辑组件和相应的超参配置。`algorithm_config`是一个Python字典,分别描述actor、learner、policy_and_network、collect_environment、eval_environment和replaybuffer。框架可以基于配置执行算法,用户仅需聚焦算法设计。
-
-下述代码定义了一组算法配置,并使用algorithm_config创建`Session`,`Session`负责分配资源并执行计算图编译和执行。
-
-```python
-from mindspore_rl.mindspore_rl import Session
-algorithm_config = {
- 'actor': {...},
- 'learner': {...},
- 'policy_and_network': {...},
- 'collect_environment': {...},
- 'eval_environment': {...},
- 'replay_buffer': {...}
-}
-
-session = Session(algorithm_config)
-session.run(...)
-```
-
-下文将详细介绍algorithm_config中各个参数含义及使用方式。
-
-### Policy配置参数
-
-Policy通常用于智能体决策下一步需要执行的行为,算法中需要policy类型名`type`和参数`params`:
-
-- `type`:指定Policy的类型,Actor通过Policy决策应该采取的动作。在深度强化学习中,Policy通常采用深度神经网络提取环境特征,并输出下一步采取的动作。
-- `params`:指定实例化相应Policy的参数。这里需要注意的是,`params`和`type`需要匹配。
-
-以下样例中定义策略和参数配置,Policy是由用户定义的`DQNPolicy`,并指定了epsilon greedy衰减参数,学习率,网络模型隐层等参数,框架会采用`DQNPolicy(policy_params)`方式创建Policy对象。
-
-```python
-from dqn.src.dqn import DQNPolicy
-
-policy_params = {
- 'epsi_high': 0.1, # epsi_high/epsi_low/decay共同控制探索-利用比例
- 'epsi_low': 0.1, # epsi_high:最大探索比例,epsi_low:最低探索比例,decay:衰减步长
- 'decay': 200,
- 'state_space_dim': 0, # 状态空间维度大小,0表示从外部环境中读取状态空间信息
- 'action_space_dim': 0, # 动作空间维度大小,0表示从外部环境中获取动作空间信息
- 'hidden_size': 100, # 隐层维度
-}
-
-algorithm_config = {
- ...
- 'policy_and_network': {
- 'type': DQNPolicy,
- 'params': policy_params,
- },
- ...
-}
-```
-
-| 键值 | 类型 | 范围 | 说明 |
-| :----: | :----------------: | :---------------------------: | :----------------------------------------------------------: |
-| type | Class | 用户定义的继承learner并实现虚函数的类 | 和用户定义的继承learner并实现虚函数的类名相同 |
-| params(可选) | Dictionary | 任意key value形式的值或者None | 自定义参数,用户可以通过key value的形式传入任何值 |
-
-### Environment配置参数
-
-`collect_environment`和`eval_environment`分别表示运行过程中收集数据的环境和用来评估模型的环境,算法中需要指定类型名`number`,`type`和参数`params`:
-
-- `number`:在算法中所需要的环境数量。
-
-- `type`:指定环境的类型名,这里可以是Reinforcement内置的环境,例如`Environment`,也可以是用户自定义的环境类型。
-
-- `params`:指定实例化相应外部环境的参数。需要注意的是,`params`和`type`需要匹配。
-
-以下样例中定义了外部环境配置,框架会采用`Environment(name='CartPole-v0')`方式创建`CartPole-v0`外部环境。`collect_environment`和`eval_environment`的配置参数是一样的。
-
-```python
-from mindspore_rl.environment import GymEnvironment
-collect_env_params = {'name': 'CartPole-v0'}
-eval_env_params = {'name': 'CartPole-v0'}
-algorithm_config = {
- ...
- 'collect_environment': {
- 'number': 1,
- 'type': GymEnvironment, # 外部环境类名
- 'params': collect_env_params # 环境参数
- },
- 'eval_environment': {
- 'number': 1,
- 'type': GymEnvironment, # 外部环境类名
- 'params': eval_env_params # 环境参数
- },
- ...
-}
-```
-
-| 键值 | 类型 | 范围 | 说明 |
-| :----------------: | :--------: | :---------------------------: | :----------------------------------------------------------: |
-| number(可选) | Integer | [1, +∞) | 当用户选择填写number这项时,填入的环境数量至少为1个。当用户不选择填入number这项时,框架会直接创建环境实例而不会调用`MultiEnvironmentWrapper`类来包装环境 |
-| num_parallel(可选) | Integer | [1, number] | 不填时默认开启环境并行。用户可通过填写num_parallel: 1来关闭环境并行,或者配置自己需要的并行参数。 |
-| type | Class | Environment类的子类 | 外部环境类名 |
-| params(可选) | Dictionary | 任意key value形式的值或者None | 自定义参数,用户可以通过key value的形式传入任何值 |
-
-### Actor配置参数
-
-`Actor`负责与外部环境交互。通常`Actor`需要基于`Policy` 与`Env`交互,部分算法中还会将交互得到的经验存入`ReplayBuffer`中,因此`Actor`会持有`Policy`和`Environment`,并且按需创建`ReplayBuffer`。`Actor配置参数`中,`policies/networks`指定`Policy`中的成员对象名称。
-
-以下代码中定义`DQNActor`配置,框架会采用`DQNActor(algorithm_config['actor'])`方式创建Actor。
-
-```python
-algorithm_config = {
- ...
- 'actor': {
- 'number': 1, # Actor个数
- 'type': DQNActor, # Actor类名
- 'policies': ['init_policy', 'collect_policy', 'eval_policy'], # 从Policy中提取名为init_policy/collect_policy/eval_policy成员对象,用于构建Actor
- 'share_env': True # 每个actor是否共享环境
- }
- ...
-}
-```
-
-| 键值 | 类型 | 范围 | 说明 |
-| :--------------: | :------------: | :---------------------------------: | :----------------------------------------------------------: |
-| number | Integer | [1, +∞) | 目前actor数量暂时不支持1以外的数值 |
-| type | Class | 用户定义的继承actor并实现虚函数的类 | 和用户定义的继承actor并实现虚函数的类名相同 |
-| params(可选) | Dictionary | 任意key value形式的值或者None | 自定义参数,用户可以通过key value的形式传入任何值 |
-| policies | List of String | 和用户定义的策略变量名相同 | 列表中的所有String都应该和用户定义的策略类中初始化的策略变量名一一对应 |
-| networks(可选) | List of String | 和用户定义的网络变量名相同 | 列表中的所有String都应该和用户定义的策略类中初始化的网络变量名一一对应 |
-| share_env(可选) | Boolean | True 或 False | 默认值为True, 即各个actor共享一个环境。如果为False, 则单独为每个actor创建一个collect环境实例 |
-
-### ReplayBuffer配置参数
-
-在部分算法中,`ReplayBuffer`用于储存Actor和环境交互的经验。之后会从`ReplayBuffer`中取出数据,用于网络训练。
-
-```python
-from mindspore_rl.core.replay_buffer import ReplayBuffer
-algorithm_config = {
- ...
- 'replay_buffer': {'number': 1,
- 'type': ReplayBuffer,
- 'capacity': 100000, # ReplayBuffer容量
- 'sample_size': 64, # 采样Batch Size
- 'data_shape': [(4,), (1,), (1,), (4,)], # ReplayBuffer的维度信息
- 'data_type': [ms.float32, ms.int32, ms.float32, ms.float32]}, # ReplayBuffer数据类型
-}
-```
-
-| 键值 | 类型 | 范围 | 说明 |
-| :---------------: | :-------------------------: | :----------------------------------: | :---------------------------------------------------: |
-| number | Integer | [1, +∞) | 需要的Buffer数量 |
-| type | Class | 用户定义或者框架提供的ReplayBuffer类 | 用户定义或者框架提供的ReplayBuffer的类名相同 |
-| capacity | Integer | [0, +∞) | ReplayBuffer容量 |
-| data_shape | List of Integer Tuple | [0, +∞) | Tuple中的第一个值需要和环境数量相等,如是单环境则不填 |
-| data_type | List of mindspore data type | 需要是MindSpore的数据类型 | data_type的长度和data_shape的长度相同 |
-| sample_size(可选) | Integer | [0, capacity] | 值必须小于capacity。不填时,默认为1 |
-
-### Learner配置参数
-
-`Learner`负责基于历史经验对网络权重进行更新。`Learner`中持有`Policy`中定义的DNN网络(由`networks`指定`Policy`的成员对象名称),用于损失函数计算和网络权重更新。
-
-以下代码中定义`DQNLearner`配置,框架会采用`DQNLearner(algorithm_config['learner'])`方式创建Learner。
-
-```python
-from dqn.src.dqn import DQNLearner
-learner_params = {'gamma': 0.99,
- 'lr': 0.001, # 学习率
- }
-algorithm_config = {
- ...
- 'learner': {
- 'number': 1, # Learner个数
- 'type': DQNLearner, # Learner类名
- 'params': learner_params, # Learner 需要的参数
- 'networks': ['policy_network', 'target_network'] # Learner从Policy中提取名为target_net/policy_network成员对象,用于更新
- },
- ...
-}
-```
-
-| 键值 | 类型 | 范围 | 说明 |
-| :----------: | :------------: | :-----------------------------------: | :----------------------------------------------------------: |
-| number | Integer | [1, +∞) | 目前learner数量暂时不支持1以外的数值 |
-| type | Class | 用户定义的继承learner并实现虚函数的类 | 和用户定义的继承learner并实现虚函数的类名相同 |
-| params(可选) | Dictionary | 任意key value形式的值或者None | 自定义参数,用户可以通过key value的形式传入任何值 |
-| networks | List of String | 和定义的网络名变量相同 | 列表中的所有String都应该和用户定义的策略类中初始化的网络变量名一一对应 |
diff --git a/docs/reinforcement/docs/source_zh_cn/dqn.md b/docs/reinforcement/docs/source_zh_cn/dqn.md
deleted file mode 100644
index f6597a300f3f18864459ab4b4312e18637946e96..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_zh_cn/dqn.md
+++ /dev/null
@@ -1,411 +0,0 @@
-# 使用MindSpore Reinforcement实现深度Q学习(DQN)
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/reinforcement/docs/source_zh_cn/dqn.md)
-
-
-## 摘要
-
-为了使用MindSpore Reinforcement实现强化学习算法,用户需要:
-
-- 提供算法配置,将算法的实现与其部署细节分开;
-- 基于Actor-Learner-Environment抽象实现算法;
-- 创建一个执行已实现的算法的会话对象。
-
-本教程展示了使用MindSpore Reinforcement API实现深度Q学习(DQN)算法。注:为保证清晰性和可读性,仅显示与API相关的代码,不相关的代码已省略。点击[此处](https://github.com/mindspore-lab/mindrl/tree/master/example/dqn)获取MindSpore Reinforcement实现完整DQN的源代码。
-
-## 指定DQN的Actor-Learner-Environment抽象
-
-DQN算法需要两个深度神经网络,一个*策略网络*用于近似动作值函数(Q函数),另一个*目标网络*用于稳定训练。策略网络指如何对环境采取行动的策略,DQN算法的目标是训练策略网络以获得最大的奖励。此外,DQN算法使用*经验回放*技术来维护先前的观察结果,进行off-policy学习。其中Actor使用不同的行为策略来对环境采取行动。
-
-MindSpore Reinforcement使用*算法配置*指定DQN算法所需的逻辑组件(Actor、Learner、Policy and Network、Collect Environment、Eval Environment、Replayuffer)和关联的超参数。根据提供的配置,它使用不同的策略执行算法,以便用户可以专注于算法设计。
-
-算法配置是一个Python字典,指定如何构造DQN算法的不同组件。每个组件的超参数在单独的Python字典中配置。DQN算法配置定义如下:
-
-```python
-algorithm_config = {
- 'actor': {
- 'number': 1, # Actor实例的数量
- 'type': DQNActor, # 需要创建的Actor类
- 'policies': ['init_policy', 'collect_policy', 'evaluate_policy'], # Actor需要用到的选择动作的策略
- },
- 'learner': {
- 'number': 1, # Learner实例的数量
- 'type': DQNLearner, # 需要创建的Learner类
- 'params': learner_params, # Learner需要用到的参数
- 'networks': ['policy_network', 'target_network'] # Learner中需要用到的网络
- },
- 'policy_and_network': {
- 'type': DQNPolicy, # 需要创建的Policy类
- 'params': policy_params # Policy中需要用到的参数
- },
- 'collect_environment': {
- 'number': 1, # Collect Environment实例的数量
- 'type': GymEnvironment, # 需要创建的Collect Environment类
- 'params': collect_env_params # Collect Environment中需要用到的参数
- },
- 'eval_environment': {
- 'number': 1, # 同Collect Environment
- 'type': GymEnvironment,
- 'params': eval_env_params
- },
- 'replay_buffer': {'number': 1, # ReplayBuffer实例的数量
- 'type': ReplayBuffer, # 需要创建的ReplayBuffer类
- 'capacity': 100000, # ReplayBuffer大小
- 'data_shape': [(4,), (1,), (1,), (4,)], # ReplayBuffer中的数据Shape
- 'data_type': [ms.float32, ms.int32, ms.float32, ms.float32], # ReplayBuffer中的数据Type
- 'sample_size': 64}, # ReplayBuffer单次采样的数据量
-}
-```
-
-以上配置定义了六个顶层项,每个配置对应一个算法组件:*actor、learner、policy*、*replaybuffer*和两个*environment*。每个项对应一个类,该类必须由用户定义或者使用MIndSpore Reinforcement提供的组件,以实现DQN算法的逻辑。
-
-顶层项具有描述组件的子项。*number*定义算法使用的组件的实例数。*type*表示必须定义的Python类的名称,用于实现组件。*params*为组件提供必要的超参数。*actor*中的*policies*定义组件使用的策略。*learner*中的*networks*列出了此组件使用的所有神经网络。在DQN示例中,只有Actor与环境交互。*replay_buffer*定义回放缓冲区的*容量、形状、样本大小和数据类型*。
-
-对于DQN算法,我们配置了一个Actor `'number': 1`,它的Python类`'type': DQNActor`,以及三个行为策略`'policies': ['init_policy', 'collect_policy', 'evaluate_policy']`。
-
-其他组件也以类似的方式定义。有关更多详细信息,请参阅[完整代码示例](https://github.com/mindspore-lab/mindrl/tree/master/example/dqn)和[API](https://www.mindspore.cn/reinforcement/docs/zh-CN/master/reinforcement.html)。
-
-请注意,MindSpore Reinforcement使用单个*policy*类来定义算法使用的所有策略和神经网络。通过这种方式,它隐藏了策略和神经网络之间数据共享和通信的复杂性。
-
-在train.py文件中,需要通过调用MindSpore Reinforcement的*session*来执行算法。*Session*在一台或多台群集计算机上分配资源并执行编译后的计算图。用户传入算法配置以实例化Session类:
-
-```python
-from mindspore_rl.core import Session
-dqn_session = Session(dqn_algorithm_config)
-```
-
-调用Session对象上的run方法,并传入对应的参数来执行DQN算法。其中*class_type*是我们定义的Trainer类在这里是DQNTrainer(后面会介绍如何实现Trainer类),episode为需要运行的循环次数,params为在config文件中定义的trainer所需要用到的参数具体可查看完整代码中*config.py*的内容,callbacks定义了需要用到的统计方法等具体请参考API中的Callback相关内容。
-
-```python
-from src.dqn_trainer import DQNTrainer
-from mindspore_rl.utils.callback import CheckpointCallback, LossCallback, EvaluateCallback
-loss_cb = LossCallback()
-ckpt_cb = CheckpointCallback(50, config.trainer_params['ckpt_path'])
-eval_cb = EvaluateCallback(10)
-cbs = [loss_cb, ckpt_cb, eval_cb]
-dqn_session.run(class_type=DQNTrainer, episode=episode, params=config.trainer_params, callbacks=cbs)
-```
-
-为使用MindSpore的计算图功能,将执行模式设置为`GRAPH_MODE`。
-
-```python
-import mindspore as ms
-ms.set_context(mode=ms.GRAPH_MODE)
-```
-
-`@jit`修饰的函数和方法将会编译到MindSpore计算图用于自动并行和加速。在本教程中,我们使用此功能来实现一个高效的`DQNTrainer`类。
-
-### 定义DQNTrainer类
-
-`DQNTrainer`类表示算法的流程编排,主要流程为循环迭代地与环境交互将经验内存入*ReplayBuffer*中,然后从*ReplayBuffer*获取经验并训练目标模型。它必须继承自`Trainer`类,该类是MindSpore Reinforcement API的一部分。
-
-`Trainer`基类包含`MSRL`(MindSpore Reinforcement)对象,该对象允许算法实现与MindSpore Reinforcement交互,以实现训练逻辑。`MSRL`类根据先前定义的算法配置实例化RL算法组件。它提供了函数处理程序,这些处理程序透明地绑定到用户定义的Actor、Learner或ReplayBuffer的方法。因此,`MSRL`类让用户能够专注于算法逻辑,同时它透明地处理一个或多个worker上不同算法组件之间的对象创建、数据共享和通信。用户通过使用算法配置创建上文提到的`Session`对象来实例化`MSRL`对象。
-
-`DQNTrainer`必须重载`train_one_episode`用于训练,`evaluate`用于评估以及`trainable_variable`用于保存断点。在本教程中,它的定义如下:
-
-```python
-class DQNTrainer(Trainer):
- def __init__(self, msrl, params):
- ...
- super(DQNTrainer, self).__init__(msrl)
-
- def trainable_variables(self):
- """Trainable variables for saving."""
- trainable_variables = {"policy_net": self.msrl.learner.policy_network}
- return trainable_variables
-
- @ms.jit
- def init_training(self):
- """Initialize training"""
- state = self.msrl.collect_environment.reset()
- done = self.false
- i = self.zero_value
- while self.less(i, self.fill_value):
- done, _, new_state, action, my_reward = self.msrl.agent_act(
- trainer.INIT, state)
- self.msrl.replay_buffer_insert(
- [state, action, my_reward, new_state])
- state = new_state
- if done:
- state = self.msrl.collect_environment.reset()
- done = self.false
- i += 1
- return done
-
- @ms.jit
- def evaluate(self):
- """Policy evaluate"""
- total_reward = self.zero_value
- eval_iter = self.zero_value
- while self.less(eval_iter, self.num_evaluate_episode):
- episode_reward = self.zero_value
- state = self.msrl.eval_environment.reset()
- done = self.false
- while not done:
- done, r, state = self.msrl.agent_act(trainer.EVAL, state)
- r = self.squeeze(r)
- episode_reward += r
- total_reward += episode_reward
- eval_iter += 1
- avg_reward = total_reward / self.num_evaluate_episode
- return avg_reward
-```
-
-用户调用`train`方法会调用Trainer基类的`train`。然后,为它指定数量的episode(iteration)训练模型,每个episode调用用户定义的`train_one_episode`方法。最后,train方法通过调用`evaluate`方法来评估策略以获得奖励值。
-
-在训练循环的每次迭代中,调用`train_one_episode`方法来训练一个episode:
-
-```python
-@ms.jit
-def train_one_episode(self):
- """Train one episode"""
- if not self.inited:
- self.init_training()
- self.inited = self.true
- state = self.msrl.collect_environment.reset()
- done = self.false
- total_reward = self.zero
- steps = self.zero
- loss = self.zero
- while not done:
- done, r, new_state, action, my_reward = self.msrl.agent_act(
- trainer.COLLECT, state)
- self.msrl.replay_buffer_insert(
- [state, action, my_reward, new_state])
- state = new_state
- r = self.squeeze(r)
- loss = self.msrl.agent_learn(self.msrl.replay_buffer_sample())
- total_reward += r
- steps += 1
- if not self.mod(steps, self.update_period):
- self.msrl.learner.update()
- return loss, total_reward, steps
-```
-
-`@jit`注解表示此方法将被编译为MindSpore计算图用于加速。所有标量值都必须定义为张量类型,例如`self.zero_value = Tensor(0, mindspore.float32)`。
-
-`train_one_episode`方法首先调用环境的`reset`方法,`self.msrl.collect_environment.reset()`函数来重置环境。然后,它使用`self.msrl.agent_act`函数处理程序从环境中收集经验,并通过`self.msrl.replay_buffer_insert`把经验存入到回放缓存中。在收集完经验后,使用`msrl.agent_learn`函数训练目标模型。`self.msrl.agent_learn`的输入是`self.msrl.replay_buffer_sample`返回的采样结果。
-
-回放缓存`ReplayBuffer`由MindSpore Reinfocement提供。它定义了`insert`和`sample`方法,分别用于对经验数据进行存储和采样。详细信息,请参阅[完整的DQN代码示例](https://github.com/mindspore-lab/mindrl/tree/master/example/dqn)。
-
-### 定义DQNPolicy类
-
-定义`DQNPolicy`类,用于实现神经网络并定义策略。
-
-```python
-class DQNPolicy():
- def __init__(self, params):
- self.policy_network = FullyConnectedNet(
- params['state_space_dim'],
- params['hidden_size'],
- params['action_space_dim'],
- params['compute_type'])
- self.target_network = FullyConnectedNet(
- params['state_space_dim'],
- params['hidden_size'],
- params['action_space_dim'],
- params['compute_type'])
-```
-
-构造函数将先前在config.py中定义的Python字典类型的超参数`policy_params`作为输入。
-
-在定义策略网络和目标网络之前,用户必须使用MindSpore算子定义神经网络的结构。例如,它们可能是`FullyConnectedNetwork`类的对象,该类定义如下:
-
-```python
-class FullyConnectedNetwork(mindspore.nn.Cell):
- def __init__(self, input_size, hidden_size, output_size, compute_type=mstype.float32):
- super(FullyConnectedNet, self).__init__()
- self.linear1 = nn.Dense(
- input_size,
- hidden_size,
- weight_init="XavierUniform").to_float(compute_type)
- self.linear2 = nn.Dense(
- hidden_size,
- output_size,
- weight_init="XavierUniform").to_float(compute_type)
- self.relu = nn.ReLU()
-```
-
-DQN算法使用损失函数来优化神经网络的权重。此时,用户必须定义一个用于计算损失函数的神经网络。此网络被指定为`DQNLearner`的嵌套类。此外,还需要优化器来训练网络。优化器和损失函数定义如下:
-
-```python
-class DQNLearner(Learner):
- """DQN Learner"""
-
- class PolicyNetWithLossCell(nn.Cell):
- """DQN policy network with loss cell"""
-
- def __init__(self, backbone, loss_fn):
- super(DQNLearner.PolicyNetWithLossCell,
- self).__init__(auto_prefix=False)
- self._backbone = backbone
- self._loss_fn = loss_fn
- self.gather = P.GatherD()
-
- def construct(self, x, a0, label):
- """constructor for Loss Cell"""
- out = self._backbone(x)
- out = self.gather(out, 1, a0)
- loss = self._loss_fn(out, label)
- return loss
- def __init__(self, params=None):
- super(DQNLearner, self).__init__()
- ...
- optimizer = nn.Adam(
- self.policy_network.trainable_params(),
- learning_rate=params['lr'])
- loss_fn = nn.MSELoss()
- loss_q_net = self.PolicyNetWithLossCell(self.policy_network, loss_fn)
- self.policy_network_train = nn.TrainOneStepCell(loss_q_net, optimizer)
- self.policy_network_train.set_train(mode=True)
- ...
-```
-
-DQN算法是一种*off-policy*算法,使用epsilon-贪婪策略学习。它使用不同的行为策略来对环境采取行动和收集数据。在本示例中,我们用`RandomPolicy`初始化训练,用`EpsilonGreedyPolicy`收集训练期间的经验,用`GreedyPolicy`进行评估:
-
-```python
-class DQNPolicy():
- def __init__(self, params):
- ...
- self.init_policy = RandomPolicy(params['action_space_dim'])
- self.collect_policy = EpsilonGreedyPolicy(self.policy_network, (1, 1), params['epsi_high'],
- params['epsi_low'], params['decay'], params['action_space_dim'])
- self.evaluate_policy = GreedyPolicy(self.policy_network)
-```
-
-由于上述三种行为策略在一系列RL算法中非常常见,MindSpore Reinforcement将它们作为可重用的构建块提供。用户还可以自定义特定算法的行为策略。
-
-请注意,参数字典的方法名称和键必须与前面定义的算法配置一致。
-
-### 定义DQNActor类
-
-定义一个新的Actor组件用于实现`DQNActor`,该组件继承了MindSpore Reinforcement提供的`Actor`类。然后,必须重载Actor中的方法:
-
-```python
-class DQNActor(Actor):
- ...
- def act(self, phase, params):
- if phase == 1:
- # Fill the replay buffer
- action = self.init_policy()
- new_state, reward, done = self._environment.step(action)
- action = self.reshape(action, (1,))
- my_reward = self.select(done, self.penalty, self.reward)
- return done, reward, new_state, action, my_reward
- if phase == 2:
- # Experience collection
- self.step += 1
- ts0 = self.expand_dims(params, 0)
- step_tensor = self.ones((1, 1), ms.float32) * self.step
-
- action = self.collect_policy(ts0, step_tensor)
- new_state, reward, done = self._environment.step(action)
- action = self.reshape(action, (1,))
- my_reward = self.select(done, self.penalty, self.reward)
- return done, reward, new_state, action, my_reward
- if phase == 3:
- # Evaluate the trained policy
- ts0 = self.expand_dims(params, 0)
- action = self.evaluate_policy(ts0)
- new_state, reward, done = self._eval_env.step(action)
- return done, reward, new_state
- self.print("Phase is incorrect")
- return 0
-```
-
-这三种方法使用不同的策略作用于指定的环境,这些策略将状态映射到操作。这些方法将张量类型的值作为输入,并从环境返回轨迹。
-
-为了与环境交互,Actor使用`Environment`类中定义的`step(action)`方法。对于应用到指定环境的操作,此方法会做出反应并返回三元组。三元组包括应用上一个操作后的新状态、作为浮点类型获得的奖励以及用于终止episode和重置环境的布尔标志。
-
-回放缓冲区类`ReplayBuffer`定义了一个`insert`方法,`DQNActor`对象调用该方法将经验数据存储在回放缓冲区中。
-
-`Environment`类和`ReplayBuffer`类由MindSpore Reinforcement API提供。
-
-`DQNActor`类的构造函数定义了环境、回放缓冲区、策略和网络。它将字典类型的参数作为输入,这些参数在算法配置中定义。下面,我们只展示环境的初始化,其他属性以类似的方式分配:
-
-```python
-class DQNActor(Actor):
- def __init__(self, params):
- self._environment = params['collect_environment']
- self._eval_env = params['eval_environment']
- ...
-```
-
-### 定义DQNLearner类
-
-为了实现`DQNLearner`,类必须继承MindSpore Reinforcement API中的`Learner`类,并重载`learn`方法:
-
-```python
-class DQNLearner(Learner):
- ...
- def learn(self, experience):
- """Model update"""
- s0, a0, r1, s1 = experience
- next_state_values = self.target_network(s1)
- next_state_values = next_state_values.max(axis=1)
- r1 = self.reshape(r1, (-1,))
-
- y_true = r1 + self.gamma * next_state_values
-
- # Modify last step reward
- one = self.ones_like(r1)
- y_true = self.select(r1 == -one, one, y_true)
- y_true = self.expand_dims(y_true, 1)
-
- success = self.policy_network_train(s0, a0, y_true)
- return success
-```
-
-在这里,`learn`方法将轨迹(从回放缓冲区采样)作为输入来训练策略网络。构造函数通过从算法配置接收字典类型的配置,将网络、策略和折扣率分配给DQNLearner:
-
-```python
-class DQNLearner(Learner):
- def __init__(self, params=None):
- super(DQNLearner, self).__init__()
- self.policy_network = params['policy_network']
- self.target_network = params['target_network']
-```
-
-## 执行并查看结果
-
-执行脚本`train.py`以启动DQN模型训练。
-
-```python
-cd example/dqn/
-python train.py
-```
-
-执行结果如下:
-
-```text
------------------------------------------
-Evaluation result in episode 0 is 95.300
------------------------------------------
-Episode 0, steps: 33.0, reward: 33.000
-Episode 1, steps: 45.0, reward: 12.000
-Episode 2, steps: 54.0, reward: 9.000
-Episode 3, steps: 64.0, reward: 10.000
-Episode 4, steps: 73.0, reward: 9.000
-Episode 5, steps: 82.0, reward: 9.000
-Episode 6, steps: 91.0, reward: 9.000
-Episode 7, steps: 100.0, reward: 9.000
-Episode 8, steps: 109.0, reward: 9.000
-Episode 9, steps: 118.0, reward: 9.000
-...
-...
-Episode 200, steps: 25540.0, reward: 200.000
-Episode 201, steps: 25740.0, reward: 200.000
-Episode 202, steps: 25940.0, reward: 200.000
-Episode 203, steps: 26140.0, reward: 200.000
-Episode 204, steps: 26340.0, reward: 200.000
-Episode 205, steps: 26518.0, reward: 178.000
-Episode 206, steps: 26718.0, reward: 200.000
-Episode 207, steps: 26890.0, reward: 172.000
-Episode 208, steps: 27090.0, reward: 200.000
-Episode 209, steps: 27290.0, reward: 200.000
------------------------------------------
-Evaluation result in episode 210 is 200.000
------------------------------------------
-```
-
-
diff --git a/docs/reinforcement/docs/source_zh_cn/environment.md b/docs/reinforcement/docs/source_zh_cn/environment.md
deleted file mode 100644
index 6e2242f8122ba10fc2a3f9ac021af6a42bfa8bf4..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_zh_cn/environment.md
+++ /dev/null
@@ -1,80 +0,0 @@
-# 强化学习环境接入
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/reinforcement/docs/source_zh_cn/environment.md)
-
-## 概述
-
-强化学习领域中,智能体与环境交互过程中,学习策略来使得数值化的收益信号最大化。“环境”作为待解决的问题,是强化学习领域中重要的要素。
-
-目前强化学习使用的环境种类繁多:[Mujoco](https://github.com/deepmind/mujoco)、[MPE](https://github.com/openai/multiagent-particle-envs)、[Atari](https://github.com/gsurma/atari)、[PySC2](https://www.github.com/deepmind/pysc2)、[SMAC](https://github/oxwhirl/smac)、[TORCS](https://github.com/ugo-nama-kun/gym_torcs)、[Isaac](https://github.com/NVIDIA-Omniverse/IsaacGymEnvs)等,目前MindSpore Reinforcement接入了Gym、SMAC两个环境,后续随着算法的丰富,还会逐渐接入更多的环境。本文将介绍如何在MindSpore Reinforcement下接入第三方环境。
-
-## 将环境Python函数封装为算子
-
-在此之前,先介绍一下静态图和动态图模式。
-
-- 动态图模式下,程序按照代码的编写顺序逐行执行,编译器将神经网络中的各个算子逐一下发到设备进行计算操作,方便用户编写和调试神经网络模型。
-
-- 静态图模式下,程序在编译执行时,会将开发者定义的算法编译成一张计算图。在这个过程中,编译器可以通过使用图优化技术来降低资源开销,获得更好的执行性能。
-
-由于静态图模式支持的语法是Python语言的子集,而常用的环境一般使用Python接口实现交互,二者之间的语法差异往往会造成图编译错误。对于这个问题,开发者可以使用`PyFunc`算子将Python函数封装为一个MindSpore计算图中的算子。
-
-接下来以gym为例,将`env.reset()`封装为一个MindSpore计算图中的算子:
-
-下面的代码中创建了一个`CartPole-v0`的环境,执行`env.reset()`方法,可以看到`state`的类型是`numpy.ndarray`,数据类型和维度分别是`np.float64`和`(4,)`。
-
-```python
-import gym
-
-env = gym.make('CartPole-v0')
-state = env.reset()
-print('type: {}, shape: {}, dtype: {}'.format(type(state), state.dtype, state.shape))
-
-# Result:
-# type: , shape: (4,), dtype: float64
-```
-
-接下来,使用`PyFunc`算子将`env.reset()`封装为一个MindSpore算子:
-
-- `fn`指定需要封装的Python函数名,既可以是普通的函数,也可以是成员函数。
-- `in_types`和`in_shapes`指定输入的数据类型和维度。`env.reset`没有入参,因此填写空的列表。
-- `out_types`,`out_shapes`指定返回值的数据类型和维度。从之前的执行结果可以看到,`env.reset()`返回值是一个numpy数组,数据类型和维度分别是`np.float64`和`(4,)`,因此填写`[ms.float64,]`和`[(4,),]`。
-- `PyFunc`返回值是个tuple(Tensor)。
-- 更加详细的使用说明[参考](https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/ops/operations/other_ops.py)。
-
-## 环境和算法解耦
-
-强化学习算法通常应该具备良好的泛化性,例如解决`HalfCheetah`的算法也应该能够解决`Pendulum`。为了贯彻泛化性的要求,有必要将环境和算法其余部分进行解耦,从而确保在更换环境后,脚本中的其余部分尽量少的修改。建议开发者参考`Environment`对环境进行封装。
-
-```python
-class Environment(nn.Cell):
- def __init__(self):
- super(Environment, self).__init__(auto_prefix=False)
-
- def reset(self):
- pass
-
- def step(self, action):
- pass
-
- @property
- def action_space(self) -> Space:
- pass
-
- @property
- def observation_space(self) -> Space:
- pass
-
- @property
- def reward_space(self) -> Space:
- pass
-
- @property
- def done_space(self) -> Space:
- pass
-```
-
-`Environment`除了提供`reset`和`step`等与环境交互的接口之外,还需要提供`action_space`、`observation_space`等方法,这些接口返回[Space](https://mindspore.cn/reinforcement/docs/zh-CN/master/reinforcement.html#mindspore_rl.environment.Space)类型。算法可以根据`Space`信息:
-
-- 获取环境的状态空间和动作空间的维度,用于构建神经网络。
-- 读取合法的动作范围,对策略网络给出的动作进行缩放和裁剪。
-- 识别环境的动作空间是离散的还是连续的,选择采用连续分布还是离散分布对环境探索。
diff --git a/docs/reinforcement/docs/source_zh_cn/images/cartpole.gif b/docs/reinforcement/docs/source_zh_cn/images/cartpole.gif
deleted file mode 100644
index 48bad3f540b81c56c8ebe07881421aaeb803d19f..0000000000000000000000000000000000000000
Binary files a/docs/reinforcement/docs/source_zh_cn/images/cartpole.gif and /dev/null differ
diff --git a/docs/reinforcement/docs/source_zh_cn/images/get.png b/docs/reinforcement/docs/source_zh_cn/images/get.png
deleted file mode 100644
index 29ff4f29177460cde5b818a8cf1ad13ab379c152..0000000000000000000000000000000000000000
Binary files a/docs/reinforcement/docs/source_zh_cn/images/get.png and /dev/null differ
diff --git a/docs/reinforcement/docs/source_zh_cn/images/insert.png b/docs/reinforcement/docs/source_zh_cn/images/insert.png
deleted file mode 100644
index ad602bba69acca0b60a8f9e7bf1472d137593d62..0000000000000000000000000000000000000000
Binary files a/docs/reinforcement/docs/source_zh_cn/images/insert.png and /dev/null differ
diff --git a/docs/reinforcement/docs/source_zh_cn/images/mindspore_rl_architecture.png b/docs/reinforcement/docs/source_zh_cn/images/mindspore_rl_architecture.png
deleted file mode 100644
index eb50b331d16c95bd031f0714e742db0f5b1f9e26..0000000000000000000000000000000000000000
Binary files a/docs/reinforcement/docs/source_zh_cn/images/mindspore_rl_architecture.png and /dev/null differ
diff --git a/docs/reinforcement/docs/source_zh_cn/images/sample.png b/docs/reinforcement/docs/source_zh_cn/images/sample.png
deleted file mode 100644
index de7799346464fae8e85f15e7241fade0da4f0ac9..0000000000000000000000000000000000000000
Binary files a/docs/reinforcement/docs/source_zh_cn/images/sample.png and /dev/null differ
diff --git a/docs/reinforcement/docs/source_zh_cn/index.rst b/docs/reinforcement/docs/source_zh_cn/index.rst
deleted file mode 100644
index 46057bdf1d63b7604dd9bfa47b0ab8d5aa2c3c4c..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_zh_cn/index.rst
+++ /dev/null
@@ -1,69 +0,0 @@
-MindSpore Reinforcement 文档
-=============================
-
-MindSpore Reinforcement是一款强化学习套件,支持使用强化学习算法对agent进行分布式训练。
-
-MindSpore Reinforcement为编写强化学习算法提供了简洁的API抽象,它将算法与具体的部署和执行过程解耦,包括加速器的使用、并行度以及跨节点的计算调度。MindSpore Reinforcement将强化学习算法转换为一系列编译后的计算图,然后由MindSpore框架在CPU、GPU或AscendAI处理器上高效运行。
-
-.. raw:: html
-
-
-
-代码仓地址:
-
-设计特点
---------
-
-1. 提供以算法为中心的API,用于编写强化学习算法
-
- 在MindSpore Reinforcement中,用户使用直观的算法概念(如agent、actor、environment、learner)来描述由Python表达的强化学习算法。Agent包含与环境交互并收集奖励的actor。根据奖励,learner更新用于控制actor行为的策略。用户可以专注于算法的实现,而不用关注框架的计算细节。
-
-2. 将强化学习算法与其执行策略解耦
-
- MindSpore Reinforcement提供的用于算法实现的API并没有假设算法如何被执行。因此,MindSpore Reinforcement可以在单GPU的笔记本电脑和多GPU的计算机集群上执行相同的算法。用户提供了单独的执行配置,该配置描述了MindSpore Reinforcement可以用于训练的资源。
-
-3. 高效加速强化学习算法
-
- MindSpore Reinforcement旨在通过在硬件加速器(如GPU或Ascend AI处理器)上执行计算,加速对强化学习算法的训练。它不仅加速了神经网络的计算,而且还将actor和learner的逻辑转换为具有并行算子的计算图。MindSpore利用框架自身在编译和自动并行上的特性优势来执行这些计算图。
-
-未来路标
----------
-
-- MindSpore Reinforcement初始版本包含一个稳定的API,用于实现强化学习算法和使用MindSpore的计算图执行计算。现已支持算法并行和半自动分布式执行能力,支持多agent场景,暂不支持自动的分布式能力。MindSpore Reinforcement的后续版本将包含这些功能,敬请期待。
-
-使用MindSpore Reinforcement的典型场景
---------------------------------------
-
-- `训练深度Q网络 `_
-
- DQN算法使用经验回放技术来维护先前的观察结果,进行off-policy学习。
-
-.. toctree::
- :glob:
- :maxdepth: 1
- :caption: 安装部署
-
- reinforcement_install
-
-.. toctree::
- :glob:
- :maxdepth: 1
- :caption: 使用指南
-
- custom_config_info
- dqn
- replaybuffer
- environment
-
-.. toctree::
- :maxdepth: 1
- :caption: API参考
-
- reinforcement
-
-.. toctree::
- :glob:
- :maxdepth: 1
- :caption: RELEASE NOTES
-
- RELEASE
diff --git a/docs/reinforcement/docs/source_zh_cn/reinforcement_install.md b/docs/reinforcement/docs/source_zh_cn/reinforcement_install.md
deleted file mode 100644
index f13d8f9f5721c3a167953ada8fd5120f1a99bdef..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_zh_cn/reinforcement_install.md
+++ /dev/null
@@ -1,36 +0,0 @@
-# 安装MindSpore Reinforcement
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/reinforcement/docs/source_zh_cn/reinforcement_install.md)
-
-MindSpore Reinforcement依赖MindSpore训练推理框架,安装完[MindSpore](https://gitee.com/mindspore/mindspore#安装),再安装MindSpore Reinforcement。可以采用pip安装或者源码编译安装两种方式。
-
-## pip安装
-
-使用pip命令安装,请从[MindSpore Reinforcement下载页面](https://www.mindspore.cn/versions)下载并安装whl包。
-
-```shell
-pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/{ms_version}/Reinforcement/any/mindspore_rl-{mr_version}-py3-none-any.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple
-```
-
-> - 在联网状态下,安装whl包时会自动下载MindSpore Reinforcement安装包的依赖项(依赖项详情参见requirement.txt),其余情况需自行安装。
-> - `{ms_version}`表示与MindSpore Reinforcement匹配的MindSpore版本号,例如下载0.1.0版本MindSpore Reinforcement时,`{ms_version}`应写为1.5.0。
-> - `{mr_version}`表示MindSpore Reinforcement版本号,例如下载0.1.0版本MindSpore Reinforcement时,`{mr_version}`应写为0.1.0。
-
-## 源码编译安装
-
-下载[源码](https://github.com/mindspore-lab/mindrl),下载后进入`reinforcement`目录。
-
-```shell
-bash build.sh
-pip install output/mindspore_rl-0.1.0-py3-none-any.whl
-```
-
-其中,`build.sh`为`reinforcement`目录下的编译脚本文件。
-
-## 验证安装是否成功
-
-执行以下命令,验证安装结果。导入Python模块不报错即安装成功:
-
-```python
-import mindspore_rl
-```
diff --git a/docs/reinforcement/docs/source_zh_cn/replaybuffer.md b/docs/reinforcement/docs/source_zh_cn/replaybuffer.md
deleted file mode 100644
index 3399972f68e31a4cad163e5432fdfd752569a7d3..0000000000000000000000000000000000000000
--- a/docs/reinforcement/docs/source_zh_cn/replaybuffer.md
+++ /dev/null
@@ -1,136 +0,0 @@
-# ReplayBuffer 使用说明
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/reinforcement/docs/source_zh_cn/replaybuffer.md)
-
-## ReplayBuffer 简介
-
-在强化学习中,ReplayBuffer是一个常用的基本数据存储方式,它的功能在于存放智能体与环境交互得到的数据。
-使用ReplayBuffer可以解决以下几个问题:
-
-1. 存储的历史经验数据,可以通过采样的方式抽取,以打破训练数据的相关性,使抽样的数据具有独立同分布的特性。
-2. 可以提供数据的临时存储,提高数据的利用率。
-
-## MindSpore Reinforcement Learning 的 ReplayBuffer 实现
-
-一般情况下,算法人员使用原生的Python数据结构或Numpy的数据结构来构造ReplayBuffer,或者一般的强化学习框架也提供了标准的API封装。不同的是,MindSpore实现了设备端的ReplayBuffer结构,一方面能在使用GPU硬件时减少数据在Host和Device之间的频繁拷贝,另一方面,以MindSpore算子的形式表达ReplayBuffer,可以构建完整的IR图,使能MindSpore GRAPH_MODE的各种图优化,提升整体的性能。
-
-在MindSpore中,提供了两种ReplayBuffer,分别是UniformReplayBuffer和PriorityReplayBuffer,分别用于常用的FIFO存储和带有优先级的存储。下面以UniformReplayBuffer为例介绍实现及使用。
-以一个List的Tensor表示,每个Tensor代表一组按列存储的数据(如一组[state, action, reward])。新放入UniformReplayBuffer中的数据以FIFO的机制进行内容的更新,具有插入、查找、采样等功能。
-
-### 参数解释
-
-创建一个UniformReplayBuffer,初始化参数为batch_size、capacity、shapes、types。
-
-* batch_size表示sample一次数据的大小,整数值。
-* capacity表示创建UniformReplayBuffer的总容量,整数值。
-* shapes表示Buffer中,每一组数据的shape大小,以list表示。
-* types表示Buffer中,每一组数据对应的数据类型,以list表示。
-
-### 功能介绍
-
-#### 1 插入 -- insert
-
-插入方法接收一组数据作为入参,需满足数据的shape和type与创建的UniformReplayBuffer参数一致。无输出。
-为了模拟循环队列的FIFO特性,我们使用两个游标来确定队列的头部head和有效长度count。下图展示了几次插入操作的过程。
-
-1. buffer的总大小为6,初始状态时,游标head和count均为0。
-2. 插入一个batch_size为2的数据后,当前的head不变,count加2。
-3. 继续插入一个batch_size为4的数据后,队列已满,count为6。
-4. 继续插入一个batch_size为2的数据后,覆盖式更新旧数据,并将head加2。
-
-
-
-#### 2 查找 -- get_item
-
-查找方法接受一个index作为入参,表示需要查找的数据的具体位置。输出为一组Tensor。如下图所示:
-
-1. UniformReplayBuffer刚满或未满的情况下,根据index直接找到对应数据。
-2. 对于已经覆盖过的数据,通过游标进行重映射。
-
-
-
-#### 3 采样 -- sample
-
-采样方法无输入,输出为一组Tensor,大小为创建UniformReplayBuffer时的batch_size大小。如下图所示:
-假定batch_size为3,算子中会随机产生一组indexes,这组随机的indexes有两种情况:
-
-1. 保序:每个index即代表真实的数据位置,需要经过游标重映射操作。
-2. 不保序:每个index不代表真实位置,直接获取。
-
-两种方式对随机性有轻微影响,默认采用不保序的方式以获取最佳的性能。
-
-
-
-## MindSpore Reinforcement Learning 的 UniformReplayBuffer 使用介绍
-
-### UniformReplayBuffer的创建
-
-MindSpore Reinforcement Learning 提供了标准的ReplayBuffer API。用户可以使用配置文件的方式使用框架创建的ReplayBuffer,形如[dqn](https://github.com/mindspore-lab/mindrl/tree/master/mindspore_rl/algorithm/dqn/config.py)的配置文件:
-
-```python
-'replay_buffer':
- {'number': 1,
- 'type': UniformReplayBuffer,
- 'capacity': 100000,
- 'data_shape': [(4,), (1,), (1,), (4,)],
- 'data_type': [ms.float32, ms.int32, ms.foat32, ms.float32],
- 'sample_size': 64}
-```
-
-或者,用户可以直接使用API接口,创建所需的数据结构:
-
-```python
-from mindspore_rl.core.uniform_replay_buffer import UniformReplayBuffer
-import mindspore as ms
-sample_size = 2
-capacity = 100000
-shapes = [(4,), (1,), (1,), (4,)]
-types = [ms.float32, ms.int32, ms.float32, ms.float32]
-replaybuffer = UniformReplayBuffer(sample_size, capacity, shapes, types)
-```
-
-### 使用创建的UniformReplayBuffer
-
-以API形式创建的[UniformReplayBuffer](https://github.com/mindspore-lab/mindrl/tree/master/mindspore_rl/core/uniform_replay_buffer.py)进行数据操作为例:
-
-* 插入操作
-
-```python
-state = ms.Tensor([0.1, 0.2, 0.3, 0.4], ms.float32)
-action = ms.Tensor([1], ms.int32)
-reward = ms.Tensor([1], ms.float32)
-new_state = ms.Tensor([0.4, 0.3, 0.2, 0.1], ms.float32)
-replaybuffer.insert([state, action, reward, new_state])
-replaybuffer.insert([state, action, reward, new_state])
-```
-
-* 查找操作
-
-```python
-exp = replaybuffer.get_item(0)
-```
-
-* 采样操作
-
-```python
-samples = replaybuffer.sample()
-```
-
-* 重置操作
-
-```python
-replaybuffer.reset()
-```
-
-* 当前buffer使用的大小
-
-```python
-size = replaybuffer.size()
-```
-
-* 判断当前buffer是否已满
-
-```python
-if replaybuffer.full():
- print("Full use of this buffer.")
-```