# 2025 Spring Deep-learning Project

**Repository Path**: mojoisme/2025-spring-deep-learning-class-project

## Basic Information

- **Project Name**: 2025 Spring Deep-learning Project
- **Description**: 苏州jzw班（深度强化学习）
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-04-21
- **Last Updated**: 2025-09-14

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

## 环境配置

python=3.10
torch=2.6.0+cu124

创建虚拟环境

```bash
conda create -n DpPro python=3.10
conda activate DpPro
```

安装 environment.yaml 中列出的 Conda 依赖包

```bash
conda install -c defaults bzip2 ca-certificates libffi openssl pip python=3.10.16 setuptools sqlite tk tzdata vc vs2015_runtime wheel xz zlib
```

查看本机的 CUDA 版本

```bash
nvidia-smi
```

根据 CUDA 版本选择 torch 版本（尽量安装 2.4.1 及以上的 torch）
例如 CUDA 版本为 11.8，可运行以下命令

```bash
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=11.8 -c pytorch -c nvidia
```

安装 environment.yaml 中列出的 pip 依赖包

```bash
pip install box2d-py==2.3.5 cloudpickle==3.1.1 filelock==3.18.0 fsspec==2025.3.2 gym==0.26.2 gym-notices==0.0.8 Jinja2==3.1.6 MarkupSafe==3.0.2 mpmath==1.3.0 networkx==3.4.2 numpy==1.23.5 pillow==11.2.1 pygame==2.1.0 swig==4.3.1 sympy==1.13.1 typing_extensions==4.13.2
```

如果发现以下报错（安装 gym[box2d] 时遇到 SWIG 问题）

```bash
running build_ext
      building 'Box2D._Box2D' extension
      swigging Box2D\Box2D.i to Box2D\Box2D_wrap.cpp
      swig.exe -python -c++ -IBox2D -small -O -includeall -ignoremissing -w201 -globals b2Globals -outdir library\Box2D -keyword -w511 -D_SWIG_Kcpp Box2D\Box2cpp Box2D\Box2D.i
      error: command 'swig.exe' failed: None
      [end of output]
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for box2d-py
  Running setup.py clean for box2d-py
Failed to build box2d-py
ERROR: Failed to build installable wheels for some pyproject.toml based projects (box2d-py)
```

可以按照下图方法解决：
![alt text](img/image.png)
[swig 安装与配置](https://blog.csdn.net/qq_24586395/article/details/108244056)

即运行命令

```bash
conda install -c anaconda swig
pip install box2d-py
```

成功安装 box2d-py 的控制台输出如下

```bash
pip install gym[box2d]
...
  Building wheel for box2d-py (setup.py) ... done
  Created wheel for box2d-py: filename=box2d_py-2.3.5-cp310-cp310-win_amd64.whl size=439378 sha256=b89149cff3502bd37e0b5f7b3ccb521f4be0f1dc1beaf961aa9f962c2bc510f4
  Stored in directory: c:\users\23778\appdata\local\pip\cache\wheels\75\52\2e\ce2d8c82b75eb9c0634783b51274eb70100634d6f995971982
Successfully built box2d-py
Installing collected packages: swig, box2d-py, pygame
Successfully installed box2d-py-2.3.5 pygame-2.1.0 swig-4.3.1
```

注意到 environment.yaml 中的 numpy 小于 2.0

```bash
PS: > pip install numpy==1.23.5
Looking in indexes: https://mirror.nju.edu.cn/pypi/web/simple
Collecting numpy==1.23.5
  Downloading https://mirror.nju.edu.cn/pypi/web/packages/6a/03/ae6c3c307f9c5c7516de3df3e764ebb1de33e54e197f0370992138433ef4/numpy-1.23.5-cp310-cp310-win_amd64.whl (14.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.6/14.6 MB 34.1 MB/s eta 0:00:00
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 2.2.5
    Uninstalling numpy-2.2.5:
      Successfully uninstalled numpy-2.2.5
Successfully installed numpy-1.23.5
```

重新运行命令

```bash
pip install box2d-py==2.3.5 cloudpickle==3.1.1 filelock==3.18.0 fsspec==2025.3.2 gym==0.26.2 gym-notices==0.0.8 Jinja2==3.1.6 MarkupSafe==3.0.2 mpmath==1.3.0 networkx==3.4.2 numpy==1.23.5 pillow==11.2.1 pygame==2.1.0 swig==4.3.1 sympy==1.13.1 typing_extensions==4.13.2
```

运行一个简单的 Python 脚本，测试是否可以正常运行 Lunar Lander-v2 环境：

```bash
python tmp.py
```

## 运行

分为训练和测试两种，训练在根目录下使用`python main.py`，测试在根目录下使用`python test.py`，参数使用`--help`查看。

具体参数可以在config文件夹中选择。

## 文件夹与文件解释

```
├── 📂 baseline/            # 决策网络模型
├── 📂 config/              # 运行时的参数配置文件（可修改）
├── 📂 main/                # 策略网络训练
├── 📂 models/              # 训练过程产物
├── 📂 _training_products/	# 训练过程产物
├── 📂 saved_models/        # 精选模型参数
├── 📂 strategy/            # 根据决策网络使用不同策略（拓展性在于一个策略可以使用一个以上的决策网络，但是本次项目不涉及）
├── 📂 test/                # 策略测试
├── 📜 logger_creator.py    # 日志生成器
├── 📜 main.py              # 主训练程序
├── 📜 test.py              # 主测试程序
└── 📜 utils.py             # 工具函数
```

## 小游戏介绍

参考：[官方文档](https://www.gymlibrary.dev/environments/box2d/lunar_lander/)、[一个版本不太对的中文解释](https://zhuanlan.zhihu.com/p/589119622)

在 LunarLander 环境中，动作是离散的，动作空间为 {0, 1, 2, 3}。0 表示什么也不做，1 表示启动左边引擎，2 表示启动主引擎，3 表示启动右边引擎。状态空间是一个向量，其中包含 8 个元素。

```bash
s[0] is the horizontal coordinate
s[1] is the vertical coordinate
s[2] is the horizontal speed
s[3] is the vertical speed
s[4] is the angle
s[5] is the angular speed
s[6] 1 if first leg has contact, else 0
s[7] 1 if second leg has contact, else 0
```

```py
x, y, v_x, v_y, theta, omega, is_left_contact, is_right_contact = env.args
```

## 新增策略

在`strategy/discrete`目录下新增离散环境下的策略，如有模型，在`baseline/<yout model name>`目录下新增模型

在`main`和`test`页面中分别对应策略的训练和测试。