diff --git a/README.md b/README.md index 62179922d2c9134386652f590d02abbaa0331eca..3808eb5efc98ef85c9be9f2fa53ef506d34b62a3 100644 --- a/README.md +++ b/README.md @@ -84,14 +84,14 @@ out, argmax = scatter_max(updates, indices, out) ``` . ├── kernels # 算子实现 -│ ├── op_host -│ ├── op_kernel +│ ├── op_host +│ ├── op_kernel │ └── CMakeLists.txt ├── onnx_plugin # onnx框架适配层 ├── mx_driving │ ├── __init__.py │ ├── csrc # 加速库API适配层 -│ └── ... +│ └── ... ├── model_examples # 自动驾驶模型示例 │ └── BEVFormer # BEVFormer模型示例 ├── ci # ci脚本 @@ -162,7 +162,7 @@ out, argmax = scatter_max(updates, indices, out) | HiVT | https://gitee.com/ascend/DrivingSDK/tree/master/model_examples/HiVT |N| | MagicDriveDiT | https://gitee.com/ascend/DrivingSDK/tree/master/model_examples/MagicDriveDiT |N| | SparseDrive | https://gitee.com/ascend/DrivingSDK/tree/master/model_examples/SparseDrive |N| - +| Diffusion-Planner | https://gitee.com/ascend/DrivingSDK/tree/master/model_examples/Diffusion-Planner |N| # 支持的产品型号 - Atlas A2 训练系列产品 @@ -232,7 +232,8 @@ Driving SDK代码中包含公网地址声明如下表所示: | 开源引入 | https://gitee.com/it-monkey/protocolbuffers.git | ci/docker/X86/build_protobuf.sh | https://gitee.com/it-monkey/protocolbuffers.git | 用于构建 protobuf | | 开源引入 | https://storage.googleapis.com/coco-dataset/external/PASCAL_VOC.zip | model_examples/CenterNet/CenterNet.patch | https://storage.googleapis.com/coco-dataset/external/PASCAL_VOC.zip | 源模型失效数据下载链接 | | 开源引入 | https://s3.amazonaws.com/images.cocodataset.org/external/external_PASCAL_VOC.zip | model_examples/CenterNet/CenterNet.patch | https://s3.amazonaws.com/images.cocodataset.org/external/external_PASCAL_VOC.zip | 模型必要数据下载链接 | - +| 开源引入 | https://download.pytorch.org/whl/torch_stable.html | model_examples/Diffusion-Planner/diffusionPlanner.patch | https://download.pytorch.org/whl/torch_stable.html | 模型依赖包下载 | +| 开源引入 | https://pypi.tuna.tsinghua.edu.cn/simple | model_examples/Diffusion-Planner/diffusionPlanner.patch | https://pypi.tuna.tsinghua.edu.cn/simple | 模型依赖包下载 | ## 公开接口声明 参考[API清单](./docs/api/README.md),Driving SDK提供了对外的自定义接口。如果一个函数在文档中有展示,则该接口是公开接口。否则,使用该功能前可以在社区询问该功能是否确实是公开的或意外暴露的接口,因为这些未暴露接口将来可能会被修改或者删除。 diff --git a/model_examples/Diffusion-Planner/README.md b/model_examples/Diffusion-Planner/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0829f3cc35c048c149e37ab0a34ae29c6341e03e --- /dev/null +++ b/model_examples/Diffusion-Planner/README.md @@ -0,0 +1,179 @@ +# Diffusion-Planner + +## 目录 + +- [简介](#简介) + - [模型介绍](#模型介绍) + - [支持任务列表](#支持任务列表) + - [代码实现](#代码实现) +- [Diffusion-Planner](#Diffusion-Planner) + - [准备训练环境](#准备训练环境) + - [快速开始](#快速开始) +- [变更说明](#变更说明) + +# 简介 + +## 模型介绍 + +本文提出了一种基于Transformer的Diffusion Planner模型,用于解决开放复杂环境中自动驾驶的规划难题。该模型通过扩散生成技术,有效建模多模态驾驶行为,无需依赖规则后处理即可保障轨迹质量。其创新点包括:1)统一架构联合建模预测与规划任务,促进车辆协同;2)采用分类器引导机制学习轨迹评分梯度,实现安全自适应规划。在大规模nuPlan基准和200小时配送车辆实测数据上的实验表明,该模型在闭环性能与驾驶风格迁移性方面均达到SOTA水平,显著超越传统模仿学习方法。本仓库针对Diffusion-Planner模型进行了昇腾NPU适配,并且提供了适配Patch,方便用户在NPU上进行模型训练。 + +## 支持任务列表 + +本仓已经支持以下模型任务类型 + +| 模型 | 任务列表 | 是否支持 | +| :---------: | :------: | :------: | +| Diffusion-Planner | 训练 | ✔ | + +## 代码实现 + +- 参考实现: + ``` + url=https://github.com/ZhengYinan-AIR/Diffusion-Planner + commit 62196099b6e969f532be0ac0b20d1a236ebbfd19 + ``` + +- 适配昇腾 AI 处理器的实现: + ``` + url=https://gitee.com/ascend/DrivingSDK.git + code_path=model_examples/Diffusion-Planner + ``` + +# Diffusion-Planner (在研版本) + +## 准备训练环境 + +### 安装昇腾环境 + +请参考昇腾社区中《[Pytorch框架训练环境准备](https://www.hiascend.com/document/detail/zh/ModelZoo/pytorchframework/ptes)》文档搭建昇腾环境,本仓已支持表1中软件版本。 + +**表 1** 昇腾软件版本支持表 + +| 软件类型 | 支持版本 | +| :---------------: | :------: | +| FrameworkPTAdapter | 7.1.0 | +| CANN | 8.2.RC1 | + + +### 安装模型环境 + +**表 2** 三方库版本支持表 + +| 三方库 | 支持版本 | +| :-----: | :------: | +| PyTorch | 2.1.0 | + + +0. 激活 CANN 环境 + 将 CANN 包目录记作 cann_root_dir,执行以下命令以激活环境 + ``` + source {cann_root_dir}/set_env.sh + ``` + +1. 创建conda环境 + ``` + conda create -n diffusion_planner python=3.9 + conda activate diffusion_planner + ``` + +2. 安装 nuplan-devkit + ``` + git clone https://github.com/motional/nuplan-devkit.git && cd nuplan-devkit + pip install -e . + pip install -r requirements.txt + ``` + +3. 安装 diffusion_planner + ``` + cd .. + git clone https://github.com/ZhengYinan-AIR/Diffusion-Planner.git && cd Diffusion-Planner + cp -f ../diffusionPlanner.patch . + cp -rf ../test . + git checkout 62196099b6e969f532be0ac0b20d1a236ebbfd19 + git apply --reject --whitespace=fix diffusionPlanner.patch + pip install -e . + pip install -r requirements_torch.txt + ``` + +4. 安装Driving SDK加速库 + ``` + git clone https://gitee.com/ascend/DrivingSDK.git -b master + cd mx_driving + bash ci/build.sh --python=3.9 + cd dist + pip3 install mx_driving-1.0.0+git{commit_id}-cp{python_version}-linux_{arch}.whl + ``` + + +### 准备数据集 + +1. 下载[NuPlan数据集](https://www.nuscenes.org/nuplan#download),并将数据集结构排布成如下格式: + ``` + ~/nuplan + └── dataset + ├── maps + │ ├── nuplan-maps-v1.0.json + │ ├── sg-one-north + │ │ └── 9.17.1964 + │ │ └── map.gpkg + │ ├── us-ma-boston + │ │ └── 9.12.1817 + │ │ └── map.gpkg + │ ├── us-nv-las-vegas-strip + │ │ └── 9.15.1915 + │ │ └── map.gpkg + │ └── us-pa-pittsburgh-hazelwood + │ └── 9.17.1937 + │ └── map.gpkg + └── nuplan-v1.1 + ├── splits + ├── mini + │ ├── 2021.05.12.22.00.38_veh-35_01008_01518.db + │ ├── 2021.06.09.17.23.18_veh-38_00773_01140.db + │ ├── ... + │ └── 2021.10.11.08.31.07_veh-50_01750_01948.db + └── trainval + ├── 2021.05.12.22.00.38_veh-35_01008_01518.db + ├── 2021.06.09.17.23.18_veh-38_00773_01140.db + ├── ... + └── 2021.10.11.08.31.07_veh-50_01750_01948.db + + ``` +2. 数据预处理 + 在 data_process.sh 脚本中替换数据路径后运行 + ``` + chmod +x data_process.sh + ./data_process.sh + ``` + +## 快速开始 +本任务主要提供**单机8卡**的训练脚本。在训练前,需要在torch_run.sh文件中修改对应路径信息。 +### 开始训练 + +- 单机8卡性能 + + ``` + bash test/train_8p_performance.sh + ``` + +- 单机8卡精度 + + ``` + bash test/train_8p_full.sh + ``` + +### 训练结果 +- 单机8卡 + +| NAME | Precision | Epoch | global_batch_size | loss | FPS | +|-------------|-------------------|-----------------|---------------|--------------|--------------|--------------| +| 8p-竞品A | FP32 | 30 | 2048 | 0.1631 | 5304.32 | +| 8p-Atlas 800T A2 | FP32 | 30 | 2048 | 0.1619 | 4935.68 | + +*该结果基于 train_boston 数据集的训练得出,未使用完整数据集进行训练。 + +# 变更说明 + +2025.06.12:首次发布。 + + diff --git a/model_examples/Diffusion-Planner/diffusionPlanner.patch b/model_examples/Diffusion-Planner/diffusionPlanner.patch new file mode 100644 index 0000000000000000000000000000000000000000..a292e4f69113768b64c7599ea6754e3359403885 --- /dev/null +++ b/model_examples/Diffusion-Planner/diffusionPlanner.patch @@ -0,0 +1,70 @@ +diff --git a/diffusion_planner/model/module/decoder.py b/diffusion_planner/model/module/decoder.py +index c17e793..adc2e7a 100644 +--- a/diffusion_planner/model/module/decoder.py ++++ b/diffusion_planner/model/module/decoder.py +@@ -34,7 +34,7 @@ class Decoder(nn.Module): + self._state_normalizer: StateNormalizer = config.state_normalizer + self._observation_normalizer: ObservationNormalizer = config.observation_normalizer + +- self._guidance_fn = config.guidance_fn ++ self._guidance_fn = None + + @property + def sde(self): +diff --git a/requirements_torch.txt b/requirements_torch.txt +index d34f3fc..b99b032 100644 +--- a/requirements_torch.txt ++++ b/requirements_torch.txt +@@ -1,8 +1,7 @@ + --find-links https://download.pytorch.org/whl/torch_stable.html + --index-url https://pypi.tuna.tsinghua.edu.cn/simple +-torch==2.0.0+cu118 +-torchvision==0.15.1+cu118 ++torchvision==0.15.1 + pytorch_lightning==2.0.1 + tensorboard==2.11.2 + timm==1.0.10 +-mmengine +\ No newline at end of file ++mmengine +diff --git a/train_predictor.py b/train_predictor.py +index b5d399c..9ea0772 100644 +--- a/train_predictor.py ++++ b/train_predictor.py +@@ -1,6 +1,8 @@ + import os + import torch + import argparse ++import torch_npu ++from torch_npu.contrib import transfer_to_npu + from torch import optim + from timm.utils import ModelEma + from torch.utils.data import DataLoader, DistributedSampler +@@ -160,7 +162,7 @@ def model_training(args): + diffusion_planner = diffusion_planner.to(rank if args.device == 'cuda' else args.device) + + if args.ddp: +- diffusion_planner = DDP(diffusion_planner, device_ids=[rank]) ++ diffusion_planner = DDP(diffusion_planner, device_ids=[rank], find_unused_parameters=True) + + if args.use_ema: + model_ema = ModelEma( +diff --git a/torch_run.sh b/torch_run.sh +old mode 100755 +new mode 100644 +index d65800b..7c1b446 +--- a/torch_run.sh ++++ b/torch_run.sh +@@ -10,7 +10,10 @@ TRAIN_SET_PATH="REPLACE_WITH_TRAIN_SET_PATH" # preprocess data using data_proces + TRAIN_SET_LIST_PATH="REPLACE_WITH_TRAIN_SET_LIST_PATH" + ################################### + +-sudo -E $RUN_PYTHON_PATH -m torch.distributed.run --nnodes 1 --nproc-per-node 8 --standalone train_predictor.py \ ++epochs=$1 ++ ++$RUN_PYTHON_PATH -m torch.distributed.run --nnodes 1 --nproc-per-node 8 --standalone train_predictor.py \ + --train_set $TRAIN_SET_PATH \ + --train_set_list $TRAIN_SET_LIST_PATH \ +- ++--batch_size 2048 \ ++--train_epochs $epochs diff --git a/model_examples/Diffusion-Planner/test/env_npu.sh b/model_examples/Diffusion-Planner/test/env_npu.sh new file mode 100644 index 0000000000000000000000000000000000000000..8b6382e032a8217c814a159a18feda03c579cf2d --- /dev/null +++ b/model_examples/Diffusion-Planner/test/env_npu.sh @@ -0,0 +1,28 @@ +#!/bin/bash + +#将Host日志输出到串口,0-关闭/1-开启 +export ASCEND_SLOG_PRINT_TO_STDOUT=0 +#设置默认日志级别,0-debug/1-info/2-warning/3-error +export ASCEND_GLOBAL_LOG_LEVEL=3 +#设置Event日志开启标志,0-关闭/1-开启 +export ASCEND_GLOBAL_EVENT_ENABLE=0 +#设置是否开启taskque,0-关闭/1-开启/2-流水优化 +export TASK_QUEUE_ENABLE=2 +#设置是否开启均匀绑核,0-关闭/1-开启粗粒度绑核/2-开启细粒度绑核 +export CPU_AFFINITY_CONF=1 +# 设置是否开启 combined 标志, 0-关闭/1-开启 +export COMBINED_ENABLE=1 + + +#设置device侧日志登记为error +msnpureport -g error -d 0 +msnpureport -g error -d 1 +msnpureport -g error -d 2 +msnpureport -g error -d 3 +msnpureport -g error -d 4 +msnpureport -g error -d 5 +msnpureport -g error -d 6 +msnpureport -g error -d 7 +#关闭Device侧Event日志 +msnpureport -e disable + diff --git a/model_examples/Diffusion-Planner/test/train_8p.sh b/model_examples/Diffusion-Planner/test/train_8p.sh new file mode 100644 index 0000000000000000000000000000000000000000..c9e30f205af7a4cf0054920c03b89996ede97e43 --- /dev/null +++ b/model_examples/Diffusion-Planner/test/train_8p.sh @@ -0,0 +1,76 @@ +# 网络名称,同目录名称,需要模型审视修改 +Network="DiffusionPlanner" +batch_size=2048 +world_size=8 +epoch=30 + +# cd到与test文件夹同层级目录下执行脚本,提高兼容性;test_path_dir为包含test文件夹的路径 +cur_path=$(pwd) +cur_path_last_dirname=${cur_path##*/} +if [ x"${cur_path_last_dirname}" == x"test" ]; then + test_path_dir=${cur_path} + cd .. + cur_path=$(pwd) +else + test_path_dir=${cur_path}/test +fi + +source ${test_path_dir}/env_npu.sh + +#创建DeviceID输出目录,不需要修改 +output_path=${cur_path}/test/output/ + +mkdir -p ${output_path} + + +#训练开始时间,不需要修改 +start_time=$(date +%s) +bash torch_run.sh ${epoch} > ${test_path_dir}/output/train_full_8p_base_fp32.log 2>&1 & + +wait +#训练结束时间,不需要修改 +end_time=$(date +%s) +e2e_time=$(($end_time - $start_time)) + +cd .. + +#结果打印,不需要修改 +echo "------------------ Final result ------------------" + +#获取性能数据,不需要修改 +#单迭代训练时长,不需要修改 +TrainingTime=$(grep -o "[0-9.]*batch/s" ${test_path_dir}/output/train_full_8p_base_fp32.log | tail -n 16 | grep -o "[0-9.]*" | awk '{sum += $1} END {print sum/NR}') + + +#吞吐量 +ActualFPS=$(awk BEGIN'{print ('$batch_size') * '$TrainingTime'}') + +#打印,不需要修改 +echo "Final Performance images/sec : $ActualFPS" + +#loss值,不需要修改 +ActualLoss=$(grep -o "epoch train loss: [0-9.]*" ${test_path_dir}/output/train_full_8p_base_fp32.log | awk 'END {print $NF}') + +#打印,不需要修改 +echo "Final Train Loss : ${ActualLoss}" +echo "E2E Training Duration sec : $e2e_time" + +#性能看护结果汇总 +#训练用例信息,不需要修改 +BatchSize=${batch_size} +WORLD_SIZE=${world_size} +DeviceType=$(uname -m) +CaseName=${Network}_bs${BatchSize}_${WORLD_SIZE}'p'_'acc' + +#关键信息打印到${CaseName}.log中,不需要修改 +echo "Network = ${Network}" >${test_path_dir}/output/${CaseName}.log +echo "RankSize = ${WORLD_SIZE}" >>${test_path_dir}/output/${CaseName}.log +echo "BatchSize = ${BatchSize}" >>${test_path_dir}/output/${CaseName}.log +echo "DeviceType = ${DeviceType}" >>${test_path_dir}/output/${CaseName}.log +echo "CaseName = ${CaseName}" >>${test_path_dir}/output/${CaseName}.log +echo "ActualFPS = ${ActualFPS}" >>${test_path_dir}/output/${CaseName}.log +echo "TrainingTime = ${TrainingTime}" >>${test_path_dir}/output/${CaseName}.log +echo "ActualLoss = ${ActualLoss}" >>${test_path_dir}/output/${CaseName}.log +echo "NDS = ${NDS}" >>${test_path_dir}/output/${CaseName}.log +echo "mAP = ${mAP}" >>${test_path_dir}/output/${CaseName}.log +echo "E2ETrainingTime = ${e2e_time}" >>${test_path_dir}/output/${CaseName}.log diff --git a/model_examples/Diffusion-Planner/test/train_8p_performance.sh b/model_examples/Diffusion-Planner/test/train_8p_performance.sh new file mode 100644 index 0000000000000000000000000000000000000000..dd01a657c9bd3add0845a97f9a1ca0170bd58c4f --- /dev/null +++ b/model_examples/Diffusion-Planner/test/train_8p_performance.sh @@ -0,0 +1,77 @@ +# 网络名称,同目录名称,需要模型审视修改 +Network="DiffusionPlanner" +batch_size=2048 +world_size=8 +epoch=20 + +# cd到与test文件夹同层级目录下执行脚本,提高兼容性;test_path_dir为包含test文件夹的路径 +cur_path=$(pwd) +cur_path_last_dirname=${cur_path##*/} +if [ x"${cur_path_last_dirname}" == x"test" ]; then + test_path_dir=${cur_path} + cd .. + cur_path=$(pwd) +else + test_path_dir=${cur_path}/test +fi + +source ${test_path_dir}/env_npu.sh + +#创建DeviceID输出目录,不需要修改 +output_path=${cur_path}/test/output/ + +mkdir -p ${output_path} + + +#训练开始时间,不需要修改 +start_time=$(date +%s) +bash torch_run.sh ${epoch} > ${test_path_dir}/output/train_full_8p_base_fp32.log 2>&1 & + +wait +#训练结束时间,不需要修改 +end_time=$(date +%s) +e2e_time=$(($end_time - $start_time)) + +cd .. + +#结果打印,不需要修改 +echo "------------------ Final result ------------------" + +#获取性能数据,不需要修改 +#单迭代训练时长,不需要修改 +TrainingTime=$(grep -o "[0-9.]*batch/s" ${test_path_dir}/output/train_full_8p_base_fp32.log | tail -n 16 | grep -o "[0-9.]*" | awk '{sum += $1} END {print sum/NR}') + + +#吞吐量 +ActualFPS=$(awk BEGIN'{print ('$batch_size') * '$TrainingTime'}') + +#打印,不需要修改 +echo "Final Performance images/sec : $ActualFPS" + +#loss值,不需要修改 +ActualLoss=$(grep -o "epoch train loss: [0-9.]*" ${test_path_dir}/output/train_full_8p_base_fp32.log | awk 'END {print $NF}') + +#打印,不需要修改 +echo "Final Train Loss : ${ActualLoss}" +echo "E2E Training Duration sec : $e2e_time" + +#性能看护结果汇总 +#训练用例信息,不需要修改 +BatchSize=${batch_size} +WORLD_SIZE=${world_size} +DeviceType=$(uname -m) +CaseName=${Network}_bs${BatchSize}_${WORLD_SIZE}'p'_'acc' + +#关键信息打印到${CaseName}.log中,不需要修改 +echo "Network = ${Network}" >${test_path_dir}/output/${CaseName}.log +echo "RankSize = ${WORLD_SIZE}" >>${test_path_dir}/output/${CaseName}.log +echo "BatchSize = ${BatchSize}" >>${test_path_dir}/output/${CaseName}.log +echo "DeviceType = ${DeviceType}" >>${test_path_dir}/output/${CaseName}.log +echo "CaseName = ${CaseName}" >>${test_path_dir}/output/${CaseName}.log +echo "ActualFPS = ${ActualFPS}" >>${test_path_dir}/output/${CaseName}.log +echo "TrainingTime = ${TrainingTime}" >>${test_path_dir}/output/${CaseName}.log +echo "ActualLoss = ${ActualLoss}" >>${test_path_dir}/output/${CaseName}.log +echo "NDS = ${NDS}" >>${test_path_dir}/output/${CaseName}.log +echo "mAP = ${mAP}" >>${test_path_dir}/output/${CaseName}.log +echo "E2ETrainingTime = ${e2e_time}" >>${test_path_dir}/output/${CaseName}.log +