lerobot模型跑训练时报MaxPoolGradWithArgmaxV1算子不支持

一、问题现象（附报错日志上下文）：
[W416 07:01:14.684393345 compiler_depend.ts:87] Warning: [Check][offset] Check input storage_offset[%ld] = 0 failed, result is untrustworthy262144 (function operator())
.........[W416 07:02:43.752708400 compiler_depend.ts:342] Warning: EZ3003: [PID: 3698324] 2025-04-16-07:02:43.997.733 No supported Ops kernel and engine are found for [MaxPoolGradWithArgmaxV1309], optype [MaxPoolGradWithArgmaxV1].
Possible Cause: The operator is not supported by the system. Therefore, no hit is found in any operator information library.
Solution: 1. Check that the OPP component is installed properly. 2. Submit an issue to request for the support of this operator type.
TraceBack (most recent call last):
Assert ((SelectEngine(node_ptr, exclude_engines, is_check_support_success, op_info)) == ge::SUCCESS) failed[FUNC:operator()][FILE:engine_place.cc][LINE:148]
RunAllSubgraphs failed, graph=online.[FUNC:RunAllSubgraphs][FILE:engine_place.cc][LINE:122]
build graph failed, graph id:308, ret:4294967295[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1618]
[Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
[Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
(function ExecFunc)
Traceback (most recent call last):
File "/home/brainco/cmr/lerobot/lerobot/scripts/train.py", line 290, in <module>
train()
File "/home/brainco/cmr/lerobot/lerobot/configs/parser.py", line 227, in wrapper_inner
response = fn(cfg, *args, **kwargs)
File "/home/brainco/cmr/lerobot/lerobot/scripts/train.py", line 214, in train
train_tracker, output_dict = update_policy(
File "/home/brainco/cmr/lerobot/lerobot/scripts/train.py", line 103, in update_policy
train_metrics.loss = loss.item()
RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is MaxPoolGradWithArgmaxV1.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2025-04-16-07:02:44 (PID:3698324, Device:0, RankID:-1) ERR00100 PTA call acl api failed
Traceback (most recent call last):
File "/home/brainco/cmr/lerobot/lerobot/scripts/train.py", line 290, in <module>
train()
File "/home/brainco/cmr/lerobot/lerobot/configs/parser.py", line 227, in wrapper_inner
response = fn(cfg, *args, **kwargs)
File "/home/brainco/cmr/lerobot/lerobot/scripts/train.py", line 214, in train
train_tracker, output_dict = update_policy(
File "/home/brainco/cmr/lerobot/lerobot/scripts/train.py", line 103, in update_policy
train_metrics.loss = loss.item()
RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is MaxPoolGradWithArgmaxV1.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2025-04-16-07:02:44 (PID:3698324, Device:0, RankID:-1) ERR00100 PTA call acl api failed
wandb:
wandb: 🚀 View run npu_act_koch_task at: https://wandb.ai/zhouchengbang-aksuru/lerobot/runs/co1zycm9

二、软件版本:
-- CANN 版本  8.0.0
--Pytorch版本: 2.4.0
--Python 版本 ：3.10.16
--操作系统版本 ：Ubuntu 22.04.5 LTS

三、测试步骤：
参考官方文档https://github.com/huggingface/lerobot
执训练脚本 ：
python lerobot/scripts/train.py   --dataset.repo_id=test/koch_task   --policy.type=act   --output_dir=outputs/train/npu_act_koch_task   --job_name=npu_act_koch_task   --policy.device=cuda   --wandb.enable=true

四、日志信息:

Ascend/ModelZoo-PyTorch

内容风险标识

评论 (0)

Ascend/ModelZoo-PyTorch .gitee-modal { width: 500px !important; }

内容风险标识