登录
注册
开源
企业版
高校版
搜索
帮助中心
使用条款
关于我们
开源
企业版
高校版
私有云
模力方舟
登录
注册
代码拉取完成,页面将自动刷新
捐赠
捐赠前请先登录
取消
前往登录
扫描微信二维码支付
取消
支付完成
支付提示
将跳转至支付宝完成支付
确定
取消
Watch
不关注
关注所有动态
仅关注版本发行动态
关注但不提醒动态
57
Star
569
Fork
1.4K
Ascend
/
ModelZoo-PyTorch
代码
Issues
498
Pull Requests
198
Wiki
统计
流水线
服务
质量分析
Jenkins for Gitee
腾讯云托管
腾讯云 Serverless
悬镜安全
阿里云 SAE
Codeblitz
SBOM
我知道了,不再自动展开
更新失败,请稍后重试!
移除标识
内容风险标识
本任务被
标识为内容中包含有代码安全 Bug 、隐私泄露等敏感信息,仓库外成员不可访问
CosyVoice2推理报错:张量形状不匹配
TODO
#ICZNMC
需求
hbhdlxx08
创建于
2025-09-25 11:09
一、问题现象: 执行CosyVoice2推理报错:张量形状不匹配,具体是在注意力机制(Attention)的实现中,在多头注意力计算时,key的张量的头数(14)与kv张量的头数(2)不匹配  二、软件版本: - 操作系统:openEuler 20.03 LTS SP3 - 芯片架构:昇腾310P(Ascend310P3) - Python版本:3.11.6 - CANN版本: 8.2.RC1 三、测试步骤: 1. 克隆ModelZoo仓库(无git需通过yum install git安装) ```bash git clone https://gitee.com/ascend/ModelZoo-PyTorch.git cd ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/CosyVoice2 ``` 2. 获取CosyVoice源码及依赖库 ```bash # 克隆CosyVoice源码 git clone https://github.com/FunAudioLLM/CosyVoice cd CosyVoice # 切换到指定 commit(确保代码兼容性) git reset --hard fd45708 # 拉取子模块(Matcha-TTS等) git submodule update --init --recursive # 应用平台补丁(platform为300I或800I,根据硬件选择) export platform=300I git apply ../${platform}/diff_CosyVoice_${platform}.patch # 复制推理脚本 cp ../infer.py ./ # 克隆transformers库并切换版本 git clone https://github.com/huggingface/transformers.git cd transformers git checkout v4.37.0 cd .. # 替换qwen2模型文件 mv ../${platform}/modeling_qwen2.py ./transformers/src/transformers/models/qwen2``` 3. 基础依赖安装 ```bash # 安装系统工具 yum install -y gcc-c++ libstdc++ dnf install -y sox git wget cmake # openEuler/CentOS # 或 Ubuntu/Debian:apt-get install -y sox git wget gcc g++ make cmake # 安装Python依赖 pip3 install -r ../requirements.txt #注:需将里面的torchaudio版本改为和torch对应的版本(2.3.1);torchvision版本也要安装和torch对应的版本(0.18.1)并且如果是tokenizers==0.19.1的话需要加上tokenizers>=0.14,<=0.19(或直接==0.15.1) ``` 4. 特殊依赖:WeTextProcessing与pynini(依赖OpenFST) ```bash # 安装OpenFST(pynini依赖,必须源码编译) wget https://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.8.3.tar.gz tar -zxvf openfst-1.8.3.tar.gz cd openfst-1.8.3 # 启用必要扩展(pynini依赖) ./configure --enable-far --enable-mpdt --enable-pdt make -j4 make install # 确认动态库文件存在: ls /usr/local/lib/libfstmpdtscript.so.26 # 配置动态库路径 export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH # 安装pynini(指定版本避免兼容性问题) pip3 install pynini==2.1.5 --no-cache-dir # 安装WeTextProcessing pip3 install WeTextProcessing==1.0.4.1 ``` 5. 安装msit工具 ```bash # 克隆msit工具仓库 git clone https://gitee.com/ascend/msit.git cd msit/msit pip install . # 安装benchmark和surgeon组件 msit install benchmark msit install surgeon cd .. cd .. ``` 6. 克隆权重仓库并切换版本 ```bash # 克隆CosyVoice2-0.5B权重 git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git cd CosyVoice2-0.5B # 切换到指定commit(2025年4月前权重) git checkout 9bd5b08fc085bd93d3f8edb16b67295606290350 ``` 7. 拉取大文件权重(依赖git-lfs) ```bash # 安装git-lfs yum install -y git-lfs # openEuler/CentOS # 或 Ubuntu/Debian:apt-get install -y git-lfs #初始化 git lfs install # 若没有epel-release源,则需要在线下载安装 curl -sLO https://github.com/git-lfs/git-lfs/releases/download/v3.5.1/git-lfs-linux-arm64-v3.5.1.tar.gz tar -zxf git-lfs-linux-arm64-v3.5.1.tar.gz cd git-lfs-3.5.1 ./install.sh # 拉取权重 git lfs pull ``` 8. 下载额外spk权重 ```bash wget https://www.modelscope.cn/models/iic/CosyVoice-300M-SFT/resolve/master/spk2info.pt ``` 9. 修改ONNX模型结构 ```bash # 回到CosyVoice2目录 cd .. cd .. # 执行修改脚本(参数为权重目录) python3 modify_onnx.py ./CosyVoice/CosyVoice2-0.5B/ # 生成修改后的模型:./CosyVoice2-0.5B/speech_token_md.onnx ``` 10. 配置昇腾环境变量 ```bash # 加载昇腾工具链环境 source /usr/local/Ascend/ascend-toolkit/set_env.sh # 确认芯片型号 export soc_version=Ascend310P3 ``` 11. 使用ATC工具转换模型 ```bash # 转换speech_token模型 atc --framework=5 \ --soc_version=$soc_version \ --model ./CosyVoice2-0.5B/speech_token_md.onnx \ --output ./CosyVoice2-0.5B/speech \ --input_shape="feats:1,128,-1;feats_length:1" \ --precision_mode allow_fp32_to_fp16 # 转换flow.decoder模型(动态shape) atc --framework=5 \ --soc_version=$soc_version \ --model ./CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx \ --output ./CosyVoice2-0.5B/flow \ --input_shape="x:2,80,-1;mask:2,1,-1;mu:2,80,-1;t:2;spks:2,80;cond:2,80,-1" # 转换分档模型(流式输出用) atc --framework=5 \ --soc_version=$soc_version \ --model ./CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx \ --output ./CosyVoice2-0.5B/flow_static \ --input_shape="x:2,80,-1;mask:2,1,-1;mu:2,80,-1;t:2;spks:2,80;cond:2,80,-1" \ --dynamic_dims="100,100,100,100;200,200,200,200;300,300,300,300;400,400,400,400;500,500,500,500;600,600,600,600;700,700,700,700" \ --input_format=ND ``` 12. 配置推理环境变量 ```bash # 指定NPU设备 export ASCEND_RT_VISIBLE_DEVICES=1 # 添加依赖库路径 export PYTHONPATH=third_party/Matcha-TTS:$PYTHONPATH export PYTHONPATH=transformers/src:$PYTHONPATH ``` 13. 执行推理 ```bash # 流式输出推理(结果保存为sft_i.wav) python3 infer.py --model_path=./CosyVoice2-0.5B --stream_out ``` 四、日志信息: .....Exception in thread Thread-3 (llm_job): Traceback (most recent call last): File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/_utils/error_code.py", line 43, in wapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/core/_backend.py", line 123, in compile return super(TorchNpuGraph, self).compile() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: EZ9999: Inner Error! EZ9999: [PID: 144022] 2025-08-15-16:36:34.010.925 numHeads:14 of key must be equal to numHeads:2 of kv when 310P.[FUNC:CheckInputFormatAndLimits][FILE:incre_flash_attention_tiling_check.cc][LINE:199] TraceBack (most recent call last): ConvertNodeToTilingContext or RunTiling fail[FUNC:CalculateWorkspace][FILE:fused_infer_attention_score_tiling_host.cpp][LINE:1116] FIA get workspace fail[FUNC:GenExtCalcParam][FILE:fused_infer_attention_score_tiling_host.cpp][LINE:1241] [GenTask][CalcExtOpRunningParam] Op[IncreFlashAttention][IncreFlashAttention] failed calc exe running param.[FUNC:CalcExtOpRunningParam][FILE:aicore_ops_kernel_builder.cc][LINE:156] [GenTask][CalcOpRunningParam] CalcExtOpRunningParam failed.[FUNC:CalcOpRunningParam][FILE:aicore_ops_kernel_builder.cc][LINE:135] Call Calculate op:IncreFlashAttention(IncreFlashAttention) running param failed[FUNC:CalcOpParam][FILE:graph_builder.cc][LINE:210] [Call][PreRun] Failed, graph_id:1, session_id:1.[FUNC:CompileGraph][FILE:graph_manager.cc][LINE:4512] [Compile][Graph]Compile graph failed, error code:1343225857, session_id:1, graph_id:1.[FUNC:CompileGraph][FILE:ge_api.cc][LINE:1239]
一、问题现象: 执行CosyVoice2推理报错:张量形状不匹配,具体是在注意力机制(Attention)的实现中,在多头注意力计算时,key的张量的头数(14)与kv张量的头数(2)不匹配  二、软件版本: - 操作系统:openEuler 20.03 LTS SP3 - 芯片架构:昇腾310P(Ascend310P3) - Python版本:3.11.6 - CANN版本: 8.2.RC1 三、测试步骤: 1. 克隆ModelZoo仓库(无git需通过yum install git安装) ```bash git clone https://gitee.com/ascend/ModelZoo-PyTorch.git cd ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/CosyVoice2 ``` 2. 获取CosyVoice源码及依赖库 ```bash # 克隆CosyVoice源码 git clone https://github.com/FunAudioLLM/CosyVoice cd CosyVoice # 切换到指定 commit(确保代码兼容性) git reset --hard fd45708 # 拉取子模块(Matcha-TTS等) git submodule update --init --recursive # 应用平台补丁(platform为300I或800I,根据硬件选择) export platform=300I git apply ../${platform}/diff_CosyVoice_${platform}.patch # 复制推理脚本 cp ../infer.py ./ # 克隆transformers库并切换版本 git clone https://github.com/huggingface/transformers.git cd transformers git checkout v4.37.0 cd .. # 替换qwen2模型文件 mv ../${platform}/modeling_qwen2.py ./transformers/src/transformers/models/qwen2``` 3. 基础依赖安装 ```bash # 安装系统工具 yum install -y gcc-c++ libstdc++ dnf install -y sox git wget cmake # openEuler/CentOS # 或 Ubuntu/Debian:apt-get install -y sox git wget gcc g++ make cmake # 安装Python依赖 pip3 install -r ../requirements.txt #注:需将里面的torchaudio版本改为和torch对应的版本(2.3.1);torchvision版本也要安装和torch对应的版本(0.18.1)并且如果是tokenizers==0.19.1的话需要加上tokenizers>=0.14,<=0.19(或直接==0.15.1) ``` 4. 特殊依赖:WeTextProcessing与pynini(依赖OpenFST) ```bash # 安装OpenFST(pynini依赖,必须源码编译) wget https://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.8.3.tar.gz tar -zxvf openfst-1.8.3.tar.gz cd openfst-1.8.3 # 启用必要扩展(pynini依赖) ./configure --enable-far --enable-mpdt --enable-pdt make -j4 make install # 确认动态库文件存在: ls /usr/local/lib/libfstmpdtscript.so.26 # 配置动态库路径 export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH # 安装pynini(指定版本避免兼容性问题) pip3 install pynini==2.1.5 --no-cache-dir # 安装WeTextProcessing pip3 install WeTextProcessing==1.0.4.1 ``` 5. 安装msit工具 ```bash # 克隆msit工具仓库 git clone https://gitee.com/ascend/msit.git cd msit/msit pip install . # 安装benchmark和surgeon组件 msit install benchmark msit install surgeon cd .. cd .. ``` 6. 克隆权重仓库并切换版本 ```bash # 克隆CosyVoice2-0.5B权重 git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git cd CosyVoice2-0.5B # 切换到指定commit(2025年4月前权重) git checkout 9bd5b08fc085bd93d3f8edb16b67295606290350 ``` 7. 拉取大文件权重(依赖git-lfs) ```bash # 安装git-lfs yum install -y git-lfs # openEuler/CentOS # 或 Ubuntu/Debian:apt-get install -y git-lfs #初始化 git lfs install # 若没有epel-release源,则需要在线下载安装 curl -sLO https://github.com/git-lfs/git-lfs/releases/download/v3.5.1/git-lfs-linux-arm64-v3.5.1.tar.gz tar -zxf git-lfs-linux-arm64-v3.5.1.tar.gz cd git-lfs-3.5.1 ./install.sh # 拉取权重 git lfs pull ``` 8. 下载额外spk权重 ```bash wget https://www.modelscope.cn/models/iic/CosyVoice-300M-SFT/resolve/master/spk2info.pt ``` 9. 修改ONNX模型结构 ```bash # 回到CosyVoice2目录 cd .. cd .. # 执行修改脚本(参数为权重目录) python3 modify_onnx.py ./CosyVoice/CosyVoice2-0.5B/ # 生成修改后的模型:./CosyVoice2-0.5B/speech_token_md.onnx ``` 10. 配置昇腾环境变量 ```bash # 加载昇腾工具链环境 source /usr/local/Ascend/ascend-toolkit/set_env.sh # 确认芯片型号 export soc_version=Ascend310P3 ``` 11. 使用ATC工具转换模型 ```bash # 转换speech_token模型 atc --framework=5 \ --soc_version=$soc_version \ --model ./CosyVoice2-0.5B/speech_token_md.onnx \ --output ./CosyVoice2-0.5B/speech \ --input_shape="feats:1,128,-1;feats_length:1" \ --precision_mode allow_fp32_to_fp16 # 转换flow.decoder模型(动态shape) atc --framework=5 \ --soc_version=$soc_version \ --model ./CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx \ --output ./CosyVoice2-0.5B/flow \ --input_shape="x:2,80,-1;mask:2,1,-1;mu:2,80,-1;t:2;spks:2,80;cond:2,80,-1" # 转换分档模型(流式输出用) atc --framework=5 \ --soc_version=$soc_version \ --model ./CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx \ --output ./CosyVoice2-0.5B/flow_static \ --input_shape="x:2,80,-1;mask:2,1,-1;mu:2,80,-1;t:2;spks:2,80;cond:2,80,-1" \ --dynamic_dims="100,100,100,100;200,200,200,200;300,300,300,300;400,400,400,400;500,500,500,500;600,600,600,600;700,700,700,700" \ --input_format=ND ``` 12. 配置推理环境变量 ```bash # 指定NPU设备 export ASCEND_RT_VISIBLE_DEVICES=1 # 添加依赖库路径 export PYTHONPATH=third_party/Matcha-TTS:$PYTHONPATH export PYTHONPATH=transformers/src:$PYTHONPATH ``` 13. 执行推理 ```bash # 流式输出推理(结果保存为sft_i.wav) python3 infer.py --model_path=./CosyVoice2-0.5B --stream_out ``` 四、日志信息: .....Exception in thread Thread-3 (llm_job): Traceback (most recent call last): File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/_utils/error_code.py", line 43, in wapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/core/_backend.py", line 123, in compile return super(TorchNpuGraph, self).compile() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: EZ9999: Inner Error! EZ9999: [PID: 144022] 2025-08-15-16:36:34.010.925 numHeads:14 of key must be equal to numHeads:2 of kv when 310P.[FUNC:CheckInputFormatAndLimits][FILE:incre_flash_attention_tiling_check.cc][LINE:199] TraceBack (most recent call last): ConvertNodeToTilingContext or RunTiling fail[FUNC:CalculateWorkspace][FILE:fused_infer_attention_score_tiling_host.cpp][LINE:1116] FIA get workspace fail[FUNC:GenExtCalcParam][FILE:fused_infer_attention_score_tiling_host.cpp][LINE:1241] [GenTask][CalcExtOpRunningParam] Op[IncreFlashAttention][IncreFlashAttention] failed calc exe running param.[FUNC:CalcExtOpRunningParam][FILE:aicore_ops_kernel_builder.cc][LINE:156] [GenTask][CalcOpRunningParam] CalcExtOpRunningParam failed.[FUNC:CalcOpRunningParam][FILE:aicore_ops_kernel_builder.cc][LINE:135] Call Calculate op:IncreFlashAttention(IncreFlashAttention) running param failed[FUNC:CalcOpParam][FILE:graph_builder.cc][LINE:210] [Call][PreRun] Failed, graph_id:1, session_id:1.[FUNC:CompileGraph][FILE:graph_manager.cc][LINE:4512] [Compile][Graph]Compile graph failed, error code:1343225857, session_id:1, graph_id:1.[FUNC:CompileGraph][FILE:ge_api.cc][LINE:1239]
评论 (
1
)
登录
后才可以发表评论
状态
TODO
TODO
WIP
DONE
CLOSED
REJECTED
负责人
未设置
标签
未设置
项目
未立项任务
未立项任务
里程碑
未关联里程碑
未关联里程碑
Pull Requests
未关联
未关联
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
未关联
未关联
master
ci-pipeline
开始日期   -   截止日期
-
置顶选项
不置顶
置顶等级:高
置顶等级:中
置顶等级:低
优先级
不指定
严重
主要
次要
不重要
预计工期
(小时)
参与者(1)
Python
1
https://gitee.com/ascend/ModelZoo-PyTorch.git
git@gitee.com:ascend/ModelZoo-PyTorch.git
ascend
ModelZoo-PyTorch
ModelZoo-PyTorch
点此查找更多帮助
搜索帮助
Git 命令在线学习
如何在 Gitee 导入 GitHub 仓库
Git 仓库基础操作
企业版和社区版功能对比
SSH 公钥设置
如何处理代码冲突
仓库体积过大,如何减小?
如何找回被删除的仓库数据
Gitee 产品配额说明
GitHub仓库快速导入Gitee及同步更新
什么是 Release(发行版)
将 PHP 项目自动发布到 packagist.org
仓库举报
回到顶部
登录提示
该操作需登录 Gitee 帐号,请先登录后再操作。
立即登录
没有帐号,去注册