CosyVoice2推理报错：张量形状不匹配

一、问题现象：
执行CosyVoice2推理报错：张量形状不匹配，具体是在注意力机制(Attention)的实现中，在多头注意力计算时，key的张量的头数(14)与kv张量的头数(2)不匹配
![输入图片说明](https://foruda.gitee.com/images/1758768392197669121/4ad89d8b_14652503.png "CosyVoice报错.png")

二、软件版本:

- 操作系统：openEuler 20.03 LTS SP3
- 芯片架构：昇腾310P（Ascend310P3）
- Python版本：3.11.6
- CANN版本： 8.2.RC1

三、测试步骤：
 1. 克隆ModelZoo仓库（无git需通过yum install git安装）
```bash
git clone https://gitee.com/ascend/ModelZoo-PyTorch.git
cd ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/CosyVoice2
```
2. 获取CosyVoice源码及依赖库
```bash
# 克隆CosyVoice源码
git clone https://github.com/FunAudioLLM/CosyVoice
cd CosyVoice
# 切换到指定 commit（确保代码兼容性）
git reset --hard fd45708
# 拉取子模块（Matcha-TTS等）
git submodule update --init --recursive
# 应用平台补丁（platform为300I或800I，根据硬件选择）
export platform=300I
git apply ../${platform}/diff_CosyVoice_${platform}.patch
# 复制推理脚本
cp ../infer.py ./
# 克隆transformers库并切换版本
git clone https://github.com/huggingface/transformers.git
cd transformers
git checkout v4.37.0
cd ..
# 替换qwen2模型文件
mv ../${platform}/modeling_qwen2.py ./transformers/src/transformers/models/qwen2```

3. 基础依赖安装
```bash
# 安装系统工具
yum install -y gcc-c++ libstdc++
dnf install -y sox git wget cmake  # openEuler/CentOS
# 或 Ubuntu/Debian：apt-get install -y sox git wget gcc g++ make cmake
# 安装Python依赖
pip3 install -r ../requirements.txt  #注：需将里面的torchaudio版本改为和torch对应的版本（2.3.1）；torchvision版本也要安装和torch对应的版本（0.18.1）并且如果是tokenizers==0.19.1的话需要加上tokenizers>=0.14,<=0.19（或直接==0.15.1）
```
4. 特殊依赖：WeTextProcessing与pynini（依赖OpenFST）
```bash
# 安装OpenFST（pynini依赖，必须源码编译）
wget https://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.8.3.tar.gz
tar -zxvf openfst-1.8.3.tar.gz
cd openfst-1.8.3
# 启用必要扩展（pynini依赖）
./configure --enable-far --enable-mpdt --enable-pdt
make -j4
make install
# 确认动态库文件存在：
ls /usr/local/lib/libfstmpdtscript.so.26
# 配置动态库路径
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
# 安装pynini（指定版本避免兼容性问题）
pip3 install pynini==2.1.5 --no-cache-dir
# 安装WeTextProcessing
pip3 install WeTextProcessing==1.0.4.1
```
5. 安装msit工具
```bash
# 克隆msit工具仓库
git clone https://gitee.com/ascend/msit.git
cd msit/msit
pip install .

# 安装benchmark和surgeon组件
msit install benchmark
msit install surgeon
cd ..
cd ..
```
6. 克隆权重仓库并切换版本
```bash
# 克隆CosyVoice2-0.5B权重
git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git
cd CosyVoice2-0.5B
# 切换到指定commit（2025年4月前权重）
git checkout 9bd5b08fc085bd93d3f8edb16b67295606290350
```
7. 拉取大文件权重（依赖git-lfs）
```bash
# 安装git-lfs
yum install -y git-lfs  # openEuler/CentOS
# 或 Ubuntu/Debian：apt-get install -y git-lfs
#初始化
git lfs install

# 若没有epel-release源，则需要在线下载安装
curl -sLO https://github.com/git-lfs/git-lfs/releases/download/v3.5.1/git-lfs-linux-arm64-v3.5.1.tar.gz
tar -zxf git-lfs-linux-arm64-v3.5.1.tar.gz
cd git-lfs-3.5.1
./install.sh

# 拉取权重
git lfs pull
```
8. 下载额外spk权重
```bash
wget https://www.modelscope.cn/models/iic/CosyVoice-300M-SFT/resolve/master/spk2info.pt
```
9. 修改ONNX模型结构
```bash
# 回到CosyVoice2目录
cd ..
cd ..
# 执行修改脚本（参数为权重目录）
python3 modify_onnx.py ./CosyVoice/CosyVoice2-0.5B/
# 生成修改后的模型：./CosyVoice2-0.5B/speech_token_md.onnx
```
10. 配置昇腾环境变量
```bash
# 加载昇腾工具链环境
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# 确认芯片型号
export soc_version=Ascend310P3
```
11. 使用ATC工具转换模型
```bash
# 转换speech_token模型
atc --framework=5 \
--soc_version=$soc_version \
--model ./CosyVoice2-0.5B/speech_token_md.onnx \
--output ./CosyVoice2-0.5B/speech \
--input_shape="feats:1,128,-1;feats_length:1" \
--precision_mode allow_fp32_to_fp16
# 转换flow.decoder模型（动态shape）
atc --framework=5 \
--soc_version=$soc_version \
--model ./CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx \
--output ./CosyVoice2-0.5B/flow \
--input_shape="x:2,80,-1;mask:2,1,-1;mu:2,80,-1;t:2;spks:2,80;cond:2,80,-1"
# 转换分档模型（流式输出用）
atc --framework=5 \
--soc_version=$soc_version \
--model ./CosyVoice2-0.5B/flow.decoder.estimator.fp32.onnx \
--output ./CosyVoice2-0.5B/flow_static \
--input_shape="x:2,80,-1;mask:2,1,-1;mu:2,80,-1;t:2;spks:2,80;cond:2,80,-1" \
--dynamic_dims="100,100,100,100;200,200,200,200;300,300,300,300;400,400,400,400;500,500,500,500;600,600,600,600;700,700,700,700" \
--input_format=ND
```

12. 配置推理环境变量
```bash
# 指定NPU设备
export ASCEND_RT_VISIBLE_DEVICES=1
# 添加依赖库路径
export PYTHONPATH=third_party/Matcha-TTS:$PYTHONPATH
export PYTHONPATH=transformers/src:$PYTHONPATH
```

13. 执行推理
```bash
# 流式输出推理（结果保存为sft_i.wav）
python3 infer.py --model_path=./CosyVoice2-0.5B --stream_out
```

四、日志信息:
.....Exception in thread Thread-3 (llm_job):
Traceback (most recent call last):
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/_utils/error_code.py", line 43, in wapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/core/_backend.py", line 123, in compile
    return super(TorchNpuGraph, self).compile()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: EZ9999: Inner Error!
EZ9999: [PID: 144022] 2025-08-15-16:36:34.010.925 numHeads:14 of key must be equal to numHeads:2 of kv when 310P.[FUNC:CheckInputFormatAndLimits][FILE:incre_flash_attention_tiling_check.cc][LINE:199]
        TraceBack (most recent call last):
       ConvertNodeToTilingContext or RunTiling fail[FUNC:CalculateWorkspace][FILE:fused_infer_attention_score_tiling_host.cpp][LINE:1116]
       FIA get workspace fail[FUNC:GenExtCalcParam][FILE:fused_infer_attention_score_tiling_host.cpp][LINE:1241]
       [GenTask][CalcExtOpRunningParam] Op[IncreFlashAttention][IncreFlashAttention] failed calc exe running param.[FUNC:CalcExtOpRunningParam][FILE:aicore_ops_kernel_builder.cc][LINE:156]
       [GenTask][CalcOpRunningParam] CalcExtOpRunningParam failed.[FUNC:CalcOpRunningParam][FILE:aicore_ops_kernel_builder.cc][LINE:135]
       Call Calculate op:IncreFlashAttention(IncreFlashAttention) running param failed[FUNC:CalcOpParam][FILE:graph_builder.cc][LINE:210]
       [Call][PreRun] Failed, graph_id:1, session_id:1.[FUNC:CompileGraph][FILE:graph_manager.cc][LINE:4512]
       [Compile][Graph]Compile graph failed, error code:1343225857, session_id:1, graph_id:1.[FUNC:CompileGraph][FILE:ge_api.cc][LINE:1239]

Ascend/ModelZoo-PyTorch

内容风险标识

评论 (1)

Ascend/ModelZoo-PyTorch .gitee-modal { width: 500px !important; }

内容风险标识