# MindIEAscendDeploy

**Repository Path**: Biz-Spring_0/mindie-ascend-deploy

## Basic Information

- **Project Name**: MindIEAscendDeploy
- **Description**: 适用于昇腾npu 上部署mindie
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 4
- **Created**: 2025-04-23
- **Last Updated**: 2025-04-23

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# deepseek部署atb方案

## 前期准备

```bash
# 下载代码仓
git clone https://gitee.com/kingTLE/mindie-ascend-deploy.git
```

- **设备要求**
  - 部署BF16至少需要4台Atlas 800I A2（8*64G）服务器，用W8A8量化则至少需要2台Atlas 800I A2 (8*64G)。
  - 驱动固件最低要求: 23.07以上,推荐使用新版本
  - os系统内核推荐: 5.10

- **BF16原始权重下载**

|    来源     |                              R1                              |                              V3                              |
| :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| huggingface | [unsloth/DeepSeek-R1-BF16 · HF Mirror](https://hf-mirror.com/unsloth/DeepSeek-R1-BF16) <br />[unsloth/DeepSeek-R1-BF16 · Hugging Face](https://huggingface.co/unsloth/DeepSeek-R1-BF16) | [unsloth/DeepSeek-V3-bf16 · HF Mirror](https://hf-mirror.com/unsloth/DeepSeek-V3-bf16) <br />[unsloth/DeepSeek-V3-0324-BF16 · HF Mirror](https://hf-mirror.com/unsloth/DeepSeek-V3-0324-BF16) <br />[unsloth/DeepSeek-V3-bf16 · Hugging Face](https://huggingface.co/unsloth/DeepSeek-V3-bf16) <br /> |
| modelscope  | [DeepSeek-R1-BF16 · 模型库](https://modelscope.cn/models/unsloth/deepseek-R1-bf16/) | [DeepSeek-V3-bf16 · 模型库](https://modelscope.cn/models/unsloth/deepseek-V3-bf16/) <br />[DeepSeek-V3-0324-BF16 · 模型库](https://modelscope.cn/models/unsloth/DeepSeek-V3-0324-BF16) |

- **配置文件准备 可参考[all_config.yaml](./src/all_config.yaml)**

```yaml
#每台机器通信配置按照组网规划进行修改,如已设置无需修改
#开始ip
start_ip:
#设备网段
network_prefix:
netmask:
#网关
gateway:
#侦测ip
netdetect:
  
#mindie配置，每台固定一致
httpsEnabled:
port:
worldSize:
modelWeightPath:
modelName:
multiNodesInferEnabled:
interNodeTLSEnabled:
start_device_id:
#容器内可访问到的路径
rank_table_file:
#all_ip[:][0]服务器的IP地址、all_ip[:][1]容器IP地址，建议服务器IP和容器IP一致
#all_ip[0][:]为主节点
all_ip:
  - ["", ""]
  - ["", ""]
  - ["", ""]
  - ["", ""]

#日志路径,默认"./deepseek_logs"
logsdir: 
```

- **检查机器网络情况 通信配置脚本[set_hccl_ip.sh](./src/set_hccl_ip.sh)**

```bash
# 检查物理链接
for i in {0..7}; do hccn_tool -i $i -lldp -g | grep Ifname; done 
# 检查链接情况
for i in {0..7}; do hccn_tool -i $i -link -g ; done
# 检查网络健康情况
for i in {0..7}; do hccn_tool -i $i -net_health -g ; done
# 查看侦测ip的配置是否正确
for i in {0..7}; do hccn_tool -i $i -netdetect -g ; done
# 查看网关是否配置正确
for i in {0..7}; do hccn_tool -i $i -gateway -g ; done
# 检查NPU底层tls校验行为一致性，建议全0
for i in {0..7}; do hccn_tool -i $i -tls -g ; done | grep switch
# NPU底层tls校验行为置0操作
for i in {0..7};do hccn_tool -i $i -tls -s enable 0;done
#获取每张卡的ip地址
for i in {0..7};do hccn_tool -i $i -ip -g; done
#检验卡间通信
for i in {0..7};do for j in {0..7}; do hccn_tool -i $i -ping -g address 192.168.205.${j}; done; done
```

#### rank_table_file生成参考[文档](./create_rank_table.md)

##### 使用示例

```bash
#无共享存储方式，主节点执行，需要各机器root密码
python3 ./src/ssh_rank_table.py --json_file /path/to/your/new_rank_table_file.json --all_config_file ./src/all_config.yaml

#有共享存储方式，4台机器分别执行
python ./src/mk_rank_table.py  --server_id xx.xx.xx.1 --server_index 1 --server_count 4 --json_file new_rank_table_file.json
python ./src/mk_rank_table.py  --server_id xx.xx.xx.2 --server_index 2 --server_count 4 --json_file new_rank_table_file.json
python ./src/mk_rank_table.py  --server_id xx.xx.xx.3 --server_index 3 --server_count 4 --json_file new_rank_table_file.json
python ./src/mk_rank_table.py  --server_id xx.xx.xx.4 --server_index 4 --server_count 4 --json_file new_rank_table_file.json

```

##### 输出文件格式

```json
{
   "server_count": "...", # 总节点数
   # server_list中第一个server为主节点
   "server_list": [
      {
         "device": [
            {
               "device_id": "...", # 当前卡的本机编号，取值范围[0, 本机卡数)
               "device_ip": "...", # 当前卡的ip地址，可通过hccn_tool命令获取
               "rank_id": "..." # 当前卡的全局编号，取值范围[0, 总卡数)
            },
            ...
         ],
         "server_id": "...", # 当前节点的ip地址
         "container_ip": "..." # 容器ip地址（服务化部署时需要），若无特殊配置，则与server_id相同
      },
      ...
   ],
   "status": "completed",
   "version": "1.0"
}
```

## 一键式启动

> [!CAUTION]
>
> 使用**无共享存储方式**即ssh_rank_table.py生成rank_table_file方式可使用**一键式启动[脚本](./ssh_start_mindie.sh)，无需手动启动镜像和服务**
>
> ```bash
> vim ./mindie-ascend-deploy/ssh_start_mindie.sh
> #修改配置
> USER="root"
> IMAGE="swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts"
> MOUNT_DIR="/data01/deepseek"
> CONTAINER_DIR="/deepseek"
> CONTAINER_NAME="deepseek_atb"
> #工具路径即：CONTAINER_DIR/MINDIE_ASCEND_DEPLOY_DIR
> MINDIE_ASCEND_DEPLOY_DIR="mindie-ascend-deploy"
> #主节点执行
> bash ./mindie-ascend-deploy/ssh_start_mindie.sh
> ```

## 启动镜像

```bash
#4机依次设置对应all_config.yaml中all_ip[:][1]的容器IP地址,
# hostname -i 节点ip和容器IP一致且回显只有一个ipv4时可用，否则手动设置
MIES_CONTAINER_IP=$(hostname -i)

#4机保持一致
docker run -itd -u 0 -e MIES_CONTAINER_IP=$MIES_CONTAINER_IP --ipc=host  --network host \
--name  deepseek_atb \
--privileged \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/sbin:/usr/local/sbin \
-v /data01/deepseek:/deepseek \
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts \
/bin/bash

#进入容器 docker exec -it 容器名字或id bash
docker exec -it deepseek_atb bash
#查看环境变量
env |grep MIES_CONTAINER_IP
```

### 启动服务

####  **依赖准备**

```bash
#进入挂载mindie-ascend-deploy的目录下
cd /deepseek/mindie-ascend-deploy
#python依赖
pip install -r ./src/requirements.txt

#jq官方地址： https://jqlang.org/
#方式1
yum install jq
#或
apt-get install jq
#方式2
mv ./jq-linux-arm64 ./jq
cp ./jq /usr/bin/
chmod +x /usr/bin/jq
```

#### 自定义调优配置（可选）

```bash
cd /deepseek/mindie-ascend-deploy
vim ./src/set_env_mindie_all.sh

根据需求调优配置,不填写则不修改原配置: 
npuMemSize:
maxSeqLen:
maxPrefillBatchSize:
maxPrefillTokens:
maxIterTimes:
supportSelectBatch:
maxBatchSize:
maxInputTokenLen:
tokenTimeout:
e2eTimeout:
并行策略,旧版镜像可能未支持
tp:
dp:
moe_ep:
moe_tp:
```

#### 自动设置环境变量并启动
**脚本默认加载[all_config.yaml](./src/all_config.yaml)**

```bash
#进入挂载mindie-ascend-deploy的目录下
cd /deepseek/mindie-ascend-deploy
#4机分别启动服务
bash start_mindie.sh

#停止服务
pkill -9 -f 'mindie|python'
```

### 测试请求

- **V3请求**

```bash
curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"inputs": "<｜begin▁of▁sentence｜><｜User｜>生抽与老抽的区别？<｜Assistant｜>", "parameters": {"do_sample": false, "max_new_tokens": 512}, "stream": false}' http://xxx.xxx.xxx.xxx:1025/generate &
```

- **R1带深度思考请求**

```bash
curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"inputs": "<｜begin▁of▁sentence｜><｜User｜>生抽与老抽的区别？\nPlease reason step by step, and put your final answer within \boxed{}.<｜Assistant｜>", "parameters": {"do_sample": false, "max_new_tokens": 512}, "stream": false}' http://xxx.xxx.xxx.xxx:1025/generate &
```

- **R1 openai接口请求 [批量压测脚本](./src/batch_request.sh)**

```bash
#流式输出
curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type:application/json" -X POST -d '{"model":"deepseekr1","messages":[{"role":"user","content":"Eveny morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop aftenwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her $4$ hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her $2$ hours and $24$ minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+\\frac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her,including the $t$ minutes spent in the coffee shop. \nPlease reason step by step, and put your final answer within \boxed{}."}], "stream": true,"max_tokens": 512 }' http://xxx.xxx.xxx.xxx:1025/v1/chat/completions

#健康检查,建议每一小时定时请求防止服务睡眠
curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X GET  http://xxx.xxx.xxx.xxx:1026/health
```

- **停止请求参考[批量停止推理文档](./stop_reasoning.md)， 脚本[stop_inference.py](./src/stop_inference.py)**

```bash
#发送停止推理
curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"id":"endpoint_common_3087"}' http://xxx.xxx.xxx.xxx:xxxx/v2/models/deepseekr1/stopInfer
```

- **benchmark测试**

```bash
#查看benchmark路径
pip show benchmark mindiebenchmark

#修改权限
chmod 640 /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/*
chmod 640 /usr/local/lib/python3.11/site-packages/mindieclient/python/config/config.json

#修改/usr/local/lib/python3.11/site-packages/mindiebenchmark/config/synthetic_config.json

'''
{
    "Input":{
        "Method": "uniform",
        "Params": {"MinValue": 128, "MaxValue": 128}
    },
    "Output": {
        "Method": "gaussian",
        "Params": {"Mean": 2048, "Var": 2048, "MinValue": 2048, "MaxValue": 2048}
    },
    "RequestCount": 512
}
'''

#启动性能测试
#模式可选vllm_client或openai
TestType=vllm_client
export MINDIE_LOG_TO_STDOUT="benchmark:1; client:1"
benchmark \
        --DatasetType "synthetic" \
        --ModelName deepseekr1 \
        --ModelPath "/deepseek/DeepSeeK-R1-bf16" \
        --TestType ${TestType} \
        --Http http://xxx.xxx.xxx.xxx:1025 \
        --ManagementHttp http://xxx.xxx.xxx.xxx:1026 \
        --Concurrency 512 \
        --TaskKind stream \
        --Tokenizer True \
        --SyntheticConfigPath /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/synthetic_config.json

#手动下载mmlu示例
cd /usr/local/Ascend/atb-models/tests/modeltest
wget -P temp_data/mmlu/ https://people.eecs.berkeley.edu/~hendrycks/data.tar
python3 scripts/data_prepare.py --dataset_name mmlu
#或进行申请
git clone https://modelers.cn/MindIE/data.git

#启动精度测试
TestType=vllm_client
benchmark \
        --DatasetType "mmlu" \
        --DatasetPath "/usr/local/Ascend/atb-models/tests/modeltest/data/mmlu" \
        --ModelName deepseekr1 \
        --ModelPath "/deepseek/DeepSeeK-R1-bf16" \
        --TestType ${TestType} \
        --Http http://xxx.xxx.xxx.xxx:1025 \
        --ManagementHttp http://xxx.xxx.xxx.xxx:1026 \
        --Concurrency 512  \
        --MaxOutputLen 8192 \
        --TaskKind stream \
        --Tokenizer True \
        --TestAccuracy True 
```

## 问题定位经验

- **使用共享存储盘启动服务化全流程在40分钟内正常 ，期间监控cpu和npu是否正常**

- **cpu内存空余不足**

  ```bash
  #确保无正常进程占用后手动释放，3表示同时丢弃页面缓存、目录项和Inodes缓存
  sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
  ```

- **宿主机禁止开启算力切分**

  ```bash
  #查看是否开启vnpu
  ll /dev | grep davinci
  #手动释放看到的v开头设备
  npu-smi set -t destroy-vnpu  -i <切分设备的id> -c <芯片id> -v <vNPU的id>
  #示例：销毁设备0编号0的芯片中编号为103的vNPU设备
  npu-smi set -t destroy-vnpu -i 0 -c 0 -v 103
  ```

- **卡上有内存残留，但无进程**

  ```bash
  #重启npu卡
  npu-smi set -t reset -i 0 -c 0 
  #查询npu告警
  npu-smi info -t health -i <card_idx> -c 0 
  ```

- **多机报RPC错误**

  ```bash
  #查看防火墙
  sudo systemctl status firewalld
  #临时关闭防火墙
  sudo systemctl stop firewalld
  #或
  ufw disable
  #关闭iptables
  service iptables stop
  #开放 multiNodesInferPort 端口
  multiNodesInferPort: 1120
  ```
  
- **权限问题**

  ```bash
  #无Root权限时，修改模型文件夹属组、权限
  chown -R HwHiAiUser:HwHiAiUser /deepseek/DeepSeeK-R1-bf16
  chmod -R 640 /deepseek/DeepSeeK-R1-bf16
  ```
  
- **模型问题**

  ```bash
  #检查模型文件是否完整，与下载仓对比
  sha256sum /deepseek/DeepSeeK-R1-bf16/*
  #或
  md5sum /deepseek/DeepSeeK-R1-bf16/*
    
  #检查Transformers版本
  #DeepSeek-R1：4.46.3，DeepSeek-V3：4.33.1
  pip show transformers
  ```

- **日志抓取**

  ```bash
  awk '/开始时间戳/,/结束时间戳/ {print;}' xxx.log > xxx.log
   
  sed -n '/开始时间戳/,$p' xxx.log > xxx.log
  ```
  
- **规避mindie 'ascii' 解码报错：export PYTHONIOENCODING=utf-8**

- **当前版本maxSeqLen和maxPrefillTokens不一致可能出现服务卡住**

- **当前版本使能HCCL_OP_EXPANSION_MODE="AIV"不稳定不建议开启**

- **有报错，优先排查环境、服务配置是否与预期一致，可测试纯模型验证多机配置**

- **可寻找的日志**：打屏日志、/root/mindie/log/debug/ 、/usr/local/Ascend/mindie/latest/mindie-llm/logs 、/root/atb/log 、/root/ascend/log/debug/plog

- **日志开启：**
  export ASDOPS_LOG_LEVEL=INFO
  export ASDOPS_LOG_TO_FILE=1
  export ATB_LOG_TO_FILE=1
  export ATB_LOG_LEVEL=INFO
  export MINDIE_LLM_LOG_TO_FILE=1
  export MINDIE_LOG_TO_STDOUT=1
  export MINDIE_LOG_TO_FILE=1 
  
- **开启全量日志（影响性能）：**

  export ATB_LOG_LEVEL=ERROR
  export ATB_LOG_TO_FILE=1
  export ATB_LOG_TO_STDOUT=1
  export ASDOPS_LOG_LEVEL=ERROR
  export ASDOPS_LOG_TO_FILE=1
  export ASDOPS_LOG_TO_STDOUT=1
  export MINDIE_LOG_TO_STDOUT=1
  export MINDIE_LLM_LOG_TO_FILE=1 
  export MINDIE_LOG_TO_FILE=1
  export ASCEND_GLOBAL_LOG_LEVEL=0
  export ASCEND_SLOG_PRINT_TO_STDOUT=1

- **可参考的相关文档，关注常见问题**
  
  [MindIE/DeepSeek-R1 | 魔乐社区](https://modelers.cn/models/MindIE/DeepSeek-R1)

  [MindIE/LLM/DeepSeek/DeepSeek-R1/README.md · Ascend/ModelZoo-PyTorch - Gitee.com](https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/MindIE/LLM/DeepSeek/DeepSeek-R1/README.md)
  
  [MindIE/LLM/DeepSeek/DeepSeek-V3/README.md · Ascend/ModelZoo-PyTorch - Gitee.com](https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/MindIE/LLM/DeepSeek/DeepSeek-V3/README.md)
  
  [配置参数说明-安装与配置-MindIE LLM开发指南-大模型开发-MindIE1.0.0开发文档-昇腾社区](https://www.hiascend.com/document/detail/zh/mindie/100/mindiellm/llmdev/mindie_llm0004.html)
  
  [推理接口-兼容OpenAI接口-EndPoint业务面RESTful接口-服务化接口-MindIE Service开发指南-服务化集成部署-MindIE1.0.0开发文档-昇腾社区](https://www.hiascend.com/document/detail/zh/mindie/100/mindieservice/servicedev/mindie_service0076.html)