# MindIEAscendDeploy
**Repository Path**: Biz-Spring_0/mindie-ascend-deploy
## Basic Information
- **Project Name**: MindIEAscendDeploy
- **Description**: 适用于昇腾npu 上部署mindie
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 4
- **Created**: 2025-04-23
- **Last Updated**: 2025-04-23
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# deepseek部署atb方案
## 前期准备
```bash
# 下载代码仓
git clone https://gitee.com/kingTLE/mindie-ascend-deploy.git
```
- **设备要求**
- 部署BF16至少需要4台Atlas 800I A2(8*64G)服务器,用W8A8量化则至少需要2台Atlas 800I A2 (8*64G)。
- 驱动固件最低要求: 23.07以上,推荐使用新版本
- os系统内核推荐: 5.10
- **BF16原始权重下载**
| 来源 | R1 | V3 |
| :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| huggingface | [unsloth/DeepSeek-R1-BF16 · HF Mirror](https://hf-mirror.com/unsloth/DeepSeek-R1-BF16)
[unsloth/DeepSeek-R1-BF16 · Hugging Face](https://huggingface.co/unsloth/DeepSeek-R1-BF16) | [unsloth/DeepSeek-V3-bf16 · HF Mirror](https://hf-mirror.com/unsloth/DeepSeek-V3-bf16)
[unsloth/DeepSeek-V3-0324-BF16 · HF Mirror](https://hf-mirror.com/unsloth/DeepSeek-V3-0324-BF16)
[unsloth/DeepSeek-V3-bf16 · Hugging Face](https://huggingface.co/unsloth/DeepSeek-V3-bf16)
|
| modelscope | [DeepSeek-R1-BF16 · 模型库](https://modelscope.cn/models/unsloth/deepseek-R1-bf16/) | [DeepSeek-V3-bf16 · 模型库](https://modelscope.cn/models/unsloth/deepseek-V3-bf16/)
[DeepSeek-V3-0324-BF16 · 模型库](https://modelscope.cn/models/unsloth/DeepSeek-V3-0324-BF16) |
- **配置文件准备 可参考[all_config.yaml](./src/all_config.yaml)**
```yaml
#每台机器通信配置按照组网规划进行修改,如已设置无需修改
#开始ip
start_ip:
#设备网段
network_prefix:
netmask:
#网关
gateway:
#侦测ip
netdetect:
#mindie配置,每台固定一致
httpsEnabled:
port:
worldSize:
modelWeightPath:
modelName:
multiNodesInferEnabled:
interNodeTLSEnabled:
start_device_id:
#容器内可访问到的路径
rank_table_file:
#all_ip[:][0]服务器的IP地址、all_ip[:][1]容器IP地址,建议服务器IP和容器IP一致
#all_ip[0][:]为主节点
all_ip:
- ["", ""]
- ["", ""]
- ["", ""]
- ["", ""]
#日志路径,默认"./deepseek_logs"
logsdir:
```
- **检查机器网络情况 通信配置脚本[set_hccl_ip.sh](./src/set_hccl_ip.sh)**
```bash
# 检查物理链接
for i in {0..7}; do hccn_tool -i $i -lldp -g | grep Ifname; done
# 检查链接情况
for i in {0..7}; do hccn_tool -i $i -link -g ; done
# 检查网络健康情况
for i in {0..7}; do hccn_tool -i $i -net_health -g ; done
# 查看侦测ip的配置是否正确
for i in {0..7}; do hccn_tool -i $i -netdetect -g ; done
# 查看网关是否配置正确
for i in {0..7}; do hccn_tool -i $i -gateway -g ; done
# 检查NPU底层tls校验行为一致性,建议全0
for i in {0..7}; do hccn_tool -i $i -tls -g ; done | grep switch
# NPU底层tls校验行为置0操作
for i in {0..7};do hccn_tool -i $i -tls -s enable 0;done
#获取每张卡的ip地址
for i in {0..7};do hccn_tool -i $i -ip -g; done
#检验卡间通信
for i in {0..7};do for j in {0..7}; do hccn_tool -i $i -ping -g address 192.168.205.${j}; done; done
```
#### rank_table_file生成参考[文档](./create_rank_table.md)
##### 使用示例
```bash
#无共享存储方式,主节点执行,需要各机器root密码
python3 ./src/ssh_rank_table.py --json_file /path/to/your/new_rank_table_file.json --all_config_file ./src/all_config.yaml
#有共享存储方式,4台机器分别执行
python ./src/mk_rank_table.py --server_id xx.xx.xx.1 --server_index 1 --server_count 4 --json_file new_rank_table_file.json
python ./src/mk_rank_table.py --server_id xx.xx.xx.2 --server_index 2 --server_count 4 --json_file new_rank_table_file.json
python ./src/mk_rank_table.py --server_id xx.xx.xx.3 --server_index 3 --server_count 4 --json_file new_rank_table_file.json
python ./src/mk_rank_table.py --server_id xx.xx.xx.4 --server_index 4 --server_count 4 --json_file new_rank_table_file.json
```
##### 输出文件格式
```json
{
"server_count": "...", # 总节点数
# server_list中第一个server为主节点
"server_list": [
{
"device": [
{
"device_id": "...", # 当前卡的本机编号,取值范围[0, 本机卡数)
"device_ip": "...", # 当前卡的ip地址,可通过hccn_tool命令获取
"rank_id": "..." # 当前卡的全局编号,取值范围[0, 总卡数)
},
...
],
"server_id": "...", # 当前节点的ip地址
"container_ip": "..." # 容器ip地址(服务化部署时需要),若无特殊配置,则与server_id相同
},
...
],
"status": "completed",
"version": "1.0"
}
```
## 一键式启动
> [!CAUTION]
>
> 使用**无共享存储方式**即ssh_rank_table.py生成rank_table_file方式可使用**一键式启动[脚本](./ssh_start_mindie.sh),无需手动启动镜像和服务**
>
> ```bash
> vim ./mindie-ascend-deploy/ssh_start_mindie.sh
> #修改配置
> USER="root"
> IMAGE="swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts"
> MOUNT_DIR="/data01/deepseek"
> CONTAINER_DIR="/deepseek"
> CONTAINER_NAME="deepseek_atb"
> #工具路径即:CONTAINER_DIR/MINDIE_ASCEND_DEPLOY_DIR
> MINDIE_ASCEND_DEPLOY_DIR="mindie-ascend-deploy"
> #主节点执行
> bash ./mindie-ascend-deploy/ssh_start_mindie.sh
> ```
## 启动镜像
```bash
#4机依次设置对应all_config.yaml中all_ip[:][1]的容器IP地址,
# hostname -i 节点ip和容器IP一致且回显只有一个ipv4时可用,否则手动设置
MIES_CONTAINER_IP=$(hostname -i)
#4机保持一致
docker run -itd -u 0 -e MIES_CONTAINER_IP=$MIES_CONTAINER_IP --ipc=host --network host \
--name deepseek_atb \
--privileged \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/sbin:/usr/local/sbin \
-v /data01/deepseek:/deepseek \
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts \
/bin/bash
#进入容器 docker exec -it 容器名字或id bash
docker exec -it deepseek_atb bash
#查看环境变量
env |grep MIES_CONTAINER_IP
```
### 启动服务
#### **依赖准备**
```bash
#进入挂载mindie-ascend-deploy的目录下
cd /deepseek/mindie-ascend-deploy
#python依赖
pip install -r ./src/requirements.txt
#jq官方地址: https://jqlang.org/
#方式1
yum install jq
#或
apt-get install jq
#方式2
mv ./jq-linux-arm64 ./jq
cp ./jq /usr/bin/
chmod +x /usr/bin/jq
```
#### 自定义调优配置(可选)
```bash
cd /deepseek/mindie-ascend-deploy
vim ./src/set_env_mindie_all.sh
根据需求调优配置,不填写则不修改原配置:
npuMemSize:
maxSeqLen:
maxPrefillBatchSize:
maxPrefillTokens:
maxIterTimes:
supportSelectBatch:
maxBatchSize:
maxInputTokenLen:
tokenTimeout:
e2eTimeout:
并行策略,旧版镜像可能未支持
tp:
dp:
moe_ep:
moe_tp:
```
#### 自动设置环境变量并启动
**脚本默认加载[all_config.yaml](./src/all_config.yaml)**
```bash
#进入挂载mindie-ascend-deploy的目录下
cd /deepseek/mindie-ascend-deploy
#4机分别启动服务
bash start_mindie.sh
#停止服务
pkill -9 -f 'mindie|python'
```
### 测试请求
- **V3请求**
```bash
curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"inputs": "<|begin▁of▁sentence|><|User|>生抽与老抽的区别?<|Assistant|>", "parameters": {"do_sample": false, "max_new_tokens": 512}, "stream": false}' http://xxx.xxx.xxx.xxx:1025/generate &
```
- **R1带深度思考请求**
```bash
curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"inputs": "<|begin▁of▁sentence|><|User|>生抽与老抽的区别?\nPlease reason step by step, and put your final answer within \boxed{}.<|Assistant|>", "parameters": {"do_sample": false, "max_new_tokens": 512}, "stream": false}' http://xxx.xxx.xxx.xxx:1025/generate &
```
- **R1 openai接口请求 [批量压测脚本](./src/batch_request.sh)**
```bash
#流式输出
curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type:application/json" -X POST -d '{"model":"deepseekr1","messages":[{"role":"user","content":"Eveny morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop aftenwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her $4$ hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her $2$ hours and $24$ minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+\\frac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her,including the $t$ minutes spent in the coffee shop. \nPlease reason step by step, and put your final answer within \boxed{}."}], "stream": true,"max_tokens": 512 }' http://xxx.xxx.xxx.xxx:1025/v1/chat/completions
#健康检查,建议每一小时定时请求防止服务睡眠
curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X GET http://xxx.xxx.xxx.xxx:1026/health
```
- **停止请求参考[批量停止推理文档](./stop_reasoning.md), 脚本[stop_inference.py](./src/stop_inference.py)**
```bash
#发送停止推理
curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"id":"endpoint_common_3087"}' http://xxx.xxx.xxx.xxx:xxxx/v2/models/deepseekr1/stopInfer
```
- **benchmark测试**
```bash
#查看benchmark路径
pip show benchmark mindiebenchmark
#修改权限
chmod 640 /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/*
chmod 640 /usr/local/lib/python3.11/site-packages/mindieclient/python/config/config.json
#修改/usr/local/lib/python3.11/site-packages/mindiebenchmark/config/synthetic_config.json
'''
{
"Input":{
"Method": "uniform",
"Params": {"MinValue": 128, "MaxValue": 128}
},
"Output": {
"Method": "gaussian",
"Params": {"Mean": 2048, "Var": 2048, "MinValue": 2048, "MaxValue": 2048}
},
"RequestCount": 512
}
'''
#启动性能测试
#模式可选vllm_client或openai
TestType=vllm_client
export MINDIE_LOG_TO_STDOUT="benchmark:1; client:1"
benchmark \
--DatasetType "synthetic" \
--ModelName deepseekr1 \
--ModelPath "/deepseek/DeepSeeK-R1-bf16" \
--TestType ${TestType} \
--Http http://xxx.xxx.xxx.xxx:1025 \
--ManagementHttp http://xxx.xxx.xxx.xxx:1026 \
--Concurrency 512 \
--TaskKind stream \
--Tokenizer True \
--SyntheticConfigPath /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/synthetic_config.json
#手动下载mmlu示例
cd /usr/local/Ascend/atb-models/tests/modeltest
wget -P temp_data/mmlu/ https://people.eecs.berkeley.edu/~hendrycks/data.tar
python3 scripts/data_prepare.py --dataset_name mmlu
#或进行申请
git clone https://modelers.cn/MindIE/data.git
#启动精度测试
TestType=vllm_client
benchmark \
--DatasetType "mmlu" \
--DatasetPath "/usr/local/Ascend/atb-models/tests/modeltest/data/mmlu" \
--ModelName deepseekr1 \
--ModelPath "/deepseek/DeepSeeK-R1-bf16" \
--TestType ${TestType} \
--Http http://xxx.xxx.xxx.xxx:1025 \
--ManagementHttp http://xxx.xxx.xxx.xxx:1026 \
--Concurrency 512 \
--MaxOutputLen 8192 \
--TaskKind stream \
--Tokenizer True \
--TestAccuracy True
```
## 问题定位经验
- **使用共享存储盘启动服务化全流程在40分钟内正常 ,期间监控cpu和npu是否正常**
- **cpu内存空余不足**
```bash
#确保无正常进程占用后手动释放,3表示同时丢弃页面缓存、目录项和Inodes缓存
sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
```
- **宿主机禁止开启算力切分**
```bash
#查看是否开启vnpu
ll /dev | grep davinci
#手动释放看到的v开头设备
npu-smi set -t destroy-vnpu -i <切分设备的id> -c <芯片id> -v
#示例:销毁设备0编号0的芯片中编号为103的vNPU设备
npu-smi set -t destroy-vnpu -i 0 -c 0 -v 103
```
- **卡上有内存残留,但无进程**
```bash
#重启npu卡
npu-smi set -t reset -i 0 -c 0
#查询npu告警
npu-smi info -t health -i -c 0
```
- **多机报RPC错误**
```bash
#查看防火墙
sudo systemctl status firewalld
#临时关闭防火墙
sudo systemctl stop firewalld
#或
ufw disable
#关闭iptables
service iptables stop
#开放 multiNodesInferPort 端口
multiNodesInferPort: 1120
```
- **权限问题**
```bash
#无Root权限时,修改模型文件夹属组、权限
chown -R HwHiAiUser:HwHiAiUser /deepseek/DeepSeeK-R1-bf16
chmod -R 640 /deepseek/DeepSeeK-R1-bf16
```
- **模型问题**
```bash
#检查模型文件是否完整,与下载仓对比
sha256sum /deepseek/DeepSeeK-R1-bf16/*
#或
md5sum /deepseek/DeepSeeK-R1-bf16/*
#检查Transformers版本
#DeepSeek-R1:4.46.3,DeepSeek-V3:4.33.1
pip show transformers
```
- **日志抓取**
```bash
awk '/开始时间戳/,/结束时间戳/ {print;}' xxx.log > xxx.log
sed -n '/开始时间戳/,$p' xxx.log > xxx.log
```
- **规避mindie 'ascii' 解码报错:export PYTHONIOENCODING=utf-8**
- **当前版本maxSeqLen和maxPrefillTokens不一致可能出现服务卡住**
- **当前版本使能HCCL_OP_EXPANSION_MODE="AIV"不稳定不建议开启**
- **有报错,优先排查环境、服务配置是否与预期一致,可测试纯模型验证多机配置**
- **可寻找的日志**:打屏日志、/root/mindie/log/debug/ 、/usr/local/Ascend/mindie/latest/mindie-llm/logs 、/root/atb/log 、/root/ascend/log/debug/plog
- **日志开启:**
export ASDOPS_LOG_LEVEL=INFO
export ASDOPS_LOG_TO_FILE=1
export ATB_LOG_TO_FILE=1
export ATB_LOG_LEVEL=INFO
export MINDIE_LLM_LOG_TO_FILE=1
export MINDIE_LOG_TO_STDOUT=1
export MINDIE_LOG_TO_FILE=1
- **开启全量日志(影响性能):**
export ATB_LOG_LEVEL=ERROR
export ATB_LOG_TO_FILE=1
export ATB_LOG_TO_STDOUT=1
export ASDOPS_LOG_LEVEL=ERROR
export ASDOPS_LOG_TO_FILE=1
export ASDOPS_LOG_TO_STDOUT=1
export MINDIE_LOG_TO_STDOUT=1
export MINDIE_LLM_LOG_TO_FILE=1
export MINDIE_LOG_TO_FILE=1
export ASCEND_GLOBAL_LOG_LEVEL=0
export ASCEND_SLOG_PRINT_TO_STDOUT=1
- **可参考的相关文档,关注常见问题**
[MindIE/DeepSeek-R1 | 魔乐社区](https://modelers.cn/models/MindIE/DeepSeek-R1)
[MindIE/LLM/DeepSeek/DeepSeek-R1/README.md · Ascend/ModelZoo-PyTorch - Gitee.com](https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/MindIE/LLM/DeepSeek/DeepSeek-R1/README.md)
[MindIE/LLM/DeepSeek/DeepSeek-V3/README.md · Ascend/ModelZoo-PyTorch - Gitee.com](https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/MindIE/LLM/DeepSeek/DeepSeek-V3/README.md)
[配置参数说明-安装与配置-MindIE LLM开发指南-大模型开发-MindIE1.0.0开发文档-昇腾社区](https://www.hiascend.com/document/detail/zh/mindie/100/mindiellm/llmdev/mindie_llm0004.html)
[推理接口-兼容OpenAI接口-EndPoint业务面RESTful接口-服务化接口-MindIE Service开发指南-服务化集成部署-MindIE1.0.0开发文档-昇腾社区](https://www.hiascend.com/document/detail/zh/mindie/100/mindieservice/servicedev/mindie_service0076.html)