# MindIEAscendDeploy **Repository Path**: Biz-Spring_0/mindie-ascend-deploy ## Basic Information - **Project Name**: MindIEAscendDeploy - **Description**: 适用于昇腾npu 上部署mindie - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 4 - **Created**: 2025-04-23 - **Last Updated**: 2025-04-23 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # deepseek部署atb方案 ## 前期准备 ```bash # 下载代码仓 git clone https://gitee.com/kingTLE/mindie-ascend-deploy.git ``` - **设备要求** - 部署BF16至少需要4台Atlas 800I A2(8*64G)服务器,用W8A8量化则至少需要2台Atlas 800I A2 (8*64G)。 - 驱动固件最低要求: 23.07以上,推荐使用新版本 - os系统内核推荐: 5.10 - **BF16原始权重下载** | 来源 | R1 | V3 | | :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: | | huggingface | [unsloth/DeepSeek-R1-BF16 · HF Mirror](https://hf-mirror.com/unsloth/DeepSeek-R1-BF16)
[unsloth/DeepSeek-R1-BF16 · Hugging Face](https://huggingface.co/unsloth/DeepSeek-R1-BF16) | [unsloth/DeepSeek-V3-bf16 · HF Mirror](https://hf-mirror.com/unsloth/DeepSeek-V3-bf16)
[unsloth/DeepSeek-V3-0324-BF16 · HF Mirror](https://hf-mirror.com/unsloth/DeepSeek-V3-0324-BF16)
[unsloth/DeepSeek-V3-bf16 · Hugging Face](https://huggingface.co/unsloth/DeepSeek-V3-bf16)
| | modelscope | [DeepSeek-R1-BF16 · 模型库](https://modelscope.cn/models/unsloth/deepseek-R1-bf16/) | [DeepSeek-V3-bf16 · 模型库](https://modelscope.cn/models/unsloth/deepseek-V3-bf16/)
[DeepSeek-V3-0324-BF16 · 模型库](https://modelscope.cn/models/unsloth/DeepSeek-V3-0324-BF16) | - **配置文件准备 可参考[all_config.yaml](./src/all_config.yaml)** ```yaml #每台机器通信配置按照组网规划进行修改,如已设置无需修改 #开始ip start_ip: #设备网段 network_prefix: netmask: #网关 gateway: #侦测ip netdetect: #mindie配置,每台固定一致 httpsEnabled: port: worldSize: modelWeightPath: modelName: multiNodesInferEnabled: interNodeTLSEnabled: start_device_id: #容器内可访问到的路径 rank_table_file: #all_ip[:][0]服务器的IP地址、all_ip[:][1]容器IP地址,建议服务器IP和容器IP一致 #all_ip[0][:]为主节点 all_ip: - ["", ""] - ["", ""] - ["", ""] - ["", ""] #日志路径,默认"./deepseek_logs" logsdir: ``` - **检查机器网络情况 通信配置脚本[set_hccl_ip.sh](./src/set_hccl_ip.sh)** ```bash # 检查物理链接 for i in {0..7}; do hccn_tool -i $i -lldp -g | grep Ifname; done # 检查链接情况 for i in {0..7}; do hccn_tool -i $i -link -g ; done # 检查网络健康情况 for i in {0..7}; do hccn_tool -i $i -net_health -g ; done # 查看侦测ip的配置是否正确 for i in {0..7}; do hccn_tool -i $i -netdetect -g ; done # 查看网关是否配置正确 for i in {0..7}; do hccn_tool -i $i -gateway -g ; done # 检查NPU底层tls校验行为一致性,建议全0 for i in {0..7}; do hccn_tool -i $i -tls -g ; done | grep switch # NPU底层tls校验行为置0操作 for i in {0..7};do hccn_tool -i $i -tls -s enable 0;done #获取每张卡的ip地址 for i in {0..7};do hccn_tool -i $i -ip -g; done #检验卡间通信 for i in {0..7};do for j in {0..7}; do hccn_tool -i $i -ping -g address 192.168.205.${j}; done; done ``` #### rank_table_file生成参考[文档](./create_rank_table.md) ##### 使用示例 ```bash #无共享存储方式,主节点执行,需要各机器root密码 python3 ./src/ssh_rank_table.py --json_file /path/to/your/new_rank_table_file.json --all_config_file ./src/all_config.yaml #有共享存储方式,4台机器分别执行 python ./src/mk_rank_table.py --server_id xx.xx.xx.1 --server_index 1 --server_count 4 --json_file new_rank_table_file.json python ./src/mk_rank_table.py --server_id xx.xx.xx.2 --server_index 2 --server_count 4 --json_file new_rank_table_file.json python ./src/mk_rank_table.py --server_id xx.xx.xx.3 --server_index 3 --server_count 4 --json_file new_rank_table_file.json python ./src/mk_rank_table.py --server_id xx.xx.xx.4 --server_index 4 --server_count 4 --json_file new_rank_table_file.json ``` ##### 输出文件格式 ```json { "server_count": "...", # 总节点数 # server_list中第一个server为主节点 "server_list": [ { "device": [ { "device_id": "...", # 当前卡的本机编号,取值范围[0, 本机卡数) "device_ip": "...", # 当前卡的ip地址,可通过hccn_tool命令获取 "rank_id": "..." # 当前卡的全局编号,取值范围[0, 总卡数) }, ... ], "server_id": "...", # 当前节点的ip地址 "container_ip": "..." # 容器ip地址(服务化部署时需要),若无特殊配置,则与server_id相同 }, ... ], "status": "completed", "version": "1.0" } ``` ## 一键式启动 > [!CAUTION] > > 使用**无共享存储方式**即ssh_rank_table.py生成rank_table_file方式可使用**一键式启动[脚本](./ssh_start_mindie.sh),无需手动启动镜像和服务** > > ```bash > vim ./mindie-ascend-deploy/ssh_start_mindie.sh > #修改配置 > USER="root" > IMAGE="swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts" > MOUNT_DIR="/data01/deepseek" > CONTAINER_DIR="/deepseek" > CONTAINER_NAME="deepseek_atb" > #工具路径即:CONTAINER_DIR/MINDIE_ASCEND_DEPLOY_DIR > MINDIE_ASCEND_DEPLOY_DIR="mindie-ascend-deploy" > #主节点执行 > bash ./mindie-ascend-deploy/ssh_start_mindie.sh > ``` ## 启动镜像 ```bash #4机依次设置对应all_config.yaml中all_ip[:][1]的容器IP地址, # hostname -i 节点ip和容器IP一致且回显只有一个ipv4时可用,否则手动设置 MIES_CONTAINER_IP=$(hostname -i) #4机保持一致 docker run -itd -u 0 -e MIES_CONTAINER_IP=$MIES_CONTAINER_IP --ipc=host --network host \ --name deepseek_atb \ --privileged \ --device=/dev/davinci0 \ --device=/dev/davinci1 \ --device=/dev/davinci2 \ --device=/dev/davinci3 \ --device=/dev/davinci4 \ --device=/dev/davinci5 \ --device=/dev/davinci6 \ --device=/dev/davinci7 \ --device=/dev/davinci_manager \ --device=/dev/devmm_svm \ --device=/dev/hisi_hdc \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/sbin:/usr/local/sbin \ -v /data01/deepseek:/deepseek \ swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.0.T3-800I-A2-py311-openeuler24.03-lts \ /bin/bash #进入容器 docker exec -it 容器名字或id bash docker exec -it deepseek_atb bash #查看环境变量 env |grep MIES_CONTAINER_IP ``` ### 启动服务 #### **依赖准备** ```bash #进入挂载mindie-ascend-deploy的目录下 cd /deepseek/mindie-ascend-deploy #python依赖 pip install -r ./src/requirements.txt #jq官方地址: https://jqlang.org/ #方式1 yum install jq #或 apt-get install jq #方式2 mv ./jq-linux-arm64 ./jq cp ./jq /usr/bin/ chmod +x /usr/bin/jq ``` #### 自定义调优配置(可选) ```bash cd /deepseek/mindie-ascend-deploy vim ./src/set_env_mindie_all.sh 根据需求调优配置,不填写则不修改原配置: npuMemSize: maxSeqLen: maxPrefillBatchSize: maxPrefillTokens: maxIterTimes: supportSelectBatch: maxBatchSize: maxInputTokenLen: tokenTimeout: e2eTimeout: 并行策略,旧版镜像可能未支持 tp: dp: moe_ep: moe_tp: ``` #### 自动设置环境变量并启动 **脚本默认加载[all_config.yaml](./src/all_config.yaml)** ```bash #进入挂载mindie-ascend-deploy的目录下 cd /deepseek/mindie-ascend-deploy #4机分别启动服务 bash start_mindie.sh #停止服务 pkill -9 -f 'mindie|python' ``` ### 测试请求 - **V3请求** ```bash curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"inputs": "<|begin▁of▁sentence|><|User|>生抽与老抽的区别?<|Assistant|>", "parameters": {"do_sample": false, "max_new_tokens": 512}, "stream": false}' http://xxx.xxx.xxx.xxx:1025/generate & ``` - **R1带深度思考请求** ```bash curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"inputs": "<|begin▁of▁sentence|><|User|>生抽与老抽的区别?\nPlease reason step by step, and put your final answer within \boxed{}.<|Assistant|>", "parameters": {"do_sample": false, "max_new_tokens": 512}, "stream": false}' http://xxx.xxx.xxx.xxx:1025/generate & ``` - **R1 openai接口请求 [批量压测脚本](./src/batch_request.sh)** ```bash #流式输出 curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type:application/json" -X POST -d '{"model":"deepseekr1","messages":[{"role":"user","content":"Eveny morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop aftenwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her $4$ hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her $2$ hours and $24$ minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+\\frac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her,including the $t$ minutes spent in the coffee shop. \nPlease reason step by step, and put your final answer within \boxed{}."}], "stream": true,"max_tokens": 512 }' http://xxx.xxx.xxx.xxx:1025/v1/chat/completions #健康检查,建议每一小时定时请求防止服务睡眠 curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X GET http://xxx.xxx.xxx.xxx:1026/health ``` - **停止请求参考[批量停止推理文档](./stop_reasoning.md), 脚本[stop_inference.py](./src/stop_inference.py)** ```bash #发送停止推理 curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"id":"endpoint_common_3087"}' http://xxx.xxx.xxx.xxx:xxxx/v2/models/deepseekr1/stopInfer ``` - **benchmark测试** ```bash #查看benchmark路径 pip show benchmark mindiebenchmark #修改权限 chmod 640 /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/* chmod 640 /usr/local/lib/python3.11/site-packages/mindieclient/python/config/config.json #修改/usr/local/lib/python3.11/site-packages/mindiebenchmark/config/synthetic_config.json ''' { "Input":{ "Method": "uniform", "Params": {"MinValue": 128, "MaxValue": 128} }, "Output": { "Method": "gaussian", "Params": {"Mean": 2048, "Var": 2048, "MinValue": 2048, "MaxValue": 2048} }, "RequestCount": 512 } ''' #启动性能测试 #模式可选vllm_client或openai TestType=vllm_client export MINDIE_LOG_TO_STDOUT="benchmark:1; client:1" benchmark \ --DatasetType "synthetic" \ --ModelName deepseekr1 \ --ModelPath "/deepseek/DeepSeeK-R1-bf16" \ --TestType ${TestType} \ --Http http://xxx.xxx.xxx.xxx:1025 \ --ManagementHttp http://xxx.xxx.xxx.xxx:1026 \ --Concurrency 512 \ --TaskKind stream \ --Tokenizer True \ --SyntheticConfigPath /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/synthetic_config.json #手动下载mmlu示例 cd /usr/local/Ascend/atb-models/tests/modeltest wget -P temp_data/mmlu/ https://people.eecs.berkeley.edu/~hendrycks/data.tar python3 scripts/data_prepare.py --dataset_name mmlu #或进行申请 git clone https://modelers.cn/MindIE/data.git #启动精度测试 TestType=vllm_client benchmark \ --DatasetType "mmlu" \ --DatasetPath "/usr/local/Ascend/atb-models/tests/modeltest/data/mmlu" \ --ModelName deepseekr1 \ --ModelPath "/deepseek/DeepSeeK-R1-bf16" \ --TestType ${TestType} \ --Http http://xxx.xxx.xxx.xxx:1025 \ --ManagementHttp http://xxx.xxx.xxx.xxx:1026 \ --Concurrency 512 \ --MaxOutputLen 8192 \ --TaskKind stream \ --Tokenizer True \ --TestAccuracy True ``` ## 问题定位经验 - **使用共享存储盘启动服务化全流程在40分钟内正常 ,期间监控cpu和npu是否正常** - **cpu内存空余不足** ```bash #确保无正常进程占用后手动释放,3表示同时丢弃页面缓存、目录项和Inodes缓存 sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches ``` - **宿主机禁止开启算力切分** ```bash #查看是否开启vnpu ll /dev | grep davinci #手动释放看到的v开头设备 npu-smi set -t destroy-vnpu -i <切分设备的id> -c <芯片id> -v #示例:销毁设备0编号0的芯片中编号为103的vNPU设备 npu-smi set -t destroy-vnpu -i 0 -c 0 -v 103 ``` - **卡上有内存残留,但无进程** ```bash #重启npu卡 npu-smi set -t reset -i 0 -c 0 #查询npu告警 npu-smi info -t health -i -c 0 ``` - **多机报RPC错误** ```bash #查看防火墙 sudo systemctl status firewalld #临时关闭防火墙 sudo systemctl stop firewalld #或 ufw disable #关闭iptables service iptables stop #开放 multiNodesInferPort 端口 multiNodesInferPort: 1120 ``` - **权限问题** ```bash #无Root权限时,修改模型文件夹属组、权限 chown -R HwHiAiUser:HwHiAiUser /deepseek/DeepSeeK-R1-bf16 chmod -R 640 /deepseek/DeepSeeK-R1-bf16 ``` - **模型问题** ```bash #检查模型文件是否完整,与下载仓对比 sha256sum /deepseek/DeepSeeK-R1-bf16/* #或 md5sum /deepseek/DeepSeeK-R1-bf16/* #检查Transformers版本 #DeepSeek-R1:4.46.3,DeepSeek-V3:4.33.1 pip show transformers ``` - **日志抓取** ```bash awk '/开始时间戳/,/结束时间戳/ {print;}' xxx.log > xxx.log sed -n '/开始时间戳/,$p' xxx.log > xxx.log ``` - **规避mindie 'ascii' 解码报错:export PYTHONIOENCODING=utf-8** - **当前版本maxSeqLen和maxPrefillTokens不一致可能出现服务卡住** - **当前版本使能HCCL_OP_EXPANSION_MODE="AIV"不稳定不建议开启** - **有报错,优先排查环境、服务配置是否与预期一致,可测试纯模型验证多机配置** - **可寻找的日志**:打屏日志、/root/mindie/log/debug/ 、/usr/local/Ascend/mindie/latest/mindie-llm/logs 、/root/atb/log 、/root/ascend/log/debug/plog - **日志开启:** export ASDOPS_LOG_LEVEL=INFO export ASDOPS_LOG_TO_FILE=1 export ATB_LOG_TO_FILE=1 export ATB_LOG_LEVEL=INFO export MINDIE_LLM_LOG_TO_FILE=1 export MINDIE_LOG_TO_STDOUT=1 export MINDIE_LOG_TO_FILE=1 - **开启全量日志(影响性能):** export ATB_LOG_LEVEL=ERROR export ATB_LOG_TO_FILE=1 export ATB_LOG_TO_STDOUT=1 export ASDOPS_LOG_LEVEL=ERROR export ASDOPS_LOG_TO_FILE=1 export ASDOPS_LOG_TO_STDOUT=1 export MINDIE_LOG_TO_STDOUT=1 export MINDIE_LLM_LOG_TO_FILE=1 export MINDIE_LOG_TO_FILE=1 export ASCEND_GLOBAL_LOG_LEVEL=0 export ASCEND_SLOG_PRINT_TO_STDOUT=1 - **可参考的相关文档,关注常见问题** [MindIE/DeepSeek-R1 | 魔乐社区](https://modelers.cn/models/MindIE/DeepSeek-R1) [MindIE/LLM/DeepSeek/DeepSeek-R1/README.md · Ascend/ModelZoo-PyTorch - Gitee.com](https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/MindIE/LLM/DeepSeek/DeepSeek-R1/README.md) [MindIE/LLM/DeepSeek/DeepSeek-V3/README.md · Ascend/ModelZoo-PyTorch - Gitee.com](https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/MindIE/LLM/DeepSeek/DeepSeek-V3/README.md) [配置参数说明-安装与配置-MindIE LLM开发指南-大模型开发-MindIE1.0.0开发文档-昇腾社区](https://www.hiascend.com/document/detail/zh/mindie/100/mindiellm/llmdev/mindie_llm0004.html) [推理接口-兼容OpenAI接口-EndPoint业务面RESTful接口-服务化接口-MindIE Service开发指南-服务化集成部署-MindIE1.0.0开发文档-昇腾社区](https://www.hiascend.com/document/detail/zh/mindie/100/mindieservice/servicedev/mindie_service0076.html)