diff --git a/docs/vllm_mindspore/docs/source_en/faqs/faqs.md b/docs/vllm_mindspore/docs/source_en/faqs/faqs.md index f6a038ba90bf5b9ab57639527ed77aa857eac220..c7812014083e9d2a80af4d33c4aa6b87b3cdccaa 100644 --- a/docs/vllm_mindspore/docs/source_en/faqs/faqs.md +++ b/docs/vllm_mindspore/docs/source_en/faqs/faqs.md @@ -50,7 +50,7 @@ load_ckpt_format: "safetensors" ``` -### `aclnnNonzeroV2` Related Error When Starting Online Service +### `aclnnNonzeroV2` Related Error When Starting Online Inference - Key error message: diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md index d09dd30010dc036732270214dd2cb3fe46397ef2..08f21d571859fcd30ee819916809b98483eb9cea 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md @@ -32,27 +32,35 @@ This section introduces three installation methods: [Docker Installation](#docke We recommend using Docker for quick deployment of the vLLM MindSpore environment. Below are the steps: -#### Pulling the Image +#### Building the Image -Execute the following command to pull the vLLM MindSpore Docker image: +User can execute the following commands to clone the vLLM MindSpore code repository and build the image: ```bash -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest +git clone https://gitee.com/mindspore/vllm-mindspore.git +bash build_image.sh ``` -During the pull process, user will see the progress of each layer. After successful completion, check the image by executing the following command: +After a successful build, user will get the following output: + +```text +Successfully built e40bcbeae9fc +Successfully tagged vllm_ms_20250726:latest +``` + +Here, `e40bcbeae9fc` is the image ID, and `vllm_ms_20250726:latest` is the image name and tag. User can run the following command to confirm that the Docker image has been successfully created: ```bash docker images -``` +``` #### Creating a Container -After [pulling the image](#pulling-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, then execute the following command to create the container: +After [building the image](#building-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, then execute the following command to create the container: ```bash export DOCKER_NAME=vllm-mindspore-container # your container name -export IMAGE_NAME=hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest # your image name +export IMAGE_NAME=vllm_ms_20250726:latest # your image name docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \ --device=/dev/davinci0 \ @@ -189,4 +197,4 @@ Prompt: 'Today is'. Generated text: ' the 100th day of school. To celebrate, the Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compostable alternative' ``` -Alternatively, refer to the [Quick Start](../quick_start/quick_start.md) guide for [online serving](../quick_start/quick_start.md#online-serving) verification. +Alternatively, refer to the [Quick Start](../quick_start/quick_start.md) guide for [online inference](../quick_start/quick_start.md#online-serving) verification. diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md index 1c9dd24786578555bcc30ba76ab78818a6749e51..c0eaf16c347299abc75285ffc1d57c50403d5fee 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md @@ -8,27 +8,35 @@ This document provides a quick guide to deploy vLLM MindSpore by [docker](https: In this section, we recommend to use docker to deploy the vLLM MindSpore environment. The following sections are the steps for deployment: -### Pulling the Image +### Building the Image -Pull the vLLM MindSpore docker image by executing the following command: +User can execute the following commands to clone the vLLM MindSpore code repository and build the image: ```bash -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest +git clone https://gitee.com/mindspore/vllm-mindspore.git +bash build_image.sh ``` -During the pull process, user will see the progress of each layer of the docker image. User can verify the image by executing the following command: +After a successful build, user will get the following output: + +```text +Successfully built e40bcbeae9fc +Successfully tagged vllm_ms_20250726:latest +``` + +Here, `e40bcbeae9fc` is the image ID, and `vllm_ms_20250726:latest` is the image name and tag. User can run the following command to confirm that the Docker image has been successfully created: ```bash docker images -``` +``` ### Creating a Container -After [pulling the image](#pulling-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, and create the container by running: +After [building the image](#building-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, and create the container by running: ```bash export DOCKER_NAME=vllm-mindspore-container # your container name -export IMAGE_NAME=hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest # your image name +export IMAGE_NAME=vllm_ms_20250726:latest # your image name docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \ --device=/dev/davinci0 \ @@ -74,7 +82,7 @@ docker exec -it $DOCKER_NAME bash ## Using the Service -After deploying the environment, user need to prepare the model files before running the model. Refer to the [Download Model](#downloading-model) section for guidance. After [setting environment variables](#setting-environment-variables), user can experience the model bt [offline inference](#offline-inference) or [online serving](#online-serving). +After deploying the environment, user need to prepare the model files before running the model. Refer to the [Download Model](#downloading-model) section for guidance. After [setting environment variables](#setting-environment-variables), user can experience the model bt [offline inference](#offline-inference) or [online inference](#online-serving). ### Downloading Model @@ -184,7 +192,7 @@ Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compost ### Online Inference -vLLM MindSpore supports online serving deployment with the OpenAI API protocol. The following section would introduce how to [starting the service](#starting-the-service) and [send requests](#sending-requests) to obtain inference results, using [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example. +vLLM MindSpore supports online inference deployment with the OpenAI API protocol. The following section would introduce how to [starting the service](#starting-the-service) and [send requests](#sending-requests) to obtain inference results, using [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example. #### Starting the Service diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md index a84c39d3bb920db4d868ad5d2dca386af337414d..7a3a9fb4a83667a8d74de50b205d55a4a8a7a928 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md @@ -6,51 +6,37 @@ vLLM MindSpore supports hybrid parallel inference with configurations of tensor This document uses the DeepSeek R1 671B W8A8 model as an example to introduce the inference workflows for [tensor parallelism (TP16)](#tp16-tensor-parallel-inference) and [hybrid parallelism](#hybrid-parallel-inference). The DeepSeek R1 671B W8A8 model requires multiple nodes to run inference. To ensure consistent execution configurations (including model configuration file paths, Python environments, etc.) across all nodes, it is recommended to use Docker containers to eliminate execution differences. -Users can configure the environment by following the [Creating a Container](#creating-a-container) section below or referring to the [Installation Guide](../../installation/installation.md#installation-guide). +Users can configure the environment by following the [Docker Installation](#docker-installation) section below. -## Creating a Container +## Docker Installation + +In this section, we recommend to use docker to deploy the vLLM MindSpore environment. The following sections are the steps for deployment: + +### Building the Image + +User can execute the following commands to clone the vLLM MindSpore code repository and build the image: ```bash -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest - -# Create Docker containers on the master and worker nodes respectively -docker run -itd --name=mindspore_vllm --ipc=host --network=host --privileged=true \ - --device=/dev/davinci0 \ - --device=/dev/davinci1 \ - --device=/dev/davinci2 \ - --device=/dev/davinci3 \ - --device=/dev/davinci4 \ - --device=/dev/davinci5 \ - --device=/dev/davinci6 \ - --device=/dev/davinci7 \ - --device=/dev/davinci_manager \ - --device=/dev/devmm_svm \ - --device=/dev/hisi_hdc \ - -v /usr/local/sbin/:/usr/local/sbin/ \ - -v /var/log/npu/slog/:/var/log/npu/slog \ - -v /var/log/npu/profiling/:/var/log/npu/profiling \ - -v /var/log/npu/dump/:/var/log/npu/dump \ - -v /var/log/npu/:/usr/slog \ - -v /etc/hccn.conf:/etc/hccn.conf \ - -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ - -v /usr/local/dcmi:/usr/local/dcmi \ - -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ - -v /etc/ascend_install.info:/etc/ascend_install.info \ - -v /etc/vnpu.cfg:/etc/vnpu.cfg \ - --shm-size="250g" \ - hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest \ - bash +git clone https://gitee.com/mindspore/vllm-mindspore.git +bash build_image.sh ``` -After successfully creating the container, the container ID will be returned. Users can execute the following command to verify whether the container was created successfully: +After a successful build, user will get the following output: + +```text +Successfully built e40bcbeae9fc +Successfully tagged vllm_ms_20250726:latest +``` + +Here, `e40bcbeae9fc` is the image ID, and `vllm_ms_20250726:latest` is the image name and tag. User can run the following command to confirm that the Docker image has been successfully created: ```bash -docker ps -``` +docker images +``` ### Entering the Container -After completing the [Creating a Container](#creating-a-container) step, use the predefined environment variable `DOCKER_NAME` to start and enter the container: +After [building the image](#building-the-image) step, use the predefined environment variable `DOCKER_NAME` to start and enter the container: ```bash docker exec -it $DOCKER_NAME bash @@ -236,7 +222,7 @@ Before managing a multi-node cluster, ensure that the hostnames of all nodes are #### Starting the Service -vLLM MindSpore can deploy online services using the OpenAI API protocol. Below is the workflow for launching the service. +vLLM MindSpore can deploy online inference using the OpenAI API protocol. Below is the workflow for launching the service. ```bash # Service launch parameter explanation @@ -315,7 +301,7 @@ parallel_config: ### Online Inference -`vllm-mindspore` can deploy online services using the OpenAI API protocol. Below is the workflow for launching the service: +`vllm-mindspore` can deploy online inference using the OpenAI API protocol. Below is the workflow for launching the service: ```bash # Parameter explanations for service launch diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md index 71e7bee046022dd10fcf91552ae3cdf71e377a7b..31f33affcfb5eb6258fd82262c4f06bdc7defe0e 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md @@ -8,23 +8,31 @@ This document introduces single-node multi-card inference process by vLLM MindSp In this section, we recommend using Docker for quick deployment of the vLLM MindSpore environment. Below are the steps for Docker deployment: -### Pulling the Image +### Building the Image -Pull the vLLM MindSpore Docker image by executing the following command: +User can execute the following commands to clone the vLLM MindSpore code repository and build the image: ```bash -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest +git clone https://gitee.com/mindspore/vllm-mindspore.git +bash build_image.sh ``` -During the pull process, user will see the progress of each layer. After successful completion, use can also check the image by running: +After a successful build, user will get the following output: + +```text +Successfully built e40bcbeae9fc +Successfully tagged vllm_ms_20250726:latest +``` + +Here, `e40bcbeae9fc` is the image ID, and `vllm_ms_20250726:latest` is the image name and tag. User can run the following command to confirm that the Docker image has been successfully created: ```bash docker images -``` +``` ### Creating a Container -After [pulling the image](#pulling-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, then create the container: +After [building the image](#building-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, then create the container: ```bash export DOCKER_NAME=vllm-mindspore-container # your container name @@ -140,7 +148,7 @@ export ASCEND_RT_VISIBLE_DEVICES=4,5,6,7 ## Online Inference -vLLM MindSpore supports online serving deployment with the OpenAI API protocol. The following section would introduce how to [starting the service](#starting-the-service) and [send requests](#sending-requests) to obtain inference results, using [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as an example. +vLLM MindSpore supports online inference deployment with the OpenAI API protocol. The following section would introduce how to [starting the service](#starting-the-service) and [send requests](#sending-requests) to obtain inference results, using [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as an example. ### Starting the Service diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md index f17c8e589a49628713b16cffb4a7bf06d1877cfe..a6b4b924dc914f9ccf038f039f3e97002f8e904c 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md @@ -8,23 +8,31 @@ This document introduces single NPU inference process by vLLM MindSpore. Taking In this section, we recommend using Docker for quick deployment of the vLLM MindSpore environment. Below are the steps for Docker deployment: -### Pulling the Image +### Building the Image -Pull the vLLM MindSpore Docker image by executing the following command: +User can execute the following commands to clone the vLLM MindSpore code repository and build the image: ```bash -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest +git clone https://gitee.com/mindspore/vllm-mindspore.git +bash build_image.sh ``` -During the pull process, user will see the progress of each layer. After successful completion, use can also check the image by running: +After a successful build, user will get the following output: + +```text +Successfully built e40bcbeae9fc +Successfully tagged vllm_ms_20250726:latest +``` + +Here, `e40bcbeae9fc` is the image ID, and `vllm_ms_20250726:latest` is the image name and tag. User can run the following command to confirm that the Docker image has been successfully created: ```bash docker images -``` +``` ### Creating a Container -After [pulling the image](#pulling-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, then create the container: +After [building the image](#building-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, then create the container: ```bash export DOCKER_NAME=vllm-mindspore-container # your container name @@ -178,7 +186,7 @@ Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compost ## Online Inference -vLLM MindSpore supports online serving deployment with the OpenAI API protocol. The following section would introduce how to [starting the service](#starting-the-service) and [send requests](#sending-requests) to obtain inference results, using [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example. +vLLM MindSpore supports online inference deployment with the OpenAI API protocol. The following section would introduce how to [starting the service](#starting-the-service) and [send requests](#sending-requests) to obtain inference results, using [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example. ### Starting the Service diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md index 78a253a5eac2128b457ebdb78599cf51e0d57793..c486016bad58e965b06b9fe3cd4950322b72ae8a 100644 --- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md +++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md @@ -15,13 +15,13 @@ export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set ac export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file. ``` -then start the online service with the following command: +then start the online inference with the following command: ```bash vllm-mindspore serve Qwen/Qwen2.5-7B-Instruct --device auto --disable-log-requests ``` -For multi-card inference, we take [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as an example. You can prepare the environment by following the guide [Multi-Card Inference (Qwen2.5-32B)](../../../getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md#online-inference), then start the online service with the following command: +For multi-card inference, we take [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as an example. You can prepare the environment by following the guide [Multi-Card Inference (Qwen2.5-32B)](../../../getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md#online-inference), then start the online inference with the following command: ```bash export TENSOR_PARALLEL_SIZE=4 diff --git a/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md b/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md index 592302dc9f0593f34fab896906c965a95793c6fc..c68420f33006ab2f8282b630cf94c5fa779fa439 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md @@ -50,7 +50,7 @@ load_ckpt_format: "safetensors" ``` -### 拉起在线服务时,报`aclnnNonzeroV2`相关错误 +### 拉起在线推理时,报`aclnnNonzeroV2`相关错误 - 错误关键信息: diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md index f21233e8bf797d029ee923025dc326fb9c328b03..cc164aecc15040173411f72cd0f826ae819b0eb7 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md @@ -5,7 +5,6 @@ 本文档将介绍安装vLLM MindSpore环境的操作步骤。分为三种安装方式: - [docker安装](#docker安装):适合用户快速使用的场景; -- [pip安装](#pip安装):适合用户需要指定安装版本的场景; - [源码安装](#源码安装):适合用户有增量开发vLLM MindSpore的场景。 ## 版本配套 @@ -32,15 +31,23 @@ 在本章节中,我们推荐用docker创建的方式,以快速部署vLLM MindSpore环境,以下是部署docker的步骤介绍: -#### 拉取镜像 +#### 构建镜像 -用户可执行以下命令,拉取vLLM MindSpore的docker镜像: +用户可执行以下命令,拉取vLLM MindSpore代码仓库,并构建镜像: ```bash -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest +git clone https://gitee.com/mindspore/vllm-mindspore.git +bash build_image.sh ``` -拉取过程中,用户将看到docker镜像各layer的拉取进度。拉取成功后,用户可执行以下命令,确认docker镜像拉取成功: +构建成功后,用户可以得到以下信息: + +```text +Successfully built e40bcbeae9fc +Successfully tagged vllm_ms_20250726:latest +``` + +其中,`e40bcbeae9fc`为镜像id,`vllm_ms_20250726:latest`为镜像名与tag。用户可执行以下命令,确认docker镜像创建成功: ```bash docker images @@ -48,11 +55,11 @@ docker images #### 新建容器 -用户在完成[拉取镜像](#拉取镜像)后,设置`DOCKER_NAME`与`IMAGE_NAME`以设置容器名与镜像名,并执行以下命令,以新建容器: +用户在完成[构建镜像](#构建镜像)后,设置`DOCKER_NAME`与`IMAGE_NAME`以设置容器名与镜像名,并执行以下命令,以新建容器: ```bash export DOCKER_NAME=vllm-mindspore-container # your container name -export IMAGE_NAME=hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest # your image name +export IMAGE_NAME=vllm_ms_20250726:latest # your image name docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \ --device=/dev/davinci0 \ @@ -189,4 +196,4 @@ Prompt: 'Today is'. Generated text: ' the 100th day of school. To celebrate, the Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compostable alternative' ``` -用户也可以参考[快速开始](../quick_start/quick_start.md)章节,使用[在线服务](../quick_start/quick_start.md#在线服务)的方式进行验证。 +用户也可以参考[快速开始](../quick_start/quick_start.md)章节,使用[在线推理](../quick_start/quick_start.md#在线推理)的方式进行验证。 diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md index bd5ae6b0db054e1ce599932373ee6290143764f5..2bf629908a752b2c386f752e33355339b9ec6069 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md @@ -2,21 +2,29 @@ [![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md) -本文档将为用户提供快速指引,以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)模型为例,使用[docker](https://www.docker.com/)的安装方式部署vLLM MindSpore,并以[离线推理](#离线推理)与[在线服务](#在线推理)两种方式,快速体验vLLM MindSpore的服务化与推理能力。如用户需要了解更多的安装方式,请参考[安装指南](../installation/installation.md)。 +本文档将为用户提供快速指引,以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)模型为例,使用[docker](https://www.docker.com/)的安装方式部署vLLM MindSpore,并以[离线推理](#离线推理)与[在线推理](#在线推理)两种方式,快速体验vLLM MindSpore的服务化与推理能力。如用户需要了解更多的安装方式,请参考[安装指南](../installation/installation.md)。 ## docker安装 在本章节中,我们推荐用docker创建的方式,以快速部署vLLM MindSpore环境,以下是部署docker的步骤介绍: -### 拉取镜像 +### 构建镜像 -拉取vLLM MindSpore的docker镜像。执行以下命令进行拉取: +用户可执行以下命令,拉取vLLM MindSpore代码仓库,并构建镜像: ```bash -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest +git clone https://gitee.com/mindspore/vllm-mindspore.git +bash build_image.sh ``` -拉取过程中,用户将看到docker镜像各layer的拉取进度。拉取成功后,用户可执行以下命令,确认docker镜像拉取成功: +构建成功后,用户可以得到以下信息: + +```text +Successfully built e40bcbeae9fc +Successfully tagged vllm_ms_20250726:latest +``` + +其中,`e40bcbeae9fc`为镜像id,`vllm_ms_20250726:latest`为镜像名与tag。用户可执行以下命令,确认docker镜像创建成功: ```bash docker images @@ -24,11 +32,11 @@ docker images ### 新建容器 -用户在完成[拉取镜像](#拉取镜像)后,设置`DOCKER_NAME`与`IMAGE_NAME`为容器名与镜像名,并执行以下命令新建容器: +用户在完成[构建镜像](#构建镜像)后,设置`DOCKER_NAME`与`IMAGE_NAME`为容器名与镜像名,并执行以下命令新建容器: ```bash export DOCKER_NAME=vllm-mindspore-container # your container name -export IMAGE_NAME=hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest # your image name +export IMAGE_NAME=vllm_ms_20250726:latest # your image name docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \ --device=/dev/davinci0 \ @@ -74,7 +82,7 @@ docker exec -it $DOCKER_NAME bash ## 使用服务 -用户在环境部署完毕后,在运行模型前,需要准备模型文件,用户可通过[下载模型](#下载模型)章节的指引作模型准备,在[设置环境变量](#设置环境变量)后,可采用[离线推理](#离线推理)或[在线服务](#在线服务)的方式,进行模型体验。 +用户在环境部署完毕后,在运行模型前,需要准备模型文件,用户可通过[下载模型](#下载模型)章节的指引作模型准备,在[设置环境变量](#设置环境变量)后,可采用[离线推理](#离线推理)或[在线推理](#在线推理)的方式,进行模型体验。 ### 下载模型 @@ -184,7 +192,7 @@ Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compost ### 在线推理 -vLLM MindSpore可使用OpenAI的API协议,进行在线服务部署。以下是以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) 为例,介绍模型的[启动服务](#启动服务),并[发送请求](#发送请求),得到在线服务的推理结果。 +vLLM MindSpore可使用OpenAI的API协议,进行在线推理部署。以下是以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) 为例,介绍模型的[启动服务](#启动服务),并[发送请求](#发送请求),得到在线推理的推理结果。 #### 启动服务 diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md index c5b2ca986f00aced1576388fbd9e96b10fecd1c6..047e7b4aad2f9239abc18c111f8f576867c3a8be 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md @@ -6,15 +6,43 @@ vLLM MindSpore支持张量并行(TP)、数据并行(DP)、专家并行 本文档将以DeepSeek R1 671B W8A8为例介绍[张量并行](#tp16-张量并行推理)及[混合并行](#混合并行推理)推理流程。DeepSeek R1 671B W8A8模型需使用多个节点资源运行推理模型。为确保各个节点的执行配置(包括模型配置文件路径、Python环境等)一致,推荐通过 docker 镜像创建容器的方式避免执行差异。 -用户可通过以下[新建容器](#新建容器)章节或参考[安装指南](../../installation/installation.md#安装指南)进行环境配置。 +用户可通过以下[docker安装](#docker安装)章节进行环境配置。 -## 新建容器 +## docker安装 + +在本章节中,我们推荐用docker创建的方式,以快速部署vLLM MindSpore环境。以下是部署docker的步骤介绍: + +### 构建镜像 + +用户可执行以下命令,拉取vLLM MindSpore代码仓库,并构建镜像: + +```bash +git clone https://gitee.com/mindspore/vllm-mindspore.git +bash build_image.sh +``` + +构建成功后,用户可以得到以下信息: + +```text +Successfully built e40bcbeae9fc +Successfully tagged vllm_ms_20250726:latest +``` + +其中,`e40bcbeae9fc`为镜像id,`vllm_ms_20250726:latest`为镜像名与tag。用户可执行以下命令,确认docker镜像创建成功: + +```bash +docker images +``` + +### 新建容器 + +用户在完成[构建镜像](#构建镜像)后,设置`DOCKER_NAME`与`IMAGE_NAME`为容器名与镜像名,并执行以下命令新建容器: ```bash -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest +export DOCKER_NAME=vllm-mindspore-container # your container name +export IMAGE_NAME=vllm_ms_20250726:latest # your image name -# 分别在主从节点新建docker容器 -docker run -itd --name=mindspore_vllm --ipc=host --network=host --privileged=true \ +docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \ --device=/dev/davinci0 \ --device=/dev/davinci1 \ --device=/dev/davinci2 \ @@ -38,7 +66,7 @@ docker run -itd --name=mindspore_vllm --ipc=host --network=host --privileged=tru -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /etc/vnpu.cfg:/etc/vnpu.cfg \ --shm-size="250g" \ - hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest \ + ${IMAGE_NAME} \ bash ``` @@ -236,7 +264,7 @@ chmod -R 777 ./Ascend-pyACL_8.0.RC1_linux-aarch64.run #### 启动服务 -vLLM MindSpore可使用OpenAI的API协议,部署为在线服务。以下是在线服务的拉起流程。 +vLLM MindSpore可使用OpenAI的API协议,部署为在线推理。以下是在线推理的拉起流程。 ```bash # 启动配置参数说明 @@ -316,7 +344,7 @@ parallel_config: ### 在线推理 -`vllm-mindspore`可使用OpenAI的API协议部署在线服务。以下是在线服务的拉起流程: +`vllm-mindspore`可使用OpenAI的API协议部署在线推理。以下是在线推理的拉起流程: ```bash # 启动配置参数说明 diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md index c39ff38372f1a03326331388b5a07e7f08bb3480..66e73eb2ed739fe655605b3b3469ef8ceeb5b3ce 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md @@ -8,15 +8,23 @@ 在本章节中,我们推荐用docker创建的方式,以快速部署vLLM MindSpore环境,以下是部署docker的步骤介绍: -### 拉取镜像 +### 构建镜像 -拉取vLLM MindSpore的docker镜像。执行以下命令进行拉取: +用户可执行以下命令,拉取vLLM MindSpore代码仓库,并构建镜像: ```bash -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest +git clone https://gitee.com/mindspore/vllm-mindspore.git +bash build_image.sh ``` -拉取过程中,用户将看到docker镜像各layer的拉取进度。拉取成功后,用户可执行以下命令,确认docker镜像拉取成功: +构建成功后,用户可以得到以下信息: + +```text +Successfully built e40bcbeae9fc +Successfully tagged vllm_ms_20250726:latest +``` + +其中,`e40bcbeae9fc`为镜像id,`vllm_ms_20250726:latest`为镜像名与tag。用户可执行以下命令,确认docker镜像创建成功: ```bash docker images @@ -24,11 +32,11 @@ docker images ### 新建容器 -用户在完成[拉取镜像](#拉取镜像)后,设置`DOCKER_NAME`与`IMAGE_NAME`为容器名与镜像名,并执行以下命令新建容器: +用户在完成[构建镜像](#构建镜像)后,设置`DOCKER_NAME`与`IMAGE_NAME`为容器名与镜像名,并执行以下命令新建容器: ```bash export DOCKER_NAME=vllm-mindspore-container # your container name -export IMAGE_NAME=hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest # your image name +export IMAGE_NAME=vllm_ms_20250726:latest # your image name docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \ --device=/dev/davinci0 \ @@ -141,7 +149,7 @@ export ASCEND_RT_VISIBLE_DEVICES=4,5,6,7 ## 在线推理 -vLLM MindSpore可使用OpenAI的API协议,部署为在线服务。以下是在线服务的拉起流程。以下是以[Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) 为例,介绍模型的[启动服务](#启动服务),并[发送请求](#发送请求),得到在线服务的推理结果。 +vLLM MindSpore可使用OpenAI的API协议,部署为在线推理。以下是在线推理的拉起流程。以下是以[Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) 为例,介绍模型的[启动服务](#启动服务),并[发送请求](#发送请求),得到在线推理的推理结果。 ### 启动服务 diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md index 0c9449f718593157247f490256bc44936964fb12..5eb590016ee253e1ad50a534163bf6e4d42006e2 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md @@ -8,15 +8,23 @@ 在本章节中,我们推荐用docker创建的方式,以快速部署vLLM MindSpore环境。以下是部署docker的步骤介绍: -### 拉取镜像 +### 构建镜像 -拉取vLLM MindSpore的docker镜像。执行以下命令进行拉取: +用户可执行以下命令,拉取vLLM MindSpore代码仓库,并构建镜像: ```bash -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest +git clone https://gitee.com/mindspore/vllm-mindspore.git +bash build_image.sh ``` -拉取过程中,用户将看到docker镜像各layer的拉取进度。拉取成功后,用户可执行以下命令,确认docker镜像拉取成功: +构建成功后,用户可以得到以下信息: + +```text +Successfully built e40bcbeae9fc +Successfully tagged vllm_ms_20250726:latest +``` + +其中,`e40bcbeae9fc`为镜像id,`vllm_ms_20250726:latest`为镜像名与tag。用户可执行以下命令,确认docker镜像创建成功: ```bash docker images @@ -24,11 +32,11 @@ docker images ### 新建容器 -用户在完成[拉取镜像](#拉取镜像)后,设置`DOCKER_NAME`与`IMAGE_NAME`为容器名与镜像名,并执行以下命令,以新建容器: +用户在完成[构建镜像](#构建镜像)后,设置`DOCKER_NAME`与`IMAGE_NAME`为容器名与镜像名,并执行以下命令新建容器: ```bash export DOCKER_NAME=vllm-mindspore-container # your container name -export IMAGE_NAME=hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest # your image name +export IMAGE_NAME=vllm_ms_20250726:latest # your image name docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \ --device=/dev/davinci0 \ @@ -180,7 +188,7 @@ Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compost ## 在线推理 -vLLM MindSpore可使用OpenAI的API协议,部署为在线服务。以下是以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) 为例,介绍模型的[启动服务](#启动服务),并[发送请求](#发送请求),得到在线服务的推理结果。 +vLLM MindSpore可使用OpenAI的API协议,部署为在线推理。以下是以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) 为例,介绍模型的[启动服务](#启动服务),并[发送请求](#发送请求),得到在线推理的推理结果。 ### 启动服务 diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md index 57f15ffdbb2a6d6e42dd8332a0fa2a2d4922ee13..661dba5878f40992338580d93896393c5d418383 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md @@ -15,13 +15,13 @@ export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set ac export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file. ``` -并以下命令启动在线服务: +并以下命令启动在线推理: ```bash vllm-mindspore serve Qwen/Qwen2.5-7B-Instruct --device auto --disable-log-requests ``` -若使用多卡推理,以[Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) 为例,可按照文档[多卡推理(Qwen2.5-32B)](../../../getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md#在线推理)进行环境准备,则可用以下命令启动在线服务: +若使用多卡推理,以[Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) 为例,可按照文档[多卡推理(Qwen2.5-32B)](../../../getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md#在线推理)进行环境准备,则可用以下命令启动在线推理: ```bash export TENSOR_PARALLEL_SIZE=4