From c53ef463e5aea7710b292d79ca03cecf9855d99a Mon Sep 17 00:00:00 2001 From: horcam Date: Thu, 18 Sep 2025 16:51:28 +0800 Subject: [PATCH] update release note && fix model list --- .../installation/installation.md | 6 +-- .../source_en/release_notes/release_notes.md | 38 +++++++++++++------ .../features_list/features_list.md | 2 +- .../models_list/models_list.md | 1 - .../installation/installation.md | 6 +-- .../release_notes/release_notes.md | 38 +++++++++++++------ .../features_list/features_list.md | 2 +- .../models_list/models_list.md | 1 - 8 files changed, 62 insertions(+), 32 deletions(-) diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md index 5495b5a710..7bc4dd987f 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md @@ -17,12 +17,12 @@ This document will introduce the [Version Matching](#version-compatibility) of v | ----- | ----- | |CANN | [8.1.RC1](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/softwareinst/instg/instg_0000.html?Mode=PmIns&InstallType=local&OS=Debian&Software=cannToolKit) | |MindSpore | [2.7.0](https://www.mindspore.cn/versions#2.7.0) | - |MSAdapter | [0.2.0](https://repo.mindspore.cn/mindspore/msadapter/version/202508/20250807/r0.2.0_20250807013007_e7636d61563c4beafac4b877891172464fdcf321_newest/any/) | + |MSAdapter | [0.5.0](https://ms-release.obs.cn-north-4.myhuaweicloud.com/2.7.0/MSAdapter/any/msadapter-0.5.0-py3-none-any.whl) | |MindSpore Transformers| [1.6.0](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/installation.html) | |Golden Stick| [1.2.0](https://www.mindspore.cn/golden_stick/docs/zh-CN/r1.2.0/install.html) | - |vLLM | [0.8.3](https://repo.mindspore.cn/mirrors/vllm/version/202505/20250514/v0.8.4.dev0_newest/any/) | + |vLLM | [0.8.3](https://repo.mindspore.cn/mirrors/vllm/version/202505/20250514/v0.8.4.dev0_newest/any/vllm-0.8.4.dev0%2Bg296c657.d20250514.empty-py3-none-any.whl) | -Note: [vLLM Package](https://repo.mindspore.cn/mirrors/vllm/version/202505/20250514/v0.8.4.dev0_newest/any/) uses vLLM 0.8.3 branch,and add data parallel. +Note: [vLLM Package](https://repo.mindspore.cn/mirrors/vllm/version/202505/20250514/v0.8.4.dev0_newest/any/vllm-0.8.4.dev0%2Bg296c657.d20250514.empty-py3-none-any.whl) uses vLLM 0.8.3 branch,and add data parallel. ## Docker Installation diff --git a/docs/vllm_mindspore/docs/source_en/release_notes/release_notes.md b/docs/vllm_mindspore/docs/source_en/release_notes/release_notes.md index 32dff9e509..2514c59a3b 100644 --- a/docs/vllm_mindspore/docs/source_en/release_notes/release_notes.md +++ b/docs/vllm_mindspore/docs/source_en/release_notes/release_notes.md @@ -4,20 +4,36 @@ ## vLLM-MindSpore Plugin 0.3.0 Release Notes -The following are the key new features and models supported in the vLLM-MindSpore Plugin version 0.3.0. +The vLLM-MindSpore Plugin 0.3.0 version is compatible with vLLM 0.8.3. Below are the new features and models supported in this release. ### New Features -- 0.8.3 V1 Architecture Basic Features, including chunked prefill and automatic prefix caching; -- V0 Multi-step Scheduling; -- V0 Chunked Prefill; -- V0 Automatic Prefix Caching; -- V0 DeepSeek MTP (Multi-Task Processing); -- GPTQ Quantization; -- SmoothQuant Quantization. +- **Architecture Adaptation**: Supports both vLLM V0 and V1 architectures. Users can switch the architectures using `VLLM_USE_V1`. +- **Service Features**: Supports Chunked Prefill, Automatic Prefix Caching, Async Output, and Reasoning Outputs. The V0 architecture also supports Multi-Step Scheduler and DeepSeek MTP features. For detailed descriptions, refer to the [Feature Support List](../user_guide/supported_features/features_list/features_list.md). +- **Quantization Support**: Supports GPTQ quantization and SmoothQuant quantization. For detailed descriptions, refer to [Quantization Methods](../user_guide/supported_features/quantization/quantization.md). +- **Parallel Strategies**: In the V1 architecture, Tensor Parallel, Data Parallel, and Expert Parallel are supported. For detailed descriptions, refer to [Multi-Machine Parallel Inference](../getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md). +- **Debugging Tools**: Adapted vLLM's profiling tool for performance data collection and model IR graph saving via the MindSpore backend, facilitating model debugging and optimization. Adapted vLLM's benchmark tool for performance testing. For detailed descriptions, refer to [Debugging Methods](../user_guide/supported_features/profiling/profiling.md) and [Performance Testing](../user_guide/supported_features/benchmark/benchmark.md). ### New Models -- DeepSeek-V3/R1 -- Qwen2.5-0.5B/1.5/7B/14B/32B/72B -- Qwen3-0.6B/1.7B/4B/8B/14B/32B +- DeepSeek Series Models: + - [Supported] DeepSeek-V3, DeepSeek-R1, DeepSeek-R1 W8A8 quantized models. +- Qwen2.5 Series Models: + - [Supported] Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B. + - [Testing] Qwen2.5-VL: 3B, 7B, 32B, 72B. +- Qwen3 Series Models: + - [Supported] Qwen3: 32B; Qwen3-MOE: 235B-A22B. + - [Testing] Qwen3: 0.6B, 1.7B, 4B, 8B, 14B; Qwen3-MOE: Qwen3-30B-A3. +- QwQ Series Models: + - [Testing] QwQ: 32B. +- Llama Series Models: + - [Testing] Llama3.1: 8B, 70B, 405B. + - [Testing] Llama3.2: 1B, 3B. + +### Contributors + +Thanks to the following contributors for their efforts: + +alien_0119, candyhong, can-gaa-hou, ccsszz, cs123abc, dayschan, Erpim, fary86, hangangqiang, horcam, huandong, huzhikun, i-robot, jiahaochen666, JingweiHuang, lijiakun, liu lili, lvhaoyu, lvhaoyu1, moran, nashturing, one_east, panshaowu, pengjingyou, r1chardf1d0, tongl, TrHan, tronzhang, TronZhang, twc, uh, w00521005, wangpingan2, WanYidong, WeiCheng Tan, wusimin, yangminghai, yyyyrf, zhaizhiqiang, zhangxuetong, zhang_xu_hao1230, zhanzhan1, zichun_ye, zlq2020 + +Contributions to the project in any form are welcome! diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/features_list/features_list.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/features_list/features_list.md index f5d11b1702..c9411ebd71 100644 --- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/features_list/features_list.md +++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/features_list/features_list.md @@ -10,7 +10,7 @@ The following is the features supported in vLLM-MindSpore Plugin. |-----------------------------------|--------------------|--------------------| | Chunked Prefill | √ | √ | | Automatic Prefix Caching | √ | √ | -| Multi step scheduler | √ | × | +| Multi-step scheduler | √ | × | | DeepSeek MTP | √ | WIP | | Async output | √ | √ | | Quantization | √ | √ | diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md index 94de7c4bc9..56b1ca4839 100644 --- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md +++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md @@ -15,6 +15,5 @@ | QwQ-32B | Testing | [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | | Llama3.1 | Testing | [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct), [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct), [Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) | | Llama3.2 | Testing | [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct), [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | -| DeepSeek-V2 | Testing | [DeepSeek-V2](https://huggingface.co/deepseek-ai/DeepSeek-V2) | Note: refer to [Environment Variable List](../../environment_variables/environment_variables.md), and set the model backend by environment variable `vLLM_MODEL_BACKEND`. diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md index 0908d8cb0e..73074e7c0e 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md @@ -17,12 +17,12 @@ | ----- | ----- | |CANN | [8.1.RC1](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/softwareinst/instg/instg_0000.html?Mode=PmIns&InstallType=local&OS=Debian&Software=cannToolKit) | |MindSpore | [2.7.0](https://www.mindspore.cn/versions#2.7.0) | - |MSAdapter | [0.2.0](https://repo.mindspore.cn/mindspore/msadapter/version/202508/20250807/r0.2.0_20250807013007_e7636d61563c4beafac4b877891172464fdcf321_newest/any/) | + |MSAdapter | [0.5.0](https://ms-release.obs.cn-north-4.myhuaweicloud.com/2.7.0/MSAdapter/any/msadapter-0.5.0-py3-none-any.whl) | |MindSpore Transformers| [1.6.0](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/installation.html) | |Golden Stick| [1.2.0](https://www.mindspore.cn/golden_stick/docs/zh-CN/r1.2.0/install.html) | - |vLLM | [0.8.3](https://repo.mindspore.cn/mirrors/vllm/version/202505/20250514/v0.8.4.dev0_newest/any/) | + |vLLM | [0.8.3](https://repo.mindspore.cn/mirrors/vllm/version/202505/20250514/v0.8.4.dev0_newest/any/vllm-0.8.4.dev0%2Bg296c657.d20250514.empty-py3-none-any.whl) | -注:[vLLM软件包](https://repo.mindspore.cn/mirrors/vllm/version/202505/20250514/v0.8.4.dev0_newest/any/)使用vLLM 0.8.3分支,并加入数据并行功能。 +注:[vLLM软件包](https://repo.mindspore.cn/mirrors/vllm/version/202505/20250514/v0.8.4.dev0_newest/any/vllm-0.8.4.dev0%2Bg296c657.d20250514.empty-py3-none-any.whl)使用vLLM 0.8.3分支,并加入数据并行功能。 ## docker安装 diff --git a/docs/vllm_mindspore/docs/source_zh_cn/release_notes/release_notes.md b/docs/vllm_mindspore/docs/source_zh_cn/release_notes/release_notes.md index b4d4cc74e8..cf9825dc18 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/release_notes/release_notes.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/release_notes/release_notes.md @@ -4,20 +4,36 @@ ## vLLM-MindSpore插件 0.3.0 Release Notes -以下为vLLM-MindSpore插件0.3.0版本支持的关键新功能和模型。 +vLLM MindSpore插件0.3.0版本,配套vLLM 0.8.3版本。以下为此版本支持的关键新功能和模型。 ### 新特性 -- 0.8.3 V1架构基础功能, 包含分块预填充和自动前缀缓存功能; -- V0 多步调度功能; -- V0 分块预填充功能; -- V0 自动前缀缓存功; -- V0 DeepSeek MTP功能; -- GPTQ量化; -- SmoothQuant量化。 +- **架构适配**:架构适配vLLM V0与V1架构,用户可通过`VLLM_USE_V1`进行架构切换; +- **服务特性**:支持Chunked Prefill、Automatic Prefix Caching、Async output、Reasoning Outputs等特性;其中V0架构中也支持Multi-step scheduler、DeepSeek MTP特性。详细描述请参考[特性支持列表](../user_guide/supported_features/features_list/features_list.md); +- **量化支持**:支持GPTQ量化与SmoothQuant量化功能;详细描述请参考[量化方法](../user_guide/supported_features/quantization/quantization.md); +- **并行策略**:V1架构中,支持张量并行(Tensor Parallel)、数据并行(Data Parallel)、专家并行(Expert Parallel);详细描述请参考[多机并行推理](../getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md); +- **调试工具**:适配使用vLLM的profile工具,通过MindSpore后端进行性能数据采集、模型IR图保存,便于用户进行模型的调试与调优;适配使用vLLM的benchmark工具进行性能测试。详细描述请参考[调试方法](../user_guide/supported_features/profiling/profiling.md)与[性能测试](../user_guide/supported_features/benchmark/benchmark.md); ### 新模型 -- DeepSeek-V3/R1 -- Qwen2.5-0.5B/1.5/7B/14B/32B/72B -- Qwen3-0.6B/1.7B/4B/8B/14B/32B +- DeepSeek 系列模型: + - [已支持] DeepSeek-V3、DeepSeek-R1、DeepSeek-R1 W8A8量化模型; +- Qwen2.5 系列模型: + - [已支持] Qwen2.5:0.5B、1.5B、3B、7B、14B、32B、72B; + - [测试中] Qwen2.5-VL:3B、7B、32B、72B; +- Qwen3 系列模型: + - [已支持] Qwen3:32B;Qwen3-MOE:235B-A22B; + - [测试中] Qwen3:0.6B、1.7B、4B、8B、14B;Qwen3-MOE:Qwen3-30B-A3 +- QwQ 系列模型: + - [测试中] QwQ:32B +- Llama 系列模型: + - [测试中] Llama3.1:8B、70B、405B + - [测试中] Llama3.2:1B、3B + +### 贡献者 + +感谢以下人员做出的贡献: + +alien_0119、candyhong、can-gaa-hou、ccsszz、cs123abc、dayschan、Erpim、fary86、hangangqiang、horcham_zhq、huandong、huzhikun、i-robot、jiahaochen666、JingweiHuang、lijiakun、liu lili、lvhaoyu、lvhaoyu1、moran、nashturing、one_east、panshaowu、pengjingyou、r1chardf1d0、tongl、TrHan、tronzhang、TronZhang、twc、uh、w00521005、wangpingan2、WanYidong、WeiCheng Tan、wusimin、yangminghai、yyyyrf、zhaizhiqiang、zhangxuetong、zhang_xu_hao1230、zhanzhan1、zichun_ye、zlq2020 + +欢迎以任何形式对项目提供贡献! diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/features_list/features_list.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/features_list/features_list.md index 966d3da543..e1d963b9fc 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/features_list/features_list.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/features_list/features_list.md @@ -10,7 +10,7 @@ vLLM-MindSpore插件支持的特性功能与vLLM社区版本保持一致,特 |-----------------------------------|--------------------|--------------------| | Chunked Prefill | √ | √ | | Automatic Prefix Caching | √ | √ | -| Multi step scheduler | √ | × | +| Multi-step scheduler | √ | × | | DeepSeek MTP | √ | WIP | | Async output | √ | √ | | Quantization | √ | √ | diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md index 5286aec97d..0da05a5b85 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md @@ -15,6 +15,5 @@ | QwQ-32B | 测试中 | [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | | Llama3.1 | 测试中 | [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)、[Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)、[Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) | | Llama3.2 | 测试中 | [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)、[Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | -| DeepSeek-V2 | 测试中 | [DeepSeek-V2](https://huggingface.co/deepseek-ai/DeepSeek-V2) | 注:用户可参考[环境变量章节](../../environment_variables/environment_variables.md),通过环境变量`vLLM_MODEL_BACKEND`,指定模型后端。 -- Gitee