diff --git a/docs/mindquantum/docs/source_en/images/mindquantum_en.png b/docs/mindquantum/docs/source_en/images/mindquantum_en.png deleted file mode 100644 index 24663f4e70cd2fb568ea03b04ce3d889722fb806..0000000000000000000000000000000000000000 Binary files a/docs/mindquantum/docs/source_en/images/mindquantum_en.png and /dev/null differ diff --git a/docs/mindquantum/docs/source_zh_cn/images/mindquantum_cn.png b/docs/mindquantum/docs/source_zh_cn/images/mindquantum_cn.png deleted file mode 100644 index e6a57eb9cb71bbbbf1a87f2928d8acf099fbcc31..0000000000000000000000000000000000000000 Binary files a/docs/mindquantum/docs/source_zh_cn/images/mindquantum_cn.png and /dev/null differ diff --git a/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md b/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md index bd4e479bd76671d41d7641cec13537a5b3187f4b..59248ddfcd6f375b0c12cd7804fb5cf1eb9d564b 100644 --- a/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md +++ b/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md @@ -18,7 +18,7 @@ To support a new model for vLLM MindSpore code repository, please note the follo - **Follow file format and location specifications.** Model code files should be placed under the `vllm_mindspore/model_executor` directory, organized in corresponding subfolders by model type. - **Implement models using MindSpore interfaces with jit static graph support.** Model definitions in vLLM MindSpore must be implemented using MindSpore interfaces. Since MindSpore's static graph mode offers performance advantages, models should support execution via @jit static graphs. For reference, see the [Qwen2.5](https://gitee.com/mindspore/vllm-mindspore/blob/master/vllm_mindspore/model_executor/models/qwen2.py) implementation. - **Register new models in vLLM MindSpore.** After implementing the model structure, register it in vLLM MindSpore by adding it to `_NATIVE_MODELS` in `vllm_mindspore/model_executor/models/registry.py`. -- **Write unit tests.** New models must include corresponding unit tests. Refer to the [Qwen2.5 testcases](https://gitee.com/mindspore/vllm-mindspore/blob/master/tests/st/python/test_vllm_qwen_7b.py) for examples. +- **Write unit tests.** New models must include corresponding unit tests. Refer to the [Qwen2.5 testcases](https://gitee.com/mindspore/vllm-mindspore/blob/master/tests/st/python/cases_parallel/vllm_qwen_7b.py) for examples. ## Contribution Process diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md index 17257f7638b3b84b68e8003a3d4202521d3fc1be..9a31ef60524727b1f5a43293581c0d7c09e173ed 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md @@ -1,6 +1,6 @@ # Parallel Inference (DeepSeek R1) -[![View Source](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md) +[![View Source](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md) vLLM MindSpore supports hybrid parallel inference with configurations of tensor parallelism (TP), data parallelism (DP), expert parallelism (EP), and their combinations. For the applicable scenarios of different parallel strategies, refer to the [vLLM official documentation](https://docs.vllm.ai/en/latest/configuration/optimization.html#parallelism-strategies). @@ -284,7 +284,7 @@ Environment variable descriptions: - `MS_ALLOC_CONF`: Set the memory policy. Refer to the [MindSpore documentation](https://www.mindspore.cn/docs/en/master/api_python/env_var_list.html). - `ASCEND_RT_VISIBLE_DEVICES`: Configure the available device IDs for each node. Use the `npu-smi info` command to check. - `vLLM_MODEL_BACKEND`: The backend of the model to run. Currently supported models and backends for vLLM MindSpore can be found in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md). -- `MINDFORMERS_MODEL_CONFIG`: Model configuration file. Users can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3/deepseek_r1_671b), such as [predict_deepseek_r1_671b_w8a8_ep4t4.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8_ep4t4.yaml). +- `MINDFORMERS_MODEL_CONFIG`: Model configuration file. Users can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3/deepseek_r1_671b), such as [predict_deepseek_r1_671b_w8a8_ep4t4.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8_ep4tp4.yaml). The model parallel strategy is specified in the `parallel_config` of the configuration file. For example, the DP4TP4EP4 hybrid parallel configuration is as follows: diff --git a/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/contributing.md b/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/contributing.md index e0a1aaa2c1104a4ec6320fce7511dc5cf06659f0..f44b6d798e0b9e41bb0d05a87b3a9748ee77daa8 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/contributing.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/contributing.md @@ -19,7 +19,7 @@ - **文件格式及位置要遵循规范。** 模型代码文件统一放置于`vllm_mindspore/model_executor`文件夹下,请根据不同模型将代码文件放置于对应的文件夹下。 - **模型基于MindSpore接口实现,支持jit静态图方式执行。** vLLM MindSpore中的模型定义实现需基于MindSpore接口实现。由于MindSpore静态图模式执行性能有优势,因此模型需支持@jit静态图方式执行。详细可参考[Qwen2.5](https://gitee.com/mindspore/vllm-mindspore/blob/master/vllm_mindspore/model_executor/models/qwen2.py)模型定义实现。 - **将新模型在vLLM MindSpore代码中进行注册。** 模型结构定义实现后,需要将该模型注册到vLLM MindSpore中,注册文件位于'vllm_mindspore/model_executor/models/registry.py'中,请将模型注册到`_NATIVE_MODELS`。 -- **编写单元测试。** 新增的模型需同步提交单元测试用例,用例编写请参考[Qwen2.5模型用例](https://gitee.com/mindspore/vllm-mindspore/blob/master/tests/st/python/test_vllm_qwen_7b.py)。 +- **编写单元测试。** 新增的模型需同步提交单元测试用例,用例编写请参考[Qwen2.5模型用例](https://gitee.com/mindspore/vllm-mindspore/blob/master/tests/st/python/cases_parallel/vllm_qwen_7b.py)。 ## 贡献流程 diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md index add87115e00ddfd1d7712928039f4db8f1b7a8a5..02ebe3d1afe298aa4bcaf66b8444dcdb87cf3492 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md @@ -285,7 +285,7 @@ export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/pre - `MS_ALLOC_CONF`: 设置内存策略。可参考[MindSpore官网文档](https://www.mindspore.cn/docs/zh-CN/r2.6.0/api_python/env_var_list.html)。 - `ASCEND_RT_VISIBLE_DEVICES`: 配置每个节点可用device id。用户可使用`npu-smi info`命令进行查询。 - `vLLM_MODEL_BACKEND`:所运行的模型后端。目前vLLM MindSpore所支持的模型与模型后端,可在[模型支持列表](../../../user_guide/supported_models/models_list/models_list.md)中进行查询。 -- `MINDFORMERS_MODEL_CONFIG`:模型配置文件。用户可以在[MindSpore Transformers工程](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3/deepseek_r1_671b)中,找到对应模型的yaml文件[predict_deepseek_r1_671b_w8a8.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8_ep4t4.yaml) 。 +- `MINDFORMERS_MODEL_CONFIG`:模型配置文件。用户可以在[MindSpore Transformers工程](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3/deepseek_r1_671b)中,找到对应模型的yaml文件[predict_deepseek_r1_671b_w8a8.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8_ep4tp4.yaml)。 模型并行策略通过配置文件中的`parallel_config`指定,例如DP4TP4EP4 混合并行配置如下所示: diff --git a/tutorials/source_en/custom_program/operation/op_customopbuilder.md b/tutorials/source_en/custom_program/operation/op_customopbuilder.md index 2f92966c49679a6cb20c5efe9cbc9670adbc4cd8..8f75e242f56ce48f0f937063fa69a141cf9115bb 100644 --- a/tutorials/source_en/custom_program/operation/op_customopbuilder.md +++ b/tutorials/source_en/custom_program/operation/op_customopbuilder.md @@ -27,7 +27,7 @@ As shown in the figure, the operator execution process in MindSpore's dynamic gr ## Custom Operators Support Multi-Stage Pipeline through PyboostRunner -The dynamic graph multi-stage pipeline involves a complex invocation process with many interfaces and data structures. To simplify the integration of custom operators into dynamic graphs, MindSpore encapsulates the [PyboostRunner class](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/cpp_api_for_custom_ops.html#class-PyboostRunner). +The dynamic graph multi-stage pipeline involves a complex invocation process with many interfaces and data structures. To simplify the integration of custom operators into dynamic graphs, MindSpore encapsulates the [PyboostRunner class](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/cpp_api_for_custom_ops.html#class-pyboostrunner). Below is an example demonstrating the integration process of custom operators into a dynamic graph: diff --git a/tutorials/source_en/custom_program/operation/op_customopbuilder_atb.md b/tutorials/source_en/custom_program/operation/op_customopbuilder_atb.md index ef1b86cd8e44c9ad6379c81cb3587ac4fd5a7e55..5347fbf93aa74b2c0bfc8aefb4efdea0d8bbc44e 100644 --- a/tutorials/source_en/custom_program/operation/op_customopbuilder_atb.md +++ b/tutorials/source_en/custom_program/operation/op_customopbuilder_atb.md @@ -12,7 +12,7 @@ In [Custom Operators Based on CustomOpBuilder](https://www.mindspore.cn/tutorial In the complete [ATB operator workflow](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/developmentguide/acce/ascendtb/ascendtb_0037.html), users need to execute steps such as constructing `Param`, creating `Operation` and `Context`, setting `variantPack` (operator input-output tensors), calling `Setup`, calling `Execute`, and destroying `Context` and `Operation`. However, for a single operator, its `Operation` only depends on operator attributes (`Param`), and its `Context` only depends on the stream, both of which can be reused. Therefore, MindSpore provides a cache to store these data structures, avoiding unnecessary time consumption caused by repeated creation and destruction. -When integrating ATB operators using the [AtbOpRunner class](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/cpp_api_for_custom_ops.html#class-AtbOpRunner), users only need to provide a corresponding hash function for `Param` (used as the key for caching `Operation`) and call the `Init` interface for initialization (constructing `Operation`), followed by the `Run` interface to execute the ATB operator. Additionally, users can directly call the [RunAtbOp](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/cpp_api_for_custom_ops.html#function-runatbop) function for one-click execution (the function internally includes calls to both `Init` and `Run` interfaces). +When integrating ATB operators using the [AtbOpRunner class](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/cpp_api_for_custom_ops.html#class-atboprunner), users only need to provide a corresponding hash function for `Param` (used as the key for caching `Operation`) and call the `Init` interface for initialization (constructing `Operation`), followed by the `Run` interface to execute the ATB operator. Additionally, users can directly call the [RunAtbOp](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/cpp_api_for_custom_ops.html#function-runatbop) function for one-click execution (the function internally includes calls to both `Init` and `Run` interfaces). This guide uses `SwiGLU` as an example to demonstrate the ATB operator integration process. The complete code can be found in the [code repository](https://gitee.com/mindspore/mindspore/blob/master/tests/st/graph_kernel/custom/jit_test_files/atb_swiglu.cpp). diff --git a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder.md b/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder.md index 90f3959c190b245014c4b8873ceb14b6de0ba009..63c837bc097a5271b474761af3bc9dd7eb12980f 100644 --- a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder.md +++ b/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder.md @@ -27,7 +27,7 @@ MindSpore以Python作为前端,用C++实现后端,每个算子执行时需 ## 自定义算子通过PyboostRunner支持多级流水 -动态图多级流水的调用流程较复杂,涉及的接口和数据结构较多,为了方便用户在动态图接入自定义算子,MindSpore封装了[PyboostRunner类](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/cpp_api_for_custom_ops.html#class-PyboostRunner)。 +动态图多级流水的调用流程较复杂,涉及的接口和数据结构较多,为了方便用户在动态图接入自定义算子,MindSpore封装了[PyboostRunner类](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/cpp_api_for_custom_ops.html#class-pyboostrunner)。 下面以一个例子演示动态图自定义算子的接入流程: diff --git a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_atb.md b/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_atb.md index cc110d200be1b9890985e106e6c13b81830a0951..a2dfc37f0dc304f312267485bc8fbe6d1f92330b 100644 --- a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_atb.md +++ b/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_atb.md @@ -12,7 +12,7 @@ 在完整的[ATB算子的调用流程](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/developmentguide/acce/ascendtb/ascendtb_0037.html)中,用户需要执行 构造`Param`、创建`Operation`和`Context`、设置`variantPack`(算子输入输出张量)、调用`Setup`、调用`Execute`、销毁`Context`和`Operation` 等流程。但是对于一个算子来说,其`Operation`仅依赖于算子属性(`Param`),其`Context`仅依赖于流(stream),且都是可以复用的,因此MindSpore提供了一个缓存,将这些数据结构放在缓存中,避免多次创建和销毁带来不必要的时间消耗。 -用户基于 [AtbOpRunner类](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/cpp_api_for_custom_ops.html#class-AtbOpRunner) 对接ATB算子时,仅需要提供相应`Param`的哈希函数(作为缓存`Operation`的键值),并调用`Init`接口初始化(即构造`Operation`),再调用`Run`接口即可执行ATB算子。还可以直接调用 [RunAtbOp](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/cpp_api_for_custom_ops.html#function-runatbop)函数一键执行(函数内包含了`Init`和`Run`接口的调用)。 +用户基于 [AtbOpRunner类](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/cpp_api_for_custom_ops.html#class-atboprunner) 对接ATB算子时,仅需要提供相应`Param`的哈希函数(作为缓存`Operation`的键值),并调用`Init`接口初始化(即构造`Operation`),再调用`Run`接口即可执行ATB算子。还可以直接调用 [RunAtbOp](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/cpp_api_for_custom_ops.html#function-runatbop)函数一键执行(函数内包含了`Init`和`Run`接口的调用)。 本指南以一个`SwiGLU`为例,展示ATB算子的接入流程。完整代码请参阅[代码仓库](https://gitee.com/mindspore/mindspore/blob/master/tests/st/graph_kernel/custom/jit_test_files/atb_swiglu.cpp)。