diff --git a/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md b/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md index dab574a711c8e5c7351e237e9dc77949f05f0150..c87c84a74eeb434e5a8f069004afe3e347f42411 100644 --- a/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md +++ b/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md @@ -254,7 +254,7 @@ By comparing the loss and local norm of the first step (step1) and the second st #### Comparison of Step1 Losses -After fixing the weights, dataset, and randomness, the difference in the loss value of the first step of training is compared. The loss value of the first step is obtained from the forward computation of the network. If the difference with the benchmark loss is large, it can be determined that there is an precision difference in the forward computation, which may be due to the model structure is not aligned, and the precision of the operator is abnormal. The tensor values of each layer of MindSpore and PyTorch can be obtained by printing or Dump tool. Currently, the tool does not have automatic comparison function, users need to manually identify the correspondence for comparison. For the introduction of MindSpore Dump tool, please refer to [Introduction of Precision Debugging Tools](#introduction-to-precision-debugging-tools), and for the use of PyTorch Dump tool, please refer to [Function Explanation of Precision Tools](https://gitee.com/ascend/mstt/tree/master/debug/accuracy_tools/msprobe/docs/05.data_dump_PyTorch.md). +After fixing the weights, dataset, and randomness, the difference in the loss value of the first step of training is compared. The loss value of the first step is obtained from the forward computation of the network. If the difference with the benchmark loss is large, it can be determined that there is an precision difference in the forward computation, which may be due to the model structure is not aligned, and the precision of the operator is abnormal. The tensor values of each layer of MindSpore and PyTorch can be obtained by printing or Dump tool. Currently, the tool does not have automatic comparison function, users need to manually identify the correspondence for comparison. For the introduction of MindSpore Dump tool, please refer to [Introduction of Precision Debugging Tools](#introduction-to-precision-debugging-tools), and for the use of PyTorch Dump tool, please refer to [Function Explanation of Precision Tools](https://gitee.com/ascend/mstt/blob/master/debug/accuracy_tools/msprobe/docs/05.data_dump_PyTorch.md). Find the correspondence of layers through PyTorch api_stack_dump.pkl file, and MindSpore statistic.csv file, and initially determine the degree of difference between input and output through max, min, and L2Norm. If you need further comparison, you can load the corresponding npy data for detailed comparison. diff --git a/docs/mindformers/docs/source_en/feature/safetensors.md b/docs/mindformers/docs/source_en/feature/safetensors.md index bc28fe03d4d9505a425d27010e0e354a574c9b40..50d0463165c809772c2d35cd754578b1ccdff5bb 100644 --- a/docs/mindformers/docs/source_en/feature/safetensors.md +++ b/docs/mindformers/docs/source_en/feature/safetensors.md @@ -227,7 +227,7 @@ parallel_config: # Configure the target distr pipeline_stage: 1 ``` -In large cluster scale scenarios, to avoid the online merging process taking too long and occupying training resources, it is recommended to pass in the original distributed weights file after [merge complete weights](#weights_merging) offline, when there is no need to pass in the path of the source cut-partitioning strategy file. +In large cluster scale scenarios, to avoid the online merging process taking too long and occupying training resources, it is recommended to pass in the original distributed weights file after [merge complete weights](#weight-merging) offline, when there is no need to pass in the path of the source cut-partitioning strategy file. ### Special Scenarios diff --git a/docs/mindformers/docs/source_en/introduction/models.md b/docs/mindformers/docs/source_en/introduction/models.md index adad824a6d0b3919b007164c024214747bbeb6f4..704f994b1f2fe0acf0e56ec133a067e81ee2aadf 100644 --- a/docs/mindformers/docs/source_en/introduction/models.md +++ b/docs/mindformers/docs/source_en/introduction/models.md @@ -6,32 +6,32 @@ The following table lists models supported by MindFormers. | Model | Specifications | Model Type | Latest Version | |:--------------------------------------------------------------------------------------------------------|:------------------------------|:----------------:|:----------------------:| -| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3) | 671B | Sparse LLM | In-development version, 1.5.0 | +| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3) | 671B | Sparse LLM | In-development version, 1.5.0 | | [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | Dense LLM | In-development version, 1.5.0 | -| [Llama3.1](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1) | 8B/70B | Dense LLM | In-development version, 1.5.0 | -| [Qwen2.5](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | Dense LLM | In-development version, 1.5.0 | -| [TeleChat2](https://gitee.com/mindspore/mindformers/blob/dev/research/telechat2) | 7B/35B/115B | Dense LLM | In-development version, 1.5.0 | +| [Llama3.1](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3_1) | 8B/70B | Dense LLM | In-development version, 1.5.0 | +| [Qwen2.5](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | Dense LLM | In-development version, 1.5.0 | +| [TeleChat2](https://gitee.com/mindspore/mindformers/tree/dev/research/telechat2) | 7B/35B/115B | Dense LLM | In-development version, 1.5.0 | | [CodeLlama](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/codellama.md) | 34B | Dense LLM | 1.5.0 | | [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_image.md) | 19B | MM | 1.5.0 | | [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_video.md) | 13B | MM | 1.5.0 | -| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek2) | 236B | Sparse LLM | 1.5.0 | -| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek1_5) | 7B | Dense LLM | 1.5.0 | -| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek) | 33B | Dense LLM | 1.5.0 | -| [GLM3-32K](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/glm32k) | 6B | Dense LLM | 1.5.0 | +| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/deepseek2) | 236B | Sparse LLM | 1.5.0 | +| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/deepseek1_5) | 7B | Dense LLM | 1.5.0 | +| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/deepseek) | 33B | Dense LLM | 1.5.0 | +| [GLM3-32K](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/glm32k) | 6B | Dense LLM | 1.5.0 | | [GLM3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/glm3.md) | 6B | Dense LLM | 1.5.0 | -| [InternLM2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/internlm2) | 7B/20B | Dense LLM | 1.5.0 | +| [InternLM2](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/internlm2) | 7B/20B | Dense LLM | 1.5.0 | | [Llama3.2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama3_2.md) | 3B | Dense LLM | 1.5.0 | | [Llama3.2-Vision](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/mllama.md) | 11B | MM | 1.5.0 | -| [Llama3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/llama3) | 8B/70B | Dense LLM | 1.5.0 | +| [Llama3](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/llama3) | 8B/70B | Dense LLM | 1.5.0 | | [Llama2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama2.md) | 7B/13B/70B | Dense LLM | 1.5.0 | -| [Mixtral](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/mixtral) | 8x7B | Sparse LLM | 1.5.0 | -| [Qwen2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | Dense /Sparse LLM | 1.5.0 | -| [Qwen1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen1_5) | 7B/14B/72B | Dense LLM | 1.5.0 | -| [Qwen-VL](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwenvl) | 9.6B | MM | 1.5.0 | -| [TeleChat](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/telechat) | 7B/12B/52B | Dense LLM | 1.5.0 | +| [Mixtral](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/mixtral) | 8x7B | Sparse LLM | 1.5.0 | +| [Qwen2](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | Dense /Sparse LLM | 1.5.0 | +| [Qwen1.5](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwen1_5) | 7B/14B/72B | Dense LLM | 1.5.0 | +| [Qwen-VL](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwenvl) | 9.6B | MM | 1.5.0 | +| [TeleChat](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/telechat) | 7B/12B/52B | Dense LLM | 1.5.0 | | [Whisper](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/whisper.md) | 1.5B | MM | 1.5.0 | -| [Yi](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yi) | 6B/34B | Dense LLM | 1.5.0 | -| [YiZhao](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yizhao) | 12B | Dense LLM | 1.5.0 | +| [Yi](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/yi) | 6B/34B | Dense LLM | 1.5.0 | +| [YiZhao](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/yizhao) | 12B | Dense LLM | 1.5.0 | | [Baichuan2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/baichuan2/baichuan2.md) | 7B/13B | Dense LLM | 1.3.2 | | [GLM2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/glm2.md) | 6B | Dense LLM | 1.3.2 | | [GPT2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/gpt2.md) | 124M/13B | Dense LLM | 1.3.2 | diff --git a/docs/mindformers/docs/source_zh_cn/introduction/models.md b/docs/mindformers/docs/source_zh_cn/introduction/models.md index d0fb9be38ca532190f799b23faaf8ef2f2214330..8771f0b5470508e019c942fcacffc2807e9ca1bf 100644 --- a/docs/mindformers/docs/source_zh_cn/introduction/models.md +++ b/docs/mindformers/docs/source_zh_cn/introduction/models.md @@ -6,32 +6,32 @@ | 模型名 | 支持规格 | 模型类型 | 最新支持版本 | |:--------------------------------------------------------------------------------------------------------|:------------------------------|:--------:|:----------:| -| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3) | 671B | 稀疏LLM | 在研版本、1.5.0 | +| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3) | 671B | 稀疏LLM | 在研版本、1.5.0 | | [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | 稠密LLM | 在研版本、1.5.0 | -| [Llama3.1](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1) | 8B/70B | 稠密LLM | 在研版本、1.5.0 | -| [Qwen2.5](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | 稠密LLM | 在研版本、1.5.0 | -| [TeleChat2](https://gitee.com/mindspore/mindformers/blob/dev/research/telechat2) | 7B/35B/115B | 稠密LLM | 在研版本、1.5.0 | +| [Llama3.1](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3_1) | 8B/70B | 稠密LLM | 在研版本、1.5.0 | +| [Qwen2.5](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | 稠密LLM | 在研版本、1.5.0 | +| [TeleChat2](https://gitee.com/mindspore/mindformers/tree/dev/research/telechat2) | 7B/35B/115B | 稠密LLM | 在研版本、1.5.0 | | [CodeLlama](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/codellama.md) | 34B | 稠密LLM | 1.5.0 | | [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_image.md) | 19B | MM | 1.5.0 | | [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_video.md) | 13B | MM | 1.5.0 | -| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek2) | 236B | 稀疏LLM | 1.5.0 | -| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek1_5) | 7B | 稠密LLM | 1.5.0 | -| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek) | 33B | 稠密LLM | 1.5.0 | -| [GLM3-32K](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/glm32k) | 6B | 稠密LLM | 1.5.0 | +| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/deepseek2) | 236B | 稀疏LLM | 1.5.0 | +| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/deepseek1_5) | 7B | 稠密LLM | 1.5.0 | +| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/deepseek) | 33B | 稠密LLM | 1.5.0 | +| [GLM3-32K](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/glm32k) | 6B | 稠密LLM | 1.5.0 | | [GLM3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/glm3.md) | 6B | 稠密LLM | 1.5.0 | -| [InternLM2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/internlm2) | 7B/20B | 稠密LLM | 1.5.0 | +| [InternLM2](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/internlm2) | 7B/20B | 稠密LLM | 1.5.0 | | [Llama3.2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama3_2.md) | 3B | 稠密LLM | 1.5.0 | | [Llama3.2-Vision](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/mllama.md) | 11B | MM | 1.5.0 | -| [Llama3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/llama3) | 8B/70B | 稠密LLM | 1.5.0 | +| [Llama3](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/llama3) | 8B/70B | 稠密LLM | 1.5.0 | | [Llama2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama2.md) | 7B/13B/70B | 稠密LLM | 1.5.0 | -| [Mixtral](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/mixtral) | 8x7B | 稀疏LLM | 1.5.0 | -| [Qwen2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | 稠密/稀疏LLM | 1.5.0 | -| [Qwen1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen1_5) | 7B/14B/72B | 稠密LLM | 1.5.0 | -| [Qwen-VL](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwenvl) | 9.6B | MM | 1.5.0 | -| [TeleChat](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/telechat) | 7B/12B/52B | 稠密LLM | 1.5.0 | +| [Mixtral](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/mixtral) | 8x7B | 稀疏LLM | 1.5.0 | +| [Qwen2](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | 稠密/稀疏LLM | 1.5.0 | +| [Qwen1.5](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwen1_5) | 7B/14B/72B | 稠密LLM | 1.5.0 | +| [Qwen-VL](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwenvl) | 9.6B | MM | 1.5.0 | +| [TeleChat](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/telechat) | 7B/12B/52B | 稠密LLM | 1.5.0 | | [Whisper](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/whisper.md) | 1.5B | MM | 1.5.0 | -| [Yi](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yi) | 6B/34B | 稠密LLM | 1.5.0 | -| [YiZhao](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yizhao) | 12B | 稠密LLM | 1.5.0 | +| [Yi](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/yi) | 6B/34B | 稠密LLM | 1.5.0 | +| [YiZhao](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/yizhao) | 12B | 稠密LLM | 1.5.0 | | [Baichuan2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/baichuan2/baichuan2.md) | 7B/13B | 稠密LLM | 1.3.2 | | [GLM2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/glm2.md) | 6B | 稠密LLM | 1.3.2 | | [GPT2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/gpt2.md) | 124M/13B | 稠密LLM | 1.3.2 | diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md index f088336aa7a953454ca0bf95d4963a7d81a83cf8..7a4c8c9165f52362b0c9c926d16277d9d5d82c18 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md @@ -8,7 +8,7 @@ This document describes the steps to install the vLLM MindSpore environment. Thr - [Pip Installation](#pip-installation): Suitable for scenarios requiring specific versions. - [Source Code Installation](#source-code-installation): Suitable for incremental development of vLLM MindSpore. -## Version Compatibility +## Version Compatibility - OS: Linux-aarch64 - Python: 3.9 / 3.10 / 3.11 @@ -24,15 +24,15 @@ This document describes the steps to install the vLLM MindSpore environment. Thr | [vLLM](https://github.com/vllm-project/vllm) | 0.8.3 | v0.8.3 | | [vLLM MindSpore](https://gitee.com/mindspore/vllm-mindspore) | 0.2 | master | -## Environment Setup +## Environment Setup This section introduces three installation methods: [Docker Installation](#docker-installation), [Pip Installation](#pip-installation), [Source Code Installation](#source-code-installation), and [Quick Verification](#quick-verification) example to check the installation. -### Docker Installation +### Docker Installation We recommend using Docker for quick deployment of the vLLM MindSpore environment. Below are the steps: -#### Pulling the Image +#### Pulling the Image Execute the following command to pull the vLLM MindSpore Docker image: @@ -46,7 +46,7 @@ During the pull process, user will see the progress of each layer. After success docker images ``` -#### Creating a Container +#### Creating a Container After [pulling the image](#pulling-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, then execute the following command to create the container: @@ -88,7 +88,7 @@ The container ID will be returned if docker is created successfully. User can al docker ps ``` -#### Entering the Container +#### Entering the Container After [creating the container](#creating-a-container), user can start and enter the container, using the environment variable `DOCKER_NAME`: @@ -96,7 +96,7 @@ After [creating the container](#creating-a-container), user can start and enter docker exec -it $DOCKER_NAME bash ``` -### Pip Installation +### Pip Installation Use pip to install vLLM MindSpore, by executing the following command: @@ -104,7 +104,7 @@ Use pip to install vLLM MindSpore, by executing the following command: pip install vllm_mindspore ``` -### Source Code Installation +### Source Code Installation - **CANN Installation** For CANN installation methods and environment configuration, please refer to [CANN Community Edition Installation Guide](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=openEuler&Software=cannToolKit). If you encounter any issues during CANN installation, please consult the [Ascend FAQ](https://www.hiascend.com/document/detail/zh/AscendFAQ/ProduTech/CANNFAQ/cannfaq_000.html) for troubleshooting. @@ -153,7 +153,7 @@ pip install vllm_mindspore export PYTHONPATH=$MF_PATH:$PYTHONPATH ``` -### Quick Verification +### Quick Verification To verify the installation, run a simple offline inference test with [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct): diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md index 007bdab99ef7f181c9102280ec54feca98e1a3b3..9df4c2668ec20a31fd298fac908e1fb3dd3add9f 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md @@ -91,7 +91,7 @@ snapshot_download( `local_dir` is the user-specified path to save the model. Ensure sufficient disk space is available. -### Downloading with git-lfs Tool +### Downloading with git-lfs Tool Run the following command to verify if [git-lfs](https://git-lfs.com) is available: diff --git a/tutorials/source_en/model_infer/ms_infer/llm_inference_overview.md b/tutorials/source_en/model_infer/ms_infer/llm_inference_overview.md index 49baacc1c51290eca66f8999c7109cc82b5fccb5..bf99af845597c129b727640198653ec1ede90bd3 100644 --- a/tutorials/source_en/model_infer/ms_infer/llm_inference_overview.md +++ b/tutorials/source_en/model_infer/ms_infer/llm_inference_overview.md @@ -103,7 +103,7 @@ pip install mindspore pip install mindformers ``` -You can also install the Python package that adapts to your environment by referring to the official installation document. For details, see [MindSpore Installation](https://www.mindspore.cn/install/en) and [MindFormers Installation](https://www.mindspore.cn/mindformers/docs/en/dev/quick_start/install.html). +You can also install the Python package that adapts to your environment by referring to the official installation document. For details, see [MindSpore Installation](https://www.mindspore.cn/install/en) and [MindFormers Installation](https://www.mindspore.cn/mindformers/docs/en/dev/installation.html). If you wish to use model quantization to enhance inference performance, you need to install the mindspore_gs package. For details, see [Installing MindSpore Golden Stick](https://www.mindspore.cn/golden_stick/docs/en/master/install.html). @@ -209,7 +209,7 @@ In addition to utilizing the capabilities provided by the MindFormers model suit For large language models with many model parameters, such as Llama2-70B and Qwen2-72B, the parameter scale usually exceeds the memory capacity of a GPU or NPU. Therefore, multi-device parallel inference is required. MindSpore large language model inference can shard the original large language model into N parallel models so that they can be executed on multiple devices in parallel. This not only enables inference for super-large models but also enhances performance by leveraging more resources from the multiple devices. The model scripts provided by the MindFormers model suite can be used to shard a model into multi-device models for execution. You can perform the following steps to deploy the model on multiple devices. -- **Weight sharding**: Because the original weight files are too large, when executing on multiple devices, the overall weight needs to be sharded into multiple weights for each device and passed to the model process corresponding to each device. You can use the script in the MindFormers model suite to perform weight sharding. For details, see [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/function/ckpt.html). +- **Weight sharding**: Because the original weight files are too large, when executing on multiple devices, the overall weight needs to be sharded into multiple weights for each device and passed to the model process corresponding to each device. You can use the script in the MindFormers model suite to perform weight sharding. For details, see [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html). Here is an example of how to shard the Llama2-7B model for parallel execution on two devices. diff --git a/tutorials/source_zh_cn/model_infer/ms_infer/llm_inference_overview.md b/tutorials/source_zh_cn/model_infer/ms_infer/llm_inference_overview.md index 309783672e5847ba80a9a06016f9a3aeef746c53..b5aff2f20428186c4d1fe746d83d92c98059aded 100644 --- a/tutorials/source_zh_cn/model_infer/ms_infer/llm_inference_overview.md +++ b/tutorials/source_zh_cn/model_infer/ms_infer/llm_inference_overview.md @@ -103,7 +103,7 @@ pip install mindspore pip install mindformers ``` -同时,用户也可以参考官方安装文档来安装自己环境适配的Python包,具体见[MindSpore安装](https://www.mindspore.cn/install)和[MindFormers安装](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/quick_start/install.html)。 +同时,用户也可以参考官方安装文档来安装自己环境适配的Python包,具体见[MindSpore安装](https://www.mindspore.cn/install)和[MindFormers安装](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/installation.html)。 如果用户需要使用模型量化能力提升模型推理性能,还需要安装mindspore_gs包,具体可以参考[MindSpore GoldenStick安装](https://www.mindspore.cn/golden_stick/docs/zh-CN/master/install.html)。 @@ -209,7 +209,7 @@ model = AutoModel.from_config(config) 对于模型参数比较多的大语言模型,如Llama2-70B、Qwen2-72B,由于其参数规模通常会超过一张GPU或者NPU的内存容量,因此需要采用多卡并行推理,MindSpore大语言模型推理支持将原始大语言模型切分成N份可并行的子模型,使其能够分别在多卡上并行执行,在实现超大模型推理同时,也利用多卡中更多的资源提升性能。MindFormers模型套件提供的模型脚本天然支持将模型切分成多卡模型执行,用户可以通过以下步骤在多卡上部署模型。 -- **权重切分**:由于原来的权重文件太大,多卡执行时,需要将整体权重切分成每张卡上的多份权重,分别传给每张卡对应的模型进程。用户可以使用MindFormers模型套件中的脚本来进行权重切分。具体可以参考[权重转换](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/ckpt.html)。 +- **权重切分**:由于原来的权重文件太大,多卡执行时,需要将整体权重切分成每张卡上的多份权重,分别传给每张卡对应的模型进程。用户可以使用MindFormers模型套件中的脚本来进行权重切分。具体可以参考[权重转换](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)。 下面以Llama2-7B大语言模型为例,简单描述一下将模型切分为2卡并行的操作: