diff --git a/docs/mindformers/docs/source_en/feature/configuration.md b/docs/mindformers/docs/source_en/feature/configuration.md index 6774effc48bf8261abaf731e93d9803d1f9a75f1..9b46c7089a39b3b4bff9e48953f9d6aa735ded63 100644 --- a/docs/mindformers/docs/source_en/feature/configuration.md +++ b/docs/mindformers/docs/source_en/feature/configuration.md @@ -186,7 +186,7 @@ In order to improve the performance of the model, it is usually necessary to con | recompute_config.select_recompute_exclude | Disable recomputation for the specified operator, valid only for the Primitive operators. | bool/list | | recompute_config.select_comm_recompute_exclude | Disable communication recomputation for the specified operator, valid only for the Primitive operators. | bool/list | -2. MindSpore Transformers provides fine-grained activations SWAP-related configurations to reduce the memory footprint of the model during training, see [Fine-Grained Activations SWAP](https://www.mindspore.cn/mindformers/docs/en/dev/feature/fine_grained_activations_swap.html) for details. +2. MindSpore Transformers provides fine-grained activations SWAP-related configurations to reduce the memory footprint of the model during training, see [Fine-Grained Activations SWAP](https://www.mindspore.cn/mindformers/docs/en/dev/feature/memory_optimization.html#fine-grained-activations-swap) for details. | Parameters | Descriptions | Types | |----------------------------------------------------|---------------------------------------------------------------------------------------------------------|-----------------| diff --git a/docs/mindformers/docs/source_en/introduction/models.md b/docs/mindformers/docs/source_en/introduction/models.md index 48468134cf399c8a63b7d51177dc149fdb62af90..adad824a6d0b3919b007164c024214747bbeb6f4 100644 --- a/docs/mindformers/docs/source_en/introduction/models.md +++ b/docs/mindformers/docs/source_en/introduction/models.md @@ -6,50 +6,56 @@ The following table lists models supported by MindFormers. | Model | Specifications | Model Type | Latest Version | |:--------------------------------------------------------------------------------------------------------|:------------------------------|:----------------:|:----------------------:| -| [CodeLlama](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/codellama.md) | 34B | Dense LLM | In-development version | -| [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/cogvlm2_image.md) | 19B | MM | In-development version | -| [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/cogvlm2_video.md) | 13B | MM | In-development version | -| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3) | 671B | Sparse LLM | In-development version | -| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek2) | 236B | Sparse LLM | In-development version | -| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek1_5) | 7B | Dense LLM | In-development version | -| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek) | 33B | Dense LLM | In-development version | -| [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | Dense LLM | In-development version | -| [GLM3-32K](https://gitee.com/mindspore/mindformers/tree/dev/research/glm32k) | 6B | Dense LLM | In-development version | -| [GLM3](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm3.md) | 6B | Dense LLM | In-development version | -| [InternLM2](https://gitee.com/mindspore/mindformers/tree/dev/research/internlm2) | 7B/20B | Dense LLM | In-development version | -| [Llama3.1](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3_1) | 8B/70B | Dense LLM | In-development version | -| [Llama3](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3) | 8B/70B | Dense LLM | In-development version | -| [Llama2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md) | 7B/13B/70B | Dense LLM | In-development version | -| [Mixtral](https://gitee.com/mindspore/mindformers/tree/dev/research/mixtral) | 8x7B | Sparse LLM | In-development version | -| [Qwen2](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | Dense/Sparse LLM | In-development version | -| [Qwen1.5](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen1_5) | 7B/14B/72B | Dense LLM | In-development version | -| [Qwen-VL](https://gitee.com/mindspore/mindformers/tree/dev/research/qwenvl) | 9.6B | MM | In-development version | -| [Whisper](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/whisper.md) | 1.5B | MM | In-development version | -| [Yi](https://gitee.com/mindspore/mindformers/tree/dev/research/yi) | 6B/34B | Dense LLM | In-development version | -| [Baichuan2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/baichuan2/baichuan2.md) | 7B/13B | Dense LLM | 1.3.2 | -| [GLM2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/glm2.md) | 6B | Dense LLM | 1.3.2 | -| [GPT2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/gpt2.md) | 124M/13B | Dense LLM | 1.3.2 | -| [InternLM](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/internlm/internlm.md) | 7B/20B | Dense LLM | 1.3.2 | -| [Qwen](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/qwen/qwen.md) | 7B/14B | Dense LLM | 1.3.2 | -| [CodeGeex2](https://gitee.com/mindspore/mindformers/blob/r1.1.0/docs/model_cards/codegeex2.md) | 6B | Dense LLM | 1.1.0 | -| [WizardCoder](https://gitee.com/mindspore/mindformers/blob/r1.1.0/research/wizardcoder/wizardcoder.md) | 15B | Dense LLM | 1.1.0 | -| [Baichuan](https://gitee.com/mindspore/mindformers/blob/r1.0/research/baichuan/baichuan.md) | 7B/13B | Dense LLM | 1.0 | -| [Blip2](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/blip2.md) | 8.1B | MM | 1.0 | -| [Bloom](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/bloom.md) | 560M/7.1B/65B/176B | Dense LLM | 1.0 | -| [Clip](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/clip.md) | 149M/428M | MM | 1.0 | -| [CodeGeex](https://gitee.com/mindspore/mindformers/blob/r1.0/research/codegeex/codegeex.md) | 13B | Dense LLM | 1.0 | -| [GLM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/glm.md) | 6B | Dense LLM | 1.0 | -| [iFlytekSpark](https://gitee.com/mindspore/mindformers/blob/r1.0/research/iflytekspark/iflytekspark.md) | 13B | Dense LLM | 1.0 | -| [Llama](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/llama.md) | 7B/13B | Dense LLM | 1.0 | -| [MAE](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/mae.md) | 86M | MM | 1.0 | -| [Mengzi3](https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md) | 13B | Dense LLM | 1.0 | -| [PanguAlpha](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/pangualpha.md) | 2.6B/13B | Dense LLM | 1.0 | -| [SAM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/sam.md) | 91M/308M/636M | MM | 1.0 | -| [Skywork](https://gitee.com/mindspore/mindformers/blob/r1.0/research/skywork/skywork.md) | 13B | Dense LLM | 1.0 | -| [Swin](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/swin.md) | 88M | MM | 1.0 | -| [T5](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/t5.md) | 14M/60M | Dense LLM | 1.0 | -| [VisualGLM](https://gitee.com/mindspore/mindformers/blob/r1.0/research/visualglm/visualglm.md) | 6B | MM | 1.0 | -| [Ziya](https://gitee.com/mindspore/mindformers/blob/r1.0/research/ziya/ziya.md) | 13B | Dense LLM | 1.0 | -| [Bert](https://gitee.com/mindspore/mindformers/blob/r0.8/docs/model_cards/bert.md) | 4M/110M | Dense LLM | 0.8 | +| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3) | 671B | Sparse LLM | In-development version, 1.5.0 | +| [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | Dense LLM | In-development version, 1.5.0 | +| [Llama3.1](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1) | 8B/70B | Dense LLM | In-development version, 1.5.0 | +| [Qwen2.5](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | Dense LLM | In-development version, 1.5.0 | +| [TeleChat2](https://gitee.com/mindspore/mindformers/blob/dev/research/telechat2) | 7B/35B/115B | Dense LLM | In-development version, 1.5.0 | +| [CodeLlama](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/codellama.md) | 34B | Dense LLM | 1.5.0 | +| [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_image.md) | 19B | MM | 1.5.0 | +| [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_video.md) | 13B | MM | 1.5.0 | +| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek2) | 236B | Sparse LLM | 1.5.0 | +| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek1_5) | 7B | Dense LLM | 1.5.0 | +| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek) | 33B | Dense LLM | 1.5.0 | +| [GLM3-32K](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/glm32k) | 6B | Dense LLM | 1.5.0 | +| [GLM3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/glm3.md) | 6B | Dense LLM | 1.5.0 | +| [InternLM2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/internlm2) | 7B/20B | Dense LLM | 1.5.0 | +| [Llama3.2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama3_2.md) | 3B | Dense LLM | 1.5.0 | +| [Llama3.2-Vision](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/mllama.md) | 11B | MM | 1.5.0 | +| [Llama3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/llama3) | 8B/70B | Dense LLM | 1.5.0 | +| [Llama2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama2.md) | 7B/13B/70B | Dense LLM | 1.5.0 | +| [Mixtral](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/mixtral) | 8x7B | Sparse LLM | 1.5.0 | +| [Qwen2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | Dense /Sparse LLM | 1.5.0 | +| [Qwen1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen1_5) | 7B/14B/72B | Dense LLM | 1.5.0 | +| [Qwen-VL](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwenvl) | 9.6B | MM | 1.5.0 | +| [TeleChat](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/telechat) | 7B/12B/52B | Dense LLM | 1.5.0 | +| [Whisper](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/whisper.md) | 1.5B | MM | 1.5.0 | +| [Yi](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yi) | 6B/34B | Dense LLM | 1.5.0 | +| [YiZhao](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yizhao) | 12B | Dense LLM | 1.5.0 | +| [Baichuan2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/baichuan2/baichuan2.md) | 7B/13B | Dense LLM | 1.3.2 | +| [GLM2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/glm2.md) | 6B | Dense LLM | 1.3.2 | +| [GPT2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/gpt2.md) | 124M/13B | Dense LLM | 1.3.2 | +| [InternLM](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/internlm/internlm.md) | 7B/20B | Dense LLM | 1.3.2 | +| [Qwen](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/qwen/qwen.md) | 7B/14B | Dense LLM | 1.3.2 | +| [CodeGeex2](https://gitee.com/mindspore/mindformers/blob/r1.1.0/docs/model_cards/codegeex2.md) | 6B | Dense LLM | 1.1.0 | +| [WizardCoder](https://gitee.com/mindspore/mindformers/blob/r1.1.0/research/wizardcoder/wizardcoder.md) | 15B | Dense LLM | 1.1.0 | +| [Baichuan](https://gitee.com/mindspore/mindformers/blob/r1.0/research/baichuan/baichuan.md) | 7B/13B | Dense LLM | 1.0 | +| [Blip2](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/blip2.md) | 8.1B | MM | 1.0 | +| [Bloom](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/bloom.md) | 560M/7.1B/65B/176B | Dense LLM | 1.0 | +| [Clip](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/clip.md) | 149M/428M | MM | 1.0 | +| [CodeGeex](https://gitee.com/mindspore/mindformers/blob/r1.0/research/codegeex/codegeex.md) | 13B | Dense LLM | 1.0 | +| [GLM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/glm.md) | 6B | Dense LLM | 1.0 | +| [iFlytekSpark](https://gitee.com/mindspore/mindformers/blob/r1.0/research/iflytekspark/iflytekspark.md) | 13B | Dense LLM | 1.0 | +| [Llama](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/llama.md) | 7B/13B | Dense LLM | 1.0 | +| [MAE](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/mae.md) | 86M | MM | 1.0 | +| [Mengzi3](https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md) | 13B | Dense LLM | 1.0 | +| [PanguAlpha](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/pangualpha.md) | 2.6B/13B | Dense LLM | 1.0 | +| [SAM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/sam.md) | 91M/308M/636M | MM | 1.0 | +| [Skywork](https://gitee.com/mindspore/mindformers/blob/r1.0/research/skywork/skywork.md) | 13B | Dense LLM | 1.0 | +| [Swin](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/swin.md) | 88M | MM | 1.0 | +| [T5](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/t5.md) | 14M/60M | Dense LLM | 1.0 | +| [VisualGLM](https://gitee.com/mindspore/mindformers/blob/r1.0/research/visualglm/visualglm.md) | 6B | MM | 1.0 | +| [Ziya](https://gitee.com/mindspore/mindformers/blob/r1.0/research/ziya/ziya.md) | 13B | Dense LLM | 1.0 | +| [Bert](https://gitee.com/mindspore/mindformers/blob/r0.8/docs/model_cards/bert.md) | 4M/110M | Dense LLM | 0.8 | * ***LLM:*** *Large Language Model;* ***MM:*** *Multi-Modal* \ No newline at end of file diff --git a/docs/mindformers/docs/source_en/introduction/overview.md b/docs/mindformers/docs/source_en/introduction/overview.md index 30e98a2cc29c48591b22f3c7caccbe74248565a0..322b841f5d97a392216e580cc5ad6f5cb24c22d5 100644 --- a/docs/mindformers/docs/source_en/introduction/overview.md +++ b/docs/mindformers/docs/source_en/introduction/overview.md @@ -9,5 +9,5 @@ The overall architecture formed by MindSpore Transformers and the end-to-end AI 3. The basic functionality features currently supported by MindSpore Transformers are listed below: 1. Supports tasks such as running training and inference for large models [distributed parallelism](https://www.mindspore.cn/mindformers/docs/en/dev/feature/parallel_training.html), with parallel capabilities including data parallelism, model parallelism, ultra-long sequence parallelism; 2. Supports [model weight conversion](https://www.mindspore.cn/mindformers/docs/en/dev/feature/weight_conversion.html), [distributed weight splitting and combination](https://www.mindspore.cn/mindformers/docs/en/dev/feature/transform_weight.html), and different format of [dataset loading](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html) and [resumable training after breakpoint](https://www.mindspore.cn/mindformers/docs/en/dev/feature/resume_training.html); - 3. Support 25+ large models [pretraining](https://www.mindspore.cn/mindformers/docs/en/dev/guide/pre_training.html), [fine-tuning](https://www.mindspore.cn/mindformers/docs/en/dev/guide/supervised_fine_tuning.html), [inference](https://www.mindspore.cn/mindformers/docs/en/dev/guide/inference.html) and [evaluation] (https://www.mindspore.cn/mindformers/docs/en/dev/guide/evaluation.html). Meanwhile, it also supports [quantization](https://www.mindspore.cn/mindformers/docs/en/dev/feature/quantization.html), and the list of supported models can be found in [Model Library](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html); -4. MindSpore Transformers supports users to carry out model service deployment function through [MindIE](https://www.mindspore.cn/mindformers/docs/en/dev/guide/mindie_deployment.html), and also supports the use of [MindX]( https://www.hiascend.com/software/mindx-dl) to realize large-scale cluster scheduling; more third-party platforms will be supported in the future, please look forward to it. + 3. Support 25+ large models [pretraining](https://www.mindspore.cn/mindformers/docs/en/dev/guide/pre_training.html), [fine-tuning](https://www.mindspore.cn/mindformers/docs/en/dev/guide/supervised_fine_tuning.html), [inference](https://www.mindspore.cn/mindformers/docs/en/dev/guide/inference.html) and [evaluation] (https://www.mindspore.cn/mindformers/docs/en/dev/feature/evaluation.html). Meanwhile, it also supports [quantization](https://www.mindspore.cn/mindformers/docs/en/dev/feature/quantization.html), and the list of supported models can be found in [Model Library](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html); +4. MindSpore Transformers supports users to carry out model service deployment function through [MindIE](https://www.mindspore.cn/mindformers/docs/en/dev/guide/deployment.html), and also supports the use of [MindX]( https://www.hiascend.com/software/mindx-dl) to realize large-scale cluster scheduling; more third-party platforms will be supported in the future, please look forward to it. diff --git a/docs/mindformers/docs/source_zh_cn/feature/configuration.md b/docs/mindformers/docs/source_zh_cn/feature/configuration.md index fe6338f5cbd49fff900019f5978e7ee0e0c69d0f..b027cb38b36bb2952a529cf59f0ae923f03e9670 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/configuration.md +++ b/docs/mindformers/docs/source_zh_cn/feature/configuration.md @@ -186,7 +186,7 @@ Context配置主要用于指定[mindspore.set_context](https://www.mindspore.cn/ | recompute_config.select_recompute_exclude | 关闭指定算子的重计算,只对Primitive算子有效。 | bool/list | | recompute_config.select_comm_recompute_exclude | 关闭指定算子的通讯重计算,只对Primitive算子有效。 | bool/list | -2. MindSpore Transformers提供细粒度激活值SWAP相关配置,以降低模型在训练时的内存占用,详情可参考[细粒度激活值SWAP](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/fine_grained_activations_swap.html)。 +2. MindSpore Transformers提供细粒度激活值SWAP相关配置,以降低模型在训练时的内存占用,详情可参考[细粒度激活值SWAP](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/memory_optimization.html#%E7%BB%86%E7%B2%92%E5%BA%A6%E6%BF%80%E6%B4%BB%E5%80%BCswap)。 | 参数 | 说明 | 类型 | |------|-----|-----| diff --git a/docs/mindformers/docs/source_zh_cn/introduction/models.md b/docs/mindformers/docs/source_zh_cn/introduction/models.md index 72dd909cf3bc2bc4fa35ca6a6ed2f370dbde0ce4..d0fb9be38ca532190f799b23faaf8ef2f2214330 100644 --- a/docs/mindformers/docs/source_zh_cn/introduction/models.md +++ b/docs/mindformers/docs/source_zh_cn/introduction/models.md @@ -4,52 +4,58 @@ 当前MindSpore Transformers全量的模型列表如下: -| 模型名 | 支持规格 | 模型类型 | 最新支持版本 | -|:--------------------------------------------------------------------------------------------------------|:------------------------------|:------------:|:------:| -| [CodeLlama](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/codellama.md) | 34B | 稠密LLM | 在研版本 | -| [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/cogvlm2_image.md) | 19B | MM | 在研版本 | -| [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/cogvlm2_video.md) | 13B | MM | 在研版本 | -| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3) | 671B | 稀疏LLM | 在研版本 | -| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek2) | 236B | 稀疏LLM | 在研版本 | -| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek1_5) | 7B | 稠密LLM | 在研版本 | -| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek) | 33B | 稠密LLM | 在研版本 | -| [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | 稠密LLM | 在研版本 | -| [GLM3-32K](https://gitee.com/mindspore/mindformers/tree/dev/research/glm32k) | 6B | 稠密LLM | 在研版本 | -| [GLM3](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm3.md) | 6B | 稠密LLM | 在研版本 | -| [InternLM2](https://gitee.com/mindspore/mindformers/tree/dev/research/internlm2) | 7B/20B | 稠密LLM | 在研版本 | -| [Llama3.1](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3_1) | 8B/70B | 稠密LLM | 在研版本 | -| [Llama3](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3) | 8B/70B | 稠密LLM | 在研版本 | -| [Llama2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md) | 7B/13B/70B | 稠密LLM | 在研版本 | -| [Mixtral](https://gitee.com/mindspore/mindformers/tree/dev/research/mixtral) | 8x7B | 稀疏LLM | 在研版本 | -| [Qwen2](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | 稠密/稀疏LLM | 在研版本 | -| [Qwen1.5](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen1_5) | 7B/14B/72B | 稠密LLM | 在研版本 | -| [Qwen-VL](https://gitee.com/mindspore/mindformers/tree/dev/research/qwenvl) | 9.6B | MM | 在研版本 | -| [Whisper](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/whisper.md) | 1.5B | MM | 在研版本 | -| [Yi](https://gitee.com/mindspore/mindformers/tree/dev/research/yi) | 6B/34B | 稠密LLM | 在研版本 | -| [Baichuan2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/baichuan2/baichuan2.md) | 7B/13B | 稠密LLM | 1.3.2 | -| [GLM2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/glm2.md) | 6B | 稠密LLM | 1.3.2 | -| [GPT2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/gpt2.md) | 124M/13B | 稠密LLM | 1.3.2 | -| [InternLM](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/internlm/internlm.md) | 7B/20B | 稠密LLM | 1.3.2 | -| [Qwen](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/qwen/qwen.md) | 7B/14B | 稠密LLM | 1.3.2 | -| [CodeGeex2](https://gitee.com/mindspore/mindformers/blob/r1.1.0/docs/model_cards/codegeex2.md) | 6B | 稠密LLM | 1.1.0 | -| [WizardCoder](https://gitee.com/mindspore/mindformers/blob/r1.1.0/research/wizardcoder/wizardcoder.md) | 15B | 稠密LLM | 1.1.0 | -| [Baichuan](https://gitee.com/mindspore/mindformers/blob/r1.0/research/baichuan/baichuan.md) | 7B/13B | 稠密LLM | 1.0 | -| [Blip2](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/blip2.md) | 8.1B | MM | 1.0 | -| [Bloom](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/bloom.md) | 560M/7.1B/65B/176B | 稠密LLM | 1.0 | -| [Clip](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/clip.md) | 149M/428M | MM | 1.0 | -| [CodeGeex](https://gitee.com/mindspore/mindformers/blob/r1.0/research/codegeex/codegeex.md) | 13B | 稠密LLM | 1.0 | -| [GLM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/glm.md) | 6B | 稠密LLM | 1.0 | -| [iFlytekSpark](https://gitee.com/mindspore/mindformers/blob/r1.0/research/iflytekspark/iflytekspark.md) | 13B | 稠密LLM | 1.0 | -| [Llama](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/llama.md) | 7B/13B | 稠密LLM | 1.0 | -| [MAE](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/mae.md) | 86M | MM | 1.0 | -| [Mengzi3](https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md) | 13B | 稠密LLM | 1.0 | -| [PanguAlpha](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/pangualpha.md) | 2.6B/13B | 稠密LLM | 1.0 | -| [SAM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/sam.md) | 91M/308M/636M | MM | 1.0 | -| [Skywork](https://gitee.com/mindspore/mindformers/blob/r1.0/research/skywork/skywork.md) | 13B | 稠密LLM | 1.0 | -| [Swin](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/swin.md) | 88M | MM | 1.0 | -| [T5](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/t5.md) | 14M/60M | 稠密LLM | 1.0 | -| [VisualGLM](https://gitee.com/mindspore/mindformers/blob/r1.0/research/visualglm/visualglm.md) | 6B | MM | 1.0 | -| [Ziya](https://gitee.com/mindspore/mindformers/blob/r1.0/research/ziya/ziya.md) | 13B | 稠密LLM | 1.0 | -| [Bert](https://gitee.com/mindspore/mindformers/blob/r0.8/docs/model_cards/bert.md) | 4M/110M | 稠密LLM | 0.8 | +| 模型名 | 支持规格 | 模型类型 | 最新支持版本 | +|:--------------------------------------------------------------------------------------------------------|:------------------------------|:--------:|:----------:| +| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3) | 671B | 稀疏LLM | 在研版本、1.5.0 | +| [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | 稠密LLM | 在研版本、1.5.0 | +| [Llama3.1](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1) | 8B/70B | 稠密LLM | 在研版本、1.5.0 | +| [Qwen2.5](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | 稠密LLM | 在研版本、1.5.0 | +| [TeleChat2](https://gitee.com/mindspore/mindformers/blob/dev/research/telechat2) | 7B/35B/115B | 稠密LLM | 在研版本、1.5.0 | +| [CodeLlama](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/codellama.md) | 34B | 稠密LLM | 1.5.0 | +| [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_image.md) | 19B | MM | 1.5.0 | +| [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_video.md) | 13B | MM | 1.5.0 | +| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek2) | 236B | 稀疏LLM | 1.5.0 | +| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek1_5) | 7B | 稠密LLM | 1.5.0 | +| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek) | 33B | 稠密LLM | 1.5.0 | +| [GLM3-32K](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/glm32k) | 6B | 稠密LLM | 1.5.0 | +| [GLM3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/glm3.md) | 6B | 稠密LLM | 1.5.0 | +| [InternLM2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/internlm2) | 7B/20B | 稠密LLM | 1.5.0 | +| [Llama3.2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama3_2.md) | 3B | 稠密LLM | 1.5.0 | +| [Llama3.2-Vision](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/mllama.md) | 11B | MM | 1.5.0 | +| [Llama3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/llama3) | 8B/70B | 稠密LLM | 1.5.0 | +| [Llama2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama2.md) | 7B/13B/70B | 稠密LLM | 1.5.0 | +| [Mixtral](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/mixtral) | 8x7B | 稀疏LLM | 1.5.0 | +| [Qwen2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | 稠密/稀疏LLM | 1.5.0 | +| [Qwen1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen1_5) | 7B/14B/72B | 稠密LLM | 1.5.0 | +| [Qwen-VL](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwenvl) | 9.6B | MM | 1.5.0 | +| [TeleChat](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/telechat) | 7B/12B/52B | 稠密LLM | 1.5.0 | +| [Whisper](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/whisper.md) | 1.5B | MM | 1.5.0 | +| [Yi](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yi) | 6B/34B | 稠密LLM | 1.5.0 | +| [YiZhao](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yizhao) | 12B | 稠密LLM | 1.5.0 | +| [Baichuan2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/baichuan2/baichuan2.md) | 7B/13B | 稠密LLM | 1.3.2 | +| [GLM2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/glm2.md) | 6B | 稠密LLM | 1.3.2 | +| [GPT2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/gpt2.md) | 124M/13B | 稠密LLM | 1.3.2 | +| [InternLM](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/internlm/internlm.md) | 7B/20B | 稠密LLM | 1.3.2 | +| [Qwen](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/qwen/qwen.md) | 7B/14B | 稠密LLM | 1.3.2 | +| [CodeGeex2](https://gitee.com/mindspore/mindformers/blob/r1.1.0/docs/model_cards/codegeex2.md) | 6B | 稠密LLM | 1.1.0 | +| [WizardCoder](https://gitee.com/mindspore/mindformers/blob/r1.1.0/research/wizardcoder/wizardcoder.md) | 15B | 稠密LLM | 1.1.0 | +| [Baichuan](https://gitee.com/mindspore/mindformers/blob/r1.0/research/baichuan/baichuan.md) | 7B/13B | 稠密LLM | 1.0 | +| [Blip2](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/blip2.md) | 8.1B | MM | 1.0 | +| [Bloom](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/bloom.md) | 560M/7.1B/65B/176B | 稠密LLM | 1.0 | +| [Clip](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/clip.md) | 149M/428M | MM | 1.0 | +| [CodeGeex](https://gitee.com/mindspore/mindformers/blob/r1.0/research/codegeex/codegeex.md) | 13B | 稠密LLM | 1.0 | +| [GLM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/glm.md) | 6B | 稠密LLM | 1.0 | +| [iFlytekSpark](https://gitee.com/mindspore/mindformers/blob/r1.0/research/iflytekspark/iflytekspark.md) | 13B | 稠密LLM | 1.0 | +| [Llama](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/llama.md) | 7B/13B | 稠密LLM | 1.0 | +| [MAE](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/mae.md) | 86M | MM | 1.0 | +| [Mengzi3](https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md) | 13B | 稠密LLM | 1.0 | +| [PanguAlpha](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/pangualpha.md) | 2.6B/13B | 稠密LLM | 1.0 | +| [SAM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/sam.md) | 91M/308M/636M | MM | 1.0 | +| [Skywork](https://gitee.com/mindspore/mindformers/blob/r1.0/research/skywork/skywork.md) | 13B | 稠密LLM | 1.0 | +| [Swin](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/swin.md) | 88M | MM | 1.0 | +| [T5](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/t5.md) | 14M/60M | 稠密LLM | 1.0 | +| [VisualGLM](https://gitee.com/mindspore/mindformers/blob/r1.0/research/visualglm/visualglm.md) | 6B | MM | 1.0 | +| [Ziya](https://gitee.com/mindspore/mindformers/blob/r1.0/research/ziya/ziya.md) | 13B | 稠密LLM | 1.0 | +| [Bert](https://gitee.com/mindspore/mindformers/blob/r0.8/docs/model_cards/bert.md) | 4M/110M | 稠密LLM | 0.8 | * ***LLM:*** *大语言模型(Large Language Model);* ***MM:*** *多模态(Multi-Modal)* \ No newline at end of file