diff --git a/MindIE/MultiModal/CogVideoX/README.md b/MindIE/MultiModal/CogVideoX/README.md
index d87e92f0b84e241b7d354f7500bcc85e853e2067..fe95f0bf41f6e80169ae922b822c26fe1beceb93 100644
--- a/MindIE/MultiModal/CogVideoX/README.md
+++ b/MindIE/MultiModal/CogVideoX/README.md
@@ -19,8 +19,9 @@ hardwares:
 ### 1.1 获取CANN&MindIE安装包&环境准备
 - 设备支持：
 Atlas 800I A2 (64G) / Atlas 800T A2设备：CogVideoX-5b支持1、2、4、8卡推理，CogVideoX-2b支持1、2、4卡推理
-- [Atlas 800I A2/Atlas 800T A2](https://www.hiascend.com/developer/download/community/result?module=pt+ie+cann&product=4&model=32)
-- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/softwareinst/instg/instg_0001.html)
+- [Atlas 800I A2](https://www.hiascend.com/developer/download/community/result?module=pt+ie+cann&product=4&model=32)
+- [Atlas 800T A2](https://www.hiascend.com/developer/download/community/result?module=pt+cann&product=4&model=26)
+- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1alpha001/softwareinst/instg/instg_0003.html)
 
 ### 1.2 CANN安装
 ```shell
@@ -50,7 +51,7 @@ chmod +x ./Ascend-mindie_${version}_linux-${arch}.run
 cd /usr/local/Ascend/mindie && source set_env.sh
 
 # 方式二：指定路径安装
-./Ascend-mindie_${version}_linux-${arch}.run  --install --install-path=${AieInstallPath}
+./Ascend-mindie_${version}_linux-${arch}.run --install --install-path=${AieInstallPath}
 # 设置环境变量
 cd ${AieInstallPath}/mindie && source set_env.sh
 ```
diff --git a/MindIE/MultiModal/CogView3-Plus-3B/README.md b/MindIE/MultiModal/CogView3-Plus-3B/README.md
index 913fde80786592c473930a193531b7acb8a13232..0d669616112dc1bb32e64c0a0802a6dabab258b9 100644
--- a/MindIE/MultiModal/CogView3-Plus-3B/README.md
+++ b/MindIE/MultiModal/CogView3-Plus-3B/README.md
@@ -4,14 +4,15 @@
 
   | 配套  | 版本 | 环境准备指导 |
   | ----- | ----- |-----|
-  | Python | 3.10.12 | - |
+  | Python | 3.10 / 3.11 | - |
   | torch | 2.1.0 | - |
 
 ### 1.1 获取CANN&MindIE安装包&环境准备
 - 设备支持
 Atlas 800I A2/Atlas 800T A2设备：支持的卡数为1
-- [Atlas 800I A2/Atlas 800T A2](https://www.hiascend.com/developer/download/community/result?module=pt+ie+cann&product=4&model=32)
-- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/softwareinst/instg/instg_0001.html)
+- [Atlas 800I A2](https://www.hiascend.com/developer/download/community/result?module=pt+ie+cann&product=4&model=32)
+- [Atlas 800T A2](https://www.hiascend.com/developer/download/community/result?module=pt+cann&product=4&model=26)
+- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1alpha001/softwareinst/instg/instg_0003.html)
 
 ### 1.2 CANN安装
 ```shell
@@ -41,7 +42,7 @@ chmod +x ./Ascend-mindie_${version}_linux-${arch}.run
 cd /usr/local/Ascend/mindie && source set_env.sh
 
 # 方式二：指定路径安装
-./Ascend-mindie_${version}_linux-${arch}.run --install-path=${AieInstallPath}
+./Ascend-mindie_${version}_linux-${arch}.run --install --install-path=${AieInstallPath}
 # 设置环境变量
 cd ${AieInstallPath}/mindie && source set_env.sh
 ```
@@ -81,62 +82,6 @@ pip install -r requirements.txt
 https://huggingface.co/THUDM/CogView3-Plus-3B/tree/main
 ```
 
-- 在main路径下，修改model_index.json文件，更改结果如下：
-```shell
-{
-  "_class_name": "CogView3PlusPipeline",
-  "_diffusers_version": "0.31.0.dev0",
-  "scheduler": [
-    "cogview3plus",
-    "CogVideoXDDIMScheduler"
-  ],
-  "text_encoder": [
-    "transformers",
-    "T5EncoderModel"
-  ],
-  "tokenizer": [
-    "transformers",
-    "T5Tokenizer"
-  ],
-  "transformer": [
-    "cogview3plus",
-    "CogView3PlusTransformer2DModel"
-  ],
-  "vae": [
-    "diffusers",
-    "AutoencoderKL"
-  ]
-}
-```
-
-- 在main/transformer路径下，修改config.json文件，更改结果如下：
-```shell
-{
-  "_class_name": "CogView3PlusTransformer2DModel",
-  "_diffusers_version": "0.31.0.dev0",
-  "attention_head_dim": 40,
-  "condition_dim": 256,
-  "in_channels": 16,
-  "num_attention_heads": 64,
-  "num_layers": 30,
-  "out_channels": 16,
-  "patch_size": 2,
-  "pooled_projection_dim": 1536,
-  "pos_embed_max_size": 128,
-  "sample_size": 128,
-  "text_embed_dim": 4096,
-  "time_embed_dim": 512,
-  "use_cache": true,
-  "cache_interval": 2,
-  "cache_start": 1,
-  "num_cache_layer": 11,
-  "cache_start_steps": 10,
-  "useagb": false,
-  "pab": 2,
-  "total_step": 50
-}
-```
-
 #### 2. 各模型的配置文件、权重文件的层级样例如下所示:
 ```commandline
 |----main
@@ -162,19 +107,7 @@ cd cogview3
 path="/data/CogView3B"
 ```
 
-#### 3. 有损加速算法选择
-
-修改权重文件CogView3B/transformer/config.json中的`use_cache`和`useagb`参数，对应算法关系如下：
-
-| 算法类型 | use_cache | useagb |
-| :------: |:----:|:----:|
-| 不使用加速算法 |  false  |  false  |
-| DiT Cache |  true  |  false  |
-| AGB Cache |  false  |  true   |
-
-**注意**：在32G的服务器上，可开启DiT Cache算法，开启AGB Cache算法可能会报显存不足的错误，因为AGB算法对显存要求更高。在64G机器上，两种Cache算法皆可开启。
-
-#### 4. 执行命令，进行推理：
+#### 3. 执行命令，进行推理：
 ```shell
 python inference_cogview3plus.py \
        --model_path ${path} \
@@ -183,7 +116,8 @@ python inference_cogview3plus.py \
        --height 1024 \
        --num_inference_steps 50 \
        --dtype bf16 \
-       --device_id 0
+       --device_id 0 \
+       --cache_algorithm attention
 ```
 参数说明：
 - model_path：权重路径，包含scheduler、text_encoder、tokenizer、transformer、vae，5个模型的配置文件及权重。
@@ -193,7 +127,9 @@ python inference_cogview3plus.py \
 - num_inference_steps：推理迭代步数。
 - dtype: 数据类型。目前只支持bf16。
 - device_id：推理设备ID。
+- cache_algorithm：默认为None，可选择attention，即使用AGBCache算法，注意是有损的加速算法。
 
+**注意**：在32G的服务器上，开启cache算法可能会报显存不足的错误；在64G机器上，可正常开启cache算法。
 **注意**：本仓库模型，是对开源模型进行优化。用户在使用时，应对开源代码函数的变量范围，类型进行校验，避免出现变量超出范围、除零等操作。
 
 
@@ -208,19 +144,7 @@ cd cogview3
 path="/data/CogView3B"
 ```
 
-#### 3. 有损加速算法选择
-
-修改权重文件CogView3B/transformer/config.json中的`use_cache`和`useagb`参数，对应算法关系如下：
-
-| 算法类型 | use_cache | useagb |
-| :------: |:----:|:----:|
-| 不使用加速算法 |  false  |  false  |
-| DiT Cache |  true  |  false  |
-| AGB Cache |  false  |  true   |
-
-**注意**：在32G的服务器上，batch_size需要等于1，否则会报显存不足的错误；在64G机器上，batch_size可为2，可开启Cache算法。
-
-#### 4. 执行命令，进行推理：
+#### 3. 执行命令，进行推理：
 ```shell
 python inference_cogview3plus.py \
        --model_path ${path} \
@@ -230,7 +154,8 @@ python inference_cogview3plus.py \
        --num_inference_steps 50 \
        --dtype bf16 \
        --batch_size 2 \
-       --device_id 0
+       --device_id 0 \
+       --cache_algorithm attention
 ```
 参数说明：
 - model_path：权重路径，包含scheduler、text_encoder、tokenizer、transformer、vae，5个模型的配置文件及权重。
@@ -241,6 +166,9 @@ python inference_cogview3plus.py \
 - dtype: 数据类型。目前只支持bf16。
 - batch_size: 推理时的batch_size。
 - device_id：推理设备ID。
+- cache_algorithm：默认为None，可选择attention，即使用AGBCache算法，注意是有损的加速算法。
+
+**注意**：在32G的服务器上，batch_size需要等于1，否则会报显存不足的错误；在64G机器上，batch_size可为2，可开启cache算法。
 
 
 ### 3.4 精度验证
@@ -277,7 +205,8 @@ python3 inference_cogview3plus.py \
         --width 1024 \
         --batch_size 1 \
         --seed 42 \
-        --device_id 0 
+        --device_id 0 \
+       --cache_algorithm attention
 ```
 参数说明：
 - model_path：权重路径，包含scheduler、text_encoder、tokenizer、transformer、vae，5个模型的配置文件及权重。
@@ -291,6 +220,7 @@ python3 inference_cogview3plus.py \
 - batch_size：模型batch size。
 - seed：随机种子。
 - device_id：推理设备ID。
+- cache_algorithm：默认为None，可选择attention，即使用AGBCache算法，注意是有损的加速算法。
 
 执行完成后在`./results_PartiPrompts`目录下生成推理图片，在当前目录生成一个`image_info_PartiPrompts.json`文件，记录着图片和prompt的对应关系，并在终端显示推理时间。
 
@@ -307,7 +237,8 @@ python3 inference_cogview3plus.py \
         --width 1024 \
         --batch_size 1 \
         --seed 42 \
-        --device_id 0
+        --device_id 0 \
+       --cache_algorithm attention
 ```
 参数说明：
 - model_path：权重路径，包含scheduler、text_encoder、tokenizer、transformer、vae，5个模型的配置文件及权重。
@@ -321,6 +252,7 @@ python3 inference_cogview3plus.py \
 - batch_size：模型batch size。
 - seed：随机种子。
 - device_id：推理设备ID。
+- cache_algorithm：默认为None，可选择attention，即使用AGBCache算法，注意是有损的加速算法。
 
 执行完成后在`./results_hpsv2`目录下生成推理图片，在当前目录生成一个`image_info_hpsv2.json`文件，记录着图片和prompt的对应关系，并在终端显示推理时间。
 
@@ -377,8 +309,7 @@ python3 hpsv2_score.py \
 | 硬件形态 | 迭代次数 | 加速算法 | 平均耗时 | CLIP_score | HPSV2_score |
 | :------: |:----:|:----:|:----:|:----:|:----:|
 | Atlas 800T A2 (8*64G) 单卡 |  50  |  无  |  27.588s  |  0.367  |  0.2879729  |
-| Atlas 800T A2 (8*64G) 单卡 |  50  |  DiT Cache   |  23.639s  |  0.367  |  0.2878573  |
-| Atlas 800T A2 (8*64G) 单卡 |  50  |  AGB   | 17.219s  |  0.367  |  0.2879835  |
+| Atlas 800T A2 (8*64G) 单卡 |  50  |  AGBCache   | 17.219s  |  0.367  |  0.2879835  |
 
 
 ## 四、优化指南
diff --git a/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/layers/__init__.py b/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/layers/__init__.py
index 602ad432a01cefe3824b16f4ee90ce56f18c3aab..4d25f1e889c0d9b1c4ff83c21c30acf8bcaacce8 100644
--- a/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/layers/__init__.py
+++ b/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/layers/__init__.py
@@ -1,3 +1,2 @@
 from .normalization import CogView3PlusAdaLayerNormZeroTextImage, AdaLayerNormContinuous
-from .embeddings import CogView3CombinedTimestepSizeEmbeddings, CogView3PlusPatchEmbed
-from .linear import QKVLinear
\ No newline at end of file
+from .embeddings import CogView3CombinedTimestepSizeEmbeddings, CogView3PlusPatchEmbed
\ No newline at end of file
diff --git a/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/layers/linear.py b/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/layers/linear.py
deleted file mode 100644
index d242d17c2e83b7d27d86f1132a736951963b71bf..0000000000000000000000000000000000000000
--- a/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/layers/linear.py
+++ /dev/null
@@ -1,48 +0,0 @@
-#!/usr/bin/env python
-# coding=utf-8
-# Copyright 2024 Huawei Technologies Co., Ltd
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     https://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-
-import torch
-import torch.nn as nn
-
-
-class QKVLinear(nn.Module):
-    def __init__(self, attention_dim, hidden_size, qkv_bias=True, device=None, dtype=None):
-        super(QKVLinear, self).__init__()
-        self.attention_dim = attention_dim
-        self.hidden_size = hidden_size
-        self.qkv_bias = qkv_bias
-
-        factory_kwargs = {"device": device, "dtype": dtype}
-
-        self.weight = nn.Parameter(torch.empty([self.attention_dim, 3 * self.hidden_size], **factory_kwargs))
-        if self.qkv_bias:
-            self.bias = nn.Parameter(torch.empty([3 * self.hidden_size], **factory_kwargs))
-
-    def forward(self, hidden_states):
-
-        if not self.qkv_bias:
-            qkv = torch.matmul(hidden_states, self.weight)
-        else:
-            qkv = torch.addmm(
-                self.bias, 
-                hidden_states.view(hidden_states.size(0) * hidden_states.size(1), hidden_states.size(2)),
-                self.weight, 
-                beta=1, 
-                alpha=1
-            )
-
-        return qkv
\ No newline at end of file
diff --git a/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/models/attention_processor.py b/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/models/attention_processor.py
index eb29215831e757fe0a6eddc7040449f45148a514..dbef9116287b1a41e4c22bacf9697f99a52b2d7a 100644
--- a/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/models/attention_processor.py
+++ b/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/models/attention_processor.py
@@ -23,7 +23,7 @@ import torch_npu
 from diffusers.utils import logging
 from diffusers.utils.torch_utils import maybe_allow_in_graph
 
-from ..layers import QKVLinear
+from mindiesd.layers.linear import QKVLinear
 
 logger = logging.get_logger(__name__)  # pylint: disable=invalid-name
 
@@ -304,11 +304,12 @@ class CogVideoXAttnProcessor2_0:
             attention_mask = attention_mask.view(batch_size, attn.heads, -1, attention_mask.shape[-1])
 
         B, S, _ = hidden_states.shape
-        qkv = attn.to_qkv(hidden_states)
-        inner_dim = qkv.shape[-1] // 3
+        query, key, value = attn.to_qkv(hidden_states)
+        inner_dim = key.shape[-1]
         head_dim = inner_dim // attn.heads
-        qkv_shape = (B, S, 3, attn.heads, head_dim)
-        query, key, value = qkv.view(qkv_shape).permute(2, 0, 3, 1, 4).contiguous().unbind(0)
+        query = query.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
+        key = key.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
+        value = value.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
 
         if attn.norm_q is not None:
             query = attn.norm_q(query)
diff --git a/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/models/transformer_cogview3plus.py b/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/models/transformer_cogview3plus.py
index e61259874834a0664ab52c17e2f1826e3355a8ee..f1abc4981c01299ab14567068105da142c690e18 100644
--- a/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/models/transformer_cogview3plus.py
+++ b/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/models/transformer_cogview3plus.py
@@ -39,25 +39,9 @@ class CogView3PlusTransformerBlock(nn.Module):
         dim: int = 2560,
         num_attention_heads: int = 64,
         attention_head_dim: int = 40,
-        time_embed_dim: int = 512,
-        useagb: bool = True,
-        pab: int = 2,
-        total_step: int = 50
+        time_embed_dim: int = 512
     ):
         super().__init__()
-        self.useagb = useagb
-        self.pab = pab
-        self.total_step = total_step
-
-        self.attn_count = 0
-        self.last_attn_x_image = None
-        self.last_attn_x_prompt = None
-        self.attn_alpha_image = 0
-        self.attn_alpha_prompt = 0
-        self.last_attn_image = None
-        self.last_attn_prompt = None
-        self.last_ff_image = None
-        self.last_ff_prompt = None
 
         self.norm1 = CogView3PlusAdaLayerNormZeroTextImage(embedding_dim=time_embed_dim, dim=dim)
 
@@ -77,6 +61,7 @@ class CogView3PlusTransformerBlock(nn.Module):
         self.norm2_context = nn.LayerNorm(dim, elementwise_affine=False, eps=1e-5)
 
         self.ff = FeedForward(dim=dim, dim_out=dim, activation_fn="gelu-approximate")
+        self.cache = None
 
     def forward(
         self,
@@ -85,128 +70,46 @@ class CogView3PlusTransformerBlock(nn.Module):
         emb: torch.Tensor,
     ) -> torch.Tensor:
         text_seq_length = encoder_hidden_states.size(1)
-        
-        if self.useagb:
-            if self.attn_count > 0:
-                diff_x_image = hidden_states - self.last_attn_x_image
-                diff_x_prompt = encoder_hidden_states - self.last_attn_x_prompt
-            
-            self.last_attn_x_image = hidden_states
-            self.last_attn_x_prompt = encoder_hidden_states
-
-            lower_bound = int(self.total_step / 5) - 0.5
-            upper_bound = self.total_step - 1.5
-            if (self.attn_count % self.pab != 0) and (lower_bound < self.attn_count < upper_bound):
-                broadcast_attn = 1
-            else:
-                broadcast_attn = 0
-            
-            if broadcast_attn == 1:
-                attn_hidden_states = self.last_attn_image + self.attn_alpha_image * diff_x_image
-                attn_encoder_hidden_states = self.last_attn_prompt + self.attn_alpha_prompt * diff_x_prompt
-            else:
-                # norm & modulate
-                norm_hidden_states, chunk_params = self.norm1(hidden_states, encoder_hidden_states, emb)
-
-                gate_msa = chunk_params.gate_msa
-                shift_mlp = chunk_params.shift_mlp
-                scale_mlp = chunk_params.scale_mlp
-                gate_mlp = chunk_params.gate_mlp
-                norm_encoder_hidden_states = chunk_params.context
-                c_gate_msa = chunk_params.c_gate_msa
-                c_shift_mlp = chunk_params.c_shift_mlp
-                c_scale_mlp = chunk_params.c_scale_mlp
-                c_gate_mlp = chunk_params.c_gate_mlp
-
-                # attention
-                attn_hidden_states, attn_encoder_hidden_states = self.attn1(
-                    hidden_states=norm_hidden_states, encoder_hidden_states=norm_encoder_hidden_states
-                )
-
-                attn_hidden_states = gate_msa.unsqueeze(1) * attn_hidden_states
-                attn_encoder_hidden_states = c_gate_msa.unsqueeze(1) * attn_encoder_hidden_states
-                
-                # calculate alpha
-                if lower_bound < self.attn_count < upper_bound:
-                    diff_image = attn_hidden_states - self.last_attn_image
-                    diff_prompt = attn_encoder_hidden_states - self.last_attn_prompt
-
-                    self.attn_alpha_image = ((diff_x_image / 100) * (diff_image / 100)).sum() / \
-                        ((diff_x_image / 100) ** 2).sum()
-                    self.attn_alpha_prompt = ((diff_x_prompt / 100) * (diff_prompt / 100)).sum() / \
-                        ((diff_x_prompt / 100) ** 2).sum()
-                else:
-                    self.attn_alpha_image = 0
-                    self.attn_alpha_prompt = 0
-                
-                self.last_attn_image = attn_hidden_states
-                self.last_attn_prompt = attn_encoder_hidden_states
-            
-            hidden_states = hidden_states + attn_hidden_states
-            encoder_hidden_states = encoder_hidden_states + attn_encoder_hidden_states
-
-            if broadcast_attn == 1:
-                hidden_states = hidden_states + self.last_ff_image
-                encoder_hidden_states = encoder_hidden_states + self.last_ff_prompt
-            else:
-                # norm & modulate
-                norm_hidden_states = self.norm2(hidden_states)
-                norm_hidden_states = norm_hidden_states * (1 + scale_mlp[:, None]) + shift_mlp[:, None]
-
-                norm_encoder_hidden_states = self.norm2_context(encoder_hidden_states)
-                norm_encoder_hidden_states = norm_encoder_hidden_states * (1 + c_scale_mlp[:, None]) + \
-                    c_shift_mlp[:, None]
-
-                # feed-forward
-                norm_hidden_states = torch.cat([norm_encoder_hidden_states, norm_hidden_states], dim=1)
-                ff_output = self.ff(norm_hidden_states)
-
-                ff_image = gate_mlp.unsqueeze(1) * ff_output[:, text_seq_length:]
-                ff_prompt = c_gate_mlp.unsqueeze(1) * ff_output[:, :text_seq_length]
-
-                hidden_states = hidden_states + ff_image
-                encoder_hidden_states = encoder_hidden_states + ff_prompt
 
-                self.last_ff_image = ff_image
-                self.last_ff_prompt = ff_prompt
-            
-            # 更新self.attn_count
-            self.attn_count = (self.attn_count + 1) % self.total_step
-        else:
-            # norm & modulate
-            norm_hidden_states, chunk_params = self.norm1(hidden_states, encoder_hidden_states, emb)
-
-            gate_msa = chunk_params.gate_msa
-            shift_mlp = chunk_params.shift_mlp
-            scale_mlp = chunk_params.scale_mlp
-            gate_mlp = chunk_params.gate_mlp
-            norm_encoder_hidden_states = chunk_params.context
-            c_gate_msa = chunk_params.c_gate_msa
-            c_shift_mlp = chunk_params.c_shift_mlp
-            c_scale_mlp = chunk_params.c_scale_mlp
-            c_gate_mlp = chunk_params.c_gate_mlp
-
-            # attention
+        # norm & modulate
+        norm_hidden_states, chunk_params = self.norm1(hidden_states, encoder_hidden_states, emb)
+
+        gate_msa = chunk_params.gate_msa
+        shift_mlp = chunk_params.shift_mlp
+        scale_mlp = chunk_params.scale_mlp
+        gate_mlp = chunk_params.gate_mlp
+        norm_encoder_hidden_states = chunk_params.context
+        c_gate_msa = chunk_params.c_gate_msa
+        c_shift_mlp = chunk_params.c_shift_mlp
+        c_scale_mlp = chunk_params.c_scale_mlp
+        c_gate_mlp = chunk_params.c_gate_mlp
+
+        # attention
+        if self.cache is None:
             attn_hidden_states, attn_encoder_hidden_states = self.attn1(
                 hidden_states=norm_hidden_states, encoder_hidden_states=norm_encoder_hidden_states
             )
+        else:
+            attn_hidden_states, attn_encoder_hidden_states = self.cache.apply(self.attn1.forward,
+                hidden_states=norm_hidden_states, encoder_hidden_states=norm_encoder_hidden_states
+            )
 
-            hidden_states = hidden_states + gate_msa.unsqueeze(1) * attn_hidden_states
-            encoder_hidden_states = encoder_hidden_states + c_gate_msa.unsqueeze(1) * attn_encoder_hidden_states
+        hidden_states = hidden_states + gate_msa.unsqueeze(1) * attn_hidden_states
+        encoder_hidden_states = encoder_hidden_states + c_gate_msa.unsqueeze(1) * attn_encoder_hidden_states
 
-            # norm & modulate
-            norm_hidden_states = self.norm2(hidden_states)
-            norm_hidden_states = norm_hidden_states * (1 + scale_mlp[:, None]) + shift_mlp[:, None]
+        # norm & modulate
+        norm_hidden_states = self.norm2(hidden_states)
+        norm_hidden_states = norm_hidden_states * (1 + scale_mlp[:, None]) + shift_mlp[:, None]
 
-            norm_encoder_hidden_states = self.norm2_context(encoder_hidden_states)
-            norm_encoder_hidden_states = norm_encoder_hidden_states * (1 + c_scale_mlp[:, None]) + c_shift_mlp[:, None]
+        norm_encoder_hidden_states = self.norm2_context(encoder_hidden_states)
+        norm_encoder_hidden_states = norm_encoder_hidden_states * (1 + c_scale_mlp[:, None]) + c_shift_mlp[:, None]
 
-            # feed-forward
-            norm_hidden_states = torch.cat([norm_encoder_hidden_states, norm_hidden_states], dim=1)
-            ff_output = self.ff(norm_hidden_states)
+        # feed-forward
+        norm_hidden_states = torch.cat([norm_encoder_hidden_states, norm_hidden_states], dim=1)
+        ff_output = self.ff(norm_hidden_states)
 
-            hidden_states = hidden_states + gate_mlp.unsqueeze(1) * ff_output[:, text_seq_length:]
-            encoder_hidden_states = encoder_hidden_states + c_gate_mlp.unsqueeze(1) * ff_output[:, :text_seq_length]
+        hidden_states = hidden_states + gate_mlp.unsqueeze(1) * ff_output[:, text_seq_length:]
+        encoder_hidden_states = encoder_hidden_states + c_gate_mlp.unsqueeze(1) * ff_output[:, :text_seq_length]
 
         if hidden_states.dtype == torch.float16:
             hidden_states = hidden_states.clip(-65504, 65504)
@@ -232,14 +135,7 @@ class CogView3PlusTransformer2DModel(ModelMixin, ConfigMixin):
         time_embed_dim: int = 512,
         condition_dim: int = 256,
         pos_embed_max_size: int = 128,
-        use_cache: bool = False,
-        cache_interval: int = 2,
-        cache_start: int = 1,
-        num_cache_layer: int = 11,
-        cache_start_steps: int = 10,
-        useagb: bool = True,
-        pab: int = 2,
-        total_step: int = 50,
+        sample_size: int = 128,
     ):
         super().__init__()
         self.out_channels = out_channels
@@ -272,9 +168,6 @@ class CogView3PlusTransformer2DModel(ModelMixin, ConfigMixin):
                     num_attention_heads=num_attention_heads,
                     attention_head_dim=attention_head_dim,
                     time_embed_dim=time_embed_dim,
-                    useagb=useagb,
-                    pab=pab,
-                    total_step=total_step
                 )
                 for _ in range(num_layers)
             ]
@@ -297,14 +190,6 @@ class CogView3PlusTransformer2DModel(ModelMixin, ConfigMixin):
         self.v_weight_cache = None
         self.v_bias_cache = None
 
-        self.use_cache = use_cache
-        self.cache_interval = cache_interval
-        self.cache_start = cache_start
-        self.num_cache_layer = num_cache_layer
-        self.cache_start_steps = cache_start_steps
-
-        self.delta_cache = None
-        self.delta_encoder_cache = None
 
     @property
     def attn_processors(self) -> Dict[str, AttentionProcessor]:
@@ -358,14 +243,30 @@ class CogView3PlusTransformer2DModel(ModelMixin, ConfigMixin):
 
     def forward(
         self,
-        states,
+        hidden_states: torch.Tensor,
+        encoder_hidden_states: torch.Tensor,
         timestep: torch.LongTensor,
         original_size: torch.Tensor,
         target_size: torch.Tensor,
         crop_coords: torch.Tensor,
+        return_dict: bool = True,
     ) -> Union[torch.Tensor, Transformer2DModelOutput]:
-        hidden_states = states[0]
-        encoder_hidden_states = states[1]
+        """
+        The [`CogView3PlusTransformer2DModel`] forward method.
+
+        Args:
+            hidden_states (`torch.Tensor`):
+                Input `hidden_states` of shape `(batch size, channel, height, width)`.
+            encoder_hidden_states (`torch.Tensor`):
+                Conditional embeddings (embeddings computed from the input conditions such as prompts) of shape
+                `(batch_size, sequence_len, text_embed_dim)`
+            timestep (`torch.LongTensor`):
+                Used to indicate denoising step.
+
+        Returns:
+            `torch.Tensor` or [`~models.transformer_2d.Transformer2DModelOutput`]:
+                The denoised latents using provided inputs as conditioning.
+        """
         height, width = hidden_states.shape[-2:]
         text_seq_length = encoder_hidden_states.shape[1]
 
@@ -377,7 +278,28 @@ class CogView3PlusTransformer2DModel(ModelMixin, ConfigMixin):
         encoder_hidden_states = hidden_states[:, :text_seq_length]
         hidden_states = hidden_states[:, text_seq_length:]
 
-        hidden_states, encoder_hidden_states = self._forward_blocks(hidden_states, encoder_hidden_states, emb, states[2])
+        for _, block in enumerate(self.transformer_blocks):
+            if self.training and self.gradient_checkpointing:
+                def create_custom_forward(module):
+                    def custom_forward(*inputs):
+                        return module(*inputs)
+
+                    return custom_forward
+
+                ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False} if is_torch_version(">=", "1.11.0") else {}
+                hidden_states, encoder_hidden_states = torch.utils.checkpoint.checkpoint(
+                    create_custom_forward(block),
+                    hidden_states,
+                    encoder_hidden_states,
+                    emb,
+                    **ckpt_kwargs,
+                )
+            else:
+                hidden_states, encoder_hidden_states = block(
+                    hidden_states=hidden_states,
+                    encoder_hidden_states=encoder_hidden_states,
+                    emb=emb,
+                )
 
         hidden_states = self.norm_out(hidden_states, emb)
         hidden_states = self.proj_out(hidden_states)  # (batch_size, height*width, patch_size*patch_size*out_channels)
@@ -395,66 +317,10 @@ class CogView3PlusTransformer2DModel(ModelMixin, ConfigMixin):
             shape=(hidden_states.shape[0], self.out_channels, height * patch_size, width * patch_size)
         )
 
-        return Transformer2DModelOutput(sample=output)
-
-    # forward blocks in range [start_idx, end_idx), then return input and output
-    def _forward_blocks_range(self, hidden_states, encoder_hidden_states, emb, start_idx, end_idx, **kwargs):
-        for _, block in enumerate(self.transformer_blocks[start_idx: end_idx]):
-            hidden_states, encoder_hidden_states = block(
-                hidden_states=hidden_states,
-                encoder_hidden_states=encoder_hidden_states,
-                emb=emb,
-            )
-
-        return hidden_states, encoder_hidden_states
-
-    def _forward_blocks(self, hidden_states, encoder_hidden_states, emb, t_idx):
-        num_blocks = len(self.transformer_blocks)
-
-        if not self.use_cache or (t_idx < self.cache_start_steps):
-            hidden_states, encoder_hidden_states = self._forward_blocks_range(
-                hidden_states, 
-                encoder_hidden_states, 
-                emb, 
-                0, 
-                num_blocks
-            )
-        else:
-            # infer [0, cache_start)
-            hidden_states, encoder_hidden_states = self._forward_blocks_range(
-                hidden_states, 
-                encoder_hidden_states, 
-                emb, 
-                0, 
-                self.cache_start
-            )
-            # infer [cache_start, cache_end)
-            cache_end = np.minimum(self.cache_start + self.num_cache_layer, num_blocks)
-            hidden_states_before_cache = hidden_states.clone()
-            encoder_hidden_states_before_cache = encoder_hidden_states.clone()
-            if t_idx % self.cache_interval == (self.cache_start_steps % self.cache_interval):
-                hidden_states, encoder_hidden_states = self._forward_blocks_range(
-                    hidden_states, 
-                    encoder_hidden_states, 
-                    emb, 
-                    self.cache_start, 
-                    cache_end
-                )
-                self.delta_cache = hidden_states - hidden_states_before_cache
-                self.delta_encoder_cache = encoder_hidden_states - encoder_hidden_states_before_cache
-            else:
-                hidden_states = hidden_states_before_cache + self.delta_cache
-                encoder_hidden_states = encoder_hidden_states_before_cache + self.delta_encoder_cache
-            # infer [cache_end, num_blocks)
-            hidden_states, encoder_hidden_states = self._forward_blocks_range(
-                hidden_states, 
-                encoder_hidden_states, 
-                emb, 
-                cache_end, 
-                num_blocks
-            )
+        if not return_dict:
+            return (output,)
 
-        return hidden_states, encoder_hidden_states
+        return Transformer2DModelOutput(sample=output)
 
     def load_weights(self, state_dict, shard=False):
         with torch.no_grad():
diff --git a/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/pipeline/pipeline_cogview3plus.py b/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/pipeline/pipeline_cogview3plus.py
index 3ea10a212a898b8b8cb9638defa0518cefa955ed..01877a9e6eb8189e659a22120a7c003fa8bc3ec2 100644
--- a/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/pipeline/pipeline_cogview3plus.py
+++ b/MindIE/MultiModal/CogView3-Plus-3B/cogview3plus/pipeline/pipeline_cogview3plus.py
@@ -309,11 +309,13 @@ class CogView3PlusPipeline(DiffusionPipeline):
 
                 # predict noise model_output
                 noise_pred = self.transformer(
-                    states=(latent_model_input, prompt_embeds, i),
+                    hidden_states=latent_model_input,
+                    encoder_hidden_states=prompt_embeds,
                     timestep=timestep,
                     original_size=original_size,
                     target_size=target_size,
                     crop_coords=crops_coords_top_left,
+                    return_dict=False,
                 )[0]
                 noise_pred = noise_pred.float()
 
diff --git a/MindIE/MultiModal/CogView3-Plus-3B/inference_cogview3plus.py b/MindIE/MultiModal/CogView3-Plus-3B/inference_cogview3plus.py
index 0052f511f5e3694ca422b96d3fdd76209fd2808b..a9912983cbbb0187f44008188d747a5b2edaad15 100644
--- a/MindIE/MultiModal/CogView3-Plus-3B/inference_cogview3plus.py
+++ b/MindIE/MultiModal/CogView3-Plus-3B/inference_cogview3plus.py
@@ -23,8 +23,9 @@ import json
 
 import torch
 
-from cogview3plus import CogView3PlusPipeline, set_random_seed
+from cogview3plus import CogView3PlusPipeline, set_random_seed, CogView3PlusTransformer2DModel
 from cogview3plus.utils.file_utils import standardize_path
+from mindiesd import CacheAgent, CacheConfig
 
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
@@ -190,6 +191,7 @@ def parse_arguments():
     parser.add_argument("--dtype", type=str, default="bf16", help="bf16 or fp16")
     parser.add_argument("--seed", type=int, default=None, help="Random seed")
     parser.add_argument("--device_id", type=int, default=0, help="NPU device id")
+    parser.add_argument('--cache_algorithm', type=str, default="None", help="The type of optimization algorithm")
 
     return parser.parse_args()
 
@@ -206,7 +208,27 @@ def infer(args):
 
     # Load the pre-trained model with the specified precision
     args.model_path = standardize_path(args.model_path)
-    pipe = CogView3PlusPipeline.from_pretrained(args.model_path, torch_dtype=dtype).to("npu")
+    pipe = CogView3PlusPipeline.from_pretrained(args.model_path, torch_dtype=dtype)
+    transformer = CogView3PlusTransformer2DModel.from_pretrained(os.path.join(args.model_path, 'transformer'), torch_dtype=dtype)
+    pipe.transformer = transformer
+    pipe = pipe.to("npu")
+
+    # attention cache
+    if args.cache_algorithm == "attention":
+        steps_count = args.num_inference_steps
+        blocks_count = pipe.transformer.config.num_layers
+        config = CacheConfig(
+            method="attention_cache",
+            blocks_count=blocks_count,
+            steps_count=steps_count,
+            step_start=15,
+            step_end=47,
+            step_interval=5
+            )
+        agent = CacheAgent(config)
+        pipe.transformer.use_cache = True
+        for block in pipe.transformer.transformer_blocks:
+            block.cache = agent
 
     use_time = 0
     prompt_loader = PromptLoader(args.prompt_file,
diff --git a/MindIE/MultiModal/Flux.1-DEV/README.md b/MindIE/MultiModal/Flux.1-DEV/README.md
index 5569bab5cdc0e6a4bd5cd3a4818795d0d275d682..a182483f1fb598605bf20269bef83f43a6337345 100644
--- a/MindIE/MultiModal/Flux.1-DEV/README.md
+++ b/MindIE/MultiModal/Flux.1-DEV/README.md
@@ -11,7 +11,7 @@
 - 设备支持：
 Atlas 800I A2推理设备：支持的卡数为1或2
 - [Atlas 800I A2](https://www.hiascend.com/developer/download/community/result?module=pt+ie+cann&product=4&model=32)
-- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/softwareinst/instg/instg_0001.html)
+- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1alpha001/softwareinst/instg/instg_0003.html)
 
 ### 1.2 CANN安装
 ```shell
@@ -46,7 +46,7 @@ chmod +x ./Ascend-mindie_${version}_linux-${arch}.run
 cd /usr/local/Ascend/mindie && source set_env.sh
 
 # 方式二：指定路径安装
-./Ascend-mindie_${version}_linux-${arch}.run --install-path=${AieInstallPath}
+./Ascend-mindie_${version}_linux-${arch}.run --install --install-path=${AieInstallPath}
 # 设置环境变量
 cd ${AieInstallPath}/mindie && source set_env.sh
 ```
diff --git a/MindIE/MultiModal/HunyuanDiT/README.md b/MindIE/MultiModal/HunyuanDiT/README.md
index b99301a884712ed01bbc359c1e5db29db6cf3ec3..d91a96e37a38976ac7889b6103708b31aee5f713 100644
--- a/MindIE/MultiModal/HunyuanDiT/README.md
+++ b/MindIE/MultiModal/HunyuanDiT/README.md
@@ -11,7 +11,7 @@
 - 设备支持：
 Atlas 800I A2推理设备：支持的卡数为1
 - [Atlas 800I A2](https://www.hiascend.com/developer/download/community/result?module=pt+ie+cann&product=4&model=32)
-- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/softwareinst/instg/instg_0001.html)
+- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1alpha001/softwareinst/instg/instg_0003.html)
 
 ### 1.2 CANN安装
 ```shell
@@ -41,7 +41,7 @@ chmod +x ./Ascend-mindie_${version}_linux-${arch}.run
 cd /usr/local/Ascend/mindie && source set_env.sh
 
 # 方式二：指定路径安装
-./Ascend-mindie_${version}_linux-${arch}.run --install-path=${AieInstallPath}
+./Ascend-mindie_${version}_linux-${arch}.run --install --install-path=${AieInstallPath}
 # 设置环境变量
 cd ${AieInstallPath}/mindie && source set_env.sh
 ```
diff --git a/MindIE/MultiModal/HunyuanVideo/README.md b/MindIE/MultiModal/HunyuanVideo/README.md
index 0e8f13a159586da2c5c7b02dc28aa069d11718c3..29f998ebcdcfc0ba172bf5e72c534fb327f21152 100644
--- a/MindIE/MultiModal/HunyuanVideo/README.md
+++ b/MindIE/MultiModal/HunyuanVideo/README.md
@@ -9,9 +9,8 @@
 
 ### 1.1 获取CANN&MindIE安装包&环境准备
 - 设备支持
-Atlas 800I A2(8\*64G)推理设备：当前支持的卡数：1、2、3、4、6、8。
-Atlas 800I A3(16\*64G)推理设备：当前支持的卡数：1、2、3、4、6、8、16。
-- [Atlas 800I A2(8*64G)环境准备指导](https://www.hiascend.com/developer/download/community/result?module=pt+ie+cann&product=4&model=32)
+- [Atlas 800I A2(8*64G)](https://www.hiascend.com/developer/download/community/result?module=pt+ie+cann&product=4&model=32)推理设备：当前支持的卡数：1、2、3、4、6、8、16
+- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1alpha001/softwareinst/instg/instg_0003.html)
 
 ### 1.2 CANN安装
 ```shell
@@ -249,7 +248,7 @@ torchrun --nproc_per_node=8 sample_video.py \
 
 #### 3.5.2 算法优化
 
-一、使用attentioncache
+使用attentioncache
 执行命令：
 ```shell
 export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
@@ -299,52 +298,6 @@ torchrun --nproc_per_node=8 sample_video.py \
 - ulysses-degree：ulysses并行使用的卡数
 - ring-degree: ring并行使用的卡数
 
-### 3.6 16卡性能测试
-仅支持Atlas 800I A3
-执行命令：
-```shell
-export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
-export TASK_QUEUE_ENABLE=2
-export CPU_AFFINITY_CONF=1
-export TOKENIZERS_PARALLELISM=false
-export ALGO=0
-torchrun --nproc_per_node=16 sample_video.py \
-      --model-base HunyuanVideo \
-      --dit-weight HunyuanVideo/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
-      --vae-path HunyuanVideo/hunyuan-video-t2v-720p/vae \
-      --text-encoder-path HunyuanVideo/text_encoder \
-      --text-encoder-2-path HunyuanVideo/clip-vit-large-patch14 \
-      --model-resolution "720p" \
-      --video-size 720 1280 \
-      --video-length 129 \
-      --infer-steps 50 \
-      --prompt "A cat walks on the grass, realistic style." \
-      --seed 42 \
-      --flow-reverse \
-      --ulysses-degree 8 \
-      --ring-degree 2 \
-      --vae-parallel \
-      --save-path ./results
-```
-参数说明： 
-- ALGO: 为0表示默认FA算子；设置为1表示使用高性能FA算子
-- nproc_per_node: 并行推理的总卡数。
-- model-base: 权重路径，包含vae、text_encoder、Tokenizer、Transformer和Scheduler五个模型的配置文件及权重。
-- dit-weight: dit的权重路径
-- vae-path: VAE的权重路径
-- text-encoder-path: text_encoder的权重路径
-- text-encoder-2-path: text_encoder_2的权重路径
-- model-resolution: 分辨率
-- video-size: 生成视频的高和宽
-- video-length: 总帧数
-- infer-steps: 推理步数
-- prompt: 文本提示词
-- seed: 随机种子
-- vae-parallel: vae部分使能并行，目前只支持8卡、16卡并行时使用
-- save-path: 生成的视频的保存路径
-- flow-reverse：是否进行反向采样
-- ulysses-degree：ulysses并行使用的卡数
-- ring-degree: ring并行使用的卡数
 
 ## 精度指标
 我们使用prompts.txt测试了seed42-46五组种子的视频，并测试了vbench并取平均值，6个指标如下：
diff --git a/MindIE/MultiModal/Janus-Pro/README.md b/MindIE/MultiModal/Janus-Pro/README.md
index 52d242d018e3b103081ec8684925a42739a90217..aa36610f05b2e705e96030345b705c816e9134d0 100644
--- a/MindIE/MultiModal/Janus-Pro/README.md
+++ b/MindIE/MultiModal/Janus-Pro/README.md
@@ -25,7 +25,7 @@ Atlas 800I A2推理设备：支持的卡数最小为1
 Atlas 300I Duo推理卡：支持的卡数最小为1
 Atlas 300 V:支持的卡数最小为1
 - [Atlas 800I A2/Atlas 300I Duo/Atlas 300 V](https://www.hiascend.com/developer/download/community/result?module=pt+ie+cann)
-- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/softwareinst/instg/instg_0001.html)
+- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1alpha001/softwareinst/instg/instg_0003.html)
 
 ### 1.2 CANN安装
 ```shell
@@ -55,7 +55,7 @@ chmod +x ./Ascend-mindie_${version}_linux-${arch}.run
 cd /usr/local/Ascend/mindie && source set_env.sh
 
 # 方式二：指定路径安装
-./Ascend-mindie_${version}_linux-${arch}.run --install-path=${AieInstallPath}
+./Ascend-mindie_${version}_linux-${arch}.run --install --install-path=${AieInstallPath}
 # 设置环境变量
 cd ${AieInstallPath}/mindie && source set_env.sh
 ```
diff --git a/MindIE/MultiModal/OpenSora-1.2/README.md b/MindIE/MultiModal/OpenSora-1.2/README.md
index a439a60331bb934f297f893274c505e04ec47fe2..7b28a836379c1d4837ec5f02aaf4d65ca4476ee8 100644
--- a/MindIE/MultiModal/OpenSora-1.2/README.md
+++ b/MindIE/MultiModal/OpenSora-1.2/README.md
@@ -11,7 +11,7 @@
 - 设备支持：
 Atlas 800I A2推理设备：支持的卡数最小为1
 - [Atlas 800I A2](https://www.hiascend.com/developer/download/community/result?module=pt+ie+cann&product=4&model=32)
-- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/softwareinst/instg/instg_0001.html)
+- [环境准备指导](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1alpha001/softwareinst/instg/instg_0003.html)
 
 ### 1.2 CANN安装
 ```shell
@@ -50,7 +50,7 @@ chmod +x ./Ascend-mindie_${version}_linux-${arch}.run
 cd /usr/local/Ascend/mindie && source set_env.sh
 
 # 方式二：指定路径安装
-./Ascend-mindie_${version}_linux-${arch}.run --install-path=${AieInstallPath}
+./Ascend-mindie_${version}_linux-${arch}.run --install --install-path=${AieInstallPath}
 # 设置环境变量
 cd ${AieInstallPath}/mindie && source set_env.sh
 ```