diff --git "a/app/zh/blogs/chenchao245/\346\225\205\351\232\234\346\263\250\345\205\245\346\241\206\346\236\266\344\275\277\347\224\250\346\214\207\345\215\227.md" "b/app/zh/blogs/chenchao245/\346\225\205\351\232\234\346\263\250\345\205\245\346\241\206\346\236\266\344\275\277\347\224\250\346\214\207\345\215\227.md"
new file mode 100644
index 0000000000000000000000000000000000000000..ff0833cb89580c828490614414727b44498963bd
--- /dev/null
+++ "b/app/zh/blogs/chenchao245/\346\225\205\351\232\234\346\263\250\345\205\245\346\241\206\346\236\266\344\275\277\347\224\250\346\214\207\345\215\227.md"
@@ -0,0 +1,177 @@
+---
+title: '故障注入框架'
+
+date: '2024-5-29'
+
+category: 'blog'
+tags: '故障注入框架'
+
+archives: '2024-5-29'
+
+author: 'chenchao'
+
+summary: '故障注入框架'
+
+times: '15:30'
+---
+
+## 1 概述
+
+本文主要介绍故障注入框架,支持开发者在预期点位注入预期故障,模拟外界故障导致的偶现问题,用于问题复现和验证。
+
+
+## 2 使用介绍
+
+### 2.1 识别故障类型
+
+现有故障类型包括如下5种:
+| 故障名 | 含义 |
+| ------------------------- | ---------- |
+| DMS_FI_TYPE_PACKET_LOSS | 网络丢包 |
+| DMS_FI_TYPE_NET_LATENCY | 网络延时 |
+| DMS_FI_TYPE_CPU_LATENCY | cpu延时 |
+| DMS_FI_TYPE_PROCESS_FAULT | 进程退出 |
+| DMS_FI_TYPE_CUSTOM_FAULT | 自定义故障 |
+
+每种故障类型用两个维度描述,一个是故障点序列,一个是故障值
+| 故障类型 | 故障点序列 | 故障值 | 故障值含义 |
+| :--------: | :------------------------: | :----------------------: | :--------: |
+| CPU延时类 | SS_FI_CPU_LATENCY_ENTRIES | SS_FI_CPU_LATENCY_MS | 延时毫秒数 |
+| 网络延时类 | SS_FI_NET_LATENCY_ENTRIES | SS_FI_NET_LATENCY_MS | 延时毫秒数 |
+| 进程退出类 | SS_FI_PROCESSFAULT_ENTRIES | SS_FI_PEOCESS_FAULT_PROB | 退出概率 |
+| 网络丢包类 | SS_FI_PACKET_LOSS_ENTRIES | SS_FIPACKET_LOSS_PROB | 丢包概率 |
+| 自定义类 | SS_FI_CUSTOM_FAULT_ENTRIES | SS_FI_CUSTOM_FAULT_PARAM | 自定义使用 |
+
+故障点序列是一个形如1,2,3,4的字符串,表示开启1、2、3、4号点位的某个故障;
+每个故障都有故障值,具体含义如表所示。
+
+使用者首先需要识别需要注入故障类型,选择出合适的故障类型进行后续的注入。
+
+### 2.2 识别故障点位
+#### 现有故障点如下
+DMS侧:
+
+| 请求消息 | 应答消息 |
+| ------------------------------------------- | ------------------------------- |
+| DMS_FI_REQ_ASK_MASTER_FOR_PAGE = 1 | DMS_FI_ACK_CHECK_VISIBLE = 35 |
+| DMS_FI_REQ_ASK_OWNER_FOR_PAGE | DMS_FI_ACK_PAGE_OWNER_ID |
+| DMS_FI_REQ_INVALIDATE_SHARE_COPY | DMS_FI_ACK_BROADCAST |
+| DMS_FI_CLAIM_OWNER | DMS_FI_ACK_BROADCAST_WITH_MSG |
+| DMS_FI_REQ_CR_PAGE | DMS_FI_ACK_PAGE_READY |
+| DMS_FI_REQ_ASK_MASTER_FOR_CR_PAGE | DMS_FI_ACK_GRANT_OWNER |
+| DMS_FI_REQ_ASK_OWNER_FOR_CR_PAGE | DMS_FI_ACK_ALREADY_OWNER |
+| DMS_FI_REQ_CHECK_VISIBLE | DMS_FI_ACK_CR_PAGE |
+| DMS_FI_REQ_TRY_ASK_MASTER_FOR_PAGE_OWNER_ID | DMS_FI_ACK_TXN_WAIT |
+| DMS_FI_REQ_BROADCAST | DMS_FI_ACK_LOCK |
+| DMS_FI_REQ_TXN_INFO | DMS_FI_ACK_TXN_INFO |
+| DMS_FI_REQ_TXN_SNAPSHOT | DMS_FI_ACK_TXN_SNAPSHOT |
+| DMS_FI_REQ_WAIT_TXN | DMS_FI_ACK_WAIT_TXN |
+| DMS_FI_REQ_AWAKE_TXN | DMS_FI_ACK_AWAKE_TXN |
+| DMS_FI_REQ_MASTER_CKPT_EDP | DMS_FI_ACK_MASTER_CKPT_EDP |
+| DMS_FI_REQ_OWNER_CKPT_EDP | DMS_FI_ACK_OWNER_CKPT_EDP |
+| DMS_FI_REQ_MASTER_CLEAN_EDP | DMS_FI_ACK_MASTER_CLEAN_EDP |
+| DMS_FI_REQ_OWNER_CLEAN_EDP | DMS_FI_ACK_OWNER_CLEAN_EDP |
+| DMS_FI_REQ_MGRT_MASTER_DATA | DMS_FI_ACK_ERROR |
+| DMS_FI_REQ_RELEASE_OWNER | DMS_FI_ACK_RELEASE_PAGE_OWNER |
+| DMS_FI_REQ_BOC | DMS_FI_ACK_INVLDT_SHARE_COPY |
+| DMS_FI_REQ_CONFIRM_CVT | DMS_FI_ACK_BOC |
+| DMS_FI_REQ_DDL_SYNC | DMS_FI_ACK_EDP_LOCAL |
+| DMS_FI_REQ_INVALID_OWNER | DMS_FI_ACK_EDP_READY |
+| DMS_FI_REQ_ASK_RES_OWNER_ID | DMS_FI_ACK_INVLD_OWNER |
+| DMS_FI_REQ_PROTOCOL_MAINTAIN_VERSION | DMS_FI_ACK_ASK_RES_OWNER_ID |
+| DMS_FI_REQ_CREATE_GLOBAL_XA_RES | DMS_FI_ACK_CREATE_GLOBAL_XA_RES |
+| DMS_FI_REQ_DELETE_GLOBAL_XA_RES | DMS_FI_ACK_DELETE_GLOBAL_XA_RES |
+| DMS_FI_REQ_ASK_XA_OWNER_ID | DMS_FI_ACK_ASK_XA_OWNER_ID |
+| DMS_FI_REQ_END_XA | DMS_FI_ACK_END_XA |
+| DMS_FI_REQ_ASK_XA_IN_USE | DMS_FI_ACK_XA_IN_USE = 65 |
+| DMS_FI_REQ_MERGE_XA_OWNERS | |
+| DMS_FI_REQ_XA_REBUILD | |
+| DMS_FI_REQ_XA_OWNERS | |
+| DMS_FI_REQ_RECYCLE = 34 | |
+
+openGauss侧:
+| DB_FI_CHANGE_BUFFERTAG_BLOCKNUM = **10000** |
+| ------------------------------------------- |
+
+使用者根据实际情况在合适点位注入故障,合适点位包括上面的现有点位或者自定义添加点位。
+注:使用者想新增点位,只需在DMS中的dms_fi_point_name_e枚举或openGauss中的db_fi_point_name枚举中新增枚举,并在对应地点调用DMS_FAULT_INJECTION_CALL和SS_FAULT_INJECTION_CALL,入参传入对应枚举值即可。
+
+### 2.3 新增点位注入故障
+DMS侧宏介绍:
+
+ DMS_FAULT_INJECTION_CALL(point,...):激活故障,如果是CPU延时类、网络延时类、进程退出类立即执行
+
+ FAULT_INJECTION_ACTION_TRIGGER(action):执行丢包故障
+
+ FAULT_INJECTION_ACTION_TRIGGER_CUSTOM(action):执行自定义故障
+
+openGauss侧宏介绍:
+
+ SS_FAULT_INJECTION_CALL(point,...):激活故障,如果是CPU延时类、网络延时类、进程退出类立即执行
+
+ FAULT_INJECTION_ACTION_TRIGGER_CUSTOM(action):执行自定义故障
+
+| | CPU延时类
网络延时类
进程退出类 | 网络丢包类 | 自定义故障 |
+| ----------- | -------------------------------------- | ----------------------------------------------------------- | ------------------------------------------------------------ |
+| DMS侧 | DMS_FAULT_INJECTION_CALL | DMS_FAULT_INJECTION_CALL
FAULT_INJECTION_ACTION_TRIGGER | DMS_FAULT_INJECTION_CALL
FAULT_INJECTION_ACTION_TRIGGER_CUSTOM |
+| openGauss侧 | SS_FAULT_INJECTION_CALL | | SS_FAULT_INJECTION_CALL
FAULT_INJECTION_ACTION_TRIGGER_CUSTOM |
+
+自定义故障需要新增自定义故障函数,作为DMS_FAULT_INJECTION_CALL或SS_FAULT_INJECTION_CALL的入参。
+***注***:CALL和TRIGGER配合使用的时候,CALL进行前置操作,TRIGGER真正触发故障。
+
+
+### 2.4 开启故障
+1.延时类:设置网络延时和cpu延时的多个故障点,设置延时时间,观察日志,是否延时生效;
+
+例:alter system set ss_fi_net_latency_entries='1,2,3,4'; //在点位1、2、3、4号点位设置网络延时
+
+ alter system set ss_fi_net_latency_ms=30; //ss_fi_net_latency_ms范围0-4294967295
+
+ 故障成功生效日志:[DMS_FI]entry:%d triggers network latency
+
+ alter system set ss_fi_cpu_latency_entries='1,2,3,4'; //在点位1、2、3、4号点位设置cpu延时
+
+ alter system set ss_fi_cpu_latency_ms=30; //ss_fi_cpu_latency_ms范围0-4294967295
+
+ 故障成功生效日志:[DMS_FI]entry:%d triggers cpu latency
+
+2.进程退出类:设置进程退出的多个故障点,设置退出概率, 观察日志,是否进程退出
+
+例:alter system set ss_fi_process_fault_entries='1,2,3,4'; //在点位1、2、3、4号点位设置进程退出
+
+ alter system set ss_fi_process_fault_prob=30; //ss_fi_process_fault_prob范围0-100
+
+ 故障成功生效日志:[DMS_FI]entry:%d triggers proc fault exit, %d in %d
+
+3.丢包类:设置丢包的多个故障点,设置丢包概率,观察日志,是否响应请求消息或者响应消息发送失败
+
+例:alter system set ss_fi_packet_loss_entries='1,2,3,4'; //在点位1、2、3、4号点位设置丢包
+
+ alter system set ss_fi_packet_loss_prob=30; //ss_fi_packet_loss_prob范围0-100
+
+ 故障成功生效日志:[DMS_FI]triggers packloss cmd:%u, %d in %d
+
+4.自定义类:在最近问题单中找到一个偶现问题,修改代码到原来版本,通过在对应点位自定义故障模拟问题单故障,运行tpcc导入数据后,得到跟问题单相同core。更新代码到最新,进行相同操作,无core产生。
+
+例:alter system set ss_fi_custom_fault_entries='10000'; //设置自定义故障点位
+
+ alter system set ss_fi_custom_fault_param=30; //ss_fi_custom_fault_param范围0-4294967295
+
+ 故障成功生效日志:[DMS_FI]entry:%d triggers cust fault
+
+### 3 总结
+#### 3.1 在现有点位注入故障
+使用 alter system set ss_fi_XXX_entries='x,x,x,x' 开启x,x,x,x这4个点位的XXX故障
+然后使用 alter system set ss_fi_xxxxx=XX,设置想要注入故障的故障值
+
+#### 3.2 新增点位注入故障
+使用者想新增点位,只需在DMS中的dms_fi_point_name_e枚举或openGauss中的db_fi_point_name枚举中新增枚举,并在对应代码位置调用DMS_FAULT_INJECTION_CALL或SS_FAULT_INJECTION_CALL,第一个入参传入对应枚举值即可。
+使用 alter system set ss_fi_XXX_entries='x' x为新增点位枚举值,打开x点位的XXX故障
+然后使用 alter system set ss_fi_xxxxx=XX,设置想要注入故障的故障值
+
+#### 3.3 注入自定义故障
+使用者想新增自定义故障,首先完成3.2中的新增点位,然后根据自己需求实现自定义故障函数,
+在对应代码位置调用DMS_FAULT_INJECTION_CALL或SS_FAULT_INJECTION_CALL,第二个入参传入自定义故障函数即可(可空实现)
+最后调用FAULT_INJECTION_ACTION_TRIGGER_CUSTOM,传入实际故障操作
+使用 alter system set ss_fi_custom_fault_entries='x' x为新增点位枚举值,打开x点位的自定义故障
+然后使用 alter system set ss_fi_custom_fault_param=XX,设置想要注入故障的故障值(可不使用)
\ No newline at end of file