diff --git a/web-ui/docs/zh/blog/luoyuzhe/RCU/1606730405603.png b/web-ui/docs/zh/blog/luoyuzhe/RCU/1606730405603.png new file mode 100644 index 0000000000000000000000000000000000000000..66b3a8acff79dd4b223031289c7beeac081d16eb Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/RCU/1606730405603.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/RCU/1607262150081.png b/web-ui/docs/zh/blog/luoyuzhe/RCU/1607262150081.png new file mode 100644 index 0000000000000000000000000000000000000000..e39b0f099ba935c6c9ca2e782eb9b0b88c35d5d0 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/RCU/1607262150081.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/RCU/Read-Copy Update.md b/web-ui/docs/zh/blog/luoyuzhe/RCU/Read-Copy Update.md new file mode 100644 index 0000000000000000000000000000000000000000..a45ee54868f65c657c71b37ddcd1fd435f649cd3 --- /dev/null +++ b/web-ui/docs/zh/blog/luoyuzhe/RCU/Read-Copy Update.md @@ -0,0 +1,331 @@ + + +# Read-Copy Update机制 + + title: openEuler中的RCU机制 + date: 2020-12-13 + + tags: + + -openEuler操作系统 + + -RCU机制 + + archives: 2020-12 + + author: luoyuzhe + + summary: 阅读前置系列文章可访问https://www.zhihu.com/column/openEuler-serial. + +​ 随着CPU核的指令执行速度的提升超过多核架构中延迟下降的速度,全局同步的开销将会被进一步加大,这种情况催生了更有效率的锁机制,Read-Copy Update(RCU)机制就是在这样的背景之下诞生的。RCU机制是针对读访问频率远高于写访问的数据结构而实际的一种非对称的读写同步机制。其基本原理是通过以提升写访问开销的代价换取读访问开销的下降,从而降低对数据结构频繁读少量写的整体开销。RCU机制对读操作不施加同步操作或者锁,从而对数据结构的更新不会阻塞读操作,这就意味着读操作有可能读到旧的值,而读旧值在RCU机制中是允许出现的。 + +​ 为了更好地理解RCU机制,我们需要首先了解以下基本概念: + +- 被保护的数据结构(Guarded Data Structure):被锁等同步机制保护的数据结构,如果没有这些机制的保护该数据结构就不能被安全地访问; + +- 静止状态(Quiescent state):在静止状态中的线程不保留任何被保护的数据结构的前置状态(prior state); + +- 静止期(Quiescent period):在一个静止期中,每一个线程都至少经历了一个静止状态; + + + +![image-20201122221131008](./image-20201122221131008.png) + +上图展示了静止状态和静止期之间的关系,在静止期中,线程1-4先后经历了一个静止状态。容易理解,任何一个包含静止期的时间段都是静止期。**静止期能够保证任何在静止期之前发生的对被保护数据结构的操作都能在静止期之后被观察到。** + +​ 在RCU机制中,读者可以无锁地对相关数据结构进行读取操作;当写者对相关数据结构进行写操作时,如果读者正在进行读者操作,那么写者需要将原数据结构重新拷贝一份,然后在重新拷贝的数据结构上进行更新操作,再等到适当的时机用新的数据结构替换旧的数据结构。这个合适的时机就是所有读者都退出的时机,这时候更新新的值,那么接下来的读者就能读到新值了。那么如何找到这个读者都退出的时机呢?这就是静止期的作用,如果线程最近经历了一个静止期,那么该线程没有保留所有之前的数据结构的状态,我们就可以确认线程的读操作都退出了。在操作系统中,典型的静止状态有CPU idle状态、运行用户程序、halt和进行上下文切换,在这些操作中,线程不再保留对任何内核数据结构的访问。 + +​ 下面我们来看一个RCU机制的例子,假设线程0正在更新链表中的元素B而线程1在无锁便利该链表: + +![image-20201122222732114](./image-20201122222732114.png) + +​ 如果线程0无法对元素B进行原子更新操作,那么在线程1的干扰下,它就无法直接修改元素B的值。根据RCU机制的规则,它首先将B拷贝到B Prime: + +![image-20201122222937227](./image-20201122222937227.png) + +​ 线程0更新被拷贝的B值后,元素A的下一个元素变成了B Prime,但这不影响线程1,因为元素B的旧值仍然存在且其下一个元素为C。现在,线程0只需要等待线程1不再访问元素B就能释放它了: + +![image-20201122223303691](./image-20201122223303691.png) + +​ 线程0等待一个静止期,这时候线程1已经无法再访问元素B了,是时候释放元素B的空间了: + +![image-20201122223428013](./image-20201122223428013.png) + +​ 释放元素B之后,链表中的B值已经被新的值替换: + +![image-20201122223518123](./image-20201122223518123.png) + +​ RCU机制中,我们需要了解合适静止期结束,从而我们需要一个静止期的检测算法: + +1. 任何需要等待静止期的实体将其在静止期后要进行的操作用回调函数的方式注册到其CPU的回调函数链表中; +2. 一段时间之后,该CPU通知其它CPU静止期开始; +3. 每个接到新静止期开始通知的其它CPU保存一个静止状态计数器的快照(我的理解,这个计数器保存了当前已经经历静止状态的CPU的数目,即一个CPU经历了静止状态后,就给该计数器进行加一的原子操作,一个静止期内一个CPU只有一次修改该计数器的机会); +4. 当每个CPU都发现其快照值与当前的静止状态计数器的快照值不同时,可以确定已经经历了一个静止期,最后一个经历静止状态的CPU同时记录已经经历了一个静止期; +5. 每一个CPU得知新的静止期结束后,调用其回调函数列表中注册的回调函数操作; + +在实际的系统设计中,操作系统往往需要支持大量的CPU,从而要求RCU机制具有良好的扩展性,树状RCU孕育而生。在现代操作系统中,**RCU原本设计中的静止期一般也被称为宽限期(grace period)**。实际上,RCU机制中的核心问题是确定何时静止期结束,从而能在适当的时候调用回调函数。然而,在大规模的多CPU系统中记录CPU的静止状态变得更加复杂,而树状RCU能更好地适应这种情况。树状RCU中有两种关键的数据结构:rcu_node和rcu_data。rcu_data用于记录每个处理器的静止状态,它是一个每处理器数据结构, 并构成rcu_node构成的树中的叶子节点中;rcu_node用于将其子节点的静止状态信息上传到根节点,以及从根节点将静止期信息传到CPU。一旦一个rcu_node的子节点全部上报了静止状态信息,该节点就会向父节点报告静止状态信息。当根节点收到了所有静止状态报告后,它会向叶子节点下发静止期结束的通知,从而处理器可以执行该静止期中或在更早的时间注册的回调函数,例如销毁对象的回调函数。 + +rcu_node在kernel/rcu/tree.h文件中: + +``` +/* + * Definition for node within the RCU grace-period-detection hierarchy. + */ +struct rcu_node { + raw_spinlock_t __private lock; /* Root rcu_node's lock protects */ + /* some rcu_state fields as well as */ + /* following. */ + unsigned long gp_seq; /* Track rsp->rcu_gp_seq. */ + unsigned long gp_seq_needed; /* Track rsp->rcu_gp_seq_needed. */ + unsigned long completedqs; /* All QSes done for this node. */ + unsigned long qsmask; /* CPUs or groups that need to switch in */ + /* order for current grace period to proceed.*/ + /* In leaf rcu_node, each bit corresponds to */ + /* an rcu_data structure, otherwise, each */ + /* bit corresponds to a child rcu_node */ + /* structure. */ + unsigned long rcu_gp_init_mask; /* Mask of offline CPUs at GP init. */ + unsigned long qsmaskinit; + …… + /* Only one bit will be set in this mask. */ + int grplo; /* lowest-numbered CPU or group here. */ + int grphi; /* highest-numbered CPU or group here. */ + u8 grpnum; /* CPU/group number for next level up. */ + u8 level; /* root is at level 0. */ + bool wait_blkd_tasks;/* Necessary to wait for blocked tasks to */ + /* exit RCU read-side critical sections */ + /* before propagating offline up the */ + /* rcu_node tree? */ + bool wait_blkd_tasks;/* Necessary to wait for blocked tasks to */ + /* exit RCU read-side critical sections */ + /* before propagating offline up the */ + /* rcu_node tree? */ + struct rcu_node *parent; + struct list_head blkd_tasks; + /* Tasks blocked in RCU read-side critical */ + /* section. Tasks are placed at the head */ + /* of this list and age towards the tail. */ + struct list_head *gp_tasks; + /* Pointer to the first task blocking the */ + /* current grace period, or NULL if there */ + /* is no such task. */ + + + + …… +} ____cacheline_internodealigned_in_smp; +``` + +其中gp_seq为本节点的当前静止期编号;qsmask记录了本节点的子节点的静止状态,一个bit为1表明该bit对应的子节点还没有报告静止期;level记录了本rcu_node在树中所处的层数,根节点处在第0层;parent指针指向本节点的父节点;blkd_tasks为阻塞进程列表,保存了在读端临界区里被其它进程抢占的进程;gp_tasks指向阻塞当前静止期的第一个进程。 + +rcu_data数据结构的源码也可以在同一个文件中找到: + +``` +/* Per-CPU data for read-copy update. */ +struct rcu_data { + /* 1) quiescent-state and grace-period handling : */ + unsigned long gp_seq; /* Track rsp->rcu_gp_seq counter. */ + unsigned long gp_seq_needed; /* Track rsp->rcu_gp_seq_needed ctr. */ + unsigned long rcu_qs_ctr_snap;/* Snapshot of rcu_qs_ctr to check */ + /* for rcu_all_qs() invocations. */ + union rcu_noqs cpu_no_qs; /* No QSes yet for this CPU. */ + bool core_needs_qs; /* Core waits for quiesc state. */ + bool beenonline; /* CPU online at least once. */ + bool gpwrap; /* Possible ->gp_seq wrap. */ + struct rcu_node *mynode; /* This CPU's leaf of hierarchy */ + unsigned long grpmask; /* Mask to apply to leaf qsmask. */ + /* 2) batch handling */ + struct rcu_segcblist cblist; /* Segmented callback list, with */ + /* different callbacks waiting for */ + /* different grace periods. */ + …… + + int cpu; + struct rcu_state *rsp; +}; +``` + +其中gp_seq为本处理器看到的当前最高的静止期编号;cpu_no_qs数据结构记录了本处理器是否经历过静止状态; core_need_qs变量表示RCU需要本处理器报告静止状态;cblist本处理器中注册的回调函数列表;cpu是本处理器的编号;rsp则指向一个维护RCU全局状态的数据结构rcu_state。 + +RCU机制的基本流程,可以用rcu_read_lock(), rcu_read_unlock()和synchronize_rcu()三个API来描述。其中rcu_read_lock()函数用于进入读端临界区;rcu_read_unlock()用于退出读端临界区;synchronize_rcu()函数用于等待静止期结束。 + +例如对于如下一段代码: + +![1606730405603](./1606730405603.png) + +假设CPU0运行rcu_reader(),CPU1运行rcu_update(),那么该树状RCU包含两个rcu_data数据结构(分别对应CPU0和CPU1),它们的父节点为一个rcu_node: + +![image-rcu_example](./image-rcu_example.png) + +在理想状态下,CPU1将x设置为1并调用synchronize_rcu()等待静止期结束,synchronize_rcu()会阻塞CPU1中的进程并造成上下文切换,从而CPU1经历静止状态并由rcu_data记录;CPU0进入读端临界区,读取x和y的值,然后退出读端临界区,一个随后的上下文切换将会使CPU0的rcu_data记录静止状态,从而系统的静止期结束,这时候CPU1的rcu_update()函数可以将y值设置为1了。 + +其中rcu_read_lock()函数最终调用了函数 **__rcu_read_lock**() ,该函数将读端临界区嵌套次数加一,这种设计允许RCU读者在读端临界区里被其它进程抢占,然后迁移到其它处理器退出读端临界区, **__rcu_read_lock**() 的代码在kernel/rcu/tree_plugin.h文件中可以找到: + +![1607262150081](./1607262150081.png) + +相比于进入读端临界区只需要增加读端临界区的嵌套次数,退出读端临界区的过程较为复杂,其代码也在tree_plugin.h文件中: + +``` +/* + * Preemptible RCU implementation for rcu_read_unlock(). + * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost + * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then + * invoke rcu_read_unlock_special() to clean up after a context switch + * in an RCU read-side critical section and other special cases. + */ +void __rcu_read_unlock(void) +{ + struct task_struct *t = current; + + if (t->rcu_read_lock_nesting != 1) { + --t->rcu_read_lock_nesting; + } else { + barrier(); /* critical section before exit code. */ + t->rcu_read_lock_nesting = INT_MIN; + barrier(); /* assign before ->rcu_read_unlock_special load */ + if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s))) + rcu_read_unlock_special(t); + barrier(); /* ->rcu_read_unlock_special load before assign */ + t->rcu_read_lock_nesting = 0; + } +#ifdef CONFIG_PROVE_LOCKING + { + int rrln = READ_ONCE(t->rcu_read_lock_nesting); + + WARN_ON_ONCE(rrln < 0 && rrln > INT_MIN / 2); + } +#endif /* #ifdef CONFIG_PROVE_LOCKING */ +} +EXPORT_SYMBOL_GPL(__rcu_read_unlock); +``` + +如果嵌套层数不为1,那么说明不是最后一个退出读端临界区的进程,只需要将嵌套层数减一;否则将嵌套层数设置为负数,这样如果被其它进程抢占,则其它进程会意识到该进程正在退出读端临界区的最外层,并继续执行从读端临界区退出的剩余部分,而这个被抢占的进程也会调用rcu_read_unlock_special()函数将自己从rcu_node的阻塞进程列表中删除,最后退出读端临界区最外层的进程将嵌套层数设为0。 + +synchronize_rcu()的代码也在tree_plugin.h文件中: + +``` +/** + * synchronize_rcu - wait until a grace period has elapsed. + * + * Control will return to the caller some time after a full grace + * period has elapsed, in other words after all currently executing RCU + * read-side critical sections have completed. Note, however, that + * upon return from synchronize_rcu(), the caller might well be executing + * concurrently with new RCU read-side critical sections that began while + * synchronize_rcu() was waiting. RCU read-side critical sections are + * delimited by rcu_read_lock() and rcu_read_unlock(), and may be nested. + * + * See the description of synchronize_sched() for more detailed + * information on memory-ordering guarantees. However, please note + * that -only- the memory-ordering guarantees apply. For example, + * synchronize_rcu() is -not- guaranteed to wait on things like code + * protected by preempt_disable(), instead, synchronize_rcu() is -only- + * guaranteed to wait on RCU read-side critical sections, that is, sections + * of code protected by rcu_read_lock(). + */ +void synchronize_rcu(void) +{ + RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) || + lock_is_held(&rcu_lock_map) || + lock_is_held(&rcu_sched_lock_map), + "Illegal synchronize_rcu() in RCU read-side critical section"); + if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) + return; + if (rcu_gp_is_expedited()) + synchronize_rcu_expedited(); + else + wait_rcu_gp(call_rcu); +} +EXPORT_SYMBOL_GPL(synchronize_rcu); +``` + +其基本作用是用于等待静止期的结束。它调用了wait_rcu_gp()函数,该函数会调用wake_me_after_rcu()函数等待所有读者退出临界区。当所有读者退出临界区后,wake_me_after_rcu()会唤醒wait_rcu_gp()函数,随后synchronize_rcu()函数就能返回调用者了。 + +那么如何判断静止期结束呢?在树状RCU中,当处理器经历了静止状态,还需将静止状态向上一层汇报,只有当根节点的所有成员都汇报了静止状态,当前静止期才可以结束。在报告静止期的过程中,首先时钟中断函数会调用函数rcu_check_callbacks(),该函数代码在kernel/rcu/tree.c文件中: + +``` +/* + * Check to see if this CPU is in a non-context-switch quiescent state + * (user mode or idle loop for rcu, non-softirq execution for rcu_bh). + * Also schedule RCU core processing. + * + * This function must be called from hardirq context. It is normally + * invoked from the scheduling-clock interrupt. + */ +void rcu_check_callbacks(int user) +{ + trace_rcu_utilization(TPS("Start scheduler-tick")); + increment_cpu_stall_ticks(); + if (user || rcu_is_cpu_rrupt_from_idle()) { + + /* + * Get here if this CPU took its interrupt from user + * mode or from the idle loop, and if this is not a + * nested interrupt. In this case, the CPU is in + * a quiescent state, so note it. + * + * No memory barrier is required here because both + * rcu_sched_qs() and rcu_bh_qs() reference only CPU-local + * variables that other CPUs neither access nor modify, + * at least not while the corresponding CPU is online. + */ + + rcu_sched_qs(); + rcu_bh_qs(); + rcu_note_voluntary_context_switch(current); + + } else if (!in_softirq()) { + + /* + * Get here if this CPU did not take its interrupt from + * softirq, in other words, if it is not interrupting + * a rcu_bh read-side critical section. This is an _bh + * critical section, so note it. + */ + + rcu_bh_qs(); + } + rcu_preempt_check_callbacks(); + /* The load-acquire pairs with the store-release setting to true. */ + if (smp_load_acquire(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs))) { + /* Idle and userspace execution already are quiescent states. */ + if (!rcu_is_cpu_rrupt_from_idle() && !user) { + set_tsk_need_resched(current); + set_preempt_need_resched(); + } + __this_cpu_write(rcu_dynticks.rcu_urgent_qs, false); + } + if (rcu_pending()) + invoke_rcu_core(); + + trace_rcu_utilization(TPS("End scheduler-tick")); +} + + +``` + +该函数首先检查了CPU是否经历了非Context Switch的静止状态,如用户模式或idle loop。如果是,那么调用rcu_sched_qs()函数将静止状态记录在CPU对应的rcu_data中;否则,判断该中断不是在软中断中触发,如果不是的话就调用rcu_bh_qs()函数记录静止状态。然后调用rcu_preempt_check_callbacks()函数,进而调用rcu_preempt_qs()函数记录静止状态。最后调用rcu_pending()函数检查是否有RCU相关的工作需要处理,如果有,那么调用invoke_rcu_core()函数,该函数的代码在同一个文件中: + +``` +static void invoke_rcu_core(void) +{ + if (cpu_online(smp_processor_id())) + raise_softirq(RCU_SOFTIRQ); +} +``` + +它触发了一个RCU软中断,该软中断的处理函数为rcu_process_call_backs(),从而一旦CPU有中断、抢占或底半机制使能的情况,就会调用rcu_process_call_backs()函数,这就启动了当前静止期的回调函数处理机制。 + + + + + +------ + +参考文献 + +[1]Mckenney, Paul & SLINGWINE, JOHN. (1998). Read-copy update: Using execution history to solve concurrency problems. Parallel and Distributed Computing and Systems. + +[2] Liang L , Mckenney P E , Kroening D , et al. Verification of tree-based hierarchical read-copy update in the Linux kernel[C]// Design, Automation and Test in Europe Conference and Exhibition. 0. + +[3]《Linux内核深度解析》,余华兵著,2019 \ No newline at end of file diff --git a/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122221131008.png b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122221131008.png new file mode 100644 index 0000000000000000000000000000000000000000..3f6f10fed4a0c4920892d83be3f1c26d57d897f3 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122221131008.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122222732114.png b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122222732114.png new file mode 100644 index 0000000000000000000000000000000000000000..af46827192cc9170793379a5f50f6490d093292e Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122222732114.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122222937227.png b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122222937227.png new file mode 100644 index 0000000000000000000000000000000000000000..285e2d9a24080490997471c0cb482cf2b825e7d3 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122222937227.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122223303691.png b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122223303691.png new file mode 100644 index 0000000000000000000000000000000000000000..2f7099e9f216ad0165c424f77e7c124d986b7eea Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122223303691.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122223428013.png b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122223428013.png new file mode 100644 index 0000000000000000000000000000000000000000..33e59353d660e466d561c610a03392cf7254d061 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122223428013.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122223518123.png b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122223518123.png new file mode 100644 index 0000000000000000000000000000000000000000..ce370871e489f0db3db4e61e92aa0246351993b6 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-20201122223518123.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/RCU/image-rcu_example.png b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-rcu_example.png new file mode 100644 index 0000000000000000000000000000000000000000..415067efc3389209efcc90019f6f5b1254070527 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/RCU/image-rcu_example.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/1.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/1.png new file mode 100644 index 0000000000000000000000000000000000000000..523d165e7c8ec25103755ad389f337b0c5ee7552 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/1.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/10.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/10.png new file mode 100644 index 0000000000000000000000000000000000000000..d0e9181f21edc9e07f69489873af5b9c1bc7a34d Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/10.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/11.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/11.png new file mode 100644 index 0000000000000000000000000000000000000000..b63014caee4769a79675ac18221d050d6b9f4b74 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/11.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/12.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/12.png new file mode 100644 index 0000000000000000000000000000000000000000..0898e823638a0958d347302ba3a2cc40c05ed21c Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/12.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/2.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/2.png new file mode 100644 index 0000000000000000000000000000000000000000..3cb9164a8080872803439c524d0faedaa2bb88b0 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/2.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/3.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/3.png new file mode 100644 index 0000000000000000000000000000000000000000..b8b58506b507d9abf2f01a8dd505a1a90bce8f6b Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/3.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/4.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/4.png new file mode 100644 index 0000000000000000000000000000000000000000..d47013571029c99ace3673e19b823b9393cfcb46 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/4.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/5.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/5.png new file mode 100644 index 0000000000000000000000000000000000000000..9f76d548019b8028e20afc0512b40739b104a98e Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/5.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/6.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/6.png new file mode 100644 index 0000000000000000000000000000000000000000..38b6c6a8787d217847e76ded4921591afc1dd868 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/6.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/7.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/7.png new file mode 100644 index 0000000000000000000000000000000000000000..e19f07a54ef2740ac7ee2faacec709f17248cc6c Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/7.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/8.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/8.png new file mode 100644 index 0000000000000000000000000000000000000000..909e2a0e62075a1d4bd681e0610675dad9b37fc2 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/8.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/9.png b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/9.png new file mode 100644 index 0000000000000000000000000000000000000000..35049d9b44bb12d5d9dc0922dcc913a4b5db5c27 Binary files /dev/null and b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/9.png differ diff --git a/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/The_Communication_Of_Threads_In_openEuler.md b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/The_Communication_Of_Threads_In_openEuler.md new file mode 100644 index 0000000000000000000000000000000000000000..a4ad3d48b1b8fdc78a2ecfd7d72a6a13412a636c --- /dev/null +++ b/web-ui/docs/zh/blog/luoyuzhe/The_Communication_Of_Threads_In_openEuler/The_Communication_Of_Threads_In_openEuler.md @@ -0,0 +1,191 @@ +--- + +title: openEuler中的线程间通信 + +date: 2020-11-09 + +tags: + +​-openEuler操作系统 + +​-线程 + +​-通信 + +​archives: 2020-11 + +​author: luoyuzhe + +​summary: 阅读前置系列文章可访问https://www.zhihu.com/column/openEuler-serial. + +--- + +上一期中我们介绍了进程的地址空间,这一期我们将介绍openEuler中的线程间通信。在《进程的地址空间》两期中,我们介绍了同一个线程组中的线程共享了数据结构mm_struct和vm_area_struct,从而实现对内存的共享。然而,当两个线程同时需要访问同一块被共享的内存时,会出现竞争,也就是说两个线程访问该内存的顺序可能不确定,从而导致程序的输出结果不确定。为了保证线程访问共享变量的顺序是程序员期望的,我们需要线程间通信的机制,常用的线程间通信机制有锁和信号量。锁和信号量机制可以用于保证同一时间只有一个线程进入访问某一共有资源的程序片段,这样的程序片段被称为临界区。openEuler中的锁有自旋锁和互斥锁。本期我们介绍自旋锁,包含排队自旋锁和排队读写锁。 + +自旋锁用于保护较短的临界区,在临界区内不允许睡眠。自旋锁的定义代码在include/linux/spinlock_types.h文件中可以找到: + +![1604842309144](./1.png) + +这段代码封装了另一个结构体raw_spinlock,其定义代码在同一个文件中可以找到: + +1604842650827 + +arch_spinlock_t这个类型对于单处理器和SMP情形的定义是不同的,对于单处理器类型它的定义在spinlock_types_up.h这个文件中,而对于SMP情形则在spinlock_types.h文件中: + +1604843071888 + +对于单处理器情形,arch_spinlock_t类型被定义为空: + +![1605016728313](./4.png) + +对于SMP的情形,我们在arch/arm64/include/asm/spinlock_types.h文件中可以看到其定义包含在两个文件中: + +1604843758966 + +在qspinlock_types.h文件中我们可以找到arch_spinlock_t的定义: + +``` +typedef struct qspinlock { + union { + atomic_t val; + + /* + * By using the whole 2nd least significant byte for the + * pending bit, we can allow better optimization of the lock + * acquisition for the pending bit holder. + */ + +#ifdef __LITTLE_ENDIAN + struct { + u8 locked; + u8 pending; + }; + struct { + u16 locked_pending; + u16 tail; + }; +#else + struct { + u16 tail; + u16 locked_pending; + }; + struct { + u8 reserved[2]; + u8 pending; + u8 locked; + + }; + +#endif + }; +} arch_spinlock_t; +``` + +其中locked位表示锁的状态,该位为1表示锁不可获取,为0则表示锁可以获取;pending是未决位,该位为1时表示至少有一个线程在等待锁,该位为0时表示要么锁被持有但没有争用,要么等待队列已经存在[2]。 + +这两个文件中分为定义了排队自旋锁和排队读写锁,读写锁是自旋锁的改进,自旋锁在同一时刻只允许一个线程进入临界区,而读写锁则允许多个读者并发地访问临界区,但只允许一个写者在同一时刻访问临界区(也就是说写者和写者、读者和写者之间是互斥的,但读者和读者之间可以并发)。排队指的是使用队列算法来管理多个希望持有锁的线程,在openEuler中使用的是FIFO算法来管理排队等待的线程。排队自旋锁在参考文献[2]中已有详细介绍,本文主要介绍排队读写锁。 + +排队读写锁的定义代码在include/asm-generic/qrwlock_types.h 文件中可以找到: + +![1604844850419](./6.png) + +其中cnts原子变量的低9位用来标识该队列读写锁是否有写者持有,将该变量右移九位则表示持有该队列读写锁的读者数量。同时,我们可以看到该读写锁中还定义了一个排队自旋锁wait_lock。为了更好地理解这段代码,我们首先来了解一下openEuler中的原子变量和其操作。原子变量可以用来实现对整数变量的互斥访问,对原子变量的原子操作是不可分割的。原子变量用atomic_t类型标识,对原子变量常用的操作有[1]: + +- atomic_read(v):读取原子变量的值; +- atomic_add_return(i,v):把原子变量v的值加上i并返回新值; +- atomic_sub_return(i,v):把原子变量v的值减去i,并且返回新值; +- atomic_cmpxchg(v,old,new):如果原子变量v的值等于old,那么将v的值设置为new,并返回旧值; + +在openEuler中,include/linux/atomic.h文件中定义了一系列原子操作,如atomic_add_return_aquire: + +![1605003305448](./7.png) + +atomic_cmpxchg_aquire: + +![1605003249589](./8.png) + +使用这些原子操作,可以对队列读写锁中的cnts变量进行操作,例如在include/asm-generic/qrwlock.h 文件中的queued_read_trylock()函数中,首先使用atomic_read()函数读出原子变量cnts的值,如果没有写者持有该锁,则增加持有该锁的读者数量并返回1,否则维持读者数量不变并返回0: + +![1605004268622](./9.png) + +其中有关的定义值的定义如下: + +![1605004339762](./10.png) + +因此cnts & QW_WMASK用来检查是否有写者占有锁,而使用_QR_BIAS对cnts进行原子加则可以用cnts的高位对读者数量进行计数。再考察一下queued_write_trylock()函数: + +![1605011384999](./11.png) + +该函数通过判断cnts变量是否为0来判断是否有读者或写者已经占有该读写锁,如果已经占有,则抢锁失败;如果锁没有被读者或写者占有,则将cnts的低八位置为0xff,表示锁已经被写者持有。 + +再来看看queued_read_lock()函数,该函数在确认没有写者占有希望抢占的锁时调用queued__read_lock_slow_path()函数: + +![1605011841027](./12.png) + +该函数的代码在./kernel/locking/qrwlock.c文件中可以找到: + +``` +/** + + * queued_read_lock_slowpath - acquire read lock of a queue rwlock + + * @lock: Pointer to queue rwlock structure + */ + void queued_read_lock_slowpath(struct qrwlock *lock) + { + /* + * Readers come here when they cannot get the lock without waiting + */ + if (unlikely(in_interrupt())) { + /* + * Readers in interrupt context will get the lock immediately + * if the writer is just waiting (not holding the lock yet), + * so spin with ACQUIRE semantics until the lock is available + * without waiting in the queue. + */ + atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED)); + return; + } + atomic_sub(_QR_BIAS, &lock->cnts); + + /* + * Put the reader into the wait queue + */ + arch_spin_lock(&lock->wait_lock); + atomic_add(_QR_BIAS, &lock->cnts); + + /* + * The ACQUIRE semantics of the following spinning code ensure + * that accesses can't leak upwards out of our subsequent critical + * section in the case that the lock is currently held for write. + */ + atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED)); + + /* + * Signal the next one in queue to become queue head + */ + arch_spin_unlock(&lock->wait_lock); + } + EXPORT_SYMBOL(queued_read_lock_slowpath); + + +``` + +atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED))的含义是等待写者释放锁,如果有写者持有锁就一直等待。arch_spin_lock()通过排队自旋锁的机制将读者放入队列中,arch_spin_unlock()函数则将队列中的下一个读者放到队列头。在排队自旋锁的机制中队列机制中,如果锁处于可获取的状态,则线程直接获取锁;如果锁已经被其它线程持有了,但没有其它线程等待,那么线程以自旋的方式不断尝试获取锁;如果有更多的线程等待,那么线程组成队列,队列头的线程自旋并不断尝试获取锁,其余线程休眠[2]。有关这一部分的内容可以查看参考文献[2]第6章或参考文献[3]。写者获取锁的排队机制也是类似的,区别在于读者获取锁只需要等待写者释放,而写者获取锁需要等待其他读者和写者都释放。 + +下一期我们将介绍openEuler中的进程间通信。 + + + +--- + + + +### 参考文献 + +[1]《Linux内核深度解析》,余华兵著,2019 + +[2]《openEuler操作系统》,任炬、张尧学、彭许红编著,2020 + +[3]https://zhuanlan.zhihu.com/p/100546935 + diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/1.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/1.png" new file mode 100644 index 0000000000000000000000000000000000000000..523d165e7c8ec25103755ad389f337b0c5ee7552 Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/1.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/10.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/10.png" new file mode 100644 index 0000000000000000000000000000000000000000..d0e9181f21edc9e07f69489873af5b9c1bc7a34d Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/10.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/11.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/11.png" new file mode 100644 index 0000000000000000000000000000000000000000..b63014caee4769a79675ac18221d050d6b9f4b74 Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/11.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/12.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/12.png" new file mode 100644 index 0000000000000000000000000000000000000000..0898e823638a0958d347302ba3a2cc40c05ed21c Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/12.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/2.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/2.png" new file mode 100644 index 0000000000000000000000000000000000000000..3cb9164a8080872803439c524d0faedaa2bb88b0 Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/2.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/3.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/3.png" new file mode 100644 index 0000000000000000000000000000000000000000..b8b58506b507d9abf2f01a8dd505a1a90bce8f6b Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/3.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/4.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/4.png" new file mode 100644 index 0000000000000000000000000000000000000000..d47013571029c99ace3673e19b823b9393cfcb46 Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/4.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/5.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/5.png" new file mode 100644 index 0000000000000000000000000000000000000000..9f76d548019b8028e20afc0512b40739b104a98e Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/5.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/6.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/6.png" new file mode 100644 index 0000000000000000000000000000000000000000..38b6c6a8787d217847e76ded4921591afc1dd868 Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/6.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/7.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/7.png" new file mode 100644 index 0000000000000000000000000000000000000000..e19f07a54ef2740ac7ee2faacec709f17248cc6c Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/7.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/8.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/8.png" new file mode 100644 index 0000000000000000000000000000000000000000..909e2a0e62075a1d4bd681e0610675dad9b37fc2 Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/8.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/9.png" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/9.png" new file mode 100644 index 0000000000000000000000000000000000000000..35049d9b44bb12d5d9dc0922dcc913a4b5db5c27 Binary files /dev/null and "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/9.png" differ diff --git "a/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241.md" "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241.md" new file mode 100644 index 0000000000000000000000000000000000000000..5266e1240147d5574f532f050db8ee3fe3cdb6cb --- /dev/null +++ "b/web-ui/docs/zh/blog/luoyuzhe/openEuler\344\270\255\347\232\204\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241/\347\272\277\347\250\213\351\227\264\351\200\232\344\277\241.md" @@ -0,0 +1,191 @@ +--- + +title: openEuler中的线程间通信 + +date: 2020-11-09 + +tags: + +​ -openEuler操作系统 + +​ -线程 + +​ -通信 + +​ archives: 2020-11 + +​ author: luoyuzhe + +​ summary: 阅读前置系列文章可访问https://www.zhihu.com/column/openEuler-serial. + +--- + +上一期中我们介绍了进程的地址空间,这一期我们将介绍openEuler中的线程间通信。在《进程的地址空间》两期中,我们介绍了同一个线程组中的线程共享了数据结构mm_struct和vm_area_struct,从而实现对内存的共享。然而,当两个线程同时需要访问同一块被共享的内存时,会出现竞争,也就是说两个线程访问该内存的顺序可能不确定,从而导致程序的输出结果不确定。为了保证线程访问共享变量的顺序是程序员期望的,我们需要线程间通信的机制,常用的线程间通信机制有锁和信号量。锁和信号量机制可以用于保证同一时间只有一个线程进入访问某一共有资源的程序片段,这样的程序片段被称为临界区。openEuler中的锁有自旋锁和互斥锁。本期我们介绍自旋锁,包含排队自旋锁和排队读写锁。 + +自旋锁用于保护较短的临界区,在临界区内不允许睡眠。自旋锁的定义代码在include/linux/spinlock_types.h文件中可以找到: + +![1604842309144](./1.png) + +这段代码封装了另一个结构体raw_spinlock,其定义代码在同一个文件中可以找到: + +1604842650827 + +arch_spinlock_t这个类型对于单处理器和SMP情形的定义是不同的,对于单处理器类型它的定义在spinlock_types_up.h这个文件中,而对于SMP情形则在spinlock_types.h文件中: + +1604843071888 + +对于单处理器情形,arch_spinlock_t类型被定义为空: + +![1605016728313](4.png) + +对于SMP的情形,我们在arch/arm64/include/asm/spinlock_types.h文件中可以看到其定义包含在两个文件中: + +1604843758966 + +在qspinlock_types.h文件中我们可以找到arch_spinlock_t的定义: + +``` +typedef struct qspinlock { + union { + atomic_t val; + + /* + * By using the whole 2nd least significant byte for the + * pending bit, we can allow better optimization of the lock + * acquisition for the pending bit holder. + */ + +#ifdef __LITTLE_ENDIAN + struct { + u8 locked; + u8 pending; + }; + struct { + u16 locked_pending; + u16 tail; + }; +#else + struct { + u16 tail; + u16 locked_pending; + }; + struct { + u8 reserved[2]; + u8 pending; + u8 locked; + + }; + +#endif + }; +} arch_spinlock_t; +``` + +其中locked位表示锁的状态,该位为1表示锁不可获取,为0则表示锁可以获取;pending是未决位,该位为1时表示至少有一个线程在等待锁,该位为0时表示要么锁被持有但没有争用,要么等待队列已经存在[2]。 + +这两个文件中分为定义了排队自旋锁和排队读写锁,读写锁是自旋锁的改进,自旋锁在同一时刻只允许一个线程进入临界区,而读写锁则允许多个读者并发地访问临界区,但只允许一个写者在同一时刻访问临界区(也就是说写者和写者、读者和写者之间是互斥的,但读者和读者之间可以并发)。排队指的是使用队列算法来管理多个希望持有锁的线程,在openEuler中使用的是FIFO算法来管理排队等待的线程。排队自旋锁在参考文献[2]中已有详细介绍,本文主要介绍排队读写锁。 + +排队读写锁的定义代码在include/asm-generic/qrwlock_types.h 文件中可以找到: + +![1604844850419](6.png) + +其中cnts原子变量的低9位用来标识该队列读写锁是否有写者持有,将该变量右移九位则表示持有该队列读写锁的读者数量。同时,我们可以看到该读写锁中还定义了一个排队自旋锁wait_lock。为了更好地理解这段代码,我们首先来了解一下openEuler中的原子变量和其操作。原子变量可以用来实现对整数变量的互斥访问,对原子变量的原子操作是不可分割的。原子变量用atomic_t类型标识,对原子变量常用的操作有[1]: + +- atomic_read(v):读取原子变量的值; +- atomic_add_return(i,v):把原子变量v的值加上i并返回新值; +- atomic_sub_return(i,v):把原子变量v的值减去i,并且返回新值; +- atomic_cmpxchg(v,old,new):如果原子变量v的值等于old,那么将v的值设置为new,并返回旧值; + +在openEuler中,include/linux/atomic.h文件中定义了一系列原子操作,如atomic_add_return_aquire: + +![1605003305448](7.png) + +atomic_cmpxchg_aquire: + +![1605003249589](8.png) + +使用这些原子操作,可以对队列读写锁中的cnts变量进行操作,例如在include/asm-generic/qrwlock.h 文件中的queued_read_trylock()函数中,首先使用atomic_read()函数读出原子变量cnts的值,如果没有写者持有该锁,则增加持有该锁的读者数量并返回1,否则维持读者数量不变并返回0: + +![1605004268622](9.png) + +其中有关的定义值的定义如下: + +![1605004339762](10.png) + +因此cnts & QW_WMASK用来检查是否有写者占有锁,而使用_QR_BIAS对cnts进行原子加则可以用cnts的高位对读者数量进行计数。再考察一下queued_write_trylock()函数: + +![1605011384999](11.png) + +该函数通过判断cnts变量是否为0来判断是否有读者或写者已经占有该读写锁,如果已经占有,则抢锁失败;如果锁没有被读者或写者占有,则将cnts的低八位置为0xff,表示锁已经被写者持有。 + +再来看看queued_read_lock()函数,该函数在确认没有写者占有希望抢占的锁时调用queued__read_lock_slow_path()函数: + +![1605011841027](12.png) + +该函数的代码在./kernel/locking/qrwlock.c文件中可以找到: + +``` +/** + + * queued_read_lock_slowpath - acquire read lock of a queue rwlock + + * @lock: Pointer to queue rwlock structure + */ + void queued_read_lock_slowpath(struct qrwlock *lock) + { + /* + * Readers come here when they cannot get the lock without waiting + */ + if (unlikely(in_interrupt())) { + /* + * Readers in interrupt context will get the lock immediately + * if the writer is just waiting (not holding the lock yet), + * so spin with ACQUIRE semantics until the lock is available + * without waiting in the queue. + */ + atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED)); + return; + } + atomic_sub(_QR_BIAS, &lock->cnts); + + /* + * Put the reader into the wait queue + */ + arch_spin_lock(&lock->wait_lock); + atomic_add(_QR_BIAS, &lock->cnts); + + /* + * The ACQUIRE semantics of the following spinning code ensure + * that accesses can't leak upwards out of our subsequent critical + * section in the case that the lock is currently held for write. + */ + atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED)); + + /* + * Signal the next one in queue to become queue head + */ + arch_spin_unlock(&lock->wait_lock); + } + EXPORT_SYMBOL(queued_read_lock_slowpath); + + +``` + +atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED))的含义是等待写者释放锁,如果有写者持有锁就一直等待。arch_spin_lock()通过排队自旋锁的机制将读者放入队列中,arch_spin_unlock()函数则将队列中的下一个读者放到队列头。在排队自旋锁的机制中队列机制中,如果锁处于可获取的状态,则线程直接获取锁;如果锁已经被其它线程持有了,但没有其它线程等待,那么线程以自旋的方式不断尝试获取锁;如果有更多的线程等待,那么线程组成队列,队列头的线程自旋并不断尝试获取锁,其余线程休眠[2]。有关这一部分的内容可以查看参考文献[2]第6章或参考文献[3]。写者获取锁的排队机制也是类似的,区别在于读者获取锁只需要等待写者释放,而写者获取锁需要等待其他读者和写者都释放。 + +下一期我们将介绍openEuler中的进程间通信。 + + + +--- + + + +### 参考文献 + +[1]《Linux内核深度解析》,余华兵著,2019 + +[2]《openEuler操作系统》,任炬、张尧学、彭许红编著,2020 + +[3]https://zhuanlan.zhihu.com/p/100546935 +