diff --git a/news/README.md b/news/README.md index d1c695fd74bf56ad628eada0d6d400dd71c046a0..1890e7f620824e95042e43a4dbdc956370f5ac0b 100644 --- a/news/README.md +++ b/news/README.md @@ -5,6 +5,824 @@ * [2022 年](2022.md) * [2023 年 - 上半年](2023-1st-half.md) +## 20231112:第 68 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[GIT PULL: RISC-V Patches for the 6.7 Merge Window, Part 2](http://lore.kernel.org/linux-riscv/mhng-9011bd86-9f72-4223-b78a-b4c69c1c927f@palmer-ri-x1c9/)** + +> The following changes since commit e1c05b3bf80f829ced464bdca90f1dfa96e8d251: +> +> RISC-V: hwprobe: Fix vDSO SIGSEGV (2023-11-02 14:05:30 -0700) +> +> are available in the Git repository at: +> +> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git tags/riscv-for-linus-6.7-mw2 +> + +**[v1: riscv: Enable percpu page first chunk allocator](http://lore.kernel.org/linux-riscv/20231110140721.114235-1-alexghiti@rivosinc.com/)** + +> While working with pcpu variables, I noticed that riscv did not support +> first chunk allocation in the vmalloc area which may be needed as a fallback +> in case of a sparse NUMA configuration. +> + +**[v7: StarFive's Pulse Width Modulation driver support](http://lore.kernel.org/linux-riscv/20231110062039.103339-1-william.qiu@starfivetech.com/)** + +> This patchset adds initial rudimentary support for the StarFive +> Pulse Width Modulation controller driver. And this driver will +> be used in StarFive's VisionFive 2 board.The first patch add +> Documentations for the device and Patch 2 adds device probe for +> the module. +> + +**[v5: RISC-V: Add MMC support for TH1520 boards](http://lore.kernel.org/linux-riscv/20231109-th1520-mmc-v5-0-018bd039cf17@baylibre.com/)** + +> This series adds support for the MMC controller in the T-Head TH1520 +> SoC, and it enables the eMMC and microSD slot on both the BeagleV +> Ahead and the Sipeed LicheePi 4A. +> + +**[v2: riscv: Support RANDOMIZE_KSTACK_OFFSET](http://lore.kernel.org/linux-riscv/20231109133751.212079-1-songshuaishuai@tinylab.org/)** + +> Inspired from arm64's implement -- commit 70918779aec9 +> ("arm64: entry: Enable random_kstack_offset support") +> +> Add support of kernel stack offset randomization while handling syscall, +> the offset is defaultly limited by KSTACK_OFFSET_MAX() (i.e. 10 bits). +> + +**[v1: drivers: perf: Check find_first_bit() return value](http://lore.kernel.org/linux-riscv/20231109082128.40777-1-alexghiti@rivosinc.com/)** + +> We must check the return value of find_first_bit() before using the +> return value as an index array since it happens to overflow the array +> and then panic +> + +**[v3: perf vendor events riscv: add StarFive Dubhe-90 JSON file](http://lore.kernel.org/linux-riscv/20231109074900.1971266-1-jisheng.teoh@starfivetech.com/)** + +> StarFive's Dubhe-90 supports raw event id 0x00 - 0x22. +> The raw events are enabled through PMU node of DT binding. +> Besides raw event, add standard RISC-V firmware events to +> support monitoring of firmware event. +> + +**[v1: RISC-V: Don't rely on positional structure initialization](http://lore.kernel.org/linux-riscv/20231107155529.8368-1-palmer@rivosinc.com/)** + +> Without this I get a bunch of warnings along the lines of +> +> arch/riscv/kernel/module.c:535:26: error: positional initialization of field in 'struct' declared with 'designated_init' attribute [-Werror=designated-init] +> 535 | [R_RISCV_32] = { apply_r_riscv_32_rela }, +> + +**[v3: riscv: report more ISA extensions through hwprobe](http://lore.kernel.org/linux-riscv/20231107105556.517187-1-cleger@rivosinc.com/)** + +> In order to be able to gather more information about the supported ISA +> extensions from userspace using the hwprobe syscall, add more ISA extensions +> report. +> + +**[v6: RISC-V: Show accurate per-hart isa in /proc/cpuinfo](http://lore.kernel.org/linux-riscv/20231106232439.3176268-1-evan@rivosinc.com/)** + +> In /proc/cpuinfo, most of the information we show for each processor is +> specific to that hart: marchid, mvendorid, mimpid, processor, hart, +> compatible, and the mmu size. But the ISA string gets filtered through a +> lowest common denominator mask, so that if one CPU is missing an ISA +> extension, no CPUs will show it. +> + +**[v3: RISC-V: Probe misaligned access speed in parallel](http://lore.kernel.org/linux-riscv/20231106225855.3121724-1-evan@rivosinc.com/)** + +> Probing for misaligned access speed takes about 0.06 seconds. On a +> system with 64 cores, doing this in smp_callin() means it's done +> serially, extending boot time by 3.8 seconds. That's a lot of boot time. +> + +**[v1: riscv: lib: Implement optimized memchr function](http://lore.kernel.org/linux-riscv/20231106170025.778193-1-ivan.orlov@codethink.co.uk/)** + +> At the moment we don't have an architecture-specific memchr +> implementation for riscv in the Kernel. The generic version of this +> function iterates the memory area bytewise looking for the target value, +> which is not the optimal approach. +> + +**[GIT PULL: RISC-V Patches for the 6.7 Merge Window, Part 1](http://lore.kernel.org/linux-riscv/mhng-ff3fd3dc-81e2-440e-9e77-b97a64e23520@palmer-ri-x1c9/)** + +> I'll almost certainly have more for later this week, there's already some on +> for-next that I wanted to give a little more time for testing. +> + +**[v1: dt-bindings: riscv: cpus: Add AMD MicroBlaze V compatible](http://lore.kernel.org/linux-riscv/d442d916204d26f82c1c3a924a4cdfb117960e1b.1699270661.git.michal.simek@amd.com/)** + +> MicroBlaze V is new AMD/Xilinx soft-core 32bit RISC-V processor IP. +> It is hardware compatible with classic MicroBlaze processor. +> + +**[v14: KVM: guest_memfd() and per-page attributes](http://lore.kernel.org/linux-riscv/20231105163040.14904-1-pbonzini@redhat.com/)** + +> [If the introduction below is not enough, go read +> https://lwn.net/SubscriberLink/949277/118520c1248ace63/ and subscribe to LWN] +> +> Introduce several new KVM uAPIs to ultimately create a guest-first memory +> subsystem within KVM, a.k.a. guest_memfd. Guest-first memory allows KVM +> to provide features, enhancements, and optimizations that are kludgly +> or outright impossible to implement in a generic memory subsystem. +> + +#### 进程调度 + +**[v1: sched/fair: Use all little CPUs for CPU-bound workload](http://lore.kernel.org/lkml/20231110125902.2152380-1-pierre.gondois@arm.com/)** + +> Running n CPU-bound tasks on an n CPUs platform with asymmetric CPU +> capacity might result in a task placement where two tasks run on a +> big CPU and none on a little CPU. This placement could be more optimal +> by using all CPUs. +> + +**[v6: drm-misc-next: drm/sched: implement dynamic job-flow control](http://lore.kernel.org/lkml/20231110001638.71750-1-dakr@redhat.com/)** + +> Currently, job flow control is implemented simply by limiting the number +> of jobs in flight. Therefore, a scheduler is initialized with a credit +> limit that corresponds to the number of jobs which can be sent to the +> hardware. +> + +#### 内存管理 + +**[v2: mm:hugetlb_cgroup: Optimize variable initialization](http://lore.kernel.org/linux-mm/20231107211307.14279-1-zhaimingbing@cmss.chinamobile.com/)** + +> Initialize 'max' with 'unsigned long' instead of 'long'. +> + +**[v2: samples: introduce cgroup events listeners](http://lore.kernel.org/linux-mm/20231110082045.19407-1-ddrokosov@salutedevices.com/)** + +> To begin with, this patch series relocates the cgroup example code to +> the samples/cgroup directory, which is the appropriate location for such +> code snippets. +> + +**[v1: mm/damon/core.c: avoid unintentional filtering out of schemes](http://lore.kernel.org/linux-mm/1699594629-3816-1-git-send-email-hyeongtak.ji@gmail.com/)** + +> The function '__damos_filter_out()' causes DAMON to always filter out +> schemes whose filter type is anon or memcg if its matching value is set +> to false. +> +> This commit addresses the issue by ensuring that '__damos_filter_out()' +> no longer applies to filters whose type is 'anon' or 'memcg'. +> + +**[v4: memcg weighted interleave mempolicy control](http://lore.kernel.org/linux-mm/20231109002517.106829-1-gregory.price@memverge.com/)** + +> This patchset implements weighted interleave and adds a new cgroup +> sysfs entry: cgroup/memory.interleave_weights (excluded from root). +> +> The il_weight of a node is used by mempolicy to implement weighted +> interleave when `numactl --interleave=...` is invoked. By default +> il_weight for a node is always 1, which preserves the default round +> robin interleave behavior. +> + +**[v2: mm/page_alloc: Dedupe some memcg uncharging logic](http://lore.kernel.org/linux-mm/20231108164920.3401565-1-jackmanb@google.com/)** + +> The duplication makes it seem like some work is required before +> uncharging in the !PageHWPoison case. But it isn't, so we can simplify +> the code a little. +> + +**[v1: Introduce unbalance proactive reclaim](http://lore.kernel.org/linux-mm/20231108065818.19932-1-link@vivo.com/)** + +> In some cases, we need to selectively reclaim file pages or anonymous +> pages in an unbalanced manner. +> +> For example, when an application is pushed to the background and frozen, +> it may not be opened for a long time, and we can safely reclaim the +> application's anonymous pages, but we do not want to touch the file pages. +> + +**[v1: mm:ALLOC_HIGHATOMIC flag allocation issuse](http://lore.kernel.org/linux-mm/20231108065408.1861-1-justinjiang@vivo.com/)** + +> In case that alloc_flags contains ALLOC_HIGHATOMIC and alloc order +> is order1/2/3/10 in rmqueue(), if pages are alloced successfully +> from pcplist, a free pageblock will be also moved from the alloced +> migratetype freelist to MIGRATE_HIGHATOMIC freelist, rather than +> alloc from MIGRATE_HIGHATOMIC freelist firstly, so this will result +> in an increasing number of pages on the MIGRATE_HIGHATOMIC freelist, +> pages in other migratetype freelist are reduced and more likely to +> allocation failure. +> + +**[v1: Make the kernel preemptible](http://lore.kernel.org/linux-mm/20231107215742.363031-1-ankur.a.arora@oracle.com/)** + +> We have two models of preemption: voluntary and full (and RT which is +> a fuller form of full preemption.) In this series -- which is based +> on Thomas' PoC (see [1]), we try to unify the two by letting the +> scheduler enforce policy for the voluntary preemption models as well. +> + +**[v10: mm: use memmap_on_memory semantics for dax/kmem](http://lore.kernel.org/linux-mm/20231107-vv-kmem_memmap-v10-0-1253ec050ed0@intel.com/)** + +> The dax/kmem driver can potentially hot-add large amounts of memory +> originating from CXL memory expanders, or NVDIMMs, or other 'device +> memories'. There is a chance there isn't enough regular system memory +> available to fit the memmap for this new memory. It's therefore +> desirable, if all other conditions are met, for the kmem managed memory +> to place its memmap on the newly added memory itself. +> + +**[v1: mm/filemap: increase usage of folio_next_index() helper](http://lore.kernel.org/linux-mm/20231107024635.4512-1-duminjie@vivo.com/)** + +> Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using +> the existing helper folio_next_index() in filemap_get_folios_contig(). +> + +**[v4: zswap: memcontrol: implement zswap writeback disabling](http://lore.kernel.org/linux-mm/20231106231158.380730-1-nphamcs@gmail.com/)** + +> During our experiment with zswap, we sometimes observe swap IOs due to +> occasional zswap store failures and writebacks-to-swap. These swapping +> IOs prevent many users who cannot tolerate swapping from adopting zswap +> to save memory and improve performance where possible. +> + +**[v1: kasan: save mempool stack traces](http://lore.kernel.org/linux-mm/cover.1699297309.git.andreyknvl@google.com/)** + +> This series updates KASAN to save alloc and free stack traces for +> secondary-level allocators that cache and reuse allocations internally +> instead of giving them back to the underlying allocator (e.g. mempool). +> + +#### 文件系统 + +**[v1: [overlayfs] do we still need d_instantiate_anon() and export of d_alloc_anon()?](http://lore.kernel.org/linux-fsdevel/20231111080400.GO1957730@ZenIV/)** + +> AFAICS, the main reason for exposing those used to be the need +> to store ovl_entry in allocated dentry; we needed to do that before it +> gets attached to inode, so the guts of d_obtain_alias() had to be +> exposed. +> + +**[v1: fs: Consider capabilities relative to namespace for linkat permission check](http://lore.kernel.org/linux-fsdevel/20231110170615.2168372-1-cmirabil@redhat.com/)** + +> This is a one line change that makes `linkat` aware of namespaces when +> checking for capabilities. +> +> As far as I can tell, the call to `capable` in this code dates back to +> before the `ns_capable` function existed, so I don't think the author +> specifically intended to prefer regular `capable` over `ns_capable`, +> and no one has noticed or cared to change it yet... until now! +> + +**[v3: next: proc: support file->f_pos checking in mem_lseek](http://lore.kernel.org/linux-fsdevel/20231110151928.2667204-1-wozizhi@huawei.com/)** + +> In mem_lseek, file->f_pos may overflow. And it's not a problem that +> mem_open set file mode with FMODE_UNSIGNED_OFFSET(memory_lseek). However, +> another file use mem_lseek do lseek can have not FMODE_UNSIGNED_OFFSET +> (kpageflags_proc_ops/proc_pagemap_operations...), so in order to prevent +> file->f_pos updated to an abnormal number, fix it by checking overflow and +> FMODE_UNSIGNED_OFFSET. +> + +**[v10: bpf-next: BPF token and BPF FS-based delegation](http://lore.kernel.org/linux-fsdevel/20231110034838.1295764-1-andrii@kernel.org/)** + +> This patch set introduces an ability to delegate a subset of BPF subsystem +> functionality from privileged system-wide daemon (e.g., systemd or any other +> container manager) through special mount options for userns-bound BPF FS to +> a *trusted* unprivileged application. Trust is the key here. This +> functionality is not about allowing unconditional unprivileged BPF usage. +> Establishing trust, though, is completely up to the discretion of respective +> privileged application that would create and mount a BPF FS instance with +> delegation enabled, as different production setups can and do achieve it +> through a combination of different means (signing, LSM, code reviews, etc), +> and it's undesirable and infeasible for kernel to enforce any particular way +> of validating trustworthiness of particular process. +> + +**[v1: loop/010: Add test for mode 0 fallocate() on loop devices](http://lore.kernel.org/linux-fsdevel/20231110010139.3901150-5-sarthakkukreti@chromium.org/)** + +> A recent patch series[1] adds support for calling fallocate() in mode 0 +> on block devices. This test adds a basic sanity test for loopback devices +> setup on a sparse file and validates that writes to the loopback device +> succeed, even when the underlying filesystem runs out of space. +> + +**[v9: v9: Introduce provisioning primitives](http://lore.kernel.org/linux-fsdevel/20231110010139.3901150-1-sarthakkukreti@chromium.org/)** + +> This patch series is version 9 of the patch series to introduce +> block-level provisioning mechanism (original [1]), which is useful for +> provisioning space across thinly provisioned storage architectures (loop +> devices backed by sparse files, dm-thin devices, virtio-blk). This +> series has minimal changes over v8[2], with a couple of patches dropped +> (suggested by Dave). +> + +**[v1: Fuse submount_lookup needs to be initialized](http://lore.kernel.org/linux-fsdevel/cover.1699564053.git.kjlx@templeofstupid.com/)** + +> I got a couple of bug reports[1][2] this morning from teams that are +> tracking regresssions in linux-next. My patch 513dfacefd71 ("fuse: +> share lookup state between submount and its parent") is causing panics +> in the fuse unmount path. The reports came from users with SLUB_DEBUG +> enabled, and the additional debug sanitization catches the fact that the +> submount_lookup field isn't getting initialized which could lead to a +> subsequently bogus attempt to access the submount_lookup structure and +> adjust its refcount. +> + +**[v2: simplifying fast_dput(), dentry_kill() et.al.](http://lore.kernel.org/linux-fsdevel/20231109061932.GA3181489@ZenIV/)** + +> The series below is the fallout of trying to document the dentry +> refcounting and life cycle - basically, getting rid of the bits that +> had been too subtle and ugly to write them up. +> + +**[v1: mm, page_owner: make the owner in page owner clearer](http://lore.kernel.org/linux-fsdevel/20231109032521.392217-1-jeff.xie@linux.dev/)** + +> The current page_owner function can display the allocate and free process of +> a page, but there is no accurate information corresponding to the page. +> +> This patchset can make the slab page to know which kmem cache the page is +> requested from. +> + +**[v2: -next: proc: support file->f_pos checking in mem_lseek](http://lore.kernel.org/linux-fsdevel/20231109102658.2075547-1-wozizhi@huawei.com/)** + +> In mem_lseek, file->f_pos may overflow. And it's not a problem that +> mem_open set file mode with FMODE_UNSIGNED_OFFSET(memory_lseek). However, +> another file use mem_lseek do lseek can have not FMODE_UNSIGNED_OFFSET +> (kpageflags_proc_ops/proc_pagemap_operations...), so in order to prevent +> file->f_pos updated to an abnormal number, fix it by checking overflow and +> FMODE_UNSIGNED_OFFSET. +> + +**[v2: virtiofs: Export filesystem tags through sysfs](http://lore.kernel.org/linux-fsdevel/20231108213333.132599-1-vgoyal@redhat.com/)** + +> virtiofs filesystem is mounted using a "tag" which is exported by the +> virtiofs device. virtiofs driver knows about all the available tags but +> these are not exported to user space. +> + +**[GIT PULL: xfs: new code for 6.7](http://lore.kernel.org/linux-fsdevel/87fs1g1rac.fsf@debian-BULLSEYE-live-builder-AMD64/)** + +> The remaining changes are limited to bug fixes and code cleanups. +> +> There was a delay in me pushing the changes to XFS' for-next branch and hence +> a delay in the code changes reaching the linux-next tree. The XFS changes +> reached linux-next on 31st of October. The delay was due to me having to drop +> a patch from the XFS tree and having to initiate execution of the test suite +> once again on October 26th. The complete test run requires around 4 days to +> complete. +> + +**[v1: Add folio_zero_tail() and folio_fill_tail()](http://lore.kernel.org/linux-fsdevel/20231107212643.3490372-1-willy@infradead.org/)** + +> I'm trying to make it easier for filesystems with tailpacking / +> stuffing / inline data to use folios. The primary function here is +> folio_fill_tail(). You give it a pointer to memory where the data +> currently is, and it takes care of copying it into the folio at that +> offset. That works for gfs2 & iomap. Then There's Ext4. Rather than +> gin up some kind of specialist "Here's a two pointers to two blocks +> of memory" routine, just let it do its current thing, and let it call +> folio_zero_tail(), which is also called by folio_fill_tail(). +> + +#### 网络设备 + +**[v1: net-next: net: do not send a MOVE event when netdev changes netns](http://lore.kernel.org/netdev/20231111021324.1702591-1-kuba@kernel.org/)** + +> To rename the underlying struct device we have to call +> device_rename(). The rename()'s MOVE event, however, doesn't +> "belong" to either the old or the new namespace. +> If there are conflicts on both sides it's actually impossible +> to issue a real MOVE (old name -> new name) without confusing +> user space. And Daniel reports that such confusions do in fact +> happen for systemd, in real life. +> + +**[[PATCH net-next RFC v5 0/4] net/sched: Introduce tc block ports tracking and use](http://lore.kernel.org/netdev/20231110214618.1883611-1-victor@mojatatu.com/)** + +> __context__ +> The "tc block" is a collection of netdevs/ports which allow qdiscs to share +> match-action block instances (as opposed to the traditional tc filter per +> netdev/port)[1]. +> + +**[v1: net: memory leak in nr_rx_frame](http://lore.kernel.org/netdev/20231110173632.2511-1-bragathemanick0908@gmail.com/)** + +> The condition (make = nr_make_new(sk)) == NULL suggests +> that nr_make_new allocates memory and returns a pointer. +> If this allocation fails (returns NULL), it indicates a +> potential memory leak. +> + +**[v4: net: ti: icssg-prueth: Add missing icss_iep_put to error path](http://lore.kernel.org/netdev/7a4e5c5b-e397-479b-b1cb-4b50da248f21@siemens.com/)** + +> Analogously to prueth_remove, just also taking care for NULL'ing the +> iep pointers. +> + +**[v1: net: gso_test: support CONFIG_MAX_SKB_FRAGS up to 45](http://lore.kernel.org/netdev/20231110153630.161171-1-willemdebruijn.kernel@gmail.com/)** + +> The test allocs a single page to hold all the frag_list skbs. This +> is insufficient on kernels with CONFIG_MAX_SKB_FRAGS=45, due to the +> increased skb_shared_info frags[] array length. +> + +**[[PATCH net-next RFC] net: sfp: rework the RollBall PHY waiting code](http://lore.kernel.org/netdev/20231110131223.32201-1-kabel@kernel.org/)** + +> RollBall SFP modules allow the access to PHY registers only after a +> certain time has passed. Until then, the registers read 0xffff. +> +> Currently we have quirks for modules where we need to wait either 25 +> seconds or 4 seconds, but recently I got hands on another module where +> the wait is even shorter. +> + +**[v1: ipsec-next: Add IP-TFS mode to xfrm](http://lore.kernel.org/netdev/20231110113719.3055788-1-chopps@chopps.org/)** + +> This patchset adds a new xfrm mode implementing on-demand IP-TFS. IP-TFS +> (AggFrag encapsulation) has been standardized in RFC9347. +> + +**[v1: bpf-next: bpf, sockmap: Bundle psock->sk_redir and redir_ingress into a tagged pointer](http://lore.kernel.org/netdev/1699609302-8605-1-git-send-email-yangpc@wangsu.com/)** + +> Like skb->_sk_redir, we bundle the sock redirect pointer and +> the ingress bit to manage them together. +> + +**[v5: stmmac: Add Loongson platform support](http://lore.kernel.org/netdev/cover.1699533745.git.siyanteng@loongson.cn/)** + +> * Remove an ugly and useless patch (fix channel number). +> * Remove the non-standard dma64 driver code, and also remove +> the HWIF entries, since the associated custom callbacks no +> longer exist. +> * Refer to Serge's suggestion: Update the dwmac1000_dma.c to +> support the multi-DMA-channels controller setup. +> + +**[v1: Add support for 10G Ethernet SerDes on MT7988](http://lore.kernel.org/netdev/cover.1699565880.git.daniel@makrotopia.org/)** + +> This series aims to add support for GMAC2 and GMAC3 of the MediaTek MT7988 SoC. +> While the vendor SDK stuffs all this into their Ethernet driver, I've tried to +> seperate things into a PHY driver, a PCS driver as well as changes to the +> existing Ethernet and LynxI PCS driver. +> + +**[GIT PULL: Networking for v6.7-rc1](http://lore.kernel.org/netdev/20231109210013.1276858-1-kuba@kernel.org/)** + +> The following changes since commit ff269e2cd5adce4ae14f883fc9c8803bc43ee1e9: +> +> Merge tag 'net-next-6.7-followup' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next (2023-11-01 16:33:20 -1000) +> +> are available in the Git repository at: +> +> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git net-6.7-rc1 +> + +**[v2: net: ncsi: Revert NCSI link loss/gain commit](http://lore.kernel.org/netdev/20231109205137.819392-1-johnathanx.mantey@intel.com/)** + +> The NCSI commit +> ncsi: Propagate carrier gain/loss events to the NCSI controller +> introduced unwanted behavior. +> +> The intent for the commit was to be able to detect carrier loss/gain +> for just the NIC connected to the BMC. The unwanted effect is a +> carrier loss for auxiliary paths also causes the BMC to lose +> carrier. The BMC never regains carrier despite the secondary NIC +> regaining a link. +> + +**[v1: net: bonding: stop the device in bond_setup_by_slave()](http://lore.kernel.org/netdev/20231109180102.4085183-1-edumazet@google.com/)** + +> Commit 9eed321cde22 ("net: lapbether: only support ethernet devices") +> has been able to keep syzbot away from net/lapb, until today. +> +> In the following splat [1], the issue is that a lapbether device has +> been created on a bonding device without members. Then adding a non +> ARPHRD_ETHER member forced the bonding master to change its type. +> + +**[v1: net: ptp: annotate data-race around q->head and q->tail](http://lore.kernel.org/netdev/20231109174859.3995880-1-edumazet@google.com/)** + +> As I was working on a syzbot report, I found that KCSAN would +> probably complain that reading q->head or q->tail without +> barriers could lead to invalid results. +> + +**[v1: net: ipvlan: add ipvlan_route_v6_outbound() helper](http://lore.kernel.org/netdev/20231109152241.3754521-1-edumazet@google.com/)** + +> Inspired by syzbot reports using a stack of multiple ipvlan devices. +> +> Reduce stack size needed in ipvlan_process_v6_outbound() by moving +> the flowi6 struct used for the route lookup in an non inlined +> helper. ipvlan_route_v6_outbound() needs 120 bytes on the stack, +> immediately reclaimed. +> + +**[v1: net-next: net: don't dump stack on queue timeout](http://lore.kernel.org/netdev/20231109000901.949152-1-kuba@kernel.org/)** + +> The top syzbot report for networking (#14 for the entire kernel) +> is the queue timeout splat. We kept it around for a long time, +> because in real life it provides pretty strong signal that +> something is wrong with the driver or the device. +> + +**[v4: net-next: hv_netvsc: Mark VF as slave before exposing it to user-mode](http://lore.kernel.org/netdev/1699484212-24079-1-git-send-email-longli@linuxonhyperv.com/)** + +> When a VF is being exposed form the kernel, it should be marked as "slave" +> before exposing to the user-mode. The VF is not usable without netvsc running +> as master. The user-mode should never see a VF without the "slave" flag. +> + +**[v1: [ncsi] Revert NCSI link loss/gain commit](http://lore.kernel.org/netdev/20231108222943.4098887-1-johnathanx.mantey@intel.com/)** + +> The NCSI commit +> ncsi: Propagate carrier gain/loss events to the NCSI controller +> introduced unwanted behavior. +> + +**[v3: net: hv_netvsc: Mark VF as slave before exposing it to user-mode](http://lore.kernel.org/netdev/1699481180-17511-1-git-send-email-longli@linuxonhyperv.com/)** + +> When a VF is being exposed form the kernel, it should be marked as "slave" +> before exposing to the user-mode. The VF is not usable without netvsc running +> as master. The user-mode should never see a VF without the "slave" flag. +> + +**[v2: net: set SOCK_RCU_FREE before inserting socket into hashtable](http://lore.kernel.org/netdev/20231108211325.18938-1-sdf@google.com/)** + +> Commit 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") +> started checking SOCK_RCU_FREE _after_ the lookup to infer whether +> the refcount has been taken care of. +> + +**[v2: add qca8084 ethernet phy driver](http://lore.kernel.org/netdev/20231108113445.24825-1-quic_luoj@quicinc.com/)** + +> QCA8084 is four-port PHY with maximum link capability 2.5G, +> which supports the interface mode qusgmii and sgmii mode, +> there are two PCSs available to connected with ethernet port. +> + +**[v1: net-next: net/smc: Introduce IPPROTO_SMC for smc](http://lore.kernel.org/netdev/1699442703-25015-1-git-send-email-alibuda@linux.alibaba.com/)** + +> This commit allows users to modify dynamically the protocol before socket +> created through eBPF programs, which provides a more flexible approach +> than smc_run (LP_PRELOAD). It does not require the process restart +> and allows for controlling replacement at the connection level, whereas +> smc_run operates at the process level. +> + +**[v1: net: kcm: fill in MODULE_DESCRIPTION()](http://lore.kernel.org/netdev/20231108020305.537293-1-kuba@kernel.org/)** + +> W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). +> + +#### 安全增强 + +#### 异步 IO + +**[v2: io_uring: Statistics of the true utilization of sq threads.](http://lore.kernel.org/io-uring/20231108080732.15587-1-xiaobing.li@samsung.com/)** + +> Since the sq thread has a while(1) structure, during this process, there +> may be a lot of time that is not processing IO but does not exceed the +> timeout period, therefore, the sqpoll thread will keep running and will +> keep occupying the CPU. Obviously, the CPU is wasted at this time;Our +> goal is to count the part of the time that the sqpoll thread actually +> processes IO, so as to reflect the part of the CPU it uses to process +> IO, which can be used to help improve the actual utilization of the CPU +> in the future. +> + +**[v2: Zero copy Rx using io_uring](http://lore.kernel.org/io-uring/20231107214045.2172393-1-dw@davidwei.uk/)** + +> * Added copy fallback support if userspace memory allocated for ZC Rx +> runs out, or if header splitting or flow steering fails. +> * Added veth support for ZC Rx, for testing and demonstration. We will +> need to figure out what driver would be best for such testing +> functionality in the future. Perhaps netdevsim? +> * Added socket registration API to io_uring to associate specific +> sockets with ifqs/Rx queues for ZC. +> * Added multi-socket support, such that multiple connections can be +> steered into the same hardware Rx queue. +> * Added Netbench server/client support. +> + +**[v1: IO_URING: Statistics of the true utilization of sq threads.](http://lore.kernel.org/io-uring/20231106074055.1248629-1-xiaobing.li@samsung.com/)** + +> Since the sq thread has a while(1) structure, during this process, there +> may be a lot of time that is not processing IO but does not exceed the +> timeout period, therefore, the sqpoll thread will keep running and will +> keep occupying the CPU. Obviously, the CPU is wasted at this time;Our +> goal is to count the part of the time that the sqpoll thread actually +> processes IO, so as to reflect the part of the CPU it uses to process +> IO, which can be used to help improve the actual utilization of the CPU +> in the future. +> + +#### Rust For Linux + +**[v1: Support MODVERSIONS by disabling Rust modules](http://lore.kernel.org/rust-for-linux/20231108022651.645950-2-mmaurer@google.com/)** + +> The goal of this patch series is to allow MODVERSIONS to be turned on +> alongside RUST. Gary Guo previously proposed to adjust the API to +> support Rust symbols' greater length: +> https://lore.kernel.org/lkml/CANiq72n4MbR+AE7eHfVQZOu26FeSttQnEEMT3Jpft+CcGwk9jw@mail.gmail.com/T/ +> + +**[v1: Kunit to check the longest symbol length](http://lore.kernel.org/rust-for-linux/20231105184010.49194-1-sergio.collado@gmail.com/)** + +> The longest length of a symbol (KSYM_NAME_LEN) was increased to 512 +> in the reference [1]. This patch adds a kunit test to check the longest +> symbol length. +> +> [1] https://lore.kernel.org/lkml/20220802015052.10452-6-ojeda@kernel.org/ +> + +**[v3: rust: crates in other kernel directories](http://lore.kernel.org/rust-for-linux/20231104211213.225891-1-yakoyoku@gmail.com/)** + +> This RFC provides makes possible to have bindings for kernel subsystems +> that are compiled as modules. +> + +#### BPF + +**[v1: LSM: Officially support appending LSM hooks after boot.](http://lore.kernel.org/bpf/38b318a5-0a16-4cc2-878e-4efa632236f3@I-love.SAKURA.ne.jp/)** + +> This functionality will be used by TOMOYO security module. +> +> In order to officially use an LSM module, that LSM module has to be +> built into vmlinux. This limitation has been a big barrier for allowing +> distribution kernel users to use LSM modules which the organization who +> builds that distribution kernel cannot afford supporting [1]. Therefore, +> I've been asking for ability to append LSM hooks from LKM-based LSMs so +> that distribution kernel users can use LSMs which the organization who +> builds that distribution kernel cannot afford supporting. +> + +**[v4: bpf-next: bpf: Add support for cgroup1, BPF part](http://lore.kernel.org/bpf/20231111090034.4248-1-laoar.shao@gmail.com/)** + +> This is the BPF part of the series "bpf, cgroup: Add BPF support for +> cgroup1 hierarchy" with adjustment in the last two patches compared +> to the previous one. +> +> 1.8.3.1 +> + +**[v1: bpf: Add missed allocation hint for bpf_mem_cache_alloc_flags()](http://lore.kernel.org/bpf/20231111043821.2258513-1-houtao@huaweicloud.com/)** + +> bpf_mem_cache_alloc_flags() may call __alloc() directly when there is no +> free object in free list, but it doesn't initialize the allocation hint +> for the returned pointer. It may lead to bad memory dereference when +> freeing the pointer, so fix it by initializing the allocation hint. +> + +**[v3: bpf-next: bpf: Do not allocate percpu memory at init stage](http://lore.kernel.org/bpf/20231111013928.948838-1-yonghong.song@linux.dev/)** + +> Kirill Shutemov reported significant percpu memory consumption increase after +> booting in 288-cpu VM ([1]) due to commit 41a5db8d8161 ("bpf: Add support for +> non-fix-size percpu mem allocation"). The percpu memory consumption is +> increased from 111MB to 969MB. The number is from /proc/meminfo. +> + +**[v1: perf: perf: get_perf_callchain return NULL for crosstask](http://lore.kernel.org/bpf/20231110235021.192796-1-linux@jordanrome.com/)** + +> Return NULL instead of returning 1 incorrect frame, which +> currently happens when trying to walk the user stack for +> any task that isn't current. Returning NULL is a better +> indicator that this behavior is not supported. +> + +**[v8: Reduce overhead of LSMs with static calls](http://lore.kernel.org/bpf/20231110222038.1450156-1-kpsingh@kernel.org/)** + +> LSM hooks (callbacks) are currently invoked as indirect function calls. These +> callbacks are registered into a linked list at boot time as the order of the +> LSMs can be configured on the kernel command line with the "lsm=" command line +> parameter. +> + +**[v1: bpf-next: BPF verifier log improvements](http://lore.kernel.org/bpf/20231110161057.1943534-1-andrii@kernel.org/)** + +> This patch set moves a big chunk of verifier log related code from gigantic +> verifier.c file into more focused kernel/bpf/log.c. This is not essential to +> the rest of functionality in this patch set, so I can undo it, but it felt +> like it's good to start chipping away from 20K+ verifier.c whenever we can. +> + +**[v1: pahole: add support for kind layout, CRC encoding BTF features](http://lore.kernel.org/bpf/20231110111533.64608-1-alan.maguire@oracle.com/)** + +> add "kind_layout" and "crc" features for adding BTF kind layout +> section and adding CRCs to BTF headers respectively; this support +> depends on the libbpf support added in [1]. +> + +**[v3: bpf-next: Add kind layout, CRCs to BTF](http://lore.kernel.org/bpf/20231110110304.63910-1-alan.maguire@oracle.com/)** + +> Update struct btf_header to add a new "kind_layout" section containing +> a description of how to parse the BTF kinds known about at BTF +> encoding time. This provides the opportunity for tools that might +> not know all of these kinds - as is the case when older tools run +> on more newly-generated BTF - to still parse the BTF provided, +> even if it cannot all be used. +> + +**[v1: bpf: Do not allocate percpu memory at init stage](http://lore.kernel.org/bpf/20231110061734.2958678-1-yonghong.song@linux.dev/)** + +> Kirill Shutemov reported significant percpu memory increase after booting +> in 288-cpu VM ([1]) due to commit 41a5db8d8161 ("bpf: Add support for +> non-fix-size percpu mem allocation"). The percpu memory is increased +> from 111MB to 969MB. The number is from /proc/meminfo. +> + +**[v1: bpf-next: bpf: replace register_is_const() with is_reg_const()](http://lore.kernel.org/bpf/20231108140043.12282-1-shung-hsi.yu@suse.com/)** + +> The addition of is_reg_const() in commit 171de12646d2 ("bpf: generalize +> is_branch_taken to handle all conditional jumps in one place") has made the +> register_is_const() redundent. Give the former has more feature, plus the +> fact the latter is only used in one place, replace register_is_const() with +> is_reg_const(), and remove the definition of register_is_const. +> + +**[v2: bpf-next: bpf: add crosstask check to __bpf_get_stack](http://lore.kernel.org/bpf/20231108112334.3433136-1-jordalgo@meta.com/)** + +> Currently get_perf_callchain only supports user stack walking for +> the current task. Passing the correct *crosstask* param will return +> 0 frames if the task passed to __bpf_get_stack isn't the current +> + +**[v0: bpf-next: Unifying signed and unsigned min/max tracking](http://lore.kernel.org/bpf/20231108054611.19531-1-shung-hsi.yu@suse.com/)** + +> This patchset is a proof of concept for previously discussed concept[1] +> of unifying smin/smax with umin/uax for both the 32-bit and 64-bit +> range[2] within bpf_reg_state. This will allow the verifier to track +> both signed and unsiged ranges using just two u32 (for 32-bit range) or +> u64 (for 64-bit range), instead four as we currently do. +> + +**[v2: bpf: Let BPF verifier consider {task,cgroup} is trusted in bpf_iter_reg](http://lore.kernel.org/bpf/20231107132204.912120-1-zhouchuyi@bytedance.com/)** + +> The patchset aims to let the BPF verivier consider +> bpf_iter__cgroup->cgroup and bpf_iter__task->task is trused suggested by +> Alexei[1]. +> + +### 周边技术动态 + +#### Qemu + +**[v1: for-8.2: hw/riscv/virt.c: do create_fdt() earlier, add finalize_fdt()](http://lore.kernel.org/qemu-devel/20231110172559.73209-1-dbarboza@ventanamicro.com/)** + +> Commit 49554856f0 fixed a problem, where TPM devices were not appearing +> in the FDT, by delaying the FDT creation up until virt_machine_done(). +> This create a side effect (see gitlab #1925) - devices that need access +> to the '/chosen' FDT node during realize() stopped working because, at +> that point, we don't have a FDT. +> + +**[v1: riscv-to-apply queue](http://lore.kernel.org/qemu-devel/20231107022946.1055027-1-alistair.francis@wdc.com/)** + +> The following changes since commit 3e01f1147a16ca566694b97eafc941d62fa1e8d8: +> +> Merge tag 'pull-sp-20231105' of https://gitlab.com/rth7680/qemu into staging (2023-11-06 09:34:22 +0800) +> + +**[v1: target/riscv: don't enable Zfa by default](http://lore.kernel.org/qemu-devel/20231106111440.59995-1-jerry.zhangjian@sifive.com/)** + +> - Zfa requires F, we should not assume all CPUs have F extension +> support. +> + +#### Buildroot + +**[package/wavemon: bump to version 0.9.5](http://lore.kernel.org/buildroot/20231106194025.2844586971@busybox.osuosl.org/)** + +> commit: https://git.buildroot.net/buildroot/commit/?id=2edabebbb417414d429161e0123fb7730bdc0489 +> branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master +> +> Drop patch (already in version) +> +> https://github.com/uoaerg/wavemon/releases/tag/v0.9.5 +> + +#### U-Boot + +**[v2: riscv: Add support for AMD/Xilinx MicroBlaze V](http://lore.kernel.org/u-boot/d488b7016e0d1b1324c64d8a8b2f033851aab6c6.1699271804.git.michal.simek@amd.com/)** + +> MicroBlaze V is new AMD/Xilinx soft-core 32bit RISC-V processor IP. +> It is hardware compatible with classic MicroBlaze processor. +> +> The patch contains initial wiring and configuration for initial HW design +> with memory, cpu, interrupt controller, timers and uartlite console +> (interrupt controller is listed but U-Boot is not using it). +> + +**[v1: Support StarFive Watchdog driver](http://lore.kernel.org/u-boot/20231105231318.992857-1-chanho61.park@samsung.com/)** + +> This patchset adds to support StarFive Watchdog driver which is based on +> Linux kernel's starfive-wdt driver. Actually, the original driver supports +> both JH7100 and JH7110 with variant coding but this removes the JH7100 +> part of codes because JH7100 isn't supported in u-boot yet. +> However, this patch tries to keep the variant coding style for future +> work of JH7100 and have a consistency with the Linux driver. +> + ## 20231105:第 67 期 ### 内核动态