diff --git a/news/README.md b/news/README.md index d20b959c4ce6e23818d0fdec38a86dba63818ebe..bb762c708b857fe6605bf68135ff0e7a73408065 100644 --- a/news/README.md +++ b/news/README.md @@ -6,6 +6,963 @@ * [2023 年 - 上半年](2023-1st-half.md) * [2023 年 - 下半年](2023-2nd-half.md) +## 20240324:第 84 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v2: arch/riscv: Enable kprobes when CONFIG_MODULES=n](http://lore.kernel.org/linux-riscv/20240323232908.13261-1-jarkko@kernel.org/)** + +> Tracing with kprobes while running a monolithic kernel is currently +> impossible due the kernel module allocator dependency. +> + +**[v6: riscv: add initial support for Canaan Kendryte K230](http://lore.kernel.org/linux-riscv/tencent_F76EB8D731C521C18D5D7C4F8229DAA58E08@qq.com/)** + +> K230 is an ideal chip for RISC-V Vector 1.0 evaluation now. Add initial +> support for it to allow more people to participate in building drivers +> to mainline for it. +> + +**[v1: riscv: merge two if-blocks for KBUILD_IMAGE](http://lore.kernel.org/linux-riscv/20240323113500.1249272-1-masahiroy@kernel.org/)** + +> In arch/riscv/Makefile, KBUILD_IMAGE is assigned in two separate +> if-blocks. +> + +**[v2: bpf: verifier: reject addr_space_cast insn without arena](http://lore.kernel.org/linux-riscv/20240322153518.11555-1-puranjay12@gmail.com/)** + +> The verifier allows using the addr_space_cast instruction in a program +> that doesn't have an associated arena. This was caught in the form an +> invalid memory access in do_misc_fixups() when while converting +> addr_space_cast to a normal 32-bit mov, env->prog->aux->arena was +> dereferenced to check for BPF_F_NO_USER_CONV flag. +> + +**[GIT PULL: RISC-V Patches for the 6.9 Merge Window](http://lore.kernel.org/linux-riscv/mhng-105d6a21-7483-4a20-a9e7-8e72770737d8@palmer-ri-x1c9/)** + +> The following changes since commit e0fe5ab4192c171c111976dbe90bbd37d3976be0: +> +> riscv: Fix pte_leaf_size() for NAPOT (2024-02-29 10:21:23 -0800) +> + +**[v1: RISC-V: selftests: cbo: Ensure asm operands match constraints, take 2](http://lore.kernel.org/linux-riscv/20240322134728.151255-2-ajones@ventanamicro.com/)** + +> Commit 0de65288d75f ("RISC-V: selftests: cbo: Ensure asm operands +> match constraints") attempted to ensure MK_CBO() would always +> provide to a compile-time constant when given a constant, but +> cpu_to_le32() isn't necessarily going to do that. Switch to manually +> shifting the bytes, when needed, to finally get this right. +> + +**[v1: i2c: reword i2c_algorithm according to newest specification](http://lore.kernel.org/linux-riscv/20240322132619.6389-1-wsa+renesas@sang-engineering.com/)** + +> Start changing the wording of the I2C main header wrt. the newest I2C +> v7, SMBus 3.2, I3C specifications and replace "master/slave" with more +> appropriate terms. This first step renames the members of struct +> i2c_algorithm. Once all in-tree users are converted, the anonymous union +> will go away again. All this work will also pave the way for finally +> seperating the monolithic header into more fine-grained headers like +> "i2c/clients.h" etc. +> + +**[v1: riscv: Improve sbi_ecall() code generation by reordering arguments](http://lore.kernel.org/linux-riscv/20240322112629.68170-1-alexghiti@rivosinc.com/)** + +> The sbi_ecall() function arguments are not in the same order as the +> ecall arguments, so we end up re-ordering the registers before the +> ecall which is useless and costly. +> + +**[v2: riscv: Add tracepoints for SBI calls and returns](http://lore.kernel.org/linux-riscv/20240321230131.1838105-1-samuel.holland@sifive.com/)** + +> These are useful for measuring the latency of SBI calls. The SBI HSM +> extension is excluded because those functions are called from contexts +> such as cpuidle where instrumentation is not allowed. +> + +**[v1: scripts/package: buildtar: Output as vmlinuz for riscv](http://lore.kernel.org/linux-riscv/4edd1c5e-aacb-4513-97ae-e6b2130476fc@imgtec.com/)** + +> This matches the behavior for arm64 [1] and prevents clobbering of +> vmlinux-${KERNELRELEASE}. +> + +**[v1: RISC-V: selftests: cbo: Use exported __cpu_to_le32() with uapi header](http://lore.kernel.org/linux-riscv/20240321115250.801731-1-woodrow.shen@sifive.com/)** + +> cpu_to_le32 is not defined in uapi headers, and it could cause an error +> of impossible constraint in 'asm' during compilation. However, +> the reason is due to undefined reference to cpu_to_le32. +> __cpu_to_le32() defined from byteorder.h should be used instead. +> + +**[v2: clk: starfive: jh7100: Use clk_hw for external input clocks](http://lore.kernel.org/linux-riscv/beb746c7538a4ff720a25fd8f309da20d8d854ef.1710933713.git.geert@linux-m68k.org/)** + +> The Starfive JH7100 clock driver does not use the DT "clocks" property +> to find the external main input clock, but instead relies on the name of +> the actual clock provider ("osc_sys"). This is fragile, and caused +> breakage when sanitizing clock node names in DTS. +> + +**[v1: riscv: Userspace pointer masking and tagged address ABI](http://lore.kernel.org/linux-riscv/20240319215915.832127-1-samuel.holland@sifive.com/)** + +> RISC-V defines three extensions for pointer masking[1]: +> - Smmpm: configured in M-mode, affects M-mode +> - Smnpm: configured in M-mode, affects the next lower mode (S or U-mode) +> - Ssnpm: configured in S-mode, affects the next lower mode (U-mode) +> + +**[v2: riscv: use KERN_INFO in do_trap](http://lore.kernel.org/linux-riscv/mvmy1aegrhm.fsf@suse.de/)** + +> Print the instruction dump with info instead of emergency level. The +> unhandled signal message is only for informational purpose. +> + +**[v3: Support Zve32[xf] and Zve64[xfd] Vector subextensions](http://lore.kernel.org/linux-riscv/20240318-zve-detection-v3-0-e12d42107fa8@sifive.com/)** + +> The series composes of two parts. The first part provides a quick fix for +> the issue on a recent thread[1]. The issue happens when a platform has +> ununified vector register length across multiple cores. Specifically, +> patch 1 adds a comment at a callsite of riscv_setup_vsize to clarify how +> vlenb is observed by the system. Patch 2 fixes the issue by failing the +> boot process of a secondary core if vlenb mismatches. +> + +**[v4: riscv: sophgo: add dmamux support for Sophgo CV1800/SG2000 SoCs](http://lore.kernel.org/linux-riscv/IA1PR20MB49536DED242092A49A69CEB6BB2D2@IA1PR20MB4953.namprd20.prod.outlook.com/)** + +> Add dma multiplexer support for the Sophgo CV1800/SG2000 SoCs. +> +> The patch include the following patch: +> http://lore.kernel.org/linux-riscv/PH7PR20MB4962F822A64CB127911978AABB4E2@PH7PR20MB4962.namprd20.prod.outlook.com/ +> + +**[v9: Add timer driver for StarFive JH7110 RISC-V SoC](http://lore.kernel.org/linux-riscv/20240318030649.10413-1-ziv.xu@starfivetech.com/)** + +> This patch serises are to add timer driver for the StarFive JH7110 +> RISC-V SoC. The first patch adds documentation to describe device +> tree bindings. The subsequent patch adds timer driver and support +> JH7110 SoC. The last patch adds device node about timer in JH7110 +> dts. +> + +**[v2: riscv: dmi: Add SMBIOS/DMI support](http://lore.kernel.org/linux-riscv/20240318020916.1299190-1-haibo1.xu@intel.com/)** + +> Enable the dmi driver for riscv which would allow access the +> SMBIOS info through some userspace file(/sys/firmware/dmi/*). +> + +#### 进程调度 + +**[v8: sched: Don't trigger misfit if affinity is restricted](http://lore.kernel.org/lkml/20240324004552.999936-1-qyousef@layalina.io/)** + +> There was a discussion on handling hotplug operation removing a capacity level +> and lead to unnecessary misfit lb to trigger again. I opted not to handle it +> now, but a working patch is available in [1]. I don't feel strongly about it +> and would leave it up to the maintainers to push which direction they prefer. +> Patch 4 will make sure that balance interval and nr_failed won't grow +> unnecessarily due to bad unnecessary misfit lb. It will lead to some +> sub-optimality, but no incorrect behavior. +> + +**[v1: sched: Improve the accuracy of sched_stat_wait statistics for rt and dl](http://lore.kernel.org/lkml/20240322081521.2687856-1-zhangqiao22@huawei.com/)** + +> Where commit b9c88f752268 ("sched/fair: Improve the accuracy of +> sched_stat_wait statistics") fixed a wrong scenairio for cfs schedstat. +> + +**[[RESEND]v2: sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending()](http://lore.kernel.org/lkml/20240318192846.75299-1-jstultz@google.com/)** + +> Zimuzo reported seeing occasional cases in perfetto traces where +> tasks went from sleeping directly to trace_sched_wakeup() +> without always seeing a trace_sched_waking(). +> + +#### 内存管理 + +**[v1: mm: get_mm_counter() get the total memory usage of the process](http://lore.kernel.org/linux-mm/20240322151139.7417-1-chentt10@chinatelecom.cn/)** + +> Currently, the get_mm_counter() function returns only the value of +> the process memory counter percpu_counter ->count record, ignoring +> the memory usage count maintained by each CPU in the +> percpu_counter->counters array, which leads to an error in obtaining +> the memory usage count of a process, especially when there are many +> CPU cores. counts, especially when there are many CPU cores. +> + +**[v1: mm/filemap: set folio->mapping to NULL before xas_store()](http://lore.kernel.org/linux-mm/20240322210455.3738-1-soma.nakata01@gmail.com/)** + +> Functions such as __filemap_get_folio() check the truncation of +> folios based on the mapping field. Therefore setting this field to NULL +> earlier prevents unnecessary operations on already removed folios. +> + +**[v5: mm/migrate: split source folio if it is on deferred split list](http://lore.kernel.org/linux-mm/20240322193304.522496-1-zi.yan@sent.com/)** + +> If the source folio is on deferred split list, it is likely some subpages +> are not used. Split it before migration to avoid migrating unused subpages. +> + +**[v1: mm: add folio in swapcache if swapin from zswap](http://lore.kernel.org/linux-mm/20240322163939.17846-1-chengming.zhou@linux.dev/)** + +> There is a report of data corruption caused by double swapin, which is +> only possible in the skip swapcache path on SWP_SYNCHRONOUS_IO backends. +> + +**[v1: exec: Don't disable perf events for setuid root executables](http://lore.kernel.org/linux-mm/20240322162759.714141-1-leo.yan@arm.com/)** + +> Al Grant reported that the 'perf record' command terminates abnormally +> after setting the setuid bit for the executable. To reproduce this +> issue, an additional condition is the binary file is owned by the root +> user but is running under a non-privileged user. +> + +**[v1: A Summary of VMA scanning improvements explored](http://lore.kernel.org/linux-mm/cover.1710829750.git.raghavendra.kt@amd.com/)** + +> I am posting the summary of numa balancing improvements tried out. +> +> (Intention is RFC and revisiting these in future when some one +> sees potential benefits with PATCH1 and PATCH2). +> + +**[v1: selftests/mm: Parse VMA range in one go](http://lore.kernel.org/linux-mm/20240322120551.818764-1-dev.jain@arm.com/)** + +> Use sscanf() to directly parse the VMA range. No functional change is intended. +> + +**[v1: THP_SWAP support for ARM64 SoC with MTE](http://lore.kernel.org/linux-mm/20240322114136.61386-1-21cnbao@gmail.com/)** + +> The patch has been extracted from the larger folios swap-in series [1], +> incorporating some new modifications. +> + +**[v2: transfer page to folio in KSM](http://lore.kernel.org/linux-mm/20240322083703.232364-1-alexs@kernel.org/)** + +> This is the first part of page to folio transfer on KSM. Since only +> single page could be stored in KSM, we could safely transfer stable tree +> pages to folios. +> + +**[v4: Improved Memory Tier Creation for CPUless NUMA Nodes](http://lore.kernel.org/linux-mm/20240322070356.315922-1-horenchuang@bytedance.com/)** + +> When a memory device, such as CXL1.1 type3 memory, is emulated as +> normal memory (E820_TYPE_RAM), the memory device is indistinguishable +> from normal DRAM in terms of memory tiering with the current implementation. +> The current memory tiering assigns all detected normal memory nodes +> to the same DRAM tier. This results in normal memory devices with +> + +**[v2: binfmt: replace deprecated strncpy](http://lore.kernel.org/linux-mm/20240321-strncpy-fs-binfmt_elf_fdpic-c-v2-1-0b6daec6cc56@google.com/)** + +> strncpy() is deprecated for use on NUL-terminated destination strings +> [1] and as such we should prefer more robust and less ambiguous string +> interfaces. +> + +**[v1: Various significant MM patches](http://lore.kernel.org/linux-mm/20240321142448.1645400-1-willy@infradead.org/)** + +> These patches all interact in annoying ways which make it tricky to +> send them out in any way other than a big batch, even though there's +> not really an overarching theme to connect them. +> + +**[v1: selftests/mm: Confirm VA exhaustion without reliance on correctness of mmap()](http://lore.kernel.org/linux-mm/20240321103522.516097-1-dev.jain@arm.com/)** + +> Currently, VA exhaustion is being checked by passing a hint to mmap() and +> expecting it to fail. This patch makes a stricter test by successful write() +> calls from /proc/self/maps to a dump file, confirming that a free chunk is +> indeed not available. +> + +**[v2: mm/slub: mark racy accesses on slab->slabs](http://lore.kernel.org/linux-mm/tencent_909E215498A54E4E100E456A92A7F13DAD06@qq.com/)** + +> The reads of slab->slabs are racy because it may be changed by +> put_cpu_partial concurrently. In slabs_cpu_partial_show() and +> show_slab_objects(), slab->slabs is only used for showing information. +> + +**[v1: mm: migrate: support poison recover from migrate folio](http://lore.kernel.org/linux-mm/20240321032747.87694-1-wangkefeng.wang@huawei.com/)** + +> The folio migration is widely used in kernel, memory compaction, memory +> hotplug, soft offline page, numa balance, memory demote/promotion, etc, +> but once access a poisoned source folio when migrating, the kerenl will +> panic. +> + +**[v2: mm/page-flags: make __PageMovable return bool](http://lore.kernel.org/linux-mm/20240321032256.82063-1-gehao@kylinos.cn/)** + +> make __PageMovable return bool like __folio_test_movable +> + +**[v1: Improve visibility of writeback](http://lore.kernel.org/linux-mm/20240320110222.6564-1-shikemeng@huaweicloud.com/)** + +> This series tries to improve visilibity of writeback. Patch 1 make +> /sys/kernel/debug/bdi/xxx/stats show writeback info of whole bdi +> instead of only writeback info in root cgroup. Patch 2 add a new +> debug file /sys/kernel/debug/bdi/xxx/wb_stats to show per wb writeback +> info. Patch 4 add wb_monitor. +> + +#### 文件系统 + +**[v2: sysctl: move sysctl type to ctl_table_header](http://lore.kernel.org/linux-fsdevel/20240322-sysctl-empty-dir-v2-0-e559cf8ec7c0@weissschuh.net/)** + +> Praparation series to enable constification of struct ctl_table further +> down the line. +> No functional changes are intended. +> + +**[v11: Landlock: IOCTL support](http://lore.kernel.org/linux-fsdevel/20240322151002.3653639-1-gnoack@google.com/)** + +> Introduce the LANDLOCK_ACCESS_FS_IOCTL_DEV right, which restricts the +> use of ioctl(2) on block and character devices. +> +> We attach the this access right to opened file descriptors, as we +> already do for LANDLOCK_ACCESS_FS_TRUNCATE. +> + +**[v1: hfsplus: refactor copy_name to not use strncpy](http://lore.kernel.org/linux-fsdevel/20240321-strncpy-fs-hfsplus-xattr-c-v1-1-0c6385a10251@google.com/)** + +> strncpy() is deprecated with NUL-terminated destination strings [1]. +> +> The copy_name() method does a lot of manual buffer manipulation to +> eventually arrive with its desired string. If we don't know the +> namespace this attr has or belongs to we want to prepend "osx." to our +> final string. Following this, we're copying xattr_name and doing a +> bizarre manual NUL-byte assignment with a memset where n=1. +> + +**[v2: fs: aio: more folio conversion](http://lore.kernel.org/linux-fsdevel/20240321131640.948634-1-wangkefeng.wang@huawei.com/)** + +> Convert to use folio throughout aio. +> + +**[v2: RFC: eventpoll: try to reuse eppoll_entry allocations](http://lore.kernel.org/linux-fsdevel/20240319184940.112678-1-i.trofimow@yandex.ru/)** + +> Instead of unconditionally allocating and deallocating pwq objects, +> try to reuse them by storing the entry in the eventpoll struct +> at deallocation request, and consuming that entry at allocation request. +> This way every EPOLL_CTL_ADD operation immediately following an +> EPOLL_CTL_DEL operation effectively cancels out its pwq allocation +> with the preceding deallocation. +> + +**[v1: fuse: require FUSE drivers to opt-in for local file leases](http://lore.kernel.org/linux-fsdevel/20240319-setlease-v1-1-4ce5a220e85d@kernel.org/)** + +> Traditionally, we've allowed people to set leases on FUSE inodes. Some +> FUSE drivers are effectively local filesystems and should be fine with +> kernel-internal lease support. But others are backed by a network server +> that may have multiple clients, or may be backed by something non-file +> like entirely. +> + +**[v1: Further reduce overhead of fsnotify permission hooks](http://lore.kernel.org/linux-fsdevel/20240317184154.1200192-1-amir73il@gmail.com/)** + +> The main motivation for this work was to avoid the overhead that was +> reported by kernel test robot on the patch that adds the upcoming +> per-content event hooks (i.e. FS_PRE_ACCESS/FS_PRE_MODIFY). +> + +#### 网络设备 + +**[v5: net-next: net/smc: SMC intra-OS shortcut with loopback-ism](http://lore.kernel.org/netdev/20240324135522.108564-1-guwen@linux.alibaba.com/)** + +> This patch set acts as the second part of the new version of [1] (The first +> part can be referred from [2]), the updated things of this version are listed +> at the end. +> + +**[v1: dns_resolver: correct sysfs path name in dns resolver documentation](http://lore.kernel.org/netdev/20240323081140.41558-1-bharathsm@microsoft.com/)** + +> Fix an incorrect sysfs path in dns resolver documentation +> + +**[v1: bpf-next: BPF: support mark in bpf_fib_lookup](http://lore.kernel.org/netdev/20240322140244.50971-1-aspsk@isovalent.com/)** + +> This patch series adds policy routing support in bpf_fib_lookup. +> This is a useful functionality which was missing for a long time, +> as without it some networking setups can't be implemented in BPF. +> One example can be found here [1]. +> + +**[v1: net: tcp: properly terminate timers for kernel sockets](http://lore.kernel.org/netdev/20240322135732.1535772-1-edumazet@google.com/)** + +> We had various syzbot reports about tcp timers firing after +> the corresponding netns has been dismantled. +> + +**[v1: ipv6: fib: hide unused 'pn' variable](http://lore.kernel.org/netdev/20240322131746.904943-1-arnd@kernel.org/)** + +> When CONFIG_IPV6_SUBTREES is disabled, the only user is hidden, causing +> a 'make W=1' warning: +> +> net/ipv6/ip6_fib.c: In function 'fib6_add': +> net/ipv6/ip6_fib.c:1388:32: error: variable 'pn' set but not used [-Werror=unused-but-set-variable] +> + +**[v2: iproute2-next: bridge: vlan: add compressvlans manpage](http://lore.kernel.org/netdev/MAZP287MB05039AA2ECF8022DD501D4BCE4312@MAZP287MB0503.INDP287.PROD.OUTLOOK.COM/)** + +> I followed Nikolay and Jiri's comment and updated the patch to v2. +> Please check it. +> + +**[v4: Documentation: tpm_tis](http://lore.kernel.org/netdev/20240322123542.24158-1-jarkko@kernel.org/)** + +> Based recent discussions on LKML, provide preliminary bits of tpm_tis_core +> dependent drivers. Includes only bare essentials but can be extended later +> on case by case. This way some people may even want to read it later on. +> + +**[v1: net: bpf: Don't redirect too small packets](http://lore.kernel.org/netdev/20240322122407.1329861-1-edumazet@google.com/)** + +> Some drivers ndo_start_xmit() expect a minimal size, as shown +> by various syzbot reports [1]. +> + +**[v1: net: dpll: indent DPLL option type by a tab](http://lore.kernel.org/netdev/20240322114819.1801795-1-ppandit@redhat.com/)** + +> Indent config option type by a tab. It helps Kconfig parsers +> to read file without error. +> + +**[v2: r8169: skip DASH fw status checks when DASH is disabled](http://lore.kernel.org/netdev/20240322082628.46272-1-atlas.yu@canonical.com/)** + +> On devices that support DASH, the current code in the "rtl_loop_wait" function +> raises false alarms when DASH is disabled. This occurs because the function +> attempts to wait for the DASH firmware to be ready, even though it's not +> relevant in this case. +> + +**[v1: net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips](http://lore.kernel.org/netdev/20240322064650.275174-1-Raju.Lakkaraju@microchip.com/)** + +> PCI11x1x Rev B0 devices might drop packets when receiving back to back frames +> at 2.5G link speed. Change the B0 Rev device's Receive filtering Engine FIFO +> threshold parameter from its hardware default of 4 to 3 dwords to prevent the +> problem. Rev C0 and later hardware already defaults to 3 dwords. +> + +**[v1: net-next: devlink: use kvzalloc() to allocate devlink instance resources](http://lore.kernel.org/netdev/20240322015814.425050-1-wenjian1@xiaomi.com/)** + +> During live migration of a virtual machine, the SR-IOV VF need to be +> re-registered. It may fail when the memory is badly fragmented. +> + +**[v1: net: gve: Add counter adminq_get_ptype_map_cnt to stats report](http://lore.kernel.org/netdev/20240321222020.31032-1-jfraker@google.com/)** + +> This counter counts the number of times get_ptype_map is executed on the +> admin queue, and was previously missing from the stats report. +> + +**[GIT PULL: Networking for v6.9-rc1](http://lore.kernel.org/netdev/20240321173325.3227312-1-kuba@kernel.org/)** + +> I'd like to highlight Florian W stepping down as a netfilter +> maintainer due to constant stream of bug reports. +> Not sure what we can do but IIUC this is not the first such case. +> + +**[v1: virtio_net: Do not send RSS key if it is not supported](http://lore.kernel.org/netdev/20240321165431.3517868-1-leitao@debian.org/)** + +> There is a bug when setting the RSS options in virtio_net that can break +> the whole machine, getting the kernel into an infinite loop. +> + +**[v1: net/netlink: how to deal with the problem of exceeding the maximum reach of nlattr's nla_len](http://lore.kernel.org/netdev/20240321141400.38639-1-renmingshuai@huawei.com/)** + +> RTM_GETLINK for greater than about 220 VFs truncates IFLA_VFINFO_LIST +> due to the maximum reach of nlattr's nla_len being exceeded. As a result, +> the value of nla_len overflows in nla_nest_end(). According to [1], +> changing the type of nla_len is not possible, but how can we deal with this +> overflow problem? The nla_len is constantly set to the +> maximum value when it overflows? Or some better ways? +> + +**[v2: bpf-next: Selftests/xsk: Test with maximum and minimum HW ring size configurations](http://lore.kernel.org/netdev/20240321134911.120091-1-tushar.vyavahare@intel.com/)** + +> Please find enclosed a patch set that introduces enhancements and new test +> cases to the selftests/xsk framework. These test the robustness and +> reliability of AF_XDP across both minimal and maximal ring size +> configurations. +> + +**[v1: net: stmmac: Do not enable/disable runtime PM for PCI devices](http://lore.kernel.org/netdev/20240321-stmmac-fix-v1-1-3aef470494c6@gmail.com/)** + +> Common function stmmac_dvr_probe is called for both PCI and non-PCI +> device. For PCI devices pm_runtime_enable/disable are called by framework +> and should not be called by the driver. +> + +**[v4: bpf: verifier: prevent userspace memory access](http://lore.kernel.org/netdev/20240321124640.8870-1-puranjay12@gmail.com/)** + +> With BPF_PROBE_MEM, BPF allows de-referencing an untrusted pointer. To +> thwart invalid memory accesses, the JITs add an exception table entry +> for all such accesses. But in case the src_reg + offset overflows and +> turns into a userspace address, the BPF program might read that memory if +> the user has mapped it. +> + +**[v1: net: devlink: use kvzalloc() to allocate devlink instance resources](http://lore.kernel.org/netdev/20240321123611.380158-1-wenjian1@xiaomi.com/)** + +> During live migration of a virtual machine, the SR-IOV VF need to be +> re-registered. It may fail when the memory is badly fragmented. +> + +**[v2: flow_dissector: prevent NULL pointer dereference in __skb_flow_dissect](http://lore.kernel.org/netdev/20240321123446.7012-1-abelova@astralinux.ru/)** + +> skb is an optional parameter, so it may be NULL. +> Add check defore dereference in eth_hdr. +> +> Found by Linux Verification Center (linuxtesting.org) with SVACE. +> + +**[v3: net: s390/qeth: handle deferred cc1](http://lore.kernel.org/netdev/20240321115337.3564694-1-wintera@linux.ibm.com/)** + +> The IO subsystem expects a driver to retry a ccw_device_start, when the +> subsequent interrupt response block (irb) contains a deferred +> condition code 1. +> + +**[v2: net: mark racy access on sk->sk_rcvbuf](http://lore.kernel.org/netdev/tencent_5A50BC27A519EBD14E1B0A8685E89405850A@qq.com/)** + +> sk->sk_rcvbuf in __sock_queue_rcv_skb() and __sk_receive_skb() can be +> changed by other threads. Mark this as benign using READ_ONCE(). +> + +**[v6: ice: Add get/set hw address for VFs using devlink commands](http://lore.kernel.org/netdev/20240321081625.28671-2-ksundara@redhat.com/)** + +> Changing the MAC address of the VFs is currently unsupported via devlink. +> Add the function handlers to set and get the HW address for the VFs. +> + +**[v3: resend: net/ipv4: add tracepoint for icmp_send](http://lore.kernel.org/netdev/202403211109183894466@zte.com.cn/)** + +> Introduce a tracepoint for icmp_send, which can help users to get more +> detail information conveniently when icmp abnormal events happen. +> + +**[v1: net-next: net: Rename mono_delivery_time to tstamp_type for scalibilty](http://lore.kernel.org/netdev/20240320211839.1214034-1-quic_abchauha@quicinc.com/)** + +> mono_delivery_time was added to check if skb->tstamp has delivery +> time in mono clock base (i.e. EDT) otherwise skb->tstamp has +> timestamp in ingress and delivery_time at egress. +> + +**[v1: net: mlxbf_gige: stop PHY during open() error paths](http://lore.kernel.org/netdev/20240320193117.3232-1-davthompson@nvidia.com/)** + +> The mlxbf_gige_open() routine starts the PHY as part of normal +> initialization. The mlxbf_gige_open() routine must stop the +> PHY during its error paths. +> + +**[v1: ipv6: delay procfs initialization after the ipv6 structs are ready](http://lore.kernel.org/netdev/20240320171858.2671-1-nicolas.cavallari@green-communications.fr/)** + +> procfs files are created before the structure they reference are +> initialized. For example, if6_proc_init() creates procfs files that +> access structures initialized by addrconf_init(). +> + +**[v5: net-next: Support ICSSG-based Ethernet on AM65x SR1.0 devices](http://lore.kernel.org/netdev/20240320144234.313672-1-diogo.ivo@siemens.com/)** + +> This series extends the current ICSSG-based Ethernet driver to support +> AM65x Silicon Revision 1.0 devices. +> +> Notable differences between the Silicon Revisions are that there is +> no TX core in SR1.0 with this being handled by the firmware, requiring +> extra DMA channels to manage communication with the firmware (with the +> firmware being different as well) and in the packet classifier. +> + +**[v2: net: ll_temac: platform_get_resource replaced by wrong function](http://lore.kernel.org/netdev/f512ff25a2cd484791757c18facb526c@terma.com/)** + +> Hope I am resubmitting this correctly, I've fixed the issues in +> the original submission. +> + +**[v1: net: can: kvaser_pciefd: Add additional Xilinx interrupts](http://lore.kernel.org/netdev/20240320112144.582741-2-mkl@pengutronix.de/)** + +> Since Xilinx-based adapters now support up to eight CAN channels, the +> TX interrupt mask array must have eight elements. +> + +**[v1: net: pull-request: can 2024-03-20](http://lore.kernel.org/netdev/20240320112144.582741-1-mkl@pengutronix.de/)** + +> Martin Jocić contributes a fix for the kvaser_pciefd driver, so that +> up to 8 channels on the Xilinx-based adapters can be used. This issue +> has been introduced in net-next for v6.9. +> + +**[v3: vhost/vdpa: Add MSI translation tables to iommu for software-managed MSI](http://lore.kernel.org/netdev/20240320101912.28210-1-w_angrong@163.com/)** + +> Once enable iommu domain for one device, the MSI +> translation tables have to be there for software-managed MSI. +> Otherwise, platform with software-managed MSI without an +> irq bypass function, can not get a correct memory write event +> from pcie, will not get irqs. +> + +**[v1: net: asix: Add check for usbnet_get_endpoints](http://lore.kernel.org/netdev/20240320073715.2002973-1-nichen@iscas.ac.cn/)** + +> Add check for usbnet_get_endpoints() and return the error if it fails +> in order to transfer the error. +> + +**[v5: net: Report RCU QS for busy network kthreads](http://lore.kernel.org/netdev/cover.1710877680.git.yan@cloudflare.com/)** + +> We observed this being a problem in production, since it can block RCU +> tasks from making progress under heavy load. Investigation indicates +> that just calling cond_resched() is insufficient for RCU tasks to reach +> quiescent states. This also has the side effect of frequently clearing +> the TIF_NEED_RESCHED flag on voluntary preempt kernels. +> + +**[v1: iwl-net: i40e: Report MFS in decimal base instead of hex](http://lore.kernel.org/netdev/20240319141657.2783609-1-e.velu@criteo.com/)** + +> If the MFS is set below the default (0x2600), a warning message is +> reported like the following : +> +> MFS for port 1 has been set below the default: 600 +> + +**[v5: Add support for Intel PPS Generator](http://lore.kernel.org/netdev/20240319130547.4195-1-lakshmi.sowjanya.d@intel.com/)** + +> The goal of the PPS(Pulse Per Second) hardware/software is to generate a +> signal from the system on a wire so that some third-party hardware can +> observe that signal and judge how close the system's time is to another +> system or piece of hardware. +> + +**[v2: net/ipv4: add tracepoint for icmp_send](http://lore.kernel.org/netdev/202403192013525995034@zte.com.cn/)** + +> Introduce a tracepoint for icmp_send, which can help users to get more +> detail information conveniently when icmp abnormal events happen. +> + +**[v1: net: inet: inet_defrag: prevent sk release while still in use](http://lore.kernel.org/netdev/20240319122310.27474-1-fw@strlen.de/)** + +> ip_local_out() and other functions can pass skb->sk as function argument. +> +> If the skb is a fragment and reassembly happens before such function call +> returns, the sk must not be released. +> + +**[v1: pull request (net): ipsec 2024-03-19](http://lore.kernel.org/netdev/20240319110151.409825-1-steffen.klassert@secunet.com/)** + +> 1) Fix possible page_pool leak triggered by esp_output. +> From Dragos Tatulea. +> +> 2) Fix UDP encapsulation in software GSO path. +> From Leon Romanovsky. +> +> Please pull or let me know if there are problems. +> + +**[v1: dt-bindings: net: rfkill-gpio: add reset-gpio property](http://lore.kernel.org/netdev/20240319-rfkill-reset-gpio-binding-v1-1-a0e3f1767c87@solid-run.com/)** + +> rfkill-gpio driver supports management of two gpios: reset, shutdown. +> Reset seems to have been missed when bindings were added. +> + +#### 安全增强 + +**[v1: video: fbdev: au1200fb: replace deprecated strncpy with strscpy](http://lore.kernel.org/linux-hardening/20240318-strncpy-drivers-video-fbdev-au1200fb-c-v1-1-680802a9f10a@google.com/)** + +> strncpy() is deprecated for use on NUL-terminated destination strings +> [1] and as such we should prefer more robust and less ambiguous string +> interfaces. +> + +**[v2: soc: qcom: cmd-db: replace deprecated strncpy with strtomem](http://lore.kernel.org/linux-hardening/20240318-strncpy-drivers-soc-qcom-cmd-db-c-v2-1-8f6ebf1bd891@google.com/)** + +> strncpy() is deprecated for use on NUL-terminated destination strings +> [1] and as such we should prefer more robust and less ambiguous string +> interfaces. +> + +**[v1: kspp-next: compiler_types: add Endianness-dependent __counted_by_{le,be}](http://lore.kernel.org/linux-hardening/20240318130354.2713265-1-aleksander.lobakin@intel.com/)** + +> Some structures contain flexible arrays at the end and the counter for +> them, but the counter has explicit Endianness and thus __counted_by() +> can't be used directly. +> + +**[v1: perf/x86/rapl: Prefer struct_size over open coded arithmetic](http://lore.kernel.org/linux-hardening/20240317164442.6729-1-erick.archer@gmx.com/)** + +> This is an effort to get rid of all multiplications from allocation +> functions in order to prevent integer overflows [1][2]. +> + +**[v1: x86, relocs: Ignore relocations in .notes section on walk_relocs](http://lore.kernel.org/linux-hardening/20240317150547.24910-1-weiguixiong@bytedance.com/)** + +> The commit aaa8736370db ("x86, relocs: Ignore relocations in +> .notes section") only ignore .note section on print_absolute_relocs, +> but it also need to add on walk_relocs to avoid relocations in .note +> section. +> + +#### 异步 IO + +**[v1: Read/Write with meta buffer](http://lore.kernel.org/io-uring/20240322185023.131697-1-joshi.k@samsung.com/)** + +> This patchset is aimed at getting the feedback on a new io_uring +> interface that userspace can use to exchange meta buffer along with +> read/write. +> + +**[v1: io_uring/alloc_cache: shrink default max entries from 512 to 128](http://lore.kernel.org/io-uring/72a3ccac-b97c-4e62-acd7-dc4f306eba50@kernel.dk/)** + +> In practice, we just need to recycle a few elements for (by far) most +> use cases. Shrink the total size down from 512 to 128, which should be +> more than plenty. +> + +#### Rust For Linux + +**[v1: WIP: Rust bindings for KMS + RVKMS](http://lore.kernel.org/rust-for-linux/20240322221305.1403600-1-lyude@redhat.com/)** + +> porting vkms over to rust so that we could come up with a set of rust +> KMS bindings for the nova driver to be able to have a modesetting driver +> written in rust. This driver currently doesn't really do much, but it +> does load and register a modesetting device! +> + +**[v2: rust: time: add Ktime](http://lore.kernel.org/rust-for-linux/20240322-rust-ktime_ms_delta-v2-1-d98de1f7c282@google.com/)** + +> Introduce a wrapper around `ktime_t` with a few different useful +> methods. +> +> Rust Binder will use these bindings to compute how many milliseconds a +> transaction has been active for when dumping the current state of the +> Binder driver. This replicates the logic in C Binder [1]. +> + +#### BPF + +**[v1: ftrace: make extra rcu_is_watching() validation check optional](http://lore.kernel.org/bpf/20240322160323.2463329-1-andrii@kernel.org/)** + +> Introduce CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING config option to +> control whether ftrace low-level code performs additional +> rcu_is_watching()-based validation logic in an attempt to catch noinstr +> violations. +> + +**[v5: bpf-next: sleepable bpf_timer (was: allow HID-BPF to do device IOs)](http://lore.kernel.org/bpf/20240322-hid-bpf-sleepable-v5-0-179c7b59eaaa@kernel.org/)** + +> New version of the sleepable bpf_timer code, without the HID changes, as +> they can now go through the HID tree indepandantly. +> + +**[v1: leds: trigger: Add led trigger for bpf](http://lore.kernel.org/bpf/cover.1711113657.git.hodges.daniel.scott@gmail.com/)** + +> This patch set adds a new led trigger that uses the bpf subsystem for +> triggering leds. It is designed to be used in conjunction with a bpf +> program(s) that can modify led state through the use of bpf kfuncs. This +> is useful for providing a physical indication that a some event has +> occurred. In the context of bpf this could range from handling a packet +> to hitting a tracepoint. +> + +**[v1: bpf-next: bpf: support resilient split BTF](http://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/)** + +> Split BPF Type Format (BTF) provides huge advantages in that kernel +> modules only have to provide type information for types that they do not +> share with the core kernel; for core kernel types, split BTF refers to +> core kernel BTF type ids. +> + +**[v1: dwarves: btf_encoder: add base_ref BTF feature to generate split BTF with base refs](http://lore.kernel.org/bpf/20240322102455.98558-2-alan.maguire@oracle.com/)** + +> Adding "base_ref" to --btf_features when generating split BTF will generate +> split and base reference BTF - the latter allows us to map references from +> split BTF to base BTF, even if that base BTF has changed. It does this +> by providing just enough information about the base types in the +> .BTF.base_ref section. +> + +**[v1: bpf-next: selftests/bench: use syscall(SYS_gettid) as libc support for gettid() is sometimes absent](http://lore.kernel.org/bpf/20240322095728.95671-1-alan.maguire@oracle.com/)** + +> It appears support for the gettid() wrapper is variable across glibc +> versions, so may be safer to use syscall(SYS_gettid) instead. +> + +**[v1: kbuild: disable pahole multithreading for reproducible builds](http://lore.kernel.org/bpf/20240322-pahole-reprodicible-v1-1-3eaafb1842da@weissschuh.net/)** + +> A BTF type_id is a numeric identifier allocated by pahole through +> libbpfd. Ids are incremented for each allocation. +> Running pahole multithreaded makes the sequence of allocations +> non-deterministic which also makes the type_id itself non-deterministic. +> As the type_id end up in the binary this breaks reproducibility. +> + +**[v3: bpf-next: bpftool: Mount bpffs on provided dir instead of parent dir](http://lore.kernel.org/bpf/20240321191955.24992-1-icegambit91@gmail.com/)** + +> When pinning programs/objects under PATH (eg: during "bpftool prog +> loadall") the bpffs is mounted on the parent dir of PATH in the +> following situations: +> - the given dir exists but it is not bpffs. +> - the given dir doesn't exist and the parent dir is not bpffs. +> + +**[v1: bpf-next: Inline two LBR-related helpers](http://lore.kernel.org/bpf/20240321180501.734779-1-andrii@kernel.org/)** + +> Implement inlining of bpf_get_branch_snapshot() BPF helper using generic BPF +> assembly approach. +> + +**[v1: libbpf: add specific btf name info when do core](http://lore.kernel.org/bpf/20240321170444.388225-1-chen.dylane@gmail.com/)** + +> No logic changed, just add specific btf name when core info +> print, maybe it seems more understandable. +> + +**[v1: uprobes: reduce contention on uprobes_tree access](http://lore.kernel.org/bpf/20240321145736.2373846-1-jonathan.haslam@gmail.com/)** + +> Active uprobes are stored in an RB tree and accesses to this tree are +> dominated by read operations. Currently these accesses are serialized by +> a spinlock but this leads to enormous contention when large numbers of +> threads are executing active probes. +> + +**[v1: bpf-next: Avoid goto in regs_refine_cond_op()](http://lore.kernel.org/bpf/20240321002955.808604-1-harishankar.vishwanathan@gmail.com/)** + +> In case of GE/GT/SGE/JST instructions, regs_refine_cond_op() +> reuses the logic that does analysis of LE/LT/SLE/SLT instructions. +> This commit avoids the use of a goto to perform the reuse. +> + +**[v1: bpf-next: bpf: mark kprobe_multi_link_prog_run as always inlined function](http://lore.kernel.org/bpf/20240320200610.2556049-1-andrii@kernel.org/)** + +> kprobe_multi_link_prog_run() is called both for multi-kprobe and +> multi-kretprobe BPF programs from kprobe_multi_link_handler() and +> kprobe_multi_link_exit_handler(), respectively. +> + +**[v1: bpf-next: bpftool: Enable libbpf logs when loading pid_iter in debug mode](http://lore.kernel.org/bpf/20240320012241.42991-1-qmo@kernel.org/)** + +> When trying to load the pid_iter BPF program used to iterate over the +> PIDs of the processes holding file descriptors to BPF links, we would +> unconditionally silence libbpf in order to keep the output clean if the +> kernel does not support iterators and loading fails. +> + +**[v3: bpf-next: BPF raw tracepoint support for BPF cookie](http://lore.kernel.org/bpf/20240319233852.1977493-1-andrii@kernel.org/)** + +> Add ability to specify and retrieve BPF cookie for raw tracepoint programs. +> Both BTF-aware (SEC("tp_btf")) and non-BTF-aware (SEC("raw_tp")) are +> supported, as they are exactly the same at runtime. +> + +**[v1: bpf-next: perf, amd: support capturing LBR from software events](http://lore.kernel.org/bpf/20240319224206.1612000-1-andrii@kernel.org/)** + +> [0] added ability to capture LBR (Last Branch Records) on Intel CPUs +> from inside BPF program at pretty much any arbitrary point. This is +> extremely useful capability that allows to figure out otherwise +> hard-to-debug problems, because LBR is now available based on some +> application-defined conditions, not just hardware-supported events. +> + +**[v1: bpf-next: bpf: avoid get_kernel_nofault() to fetch kprobe entry IP](http://lore.kernel.org/bpf/20240319212013.1046779-1-andrii@kernel.org/)** + +> get_kernel_nofault() (or, rather, underlying copy_from_kernel_nofault()) +> is not free and it does pop up in performance profiles when +> kprobes are heavily utilized with CONFIG_X86_KERNEL_IBT=y config. +> + +**[v2: uprobes: two common case speed ups](http://lore.kernel.org/bpf/20240318181728.2795838-1-andrii@kernel.org/)** + +> This patch set implements two speed ups for uprobe/uretprobe runtime execution +> path for some common scenarios: BPF-only uprobes (patches #1 and #2) and +> system-wide (non-PID-specific) uprobes (patch #3). Please see individual +> patches for details. +> + +**[v1: bpf-next: xsk: Don't assume metadata is always requested in TX completion](http://lore.kernel.org/bpf/20240318165427.1403313-1-sdf@google.com/)** + +> `compl->tx_timestam != NULL` means that the user has explicitly +> requested the metadata via XDP_TX_METADATA+XDP_TX_METADATA_TIMESTAMP. +> + +**[v1: bpf-next: bpf: check bpf_map/bpf_program fd validity](http://lore.kernel.org/bpf/20240318131808.95959-1-yatsenko@meta.com/)** + +> libbpf creates bpf_program/bpf_map structs for each program/map that +> user defines, but it allows to disable creating/loading those objects in +> kernel, in that case they won't have associated file descriptor +> (fd < 0). Such functionality is used for backward compatibility +> with some older kernels. +> + +**[v1: bpf-next: uprobe: uretprobe speed up](http://lore.kernel.org/bpf/20240318093139.293497-1-jolsa@kernel.org/)** + +> The speed up depends on instruction type that uprobe is installed +> and depends on specific HW type, please check patch 1 for details. +> + +### 周边技术动态 + +#### Qemu + +**[v1: riscv-to-apply queue](http://lore.kernel.org/qemu-devel/20240322085319.1758843-1-alistair.francis@wdc.com/)** + +> The following changes since commit fea445e8fe9acea4f775a832815ee22bdf2b0222: +> +> Merge tag 'pull-maintainer-final-for-real-this-time-200324-1' of https://gitlab.com/stsquad/qemu into staging (2024-03-21 10:31:56 +0000) +> + +**[v1: for-9.0: target/riscv/debug: set tval=pc in breakpoint exceptions](http://lore.kernel.org/qemu-devel/20240320093221.220854-1-dbarboza@ventanamicro.com/)** + +> We're not setting (s/m)tval when triggering breakpoints of type 2 +> (mcontrol) and 6 (mcontrol6). According to the debug spec section +> 5.7.12, "Match Control Type 6": +> +> "The Privileged Spec says that breakpoint exceptions that occur on +> instruction fetches, loads, or stores update the tval CSR with either +> zero or the faulting virtual address. The faulting virtual address for +> an mcontrol6 trigger with action = 0 is the address being accessed and +> which caused that trigger to fire." +> + +**[v1: target/riscv: rvv: Check single width operator for vector fp widen instructions](http://lore.kernel.org/qemu-devel/20240320072709.1043227-3-max.chou@sifive.com/)** + +> The require_scale_rvf function only checks the double width operator for +> the vector floating point widen instructions, so most of the widen +> checking functions need to add require_rvf for single width operator. +> + +**[v1: target/riscv: rvv: Check single width operator for vfncvt.rod.f.f.w](http://lore.kernel.org/qemu-devel/20240320072709.1043227-4-max.chou@sifive.com/)** + +> The opfv_narrow_check needs to check the single width float operator by +> require_rvf. +> + +#### U-Boot + +**[v2: riscv: add support for Milk-V Mars board](http://lore.kernel.org/u-boot/20240321181149.177356-1-heinrich.schuchardt@canonical.com/)** + +> The Milk-V Mars board is technically very close to the StarFive +> VisionFive 2 board. +> +> With this patch series the VisionFive 2 U-Boot SPL will detect that it +> is running on a Milk-V board and patch the device-tree accordingly. +> This is the same approach that has been taken to handle the differences +> between the Visionfive 2 1.2B and 1.3A revisions. +> + +**[v1: cmd: bootm: add ELF file support](http://lore.kernel.org/u-boot/20240319161022.41451-1-Maxim.Moskalets@kaspersky.com/)** + +> Some operating systems (e.g. seL4) and embedded applications are ELF +> images. It is convenient to use FIT-images to implement trusted boot. +> Added "elf" image type for booting using bootm command. +> + +**[v1: Support new RISC-V ISA extension properties](http://lore.kernel.org/u-boot/20240318151604.865025-2-conor@kernel.org/)** + +> This would have just been a single patch (the second one), but as I +> reported a while back there's a problem with extension detection when +> the ISA string exceeds 32 characters: +> https://lore.kernel.org/u-boot/20240221-daycare-reliably-8ec86f95fe71@spud/ +> The first patch here fixes what I see as a bit of a misuse of +> cpu_get_desc() in supports_extension() as a preparatory patch for adding +> the new properties. Or more accurately, new property, as U-Boot barely +> makes use of extension detection as-is in s-mode and only one of the two +> new properties is even needed. +> + ## 20240317:第 83 期 ### 内核动态