From 1119fa2ab303397c879ca5d05211845920b509e3 Mon Sep 17 00:00:00 2001 From: Rory Li <736729045@qq.com> Date: Sun, 26 May 2024 07:47:28 +0000 Subject: [PATCH] update news/README.md. Signed-off-by: Rory Li <736729045@qq.com> --- news/README.md | 819 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 819 insertions(+) diff --git a/news/README.md b/news/README.md index defac61..c42be19 100644 --- a/news/README.md +++ b/news/README.md @@ -6,6 +6,825 @@ * [2023 年 - 上半年](2023-1st-half.md) * [2023 年 - 下半年](2023-2nd-half.md) + +## 20240526:第 93 期 + +### 内核动态 + +#### RISC-V 架构支持 + +**[v6: Linux RISC-V IOMMU Support](http://lore.kernel.org/linux-riscv/cover.1716578450.git.tjeznach@rivosinc.com/)** + +> This patch series introduces support for RISC-V IOMMU architected +> hardware into the Linux kernel. + +**[v1: Documentation: RISC-V: uabi: Only scalar misaligned loads are supported](http://lore.kernel.org/linux-riscv/20240524185600.5919-1-palmer@rivosinc.com/)** + +> We're stuck supporting scalar misaligned loads in userspace because they +> were part of the ISA at the time we froze the uABI. That wasn't the +> case for vector misaligned accesses, so depending on them +> unconditionally is a userspace bug. All extant vector hardware traps on +> these misaligned accesses. + +**[GIT PULL: RISC-V Patches for the 6.10 Merge Window, Part 2](http://lore.kernel.org/linux-riscv/mhng-3c77af8f-340f-41fd-86db-a2add39fdac2@palmer-ri-x1c9/)** + +> merged tag 'riscv-for-linus-6.10-mw1' +> The following changes since commit 0bfbc914d9433d8ac2763a9ce99ce7721ee5c8e0: + +**[v4: Add Svadu Extension Support](http://lore.kernel.org/linux-riscv/20240524103307.2684-1-yongxuan.wang@sifive.com/)** + +> Svadu is a RISC-V extension for hardware updating of PTE A/D bits. This +> patch set adds support to enable Svadu extension for both host and guest +> OS. + +**[v4: bpf-next: riscv, bpf: Introduce Zba optimization](http://lore.kernel.org/linux-riscv/20240524075543.4050464-1-xiao.w.wang@intel.com/)** + +> The riscv Zba extension provides instructions to accelerate the generation +> of addresses that index into arrays of basic data types, bpf JIT generated +> insn counts could be reduced by leveraging Zba for address calculation. + +**[v1: riscv: hweight: relax assembly constraints](http://lore.kernel.org/linux-riscv/20240523094325.3514-1-dqfext@gmail.com/)** + +> rd and rs don't have to be the same. + +**[v2: riscv: prevent pt_regs corruption for secondary idle threads](http://lore.kernel.org/linux-riscv/20240523084327.2013211-1-geomatsi@gmail.com/)** + +> Top of the kernel thread stack should be reserved for pt_regs. However +> this is not the case for the idle threads of the secondary boot harts. +> Their stacks overlap with their pt_regs, so both may get corrupted. + +**[v2: riscv, bpf: Use STACK_ALIGN macro for size rounding up](http://lore.kernel.org/linux-riscv/20240523031835.3977713-1-xiao.w.wang@intel.com/)** + +> Use the macro STACK_ALIGN that is defined in asm/processor.h for stack size +> rounding up, just like bpf_jit_comp32.c does. + +**[GIT PULL: RISC-V Patches for the 6.10 Merge Window, Part 1](http://lore.kernel.org/linux-riscv/mhng-e73e59bf-92fc-4122-9f9e-a329d20eba55@palmer-ri-x1c9/)** + +> There's a pair of driver build fixes that are already on the lists and +> a report of a ftrace failure that might be triggered by the ftrace/AIA fix, but +> seems like we're better off with these than without. +> This first one isn't showing up in a in-flight merge \`git diff\`, but it looks +> pretty straight-forward + +**[v2: dt-bindings: interrupt-controller: riscv,cpu-intc](http://lore.kernel.org/linux-riscv/20240522153835.22712-1-kanakshilledar111@protonmail.com/)** + +> This series of patches converts the RISC-V CPU interrupt controller to +> the newer dt-schema binding. + +**[v2: KVM: Fold kvm_arch_sched_in() into kvm_arch_vcpu_load()](http://lore.kernel.org/linux-riscv/20240522014013.1672962-1-seanjc@google.com/)** + +> The other motivation for this is to avoid yet another arch hook, and more +> arbitrary ordering, if there's a future need to hook kvm_sched_out() (we've +> come close on the x86 side several times). E.g. kvm_arch_vcpu_put() can +> simply check kvm_vcpu.scheduled_out if it needs to something specific for +> the vCPU being scheduled out. + +**[v5: Define _GNU_SOURCE for sources using](http://lore.kernel.org/linux-riscv/20240522005913.3540131-1-edliaw@google.com/)** + +> Centralizes the definition of _GNU_SOURCE into KHDR_INCLUDES and removes +> redefinitions of _GNU_SOURCE from source code. + +**[v3: riscv: Memory Hot(Un)Plug support](http://lore.kernel.org/linux-riscv/20240521114830.841660-1-bjorn@kernel.org/)** + + +> Memory Hot(Un)Plug support (and ZONE_DEVICE) for the RISC-V port + + +**[v9: Add support for Allwinner PWM on D1/T113s/R329 SoCs](http://lore.kernel.org/linux-riscv/20240520184227.120956-1-privatesub2@gmail.com/)** + +**[v1: riscv, bpf: Introduce shift add helper with Zba optimization](http://lore.kernel.org/linux-riscv/20240520071631.2980798-1-xiao.w.wang@intel.com/)** + +> Zba extension is very useful for generating addresses that index into array +> of basic data types. This patch introduces sh2add and sh3add helpers for +> RV32 and RV64 respectively, to accelerate pointer array addressing. + +**[v1: riscv, bpf: try RVC for reg move within BPF_CMPXCHG JIT](http://lore.kernel.org/linux-riscv/20240519050507.2217791-1-xiao.w.wang@intel.com/)** + +> We could try to emit compressed insn for reg move operation during CMPXCHG +> JIT, the instruction compression has no impact on the jump offsets of +> following forward and backward jump instructions. + +**[v2: bpf-next: Use bpf_prog_pack for RV64 bpf trampoline](http://lore.kernel.org/linux-riscv/20240518092917.2771703-1-pulehui@huaweicloud.com/)** + +> We used bpf_prog_pack to aggregate bpf programs into huge page to +> relieve the iTLB pressure on the system. We can apply it to bpf +> trampoline, as Song had been implemented it in core and x86 . This +> patch is going to use bpf_prog_pack to RV64 bpf trampoline. + +#### 进程调度 + +**[v1: net: net/sched: Add xmit_recursion level in sch_direct_xmit()](http://lore.kernel.org/lkml/20240524085108.1430317-1-yuehaibing@huawei.com/)** + +> packet from PF_PACKET socket ontop of an IPv6-backed ipvlan device will hit +> WARN_ON_ONCE() in sk_mc_loop() through sch_direct_xmit() path while ipvlan +> device has qdisc queue. + +**[v1: sched/numa: Correct NUMA imbalance calculation](http://lore.kernel.org/lkml/20240524035438.2701479-1-zhangqiao22@huawei.com/)** + +> When perform load balance, a NUMA imbalance is allowed +> if busy CPUs is less than the maximum threshold, it +> remains a pair of communication tasks on the current +> node when the source doamin is lightly loaded. In many +> cases, this prevents communicating tasks being pulled apart. + +#### 内存管理 + +**[v1: kmsan: introduce test_unpoison_memory()](http://lore.kernel.org/linux-mm/20240524232804.1984355-1-bjohannesmeyer@gmail.com/)** + +> Add a regression test to ensure that kmsan_unpoison_memory() works the same +> as an unpoisoning operation added by the instrumentation. (Of course, +> please correct me if I'm misunderstanding how these should work). + +**[v4: Enhance soft hwpoison handling and injection](http://lore.kernel.org/linux-mm/20240524215306.2705454-1-jane.chu@oracle.com/)** + +**[v1: Huge remap_pfn_range for vfio-pci](http://lore.kernel.org/linux-mm/CAJHvVcge-1JhHd4HQKXye9F-WrrMQZ66oU6XF2ie3PCKXaihKw@mail.gmail.com/)** + +> Changing remap_pfn_range to install PUDs or PMDs is straightforward. +> The hairy part is the fault / follow side of things: + +**[v1: mm: swap: mTHP swap allocator base on swap cluster order](http://lore.kernel.org/linux-mm/20240524-swap-allocator-v1-0-47861b423b26@kernel.org/)** + +> This is the short term solutiolns "swap cluster order" listed +> in my "Swap Abstraction" discussion slice 8 in the recent +> LSF/MM conference. + +**[v2: ioctl()-based API to query VMAs from /proc//maps](http://lore.kernel.org/linux-mm/20240524041032.1048094-1-andrii@kernel.org/)** + +> The new PROCMAP_QUERY ioctl() API added in this patch set was motivated by the +> former pattern of usage. Patch #9 adds a tool that faithfully reproduces an +> efficient VMA matching pass of a symbolizer, collecting a subset of covering +> VMAs for a given set of addresses as efficiently as possible. This tool is +> serving both as a testing ground, as well as a benchmarking tool. +> It implements everything both for currently existing text-based +> /proc//maps interface, as well as for newly-added PROCMAP_QUERY ioctl(). + +**[v1: Modified XArray entry bit flags as macro constants](http://lore.kernel.org/linux-mm/20240524024945.9309-1-rgbi3307@naver.com/)** + +> It would be better to modify the operation on the last two bits of the entry +> with a macro constant name rather than using a numeric constant. + +**[v1: mm: let kswapd work again for node that used to be hopeless but may not now](http://lore.kernel.org/linux-mm/20240523051406.81053-1-byungchul@sk.com/)** + +> From now on, the system should run without reclaim service in background +> served by kswapd until direct reclaim will do for that. Even worse, +> tiering mechanism is no longer able to work because kswapd has stopped +> that the mechanism relies on. + +**[v1: memcg: rearrage fields of mem_cgroup_per_node](http://lore.kernel.org/linux-mm/20240523034824.1255719-1-shakeel.butt@linux.dev/)** + +> Kernel test robot reported performance regression for will-it-scale +> test suite's page_fault2 test case for the commit 70a64b7919cb ("memcg: +> dynamically allocate lruvec_stats"). After inspection it seems like the +> commit has unintentionally introduced false cache sharing. + +**[v1: 9p: Enable multipage folios](http://lore.kernel.org/linux-mm/463766.1716412334@warthog.procyon.org.uk/)** + +> Enable support for multipage folios on the 9P filesystem. This is all +> handled through netfslib and is already enabled on AFS and CIFS also. + +**[v1: mm: page_type, zsmalloc and page_mapcount_reset()](http://lore.kernel.org/linux-mm/20240522210341.1030552-1-david@redhat.com/)** + +> Wanting to remove the remaining abuser of _mapcount/page_type along with +> page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be +> converted away from "struct page" . + +**[v2: mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()](http://lore.kernel.org/linux-mm/20240522125713.775114-1-david@redhat.com/)** + +> There is interest in mapping zeropages via vm_insert_pages() into +> MAP_SHARED mappings. + +**[v1: mm, slab: don't wrap internal functions with alloc_hooks()](http://lore.kernel.org/linux-mm/20240522095037.13958-1-vbabka@suse.cz/)** + +> The functions __kmalloc_noprof(), kmalloc_large_noprof(), +> kmalloc_trace_noprof() and their _node variants are all internal to the +> implementations of kmalloc_noprof() and kmalloc_node_noprof() and are +> only declared in the "public" slab.h and exported so that those +> implementations can be static inline and distinguish the build-time +> constant size variants. + +**[v1: Improve dmesg output for swapfile+hibernation](http://lore.kernel.org/linux-mm/20240522074658.2420468-1-Sukrit.Bhatnagar@sony.com/)** + +> While trying to use a swapfile for hibernation, I noticed that the suspend +> process was failing when it tried to search for the swap to use for snapshot. +> I had created the swapfile on ext4 and got the starting physical block offset +> using the filefrag command. + +**[v1: Improve dump_page() output for slab pages](http://lore.kernel.org/linux-mm/20240522074629.2420423-1-Sukrit.Bhatnagar@sony.com/)** + +> While using dump_page() on a range of pages, I noticed that there were some +> PG_slab pages that were also showing as PG_anon pages, according to the +> function output. + +**[v1: Restructure va_high_addr_switch](http://lore.kernel.org/linux-mm/20240522070435.773918-1-dev.jain@arm.com/)** + +> The va_high_addr_switch memory selftest tests out some corner cases +> related to allocation and page/hugepage faulting around the switch +> boundary. + +**[v2: mm: batch unlink_file_vma calls in free_pgd_range](http://lore.kernel.org/linux-mm/20240521234321.359501-1-mjguzik@gmail.com/)** + +> Execs of dynamically linked binaries at 20-ish cores are bottlenecked on +> the i_mmap_rwsem semaphore, while the biggest singular contributor is +> free_pgd_range inducing the lock acquire back-to-back for all +> consecutive mappings of a given file. + +**[v3: percpu_counter: add a cmpxchg-based _add_batch variant](http://lore.kernel.org/linux-mm/20240521233100.358002-1-mjguzik@gmail.com/)** + +> Interrupt disable/enable trips are quite expensive on x86-64 compared to +> a mere cmpxchg (note: no lock prefix!) and percpu counters are used +> quite often. + +**[v2: mm: refactor folio_undo_large_rmappable()](http://lore.kernel.org/linux-mm/20240521130315.46072-1-wangkefeng.wang@huawei.com/)** + +> There is a +> repeated check for small folio (order 0) during each call of the +> folio_undo_large_rmappable(), so only keep folio_order() check +> inside the function. + +**[v1: support large folio swap-out and swap-in for shmem](http://lore.kernel.org/linux-mm/cover.1716285099.git.baolin.wang@linux.alibaba.com/)** + +> Shmem will support large folio allocation to get a better performance, +> however, the memory reclaim still splits the precious large folios when trying +> to swap-out shmem, which may lead to the memory fragmentation issue and can not +> take advantage of the large folio for shmeme. +> + +**[v1: mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp](http://lore.kernel.org/linux-mm/20240521073446.23185-1-osalvador@suse.de/)** + +> hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which +> means that we might need to call restore_reserve_on_error() upon error. +> vmf_anon_prepare() releases the vma lock before returning, but +> restore_reserve_on_error() expects the vma lock to be held by the caller. + +**[v6: Reclaim lazyfree THP without splitting](http://lore.kernel.org/linux-mm/20240521040244.48760-1-ioworker0@gmail.com/)** + +> This series adds support for reclaiming PMD-mapped THP marked as lazyfree +> without needing to first split the large folio via split_huge_pmd_address(). + +**[v1: memblock: introduce memsize showing reserved memory](http://lore.kernel.org/linux-mm/20240521023957.2587005-1-jaewon31.kim@samsung.com/)** + +> This patch introduce a debugfs node, memblock/memsize, to see reserved +> memory easily. + +**[v1: selftests: mm: check return values](http://lore.kernel.org/linux-mm/20240520185248.1801945-1-usama.anjum@collabora.com/)** + +> Check return value and return error/skip the tests. + +**[v1: Convert __unmap_hugepage_range() to folios](http://lore.kernel.org/linux-mm/20240520194749.334896-1-vishal.moola@gmail.com/)** + +> Replaces 4 calls to compound_head() with one. Also converts +> unmap_hugepage_range() and unmap_ref_private() to take in folios. + +**[v2: Add NUMA-aware DAMOS watermarks](http://lore.kernel.org/linux-mm/20240520143038.189061-1-tome01@ajou.ac.kr/)** + +> These patches allow for DAMON to select monitoring target either total +> memory or a specific NUMA memory node. + +#### 文件系统 + +**[v1: [CFT][experimental] net/socket.c: use straight fdget/fdput](http://lore.kernel.org/linux-fsdevel/20240526034032.GY2118490@ZenIV/)** + +> Checking the theory that the important part in sockfd_lookup_light() is +> avoiding needless file refcount operations, not the marginal reduction +> of the register pressure from not keeping a struct file pointer in the +> caller. + +**[v1: netfs: if extracting pages from user iterator fails return 0](http://lore.kernel.org/linux-fsdevel/20240524053557.2702600-1-lizhi.xu@windriver.com/)** + +> When extracting the pages from a user iterator fails, netfs_extract_user_iter() +> will return 0, this situation will result in an abnormal and oversized return +> value for netfs_unbuffered_writer_locked() (for example, 9223372036854775807). +> Therefore, when the number of extracted pages is 0, set ret to 0 and jump to out. + +**[v2: fhandle: expose u64 mount id to name_to_handle_at(2)](http://lore.kernel.org/linux-fsdevel/20240523-exportfs-u64-mount-id-v2-1-f9f959f17eb1@cyphar.com/)** + +> Now that we provide a unique 64-bit mount ID interface in statx, we can +> now provide a race-free way for name_to_handle_at(2) to provide a file +> handle and corresponding mount without needing to worry about racing +> with /proc/mountinfo parsing. + +**[v3: zonefs: move super block reading from page to folio](http://lore.kernel.org/linux-fsdevel/20240523150253.3288-1-jth@kernel.org/)** + +> Move reading of the on-disk superblock from page to folios. + +**[v1: fs/adfs: add MODULE_DESCRIPTION](http://lore.kernel.org/linux-fsdevel/20240523-md-adfs-v1-1-364268e38370@quicinc.com/)** + +> Fix the 'make W=1' issue: +> WARNING: modpost: missing MODULE_DESCRIPTION() in fs/adfs/adfs.o + +**[v1: exfat: handle idmapped mounts](http://lore.kernel.org/linux-fsdevel/20240523011007.40649-1-mjeanson@efficios.com/)** + +> Pass the idmapped mount information to the different helper +> functions. Adapt the uid/gid checks in exfat_setattr to use the +> vfsuid/vfsgid helpers. + +**[v2: fs: fsconfig: intercept non-new mount API in advance for FSCONFIG_CMD_CREATE_EXCL command](http://lore.kernel.org/linux-fsdevel/20240522030422.315892-1-lihongbo22@huawei.com/)** + +> fsconfig with FSCONFIG_CMD_CREATE_EXCL command requires the new mount api, +> here we should return -EOPNOTSUPP in advance to avoid extra procedure. + +**[git pull: vfs.git misc stuff](http://lore.kernel.org/linux-fsdevel/20240521184359.GR2118490@ZenIV/)** + +> Stuff that should've been pushed in the last merge window, +> but had fallen through the cracks ;-/ + +**[git pull: vfs.git last bdev series](http://lore.kernel.org/linux-fsdevel/20240521183524.GQ2118490@ZenIV/)** + +> We can easily have up to 24 flags with sane +> atomicity, _without_ pushing anything out +> of the first cacheline of struct block_device. + +**[git pull: vfs bdev pile 2](http://lore.kernel.org/linux-fsdevel/20240521163852.GP2118490@ZenIV/)** + +> Next block device series - replacing ->bd_inode (me and Yu Kuai). +> Two trivial conflicts (block/ioctl.c and fs/btrfs/disk-io.c); proposed +> resolution in #merge-candidate (or in linux-next, for that matter). + +**[git pull: vfs.git set_blocksize() (bdev pile 1)](http://lore.kernel.org/linux-fsdevel/20240521043731.GO2118490@ZenIV/)** + +> First bdev-related pile - set_blocksize() stuff +> getting rid of bogus set_blocksize() uses, switching it +> to struct file * and verifying that caller has device +> opened exclusively. +> + +**[v1: udf: Correct lock ordering in udf_setsize()](http://lore.kernel.org/linux-fsdevel/20240520134853.21305-1-jack@suse.cz/)** + +> Syzbot has reported a lockdep warning in udf_setsize(). After some analysis +> this is actually harmless but this series fixes the lockdep false positive +> error. I plan to merge these patches through my tree. + + +**[v2: jbd2: speed up jbd2_transaction_committed()](http://lore.kernel.org/linux-fsdevel/20240520131831.2910790-1-yi.zhang@huaweicloud.com/)** + +> We have already stored the sequence number of the most recently +> committed transaction in journal t->j_commit_sequence, we could do this +> check by comparing it with the given tid instead. If the given tid isn't +> smaller than j_commit_sequence, we can ensure that the given transaction +> has been committed. That way we could drop the expensive lock and +> achieve about 10% +> 20% performance gains in concurrent DIOs on may +> virtual machine with 100G ramdisk. + + +**[v20: Implement copy offload support](http://lore.kernel.org/linux-fsdevel/20240520102033.9361-1-nj.shetty@samsung.com/)** + +> The patch series covers the points discussed in the past and most recently +> in LSFMM'24. +> We have covered the initial agreed requirements in this patch set and +> further additional features suggested by the community. + +**[GIT PULL: isofs, udf, quota, ext2, reiserfs changes for 6.10-rc1](http://lore.kernel.org/linux-fsdevel/20240520113428.ckwzn5kh75mmjxo3@quack3/)** + +**[GIT PULL: fsnotify changes for 6.10-rc1](http://lore.kernel.org/linux-fsdevel/20240520112239.pz35myprqju2gzo6@quack3/)** + +**[v3: iomap: avoid redundant fault_in_iov_iter_readable() judgement when use larger chunks](http://lore.kernel.org/linux-fsdevel/20240520105525.2176322-1-xu.yang_2@nxp.com/)** + +> Since commit (5d8edfb900d5 "iomap: Copy larger chunks from userspace"), +> iomap will try to copy in larger chunks than PAGE_SIZE. However, if the +> mapping doesn't support large folio, only one page of maximum 4KB will +> be created and 4KB data will be writen to pagecache each time. Then, +> next 4KB will be handled in next iteration. This will cause potential +> write performance problem. +> With this change, the write speed will be stable. Tested on ARM64 device. +> + +**[v1: exec: Add KUnit test for bprm_stack_limits()](http://lore.kernel.org/linux-fsdevel/20240520021337.work.198-kees@kernel.org/)** + +> This adds a first KUnit test to the core exec code. With the ability to +> manipulate userspace memory from KUnit coming[1], I wanted to at least +> get the KUnit framework in place in exec.c. Most of the coming tests +> will likely be to binfmt_elf.c, but still, this serves as a reasonable +> first step. +> + +**[v1: eventfd: introduce ratelimited wakeup for non-semaphore eventfd](http://lore.kernel.org/linux-fsdevel/20240519144124.4429-1-wen.yang@linux.dev/)** + +> For the NON-SEMAPHORE eventfd, a write (2) call adds the 8-byte integer +> value provided in its buffer to the counter, while a read (2) returns the +> 8-byte value containing the value and resetting the counter value to 0. +> Therefore, the accumulated counter values of multiple eventfd_write can be +> read out by a single eventfd_read. Therefore, the accumulated value of +> multiple writes can be retrieved by a single read. + +**[v1: blk: optimization for classic polling](http://lore.kernel.org/linux-fsdevel/3578876466-3733-1-git-send-email-nj.shetty@samsung.com/)** + +> This removes the dependency on interrupts to wake up task. Set task +> state as TASK_RUNNING, if need_resched() returns true, +> while polling for IO completion. + +#### 网络设备 + +**[v1: net: ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound](http://lore.kernel.org/netdev/20240525034231.2498827-1-yuehaibing@huawei.com/)** + +> Raw packet from PF_PACKET socket ontop of an IPv6-backed ipvlan device will +> hit WARN_ON_ONCE() in sk_mc_loop() through sch_direct_xmit() path. + +**[v1: net: selftests: mptcp: mark unstable subtests as flaky](http://lore.kernel.org/netdev/20240524-upstream-net-20240524-selftests-mptcp-flaky-v1-0-a352362f3f8e@kernel.org/)** + +> Some subtests can be unstable, failing once every X runs. Fixing them +> can take time: there could be an issue in the kernel or in the subtest, +> and it is then important to do a proper analysis, not to hide real bugs. + +**[v6: iwl-next: ixgbe: Add support for Intel(R) E610 device](http://lore.kernel.org/netdev/20240524151311.78939-1-piotr.kwapulinski@intel.com/)** + +> This patch series adds low level support for the following features and +> enables link management. + +**[v2: net: sock_map: avoid race between sock_map_close and sk_psock_put](http://lore.kernel.org/netdev/20240524144702.1178377-1-cascardo@igalia.com/)** + +> This can be reproduced with a thread deleting an element from the sock map, +> while the second one creates a socket, adds it to the map and closes it. + +**[v1: net: phy: microchip_t1s: lan865x rev.b1 support](http://lore.kernel.org/netdev/20240524140706.359537-1-ramon.nordin.rodriguez@ferroamp.se/)** + +> This has been tested with a lan8650 rev.b1 chip on one end and a lan8670 +> usb eval board on the other end. Performance is rather lacking, the +> rev.b0 reaches close to the 10Mbit/s limit, but b.1 only gets about +> +> 4Mbit/s, with the same results when PLCA enabled or disabled. + +**[v3: iwl-next: ice:Support to dump PHY config, FEC](http://lore.kernel.org/netdev/20240524135255.3607422-1-anil.samal@intel.com/)** + +> Implementation to dump PHY configuration and FEC statistics to +> facilitate link level debugging of customer issues. + + +**[v1: net-next: net: stmmac: Add 2500BASEX support for integrated PCS](http://lore.kernel.org/netdev/20240524130653.30666-1-quic_snehshah@quicinc.com/)** + +> Qcom mac supports both SGMII and 2500BASEX with integrated PCS. +> Add changes to enable 2500BASEX along woth SGMII. + +**[v2: Socket type control for Landlock](http://lore.kernel.org/netdev/20240524093015.2402952-1-ivanov.mikhail1@huawei-partners.com/)** + + +> It is based on the landlock's mic-next branch on top of v6.9 kernel +> version. + +**[[net v3 PATCH] net:fec: Add fec_enet_deinit()](http://lore.kernel.org/netdev/20240524050528.4115581-1-xiaolei.wang@windriver.com/)** + +> When fec_probe() fails or fec_drv_remove() needs to release the +> fec queue and remove a NAPI context, therefore add a function +> corresponding to fec_enet_init() and call fec_enet_deinit() which +> does the opposite to release memory and remove a NAPI context. + +**[v3: Bluetooth: Add vendor-specific packet classification for ISO data](http://lore.kernel.org/netdev/20240524045045.3310641-1-yinghsu@chromium.org/)** + +> To avoid additional lookups, this patch introduces vendor-specific +> packet classification for Intel BT controllers to distinguish +> ISO data packets from ACL data packets. + +**[v10: VMware hypercalls enhancements](http://lore.kernel.org/netdev/20240523191446.54695-1-alexey.makhalov@broadcom.com/)** + +> VMware hypercalls invocations were all spread out across the kernel +> implementing same ABI as in-place asm-inline. With encrypted memory +> and confidential computing it became harder to maintain every changes +> in these hypercall implementations. + +**[v1: net-next: net: mana: Allow variable size indirection table](http://lore.kernel.org/netdev/1716483314-19028-1-git-send-email-shradhagupta@linux.microsoft.com/)** + +> Allow variable size indirection table allocation in MANA instead +> of using a constant value MANA_INDIRECT_TABLE_SIZE. +> The size is now derived from the MANA_QUERY_VPORT_CONFIG and the +> indirection table is allocated dynamically. + +**[v1: ipvs: Avoid unnecessary calls to skb_is_gso_sctp](http://lore.kernel.org/netdev/20240523165445.24016-1-iluceno@suse.de/)** + +> In the context of the SCTP SNAT/DNAT handler, these calls can only +> return true. + +**[GIT PULL: Networking for v6.10-rc1](http://lore.kernel.org/netdev/20240523161524.82511-1-pabeni@redhat.com/)** + +> Quite smaller than usual. Notably it includes the fix for the unix +> regression you have been notified of in the past weeks. +> The TCP window fix will require some follow-up, already queued. + + +**[v1: iproute-next: Add support for xfrm state direction attribute](http://lore.kernel.org/netdev/20240523151707.972161-1-chopps@labn.net/)** + + +> This patchset adds support for setting the new xfrm state direction +> attribute. + +**[v1: net: gro: initialize network_offset in network layer](http://lore.kernel.org/netdev/20240523141434.1752483-1-willemdebruijn.kernel@gmail.com/)** + +> Syzkaller was able to trigger + +**[v1: net: tcp: reduce accepted window in NEW_SYN_RECV state](http://lore.kernel.org/netdev/20240523130528.60376-1-edumazet@google.com/)** + +> Jason commit made checks against ACK sequence less strict +> and can be exploited by attackers to establish spoofed flows +> with less probes. + +**[v3: bpf-next: netfilter: Add the capability to offload flowtable in XDP layer](http://lore.kernel.org/netdev/cover.1716465377.git.lorenzo@kernel.org/)** + + +> This series has been tested running the xdp_flowtable_offload eBPF program +> on an ixgbe 10Gbps NIC (eno2) in order to XDP_REDIRECT the TCP traffic to +> a veth pair (veth0-veth1) based on the content of the nf_flowtable as soon +> as the TCP connection is in the established state: + + +**[v1: can: m_can: Add am62 wakeup support](http://lore.kernel.org/netdev/20240523075347.1282395-1-msp@baylibre.com/)** + + +> To support mcu_mcan0 and mcu_mcan1 wakeup for the mentioned SoCs, the +> series introduces a notion of wake-on-lan for m_can. If the user decides +> to enable wake-on-lan for a m_can device, the device is set to wakeup +> enabled. A 'wakeup' pinctrl state is selected to enable wakeup flags for +> the relevant pins. If wake-on-lan is disabled the default pinctrl is +> selected. + +**[v1: net: filter: use DEV_STAT_INC()](http://lore.kernel.org/netdev/20240523033520.4029314-1-jiangyunshui@kylinos.cn/)** + +> syzbot/KCSAN reported that races happen when multiple cpus +> updating dev->stats.tx_error concurrently. +> +> Adopt SMP safe DEV_STATS_INC() to update dev->stats fields. + +**[v3: iproute2: color: default to dark background](http://lore.kernel.org/netdev/E1s9tE4-00000006L4I-46tH@ws2.gedalya.net/)** + +> Since the COLORFGBG environment variable isn't always there, and +> anyway it seems that terminals and consoles more commonly default +> to dark backgrounds, make that assumption here. + + +**[v1: [resend] color: default to dark background](http://lore.kernel.org/netdev/E1s9rpA-00000006Jy7-18Q5@ws2.gedalya.net/)** + +**[v2: net: af_unix: Read sk->sk_hash under bindlock during bind().](http://lore.kernel.org/netdev/20240522154218.78088-1-kuniyu@amazon.com/)** + + +> There could be a chance that sk->sk_hash changes after the lockless +> read. However, in such a case, non-NULL unix_sk(sk)->addr is visible +> under unix_sk(sk)->bindlock, and bind() returns -EINVAL without using +> the prefetched value. + +**[v2: net: af_unix: Annotate data-race around unix_sk(sk)->addr.](http://lore.kernel.org/netdev/20240522154002.77857-1-kuniyu@amazon.com/)** + +> In other functions, we still read unix_sk(sk)->addr locklessly to check +> if the socket is bound, and KCSAN complains about it. + +**[v3: RESEND: can: mcp251xfd: add gpio functionality](http://lore.kernel.org/netdev/20240522-mcp251xfd-gpio-feature-v3-0-8829970269c5@ew.tq-group.com/)** + +> The mcp251xfd allows two pins to be configured as GPIOs. This series +> adds support for this feature. +> The GPIO functionality is controlled with the IOCON register which has +> an erratum. + +**[v1: net: usb: smsc95xx: configure external LEDs function for EVB-LAN8670-USB](http://lore.kernel.org/netdev/20240522140817.409936-1-Parthiban.Veerasooran@microchip.com/)** + +> By default, LAN9500A configures the external LEDs to the below function. +> But, EVB-LAN8670-USB uses the below external LEDs function which can be +> enabled by writing 1 to the LED Select (LED_SEL) bit in the LAN9500A. + +**[v3: rds: rdma: Add ability to force GFP_NOIO](http://lore.kernel.org/netdev/20240522135444.1685642-1-haakon.bugge@oracle.com/)** + +> This series enables RDS and the RDMA stack to be used as a block I/O +> device. This to support a filesystem on top of a raw block device +> which uses RDS and the RDMA stack as the network transport layer. + +**[v1: bpf-next: selftests/bpf: test_sockmap, use section names understood by libbpf](http://lore.kernel.org/netdev/20240522080936.2475833-1-jakub@cloudflare.com/)** + +> libbpf can deduce program type and attach type from the ELF section name. +> We don't need to pass it out-of-band if we switch to libbpf convention + +**[v2: net: enic: Validate length of nl attributes in enic_set_vf_port](http://lore.kernel.org/netdev/20240522073044.33519-1-rzats@paloaltonetworks.com/)** + +> These attributes are validated (in the function do_setlink in rtnetlink.c) +> using the nla_policy ifla_port_policy. The policy defines IFLA_PORT_PROFILE +> as NLA_STRING, IFLA_PORT_INSTANCE_UUID as NLA_BINARY and +> IFLA_PORT_HOST_UUID as NLA_STRING. That means that the length validation +> using the policy is for the max size of the attributes and not on exact +> size so the length of these attributes might be less than the sizes that +> enic_set_vf_port expects. This might cause an out of bands +> read access in the memcpys of the data of these +> attributes in enic_set_vf_port. + +**[v1: net-next: ila: avoid genlmsg_reply when not ila_map found](http://lore.kernel.org/netdev/20240522031537.51741-1-linma@zju.edu.cn/)** + +> The current ila_xlat_nl_cmd_get_mapping will call genlmsg_reply even if +> not ila_map found with user provided parameters. Then an empty netlink +> message will be sent and cause a WARNING like below. + +**[[net PATCH] net: fec: free fec queue when fec_enet_mii_init() fails](http://lore.kernel.org/netdev/20240522021317.1113689-1-xiaolei.wang@windriver.com/)** + +> commit 63e3cc2b87c2 ("arm64: dts: imx93-11x11-evk: add +> reset gpios for ethernet PHYs") the rese-gpios attribute +> is added, but this pcal6524 is loaded later, which causes +> fec driver defer, the following memory leak occurs. + +#### 安全增强 + +**[v1: Input: keyboard - use sizeof(*pointer) instead of sizeof(type)](http://lore.kernel.org/linux-hardening/AS8PR02MB7237277464F23CA168BFB3B18BF52@AS8PR02MB7237.eurprd02.prod.outlook.com/)** + +> It is preferred to use sizeof(*pointer) instead of sizeof(type) +> due to the type of the variable can change and one needs not +> change the former (unlike the latter). This patch has no effect +> on runtime behavior. + +**[v1: Bluetooth: Use sizeof(*pointer) instead of sizeof(type)](http://lore.kernel.org/linux-hardening/AS8PR02MB72373F23330301EA5B897D6E8BF52@AS8PR02MB7237.eurprd02.prod.outlook.com/)** + +> It is preferred to use sizeof(*pointer) instead of sizeof(type) +> due to the type of the variable can change and one needs not +> change the former (unlike the latter). This patch has no effect +> on runtime behavior. + +**[v1: ext4: Use memtostr_pad() for s_volume_name](http://lore.kernel.org/linux-hardening/20240523225408.work.904-kees@kernel.org/)** + +> As with the other strings in struct ext4_super_block, s_volume_name is +> not NUL terminated. The other strings were marked in commit 072ebb3bffe6 +> ("ext4: add nonstring annotations to ext4.h"). Using strscpy() isn't +> the right replacement for strncpy(); it should use memtostr_pad() +> instead. + +**[v1: clocksource/drivers/sprd: Enable register for timer counter from 32 bit to 64 bit](http://lore.kernel.org/linux-hardening/TYSPR04MB708410A11D944F4553DB55A08AEB2@TYSPR04MB7084.apcprd04.prod.outlook.com/)** + +> Using 32 bit for suspend compensation, the max compensation time is 36 +> hours(working clock is 32k).In some IOT devices, the suspend time may +> be long, even exceeding 36 hours. Therefore, a 64 bit timer counter +> is needed for counting. + +**[v3: Introduce STM32 DMA3 support](http://lore.kernel.org/linux-hardening/20240520154948.690697-1-amelie.delaunay@foss.st.com/)** + +> STM32 DMA3 is a direct memory access controller with different features +> depending on its hardware configuration. It is either called LPDMA (Low +> Power), GPDMA (General Purpose) or HPDMA (High Performance), and it can +> be found in new STM32 MCUs and MPUs. + +**[v1: usercopy: Convert test_user_copy to KUnit test](http://lore.kernel.org/linux-hardening/20240519190422.work.715-kees@kernel.org/)** + +> This builds on the proposal[1] from Mark and lets me convert the +> existing usercopy selftest to KUnit. Besides adding this basic test to +> the KUnit collection, it also opens the door for execve testing (which +> depends on having a functional current->mm), and should provide the +> basic infrastructure for adding Mark's much more complete usercopy tests. + +**[v1: efi: pstore: Return proper errors on UEFI failures](http://lore.kernel.org/linux-hardening/20240519163428.1148724-1-gpiccoli@igalia.com/)** + +> Right now efi-pstore either returns 0 (success) or -EIO; but we +> do have a function to convert UEFI errors in different standard +> error codes, helping to narrow down potential issues more accurately. + + +**[v1: RDMA/irdma: Annotate flexible array with __counted_by() in struct irdma_qvlist_info](http://lore.kernel.org/linux-hardening/2ca8b14adf79c4795d7aa95bbfc79253a6bfed82.1716102112.git.christophe.jaillet@wanadoo.fr/)** + +> So annotate it with __counted_by() to make it explicit and enable some +> additional checks. +> +> This allocation is done in irdma_save_msix_info(). + +**[v1: dma-buf/fence-array: Add flex array to struct dma_fence_array](http://lore.kernel.org/linux-hardening/d3204a5b4776553455c2cfb1def72f1dae0dba25.1716054403.git.christophe.jaillet@wanadoo.fr/)** + +> This is an effort to get rid of all multiplications from allocation +> functions in order to prevent integer overflows . + +**[v2: Bluetooth: hci_core: Refactor hci_get_dev_list() function](http://lore.kernel.org/linux-hardening/AS8PR02MB72371852645FF17D07AE0CA98BEF2@AS8PR02MB7237.eurprd02.prod.outlook.com/)** + +> This is an effort to get rid of all multiplications from allocation +> functions in order to prevent integer overflows . + +#### 异步 IO + +**[v2: io_uring/sqpoll: ensure that normal task_work is also run timely](http://lore.kernel.org/io-uring/45f46362-7dc2-4ab5-ab49-0f3cac1d58fb@kernel.dk/)** + +> With the move to private task_work, SQPOLL neglected to also run the +> normal task_work, if any is pending. This will eventually get run, but +> we should run it with the private task_work to ensure that things like +> a final fput() is processed in a timely fashion. + +#### Rust For Linux + +**[v2: rust: kernel: make impl_has_work compatible with more generics](http://lore.kernel.org/rust-for-linux/SY8P282MB48866A68C05C6444F0340A00CCEB2@SY8P282MB4886.AUSP282.PROD.OUTLOOK.COM/)** + +**[回复: v1: rust: kernel: make impl_has_work compatible with more complex generics](http://lore.kernel.org/rust-for-linux/ME0P282MB4890C545F827111FCE2D2AA0CCEB2@ME0P282MB4890.AUSP282.PROD.OUTLOOK.COM/)** + + +**[v2: Rust block device driver API and null block driver](http://lore.kernel.org/rust-for-linux/20240521140323.2960069-1-nmi@metaspace.dk/)** + +> Kernel robot found a few issues with the first iteration of this patch [1]. I +> also rebased the patch on the Rust PR for 6.10 [2], because we have some changes +> to allocation going in, and this patch needs updates for those changes. +> +> This is a resend to correct those issues. + + +**[v1: Device / Driver and PCI Rust abstractions](http://lore.kernel.org/rust-for-linux/20240520172554.182094-1-dakr@redhat.com/)** + +> This patch sereis implements basic generic device / driver Rust abstractions, +> as well as some basic PCI abstractions. + +**[v1: DRM Rust abstractions and Nova](http://lore.kernel.org/rust-for-linux/20240520172059.181256-1-dakr@redhat.com/)** + +> This patch series implements some basic DRM Rust abstractions and a stub +> implementation of the Nova GPU driver. + +#### BPF + +**[v5: bpf-next: use network helpers, part 5](http://lore.kernel.org/bpf/cover.1716638248.git.tanggeliang@kylinos.cn/)** + +> This patchset uses post_socket_cb and post_connect_cb callbacks of struct +> network_helper_opts to refactor do_test() in bpf_tcp_ca.c to move dctcp +> test dedicated code out of do_test() into test_dctcp(). + +**[v1: function_graph: Allow multiple users for function graph tracing](http://lore.kernel.org/bpf/20240525023652.903909489@goodmis.org/)** + +> This is a continuation of the function graph multi user code. +> I wrote a proof of concept back in 2019 of this code[1] and +> Masami started cleaning it up. + +**[v1: bpf-next: libbpf: configure log verbosity with env variable](http://lore.kernel.org/bpf/20240524131840.114289-1-yatsenko@meta.com/)** + +> Configure logging verbosity by setting LIBBPF_LOG_LEVEL environment +> variable, which is applied only to default logger. Once user set their +> custom logging callback, it is up to them to handle filtering. + +**[v5: bpf-next: Notify user space when a struct_ops object is detached/unregistered](http://lore.kernel.org/bpf/20240523230848.2022072-1-thinker.li@gmail.com/)** + +> This patch set enables the detach feature for struct_ops links and +> send an event to epoll when a link is detached. Subsystems could call +> link->ops->detach() to detach a link and notify user space programs +> through epoll. + +**[v7: bpf-next: Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.](http://lore.kernel.org/bpf/20240523174202.461236-1-thinker.li@gmail.com/)** + +> The patch set aims to enable the use of these specific types in arrays +> and struct fields, providing flexibility. It examines the types of +> global variables or the value types of maps, such as arrays and struct +> types, recursively to identify these special types and generate field +> information for them. + +**[v1: bpf-next: guard against access_size overflow?](http://lore.kernel.org/bpf/20240523131904.29770-1-shung-hsi.yu@suse.com/)** + +> Looking at commit ecc6a2101840 ("bpf: Protect against int overflow for +> stack access size") and the associated syzbot report (linked below), it +> seems that the underlying issue is that access_size argument +> check_helper_mem_access() can be overflowed. + +**[v2: bpf-next: bpf: Relax precision marking in open coded iters and may_goto loop.](http://lore.kernel.org/bpf/20240523064219.42465-1-alexei.starovoitov@gmail.com/)** + +> Skipping precision mark at if (i > 1000) keeps 'i' imprecise, +> but arr[i] will mark 'i' as precise anyway, because 'arr' is a map. +> On the next iteration of the loop the patch does copy_precision() +> that copies precision markings for top of the loop into next state +> of the loop. So on the next iteration 'i' will be seen as precise. + +**[v1: perf record: Use pinned BPF program for filter (v1)](http://lore.kernel.org/bpf/20240522215616.762195-1-namhyung@kernel.org/)** + +> This is to support the unprivileged BPF filter for profiling per-task events. +> Until now only root (or any user with CAP_BPF) can use the filter and we +> cannot add a new unprivileged BPF program types. After talking with the BPF +> folks at LSF/MM/BPF 2024, I was told that this is the way to go. Finally I +> managed to make it working with pinned BPF objects. :) + +**[v1: bpf-next: selftests/bpf: Use prog_attach_type to attach in test_sockmap](http://lore.kernel.org/bpf/e27d7d0c1e0e79b0acd22ac6ad5d8f9f00225303.1716372485.git.tanggeliang@kylinos.cn/)** + +> Since prog_attach_type[] array is defined, it makes sense to use it paired +> with prog_fd[] array for bpf_prog_attach() and bpf_prog_detach2() instead +> of open-coding. + +### 周边技术动态 + +#### Qemu + +**[v2: Add support for RISC-V ACPI tests](http://lore.kernel.org/qemu-devel/20240524061411.341599-1-sunilvl@ventanamicro.com/)** + +> Currently, bios-table-test doesn't support RISC-V. This series enables +> the framework changes required and basic testing. Things like NUMA +> related test cases will be added later. + +**[v3: riscv: QEMU RISC-V IOMMU Support](http://lore.kernel.org/qemu-devel/20240523173955.1940072-1-dbarboza@ventanamicro.com/)** + +> This series was tested using an emulated QEMU RISC-V host booting a QEMU +> KVM guest, passing through an emulated e1000 network card from the host +> to the guest. I can provide more details (e.g. QEMU command lines) if +> required, just let me know. For now this cover-letter is too much of an +> essay as is. + +**[v1: target/riscv: Support Zabha extension](http://lore.kernel.org/qemu-devel/20240523124045.1964-1-zhiwei_liu@linux.alibaba.com/)** + +> Zabha adds support AMO operations for byte and half word. If zacas has been implemented, +> zabha also adds support amocas.b and amocas.h. + +**[v1: target/riscv: Implement May-Be-Operations(zimop) extension](http://lore.kernel.org/qemu-devel/20240522062905.1799-1-zhiwei_liu@linux.alibaba.com/)** + +> The may be operation means that it has an initial behavior which can be redefined +> by later extensions to perform some other action. + +**[v2: RISC-V virt MHP support](http://lore.kernel.org/qemu-devel/20240521105635.795211-1-bjorn@kernel.org/)** + +> The RISC-V "virt" machine is currently missing memory hotplugging +> support (MHP). This series adds the missing virtio-md, and PC-DIMM +> support. + +#### U-Boot + +**[v1: LoongArch initial support](http://lore.kernel.org/u-boot/20240522-loongarch-v1-0-1407e0b69678@flygoat.com/)** + +> So far this series has implemented general support for initializing CPU, +> exceptions, kernel booting, CPU and timer drivers, QEMU LoongArch virt machine +> support and UEFI standard compliant EFI booting support. + + ## 20240519:第 92 期 ### 内核动态 -- Gitee