From 8edeb247e11432f9c690a3394ecdcf926ddc7db0 Mon Sep 17 00:00:00 2001 From: Peter Oberparleiter Date: Thu, 20 Jun 2024 14:20:27 +0200 Subject: [PATCH 1/9] s390/sclp: Prevent release of buffer in I/O stable inclusion from stable-5.10.224 commit a3e52a4c22c846858a6875e1c280030a3849e148 category: bugfix issue: #IAREPT CVE: CVE-2024-44969 Signed-off-by: wangxin --------------------------------------- [ Upstream commit bf365071ea92b9579d5a272679b74052a5643e35 ] When a task waiting for completion of a Store Data operation is interrupted, an attempt is made to halt this operation. If this attempt fails due to a hardware or firmware problem, there is a chance that the SCLP facility might store data into buffers referenced by the original operation at a later time. Handle this situation by not releasing the referenced data buffers if the halt attempt fails. For current use cases, this might result in a leak of few pages of memory in case of a rare hardware/firmware malfunction. Reviewed-by: Heiko Carstens Signed-off-by: Peter Oberparleiter Signed-off-by: Alexander Gordeev Signed-off-by: Sasha Levin Signed-off-by: wangxin --- drivers/s390/char/sclp_sd.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/s390/char/sclp_sd.c b/drivers/s390/char/sclp_sd.c index 1e244f78f192..64581433c334 100644 --- a/drivers/s390/char/sclp_sd.c +++ b/drivers/s390/char/sclp_sd.c @@ -319,8 +319,14 @@ static int sclp_sd_store_data(struct sclp_sd_data *result, u8 di) &esize); if (rc) { /* Cancel running request if interrupted */ - if (rc == -ERESTARTSYS) - sclp_sd_sync(page, SD_EQ_HALT, di, 0, 0, NULL, NULL); + if (rc == -ERESTARTSYS) { + if (sclp_sd_sync(page, SD_EQ_HALT, di, 0, 0, NULL, NULL)) { + pr_warn("Could not stop Store Data request - leaking at least %zu bytes\n", + (size_t)dsize * PAGE_SIZE); + data = NULL; + asce = 0; + } + } vfree(data); goto out; } -- Gitee From 54b7325891473d4007a2f2de496b28d300abd378 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= Date: Tue, 2 Apr 2024 23:10:34 +0200 Subject: [PATCH 2/9] sysctl: always initialize i_uid/i_gid MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit stable inclusion from stable-5.10.224 commit b2591c89a6e2858796111138c38fcb6851aa1955 category: bugfix issue: NA CVE: CVE-2024-42312 Signed-off-by: wangxin --------------------------------------- [ Upstream commit 98ca62ba9e2be5863c7d069f84f7166b45a5b2f4 ] Always initialize i_uid/i_gid inside the sysfs core so set_ownership() can safely skip setting them. Commit 5ec27ec735ba ("fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.") added defaults for i_uid/i_gid when set_ownership() was not implemented. It also missed adjusting net_ctl_set_ownership() to use the same default values in case the computation of a better value failed. Fixes: 5ec27ec735ba ("fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh Signed-off-by: Joel Granados Signed-off-by: Sasha Levin Signed-off-by: wangxin --- fs/proc/proc_sysctl.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index b6c9ea7f96b6..6e07ba586396 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -471,12 +471,10 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, make_empty_dir_inode(inode); } + inode->i_uid = GLOBAL_ROOT_UID; + inode->i_gid = GLOBAL_ROOT_GID; if (root->set_ownership) root->set_ownership(head, table, &inode->i_uid, &inode->i_gid); - else { - inode->i_uid = GLOBAL_ROOT_UID; - inode->i_gid = GLOBAL_ROOT_GID; - } return inode; } -- Gitee From b19d3e9233defa00a570b3d949359b04b13b8973 Mon Sep 17 00:00:00 2001 From: Taehee Yoo Date: Fri, 12 Jul 2024 09:51:16 +0000 Subject: [PATCH 3/9] xdp: fix invalid wait context of page_pool_destroy() stable inclusion from stable-5.10.224 commit be9d08ff102df3ac4f66e826ea935cf3af63a4bd category: bugfix issue: NA CVE: CVE-2024-43834 Signed-off-by: wangxin --------------------------------------- [ Upstream commit 59a931c5b732ca5fc2ca727f5a72aeabaafa85ec ] If the driver uses a page pool, it creates a page pool with page_pool_create(). The reference count of page pool is 1 as default. A page pool will be destroyed only when a reference count reaches 0. page_pool_destroy() is used to destroy page pool, it decreases a reference count. When a page pool is destroyed, ->disconnect() is called, which is mem_allocator_disconnect(). This function internally acquires mutex_lock(). If the driver uses XDP, it registers a memory model with xdp_rxq_info_reg_mem_model(). The xdp_rxq_info_reg_mem_model() internally increases a page pool reference count if a memory model is a page pool. Now the reference count is 2. To destroy a page pool, the driver should call both page_pool_destroy() and xdp_unreg_mem_model(). The xdp_unreg_mem_model() internally calls page_pool_destroy(). Only page_pool_destroy() decreases a reference count. If a driver calls page_pool_destroy() then xdp_unreg_mem_model(), we will face an invalid wait context warning. Because xdp_unreg_mem_model() calls page_pool_destroy() with rcu_read_lock(). The page_pool_destroy() internally acquires mutex_lock(). Splat looks like: ============================= [ BUG: Invalid wait context ] 6.10.0-rc6+ #4 Tainted: G W ----------------------------- ethtool/1806 is trying to lock: ffffffff90387b90 (mem_id_lock){+.+.}-{4:4}, at: mem_allocator_disconnect+0x73/0x150 other info that might help us debug this: context-{5:5} 3 locks held by ethtool/1806: stack backtrace: CPU: 0 PID: 1806 Comm: ethtool Tainted: G W 6.10.0-rc6+ #4 f916f41f172891c800f2fed Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 Call Trace: dump_stack_lvl+0x7e/0xc0 __lock_acquire+0x1681/0x4de0 ? _printk+0x64/0xe0 ? __pfx_mark_lock.part.0+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 lock_acquire+0x1b3/0x580 ? mem_allocator_disconnect+0x73/0x150 ? __wake_up_klogd.part.0+0x16/0xc0 ? __pfx_lock_acquire+0x10/0x10 ? dump_stack_lvl+0x91/0xc0 __mutex_lock+0x15c/0x1690 ? mem_allocator_disconnect+0x73/0x150 ? __pfx_prb_read_valid+0x10/0x10 ? mem_allocator_disconnect+0x73/0x150 ? __pfx_llist_add_batch+0x10/0x10 ? console_unlock+0x193/0x1b0 ? lockdep_hardirqs_on+0xbe/0x140 ? __pfx___mutex_lock+0x10/0x10 ? tick_nohz_tick_stopped+0x16/0x90 ? __irq_work_queue_local+0x1e5/0x330 ? irq_work_queue+0x39/0x50 ? __wake_up_klogd.part.0+0x79/0xc0 ? mem_allocator_disconnect+0x73/0x150 mem_allocator_disconnect+0x73/0x150 ? __pfx_mem_allocator_disconnect+0x10/0x10 ? mark_held_locks+0xa5/0xf0 ? rcu_is_watching+0x11/0xb0 page_pool_release+0x36e/0x6d0 page_pool_destroy+0xd7/0x440 xdp_unreg_mem_model+0x1a7/0x2a0 ? __pfx_xdp_unreg_mem_model+0x10/0x10 ? kfree+0x125/0x370 ? bnxt_free_ring.isra.0+0x2eb/0x500 ? bnxt_free_mem+0x5ac/0x2500 xdp_rxq_info_unreg+0x4a/0xd0 bnxt_free_mem+0x1356/0x2500 bnxt_close_nic+0xf0/0x3b0 ? __pfx_bnxt_close_nic+0x10/0x10 ? ethnl_parse_bit+0x2c6/0x6d0 ? __pfx___nla_validate_parse+0x10/0x10 ? __pfx_ethnl_parse_bit+0x10/0x10 bnxt_set_features+0x2a8/0x3e0 __netdev_update_features+0x4dc/0x1370 ? ethnl_parse_bitset+0x4ff/0x750 ? __pfx_ethnl_parse_bitset+0x10/0x10 ? __pfx___netdev_update_features+0x10/0x10 ? mark_held_locks+0xa5/0xf0 ? _raw_spin_unlock_irqrestore+0x42/0x70 ? __pm_runtime_resume+0x7d/0x110 ethnl_set_features+0x32d/0xa20 To fix this problem, it uses rhashtable_lookup_fast() instead of rhashtable_lookup() with rcu_read_lock(). Using xa without rcu_read_lock() here is safe. xa is freed by __xdp_mem_allocator_rcu_free() and this is called by call_rcu() of mem_xa_remove(). The mem_xa_remove() is called by page_pool_destroy() if a reference count reaches 0. The xa is already protected by the reference count mechanism well in the control plane. So removing rcu_read_lock() for page_pool_destroy() is safe. Fixes: c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.") Signed-off-by: Taehee Yoo Reviewed-by: Jakub Kicinski Link: https://patch.msgid.link/20240712095116.3801586-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin Signed-off-by: wangxin --- net/core/xdp.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/core/xdp.c b/net/core/xdp.c index fd98d6059007..b2ad644df21f 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -124,10 +124,8 @@ void xdp_unreg_mem_model(struct xdp_mem_info *mem) return; if (type == MEM_TYPE_PAGE_POOL) { - rcu_read_lock(); - xa = rhashtable_lookup(mem_id_ht, &id, mem_id_rht_params); + xa = rhashtable_lookup_fast(mem_id_ht, &id, mem_id_rht_params); page_pool_destroy(xa->page_pool); - rcu_read_unlock(); } } EXPORT_SYMBOL_GPL(xdp_unreg_mem_model); -- Gitee From 17aad647f413caa0010d96e5b0fa6d0c598eefcd Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Sat, 4 May 2024 18:25:33 +0200 Subject: [PATCH 4/9] leds: trigger: Unregister sysfs attributes before calling deactivate() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit stable inclusion from stable-5.10.224 commit 09c1583f0e10c918855d6e7540a79461a353e5d6 category: bugfix issue: NA CVE: CVE-2024-43830 Signed-off-by: wangxin --------------------------------------- [ Upstream commit c0dc9adf9474ecb7106e60e5472577375aedaed3 ] Triggers which have trigger specific sysfs attributes typically store related data in trigger-data allocated by the activate() callback and freed by the deactivate() callback. Calling device_remove_groups() after calling deactivate() leaves a window where the sysfs attributes show/store functions could be called after deactivation and then operate on the just freed trigger-data. Move the device_remove_groups() call to before deactivate() to close this race window. This also makes the deactivation path properly do things in reverse order of the activation path which calls the activate() callback before calling device_add_groups(). Fixes: a7e7a3156300 ("leds: triggers: add device attribute support") Cc: Uwe Kleine-König Signed-off-by: Hans de Goede Acked-by: Uwe Kleine-König Link: https://lore.kernel.org/r/20240504162533.76780-1-hdegoede@redhat.com Signed-off-by: Lee Jones Signed-off-by: Sasha Levin Signed-off-by: wangxin --- drivers/leds/led-triggers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c index 4e7b78a84149..cbe70f38cb57 100644 --- a/drivers/leds/led-triggers.c +++ b/drivers/leds/led-triggers.c @@ -177,9 +177,9 @@ int led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig) flags); cancel_work_sync(&led_cdev->set_brightness_work); led_stop_software_blink(led_cdev); + device_remove_groups(led_cdev->dev, led_cdev->trigger->groups); if (led_cdev->trigger->deactivate) led_cdev->trigger->deactivate(led_cdev); - device_remove_groups(led_cdev->dev, led_cdev->trigger->groups); led_cdev->trigger = NULL; led_cdev->trigger_data = NULL; led_cdev->activated = false; -- Gitee From 91b3f5dcaea7a423a8a4ba76375e59c89bb55f50 Mon Sep 17 00:00:00 2001 From: Zijun Hu Date: Thu, 30 May 2024 21:14:37 +0800 Subject: [PATCH 5/9] kobject_uevent: Fix OOB access within zap_modalias_env() stable inclusion from stable-5.10.224 commit 648d5490460d38436640da0812bf7f6351c150d2 category: bugfix issue: NA CVE: CVE-2024-42292 Signed-off-by: wangxin --------------------------------------- commit dd6e9894b451e7c85cceb8e9dc5432679a70e7dc upstream. zap_modalias_env() wrongly calculates size of memory block to move, so will cause OOB memory access issue if variable MODALIAS is not the last one within its @env parameter, fixed by correcting size to memmove. Fixes: 9b3fa47d4a76 ("kobject: fix suppressing modalias in uevents delivered over netlink") Cc: stable@vger.kernel.org Signed-off-by: Zijun Hu Reviewed-by: Lk Sii Link: https://lore.kernel.org/r/1717074877-11352-1-git-send-email-quic_zijuhu@quicinc.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: wangxin --- lib/kobject_uevent.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c index c87d5b6a8a55..38716b2eb671 100644 --- a/lib/kobject_uevent.c +++ b/lib/kobject_uevent.c @@ -432,8 +432,23 @@ static void zap_modalias_env(struct kobj_uevent_env *env) len = strlen(env->envp[i]) + 1; if (i != env->envp_idx - 1) { + /* @env->envp[] contains pointers to @env->buf[] + * with @env->buflen chars, and we are removing + * variable MODALIAS here pointed by @env->envp[i] + * with length @len as shown below: + * + * 0 @env->buf[] @env->buflen + * --------------------------------------------- + * ^ ^ ^ ^ + * | |-> @len <-| target block | + * @env->envp[0] @env->envp[i] @env->envp[i + 1] + * + * so the "target block" indicated above is moved + * backward by @len, and its right size is + * @env->buflen - (@env->envp[i + 1] - @env->envp[0]). + */ memmove(env->envp[i], env->envp[i + 1], - env->buflen - len); + env->buflen - (env->envp[i + 1] - env->envp[0])); for (j = i; j < env->envp_idx - 1; j++) env->envp[j] = env->envp[j + 1] - len; -- Gitee From 5b052a3ffaf8474205885b283c9be15c7e243f7e Mon Sep 17 00:00:00 2001 From: Baochen Qiang Date: Thu, 6 Jun 2024 10:06:53 +0800 Subject: [PATCH 6/9] wifi: cfg80211: handle 2x996 RU allocation in cfg80211_calculate_bitrate_he() stable inclusion from stable-5.10.224 commit 2e201b3d162c6c49417c438ffb30b58c9f85769f category: bugfix issue: NA CVE: CVE-2024-43879 Signed-off-by: wangxin --------------------------------------- [ Upstream commit bcbd771cd5d68c0c52567556097d75f9fc4e7cd6 ] Currently NL80211_RATE_INFO_HE_RU_ALLOC_2x996 is not handled in cfg80211_calculate_bitrate_he(), leading to below warning: kernel: invalid HE MCS: bw:6, ru:6 kernel: WARNING: CPU: 0 PID: 2312 at net/wireless/util.c:1501 cfg80211_calculate_bitrate_he+0x22b/0x270 [cfg80211] Fix it by handling 2x996 RU allocation in the same way as 160 MHz bandwidth. Fixes: c4cbaf7973a7 ("cfg80211: Add support for HE") Signed-off-by: Baochen Qiang Link: https://msgid.link/20240606020653.33205-3-quic_bqiang@quicinc.com Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin Signed-off-by: wangxin --- net/wireless/util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/wireless/util.c b/net/wireless/util.c index 4b32e85c2d9a..94096e3338f2 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -1354,7 +1354,9 @@ static u32 cfg80211_calculate_bitrate_he(struct rate_info *rate) if (WARN_ON_ONCE(rate->nss < 1 || rate->nss > 8)) return 0; - if (rate->bw == RATE_INFO_BW_160) + if (rate->bw == RATE_INFO_BW_160 || + (rate->bw == RATE_INFO_BW_HE_RU && + rate->he_ru_alloc == NL80211_RATE_INFO_HE_RU_ALLOC_2x996)) result = rates_160M[rate->he_gi]; else if (rate->bw == RATE_INFO_BW_80 || (rate->bw == RATE_INFO_BW_HE_RU && -- Gitee From 0845ced9814da5a153306cf384098e4a6e7418f9 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Tue, 2 Jul 2024 21:23:49 +0800 Subject: [PATCH 7/9] ext4: make sure the first directory block is not a hole stable inclusion from stable-5.10.224 commit de2a011a13a46468a6e8259db58b1b62071fe136 category: bugfix issue: NA CVE: CVE-2024-42304 Signed-off-by: wangxin --------------------------------------- commit f9ca51596bbfd0f9c386dd1c613c394c78d9e5e6 upstream. The syzbot constructs a directory that has no dirblock but is non-inline, i.e. the first directory block is a hole. And no errors are reported when creating files in this directory in the following flow. ext4_mknod ... ext4_add_entry // Read block 0 ext4_read_dirblock(dir, block, DIRENT) bh = ext4_bread(NULL, inode, block, 0) if (!bh && (type == INDEX || type == DIRENT_HTREE)) // The first directory block is a hole // But type == DIRENT, so no error is reported. After that, we get a directory block without '.' and '..' but with a valid dentry. This may cause some code that relies on dot or dotdot (such as make_indexed_dir()) to crash. Therefore when ext4_read_dirblock() finds that the first directory block is a hole report that the filesystem is corrupted and return an error to avoid loading corrupted data from disk causing something bad. Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Fixes: 4e19d6b65fb4 ("ext4: allow directory holes") Cc: stable@kernel.org Signed-off-by: Baokun Li Reviewed-by: Jan Kara Link: https://patch.msgid.link/20240702132349.2600605-3-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman Signed-off-by: wangxin --- fs/ext4/namei.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 765bafda2df4..756a3a176ba6 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -145,10 +145,11 @@ static struct buffer_head *__ext4_read_dirblock(struct inode *inode, return bh; } - if (!bh && (type == INDEX || type == DIRENT_HTREE)) { + /* The first directory block must not be a hole. */ + if (!bh && (type == INDEX || type == DIRENT_HTREE || block == 0)) { ext4_error_inode(inode, func, line, block, - "Directory hole found for htree %s block", - (type == INDEX) ? "index" : "leaf"); + "Directory hole found for htree %s block %u", + (type == INDEX) ? "index" : "leaf", block); return ERR_PTR(-EFSCORRUPTED); } if (!bh) @@ -2997,10 +2998,7 @@ bool ext4_empty_dir(struct inode *inode) EXT4_ERROR_INODE(inode, "invalid size"); return false; } - /* The first directory block must not be a hole, - * so treat it as DIRENT_HTREE - */ - bh = ext4_read_dirblock(inode, 0, DIRENT_HTREE); + bh = ext4_read_dirblock(inode, 0, EITHER); if (IS_ERR(bh)) return false; @@ -3637,10 +3635,7 @@ static struct buffer_head *ext4_get_first_dir_block(handle_t *handle, struct ext4_dir_entry_2 *de; unsigned int offset; - /* The first directory block must not be a hole, so - * treat it as DIRENT_HTREE - */ - bh = ext4_read_dirblock(inode, 0, DIRENT_HTREE); + bh = ext4_read_dirblock(inode, 0, EITHER); if (IS_ERR(bh)) { *retval = PTR_ERR(bh); return NULL; -- Gitee From 9794ce6917cfdac51ca72e4e5ca17b1d09721aae Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Thu, 6 Jun 2024 16:49:42 +0200 Subject: [PATCH 8/9] mlxsw: spectrum_acl_erp: Fix object nesting warning stable inclusion from stable-5.10.224 commit 36a9996e020dd5aa325e0ecc55eb2328288ea6bb category: bugfix issue: NA CVE: CVE-2024-43880 Signed-off-by: wangxin --------------------------------------- [ Upstream commit 97d833ceb27dc19f8777d63f90be4a27b5daeedf ] ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM (A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can contain more ACLs (i.e., tc filters), but the number of masks in each region (i.e., tc chain) is limited. In order to mitigate the effects of the above limitation, the device allows filters to share a single mask if their masks only differ in up to 8 consecutive bits. For example, dst_ip/25 can be represented using dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the number of masks being used (and therefore does not support mask aggregation), but can contain a limited number of filters. The driver uses the "objagg" library to perform the mask aggregation by passing it objects that consist of the filter's mask and whether the filter is to be inserted into the A-TCAM or the C-TCAM since filters in different TCAMs cannot share a mask. The set of created objects is dependent on the insertion order of the filters and is not necessarily optimal. Therefore, the driver will periodically ask the library to compute a more optimal set ("hints") by looking at all the existing objects. When the library asks the driver whether two objects can be aggregated the driver only compares the provided masks and ignores the A-TCAM / C-TCAM indication. This is the right thing to do since the goal is to move as many filters as possible to the A-TCAM. The driver also forbids two identical masks from being aggregated since this can only happen if one was intentionally put in the C-TCAM to avoid a conflict in the A-TCAM. The above can result in the following set of hints: H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta After getting the hints from the library the driver will start migrating filters from one region to another while consulting the computed hints and instructing the device to perform a lookup in both regions during the transition. Assuming a filter with mask X is being migrated into the A-TCAM in the new region, the hints lookup will return H1. Since H2 is the parent of H1, the library will try to find the object associated with it and create it if necessary in which case another hints lookup (recursive) will be performed. This hints lookup for {mask Y, A-TCAM} will either return H2 or H3 since the driver passes the library an object comparison function that ignores the A-TCAM / C-TCAM indication. This can eventually lead to nested objects which are not supported by the library [1]. Fix by removing the object comparison function from both the driver and the library as the driver was the only user. That way the lookup will only return exact matches. I do not have a reliable reproducer that can reproduce the issue in a timely manner, but before the fix the issue would reproduce in several minutes and with the fix it does not reproduce in over an hour. Note that the current usefulness of the hints is limited because they include the C-TCAM indication and represent aggregation that cannot actually happen. This will be addressed in net-next. [1] WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0 Modules linked in: CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42 Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0 [...] Call Trace: __objagg_obj_get+0x2bb/0x580 objagg_obj_get+0xe/0x80 mlxsw_sp_acl_erp_mask_get+0xb5/0xf0 mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0 mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270 mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510 process_one_work+0x151/0x370 Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation") Signed-off-by: Ido Schimmel Reviewed-by: Amit Cohen Tested-by: Alexander Zubkov Signed-off-by: Petr Machata Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: wangxin --- .../ethernet/mellanox/mlxsw/spectrum_acl_erp.c | 13 ------------- include/linux/objagg.h | 1 - lib/objagg.c | 15 --------------- 3 files changed, 29 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c index d231f4d2888b..9eee229303cc 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c @@ -1217,18 +1217,6 @@ static bool mlxsw_sp_acl_erp_delta_check(void *priv, const void *parent_obj, return err ? false : true; } -static int mlxsw_sp_acl_erp_hints_obj_cmp(const void *obj1, const void *obj2) -{ - const struct mlxsw_sp_acl_erp_key *key1 = obj1; - const struct mlxsw_sp_acl_erp_key *key2 = obj2; - - /* For hints purposes, two objects are considered equal - * in case the masks are the same. Does not matter what - * the "ctcam" value is. - */ - return memcmp(key1->mask, key2->mask, sizeof(key1->mask)); -} - static void *mlxsw_sp_acl_erp_delta_create(void *priv, void *parent_obj, void *obj) { @@ -1308,7 +1296,6 @@ static void mlxsw_sp_acl_erp_root_destroy(void *priv, void *root_priv) static const struct objagg_ops mlxsw_sp_acl_erp_objagg_ops = { .obj_size = sizeof(struct mlxsw_sp_acl_erp_key), .delta_check = mlxsw_sp_acl_erp_delta_check, - .hints_obj_cmp = mlxsw_sp_acl_erp_hints_obj_cmp, .delta_create = mlxsw_sp_acl_erp_delta_create, .delta_destroy = mlxsw_sp_acl_erp_delta_destroy, .root_create = mlxsw_sp_acl_erp_root_create, diff --git a/include/linux/objagg.h b/include/linux/objagg.h index 78021777df46..6df5b887dc54 100644 --- a/include/linux/objagg.h +++ b/include/linux/objagg.h @@ -8,7 +8,6 @@ struct objagg_ops { size_t obj_size; bool (*delta_check)(void *priv, const void *parent_obj, const void *obj); - int (*hints_obj_cmp)(const void *obj1, const void *obj2); void * (*delta_create)(void *priv, void *parent_obj, void *obj); void (*delta_destroy)(void *priv, void *delta_priv); void * (*root_create)(void *priv, void *obj, unsigned int root_id); diff --git a/lib/objagg.c b/lib/objagg.c index 5e1676ccdadd..6917d5974345 100644 --- a/lib/objagg.c +++ b/lib/objagg.c @@ -906,20 +906,6 @@ static const struct objagg_opt_algo *objagg_opt_algos[] = { [OBJAGG_OPT_ALGO_SIMPLE_GREEDY] = &objagg_opt_simple_greedy, }; -static int objagg_hints_obj_cmp(struct rhashtable_compare_arg *arg, - const void *obj) -{ - struct rhashtable *ht = arg->ht; - struct objagg_hints *objagg_hints = - container_of(ht, struct objagg_hints, node_ht); - const struct objagg_ops *ops = objagg_hints->ops; - const char *ptr = obj; - - ptr += ht->p.key_offset; - return ops->hints_obj_cmp ? ops->hints_obj_cmp(ptr, arg->key) : - memcmp(ptr, arg->key, ht->p.key_len); -} - /** * objagg_hints_get - obtains hints instance * @objagg: objagg instance @@ -958,7 +944,6 @@ struct objagg_hints *objagg_hints_get(struct objagg *objagg, offsetof(struct objagg_hints_node, obj); objagg_hints->ht_params.head_offset = offsetof(struct objagg_hints_node, ht_node); - objagg_hints->ht_params.obj_cmpfn = objagg_hints_obj_cmp; err = rhashtable_init(&objagg_hints->node_ht, &objagg_hints->ht_params); if (err) -- Gitee From 3ee7e8efec9a1f2428597244cc5e5c8e72a6fe63 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 31 Jul 2024 18:31:05 +0200 Subject: [PATCH 9/9] x86/mm: Fix pti_clone_pgtable() alignment assumption stable inclusion from stable-5.10.224 commit d00c9b4bbc442d99e1dafbdfdab848bc1ead73f6 category: bugfix issue: NA CVE: CVE-2024-44965 Signed-off-by: wangxin --------------------------------------- [ Upstream commit 41e71dbb0e0a0fe214545fe64af031303a08524c ] Guenter reported dodgy crashes on an i386-nosmp build using GCC-11 that had the form of endless traps until entry stack exhaust and then It turned out that pti_clone_pgtable() had alignment assumptions on the start address, notably it hard assumes start is PMD aligned. This is true on x86_64, but very much not true on i386. These assumptions can cause the end condition to malfunction, leading to a 'short' clone. Guess what happens when the user mapping has a short copy of the entry text? Use the correct increment form for addr to avoid alignment assumptions. Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit") Reported-by: Guenter Roeck Tested-by: Guenter Roeck Suggested-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20240731163105.GG33588@noisy.programming.kicks-ass.net Signed-off-by: Sasha Levin Signed-off-by: wangxin --- arch/x86/mm/pti.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 1aab92930569..59c13fdb8da0 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -374,14 +374,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end, */ *target_pmd = *pmd; - addr += PMD_SIZE; + addr = round_up(addr + 1, PMD_SIZE); } else if (level == PTI_CLONE_PTE) { /* Walk the page-table down to the pte level */ pte = pte_offset_kernel(pmd, addr); if (pte_none(*pte)) { - addr += PAGE_SIZE; + addr = round_up(addr + 1, PAGE_SIZE); continue; } @@ -401,7 +401,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end, /* Clone the PTE */ *target_pte = *pte; - addr += PAGE_SIZE; + addr = round_up(addr + 1, PAGE_SIZE); } else { BUG(); -- Gitee