From 12aa2a30b4bb6688353a67976791413f7fe3dfb0 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Fri, 4 Oct 2024 19:36:43 +0000 Subject: [PATCH 01/13] f2fs: Require FMODE_WRITE for atomic write ioctls stable inclusion from stable-v5.10.227 commit 000bab8753ae29a259feb339b99ee759795a48ac category: bugfix issue: #IB422D CVE: CVE-2024-47740 Signed-off-by: May27May --------------------------------------- commit 4f5a100f87f32cb65d4bb1ad282a08c92f6f591e upstream. The F2FS ioctls for starting and committing atomic writes check for inode_owner_or_capable(), but this does not give LSMs like SELinux or Landlock an opportunity to deny the write access - if the caller's FSUID matches the inode's UID, inode_owner_or_capable() immediately returns true. There are scenarios where LSMs want to deny a process the ability to write particular files, even files that the FSUID of the process owns; but this can currently partially be bypassed using atomic write ioctls in two ways: - F2FS_IOC_START_ATOMIC_REPLACE + F2FS_IOC_COMMIT_ATOMIC_WRITE can truncate an inode to size 0 - F2FS_IOC_START_ATOMIC_WRITE + F2FS_IOC_ABORT_ATOMIC_WRITE can revert changes another process concurrently made to a file Fix it by requiring FMODE_WRITE for these operations, just like for F2FS_IOC_MOVE_RANGE. Since any legitimate caller should only be using these ioctls when intending to write into the file, that seems unlikely to break anything. Fixes: 88b88a667971 ("f2fs: support atomic writes") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn Reviewed-by: Chao Yu Reviewed-by: Eric Biggers Signed-off-by: Jaegeuk Kim Signed-off-by: Eric Biggers Signed-off-by: Sasha Levin Signed-off-by: May27May --- fs/f2fs/file.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index b479cdd85d8b..baf39c7ee331 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -2025,6 +2025,9 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) struct f2fs_sb_info *sbi = F2FS_I_SB(inode); int ret; + if (!(filp->f_mode & FMODE_WRITE)) + return -EBADF; + if (!inode_owner_or_capable(inode)) return -EACCES; @@ -2095,6 +2098,9 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) struct inode *inode = file_inode(filp); int ret; + if (!(filp->f_mode & FMODE_WRITE)) + return -EBADF; + if (!inode_owner_or_capable(inode)) return -EACCES; @@ -2137,6 +2143,9 @@ static int f2fs_ioc_start_volatile_write(struct file *filp) struct inode *inode = file_inode(filp); int ret; + if (!(filp->f_mode & FMODE_WRITE)) + return -EBADF; + if (!inode_owner_or_capable(inode)) return -EACCES; @@ -2172,6 +2181,9 @@ static int f2fs_ioc_release_volatile_write(struct file *filp) struct inode *inode = file_inode(filp); int ret; + if (!(filp->f_mode & FMODE_WRITE)) + return -EBADF; + if (!inode_owner_or_capable(inode)) return -EACCES; @@ -2201,6 +2213,9 @@ static int f2fs_ioc_abort_volatile_write(struct file *filp) struct inode *inode = file_inode(filp); int ret; + if (!(filp->f_mode & FMODE_WRITE)) + return -EBADF; + if (!inode_owner_or_capable(inode)) return -EACCES; -- Gitee From 840f27aaff4bc081fac790201a97fe0573539875 Mon Sep 17 00:00:00 2001 From: VanGiang Nguyen Date: Fri, 9 Aug 2024 06:21:42 +0000 Subject: [PATCH 02/13] padata: use integer wrap around to prevent deadlock on seq_nr overflow mainline inclusion from mainline-v6.12-rc1 commit 9a22b2812393d93d84358a760c347c21939029a6 category: bugfix issue: #IB422D CVE: CVE-2024-47739 Signed-off-by: May27May --------------------------------------- When submitting more than 2^32 padata objects to padata_do_serial, the current sorting implementation incorrectly sorts padata objects with overflowed seq_nr, causing them to be placed before existing objects in the reorder list. This leads to a deadlock in the serialization process as padata_find_next cannot match padata->seq_nr and pd->processed because the padata instance with overflowed seq_nr will be selected next. To fix this, we use an unsigned integer wrap around to correctly sort padata objects in scenarios with integer overflow. Fixes: bfde23ce200e ("padata: unbind parallel jobs from specific CPUs") Cc: Co-developed-by: Christian Gafert Signed-off-by: Christian Gafert Co-developed-by: Max Ferger Signed-off-by: Max Ferger Signed-off-by: Van Giang Nguyen Acked-by: Daniel Jordan Signed-off-by: Herbert Xu Signed-off-by: May27May --- kernel/padata.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/padata.c b/kernel/padata.c index e199f6bb5a85..4dfc8a616023 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -409,7 +409,8 @@ void padata_do_serial(struct padata_priv *padata) /* Sort in ascending order of sequence number. */ list_for_each_prev(pos, &reorder->list) { cur = list_entry(pos, struct padata_priv, list); - if (cur->seq_nr < padata->seq_nr) + /* Compare by difference to consider integer wrap around */ + if ((signed int)(cur->seq_nr - padata->seq_nr) < 0) break; } list_add(&padata->list, pos); -- Gitee From a947d24f83b5ea6ee00547108611a235a15b9adc Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 6 Sep 2024 15:44:49 +0000 Subject: [PATCH 03/13] sock_map: Add a cond_resched() in sock_hash_free() mainline inclusion from mainline-v6.12-rc1 commit b1339be951ad31947ae19bc25cb08769bf255100 category: bugfix issue: #IB422D CVE: CVE-2024-47710 Signed-off-by: May27May --------------------------------------- Several syzbot soft lockup reports all have in common sock_hash_free() If a map with a large number of buckets is destroyed, we need to yield the cpu when needed. Fixes: 75e68e5bf2c7 ("bpf, sockhash: Synchronize delete from bucket list on map free") Reported-by: syzbot Signed-off-by: Eric Dumazet Signed-off-by: Daniel Borkmann Acked-by: Martin KaFai Lau Acked-by: John Fastabend Link: https://lore.kernel.org/bpf/20240906154449.3742932-1-edumazet@google.com Signed-off-by: May27May --- net/core/sock_map.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index f8c287788bea..deeb1a94c23e 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1215,6 +1215,7 @@ static void sock_hash_free(struct bpf_map *map) sock_put(elem->sk); sock_hash_free_elem(htab, elem); } + cond_resched(); } /* wait for psock readers accessing its map link */ -- Gitee From e1b86ede505f34e1accc97680c16de69cc5c4776 Mon Sep 17 00:00:00 2001 From: Dmitry Antipov Date: Fri, 6 Sep 2024 15:31:51 +0300 Subject: [PATCH 04/13] wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop() mainline inclusion from mainline-v6.12-rc1 commit 9d301de12da6e1bb069a9835c38359b8e8135121 category: bugfix issue: #IB422D CVE: CVE-2024-47713 Signed-off-by: May27May --------------------------------------- Since '__dev_queue_xmit()' should be called with interrupts enabled, the following backtrace: ieee80211_do_stop() ... spin_lock_irqsave(&local->queue_stop_reason_lock, flags) ... ieee80211_free_txskb() ieee80211_report_used_skb() ieee80211_report_ack_skb() cfg80211_mgmt_tx_status_ext() nl80211_frame_tx_status() genlmsg_multicast_netns() genlmsg_multicast_netns_filtered() nlmsg_multicast_filtered() netlink_broadcast_filtered() do_one_broadcast() netlink_broadcast_deliver() __netlink_sendskb() netlink_deliver_tap() __netlink_deliver_tap_skb() dev_queue_xmit() __dev_queue_xmit() ; with IRQS disabled ... spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags) issues the warning (as reported by syzbot reproducer): WARNING: CPU: 2 PID: 5128 at kernel/softirq.c:362 __local_bh_enable_ip+0xc3/0x120 Fix this by implementing a two-phase skb reclamation in 'ieee80211_do_stop()', where actual work is performed outside of a section with interrupts disabled. Fixes: 5061b0c2b906 ("mac80211: cooperate more with network namespaces") Reported-by: syzbot+1a3986bbd3169c307819@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1a3986bbd3169c307819 Signed-off-by: Dmitry Antipov Link: https://patch.msgid.link/20240906123151.351647-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg Signed-off-by: May27May --- net/mac80211/iface.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index 3a15ef8dd322..7b3387c4ba5d 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -369,6 +369,7 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata, { struct ieee80211_local *local = sdata->local; unsigned long flags; + struct sk_buff_head freeq; struct sk_buff *skb, *tmp; u32 hw_reconf_flags = 0; int i, flushed; @@ -564,18 +565,32 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata, skb_queue_purge(&sdata->skb_queue); } + /* + * Since ieee80211_free_txskb() may issue __dev_queue_xmit() + * which should be called with interrupts enabled, reclamation + * is done in two phases: + */ + __skb_queue_head_init(&freeq); + + /* unlink from local queues... */ spin_lock_irqsave(&local->queue_stop_reason_lock, flags); for (i = 0; i < IEEE80211_MAX_QUEUES; i++) { skb_queue_walk_safe(&local->pending[i], skb, tmp) { struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); if (info->control.vif == &sdata->vif) { __skb_unlink(skb, &local->pending[i]); - ieee80211_free_txskb(&local->hw, skb); + __skb_queue_tail(&freeq, skb); } } } spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags); + /* ... and perform actual reclamation with interrupts enabled. */ + skb_queue_walk_safe(&freeq, skb, tmp) { + __skb_unlink(skb, &freeq); + ieee80211_free_txskb(&local->hw, skb); + } + if (sdata->vif.type == NL80211_IFTYPE_AP_VLAN) ieee80211_txq_remove_vlan(local, sdata); -- Gitee From c681bfe70dfc702798edb615cb5982ba5b8612a7 Mon Sep 17 00:00:00 2001 From: Yanjun Zhang Date: Tue, 1 Oct 2024 16:39:30 +0800 Subject: [PATCH 05/13] NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies() mainline inclusion from mainline-v6.12-rc3 commit a848c29e3486189aaabd5663bc11aea50c5bd144 category: bugfix issue: #IB422D CVE: CVE-2024-50046 Signed-off-by: May27May --------------------------------------- On the node of an NFS client, some files saved in the mountpoint of the NFS server were copied to another location of the same NFS server. Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference crash with the following syslog: [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [232066.588586] Mem abort info: [232066.588701] ESR = 0x0000000096000007 [232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits [232066.589084] SET = 0, FnV = 0 [232066.589216] EA = 0, S1PTW = 0 [232066.589340] FSC = 0x07: level 3 translation fault [232066.589559] Data abort info: [232066.589683] ISV = 0, ISS = 0x00000007 [232066.589842] CM = 0, WnR = 0 [232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400 [232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000 [232066.590757] Internal error: Oops: 96000007 [#1] SMP [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2 [232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1 [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06 [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4] [232066.598595] sp : ffff8000f568fc70 [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000 [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001 [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050 [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000 [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000 [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6 [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828 [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058 [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000 [232066.601636] Call trace: [232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4] [232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4] [232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4] [232066.602690] kthread+0x110/0x114 [232066.602830] ret_from_fork+0x10/0x20 [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00) [232066.603284] SMP: stopping secondary CPUs [232066.606936] Starting crashdump kernel... [232066.607146] Bye! Analysing the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added by the field copies in handle_async_copy(), and we found a waiting copy process with the stack as: PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp" #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4 #1 [ffff8001116ef760] __schedule at ffff800008dd0650 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4] #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4] #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4] #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4] The NULL-pointer dereference was due to nfs42_complete_copies() listed the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and the data accessed through this pointer was also incorrect. Generally, the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state(). When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED and copies are not deleted in nfs_server->ss_copies, the source state may be passed to the nfs42_complete_copies() process earlier, resulting in this crash scene finally. To solve this issue, we add a list_head nfs_server->ss_src_copies for a server-to-server copy specially. Fixes: 0e65a32c8a56 ("NFS: handle source server reboot") Signed-off-by: Yanjun Zhang Reviewed-by: Trond Myklebust Signed-off-by: Anna Schumaker Signed-off-by: May27May --- fs/nfs/client.c | 1 + fs/nfs/nfs42proc.c | 2 +- fs/nfs/nfs4state.c | 2 +- include/linux/nfs_fs_sb.h | 1 + 4 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 70b4793bee1a..9f17be528dcd 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -927,6 +927,7 @@ struct nfs_server *nfs_alloc_server(void) INIT_LIST_HEAD(&server->layouts); INIT_LIST_HEAD(&server->state_owners_lru); INIT_LIST_HEAD(&server->ss_copies); + INIT_LIST_HEAD(&server->ss_src_copies); atomic_set(&server->active, 0); diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index dad32b171e67..307f5c08ccad 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -210,7 +210,7 @@ static int handle_async_copy(struct nfs42_copy_res *res, if (dst_server != src_server) { spin_lock(&src_server->nfs_client->cl_lock); - list_add_tail(©->src_copies, &src_server->ss_copies); + list_add_tail(©->src_copies, &src_server->ss_src_copies); spin_unlock(&src_server->nfs_client->cl_lock); } diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index ff6ca05a9d44..693fe352c343 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1589,7 +1589,7 @@ static void nfs42_complete_copies(struct nfs4_state_owner *sp, struct nfs4_state complete(©->completion); } } - list_for_each_entry(copy, &sp->so_server->ss_copies, src_copies) { + list_for_each_entry(copy, &sp->so_server->ss_src_copies, src_copies) { if ((test_bit(NFS_CLNT_SRC_SSC_COPY_STATE, &state->flags) && !nfs4_stateid_match_other(&state->stateid, ©->parent_src_state->stateid))) diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index 38e60ec742df..cb2cb4d9c013 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -230,6 +230,7 @@ struct nfs_server { struct list_head layouts; struct list_head delegations; struct list_head ss_copies; + struct list_head ss_src_copies; unsigned long mig_gen; unsigned long mig_status; -- Gitee From df20ad78ac4a64231b90cf25729667cd1942833e Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Mon, 30 Sep 2024 13:26:21 -0400 Subject: [PATCH 06/13] Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_change mainline inclusion from mainline-v6.12-rc3 commit 08d1914293dae38350b8088980e59fbc699a72fe category: bugfix issue: #IB422D CVE: CVE-2024-50044 Signed-off-by: May27May --------------------------------------- rfcomm_sk_state_change attempts to use sock_lock so it must never be called with it locked but rfcomm_sock_ioctl always attempt to lock it causing the following trace: ====================================================== WARNING: possible circular locking dependency detected 6.8.0-syzkaller-08951-gfe46a7dd189e #0 Not tainted ------------------------------------------------------ syz-executor386/5093 is trying to acquire lock: ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1671 [inline] ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: rfcomm_sk_state_change+0x5b/0x310 net/bluetooth/rfcomm/sock.c:73 but task is already holding lock: ffff88807badfd28 (&d->lock){+.+.}-{3:3}, at: __rfcomm_dlc_close+0x226/0x6a0 net/bluetooth/rfcomm/core.c:491 Reported-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com Tested-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d7ce59b06b3eb14fd218 Fixes: 3241ad820dbb ("[Bluetooth] Add timestamp support to L2CAP, RFCOMM and SCO") Signed-off-by: Luiz Augusto von Dentz Signed-off-by: May27May --- net/bluetooth/rfcomm/sock.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c index 82f404d3eba2..82fb8db7f92e 100644 --- a/net/bluetooth/rfcomm/sock.c +++ b/net/bluetooth/rfcomm/sock.c @@ -867,9 +867,7 @@ static int rfcomm_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned lon if (err == -ENOIOCTLCMD) { #ifdef CONFIG_BT_RFCOMM_TTY - lock_sock(sk); err = rfcomm_dev_ioctl(sk, cmd, (void __user *) arg); - release_sock(sk); #else err = -EOPNOTSUPP; #endif -- Gitee From 4d79ac6de190ec7ca00c2fac669a30779c9db012 Mon Sep 17 00:00:00 2001 From: Jonathan McDowell Date: Fri, 16 Aug 2024 12:55:46 +0100 Subject: [PATCH 07/13] tpm: Clean up TPM space after command failure mainline inclusion from mainline-v6.12-rc1 commit e3aaebcbb7c6b403416f442d1de70d437ce313a7 category: bugfix issue: #IB422D CVE: CVE-2024-49851 Signed-off-by: May27May --------------------------------------- tpm_dev_transmit prepares the TPM space before attempting command transmission. However if the command fails no rollback of this preparation is done. This can result in transient handles being leaked if the device is subsequently closed with no further commands performed. Fix this by flushing the space in the event of command transmission failure. Fixes: 745b361e989a ("tpm: infrastructure for TPM spaces") Signed-off-by: Jonathan McDowell Reviewed-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen Signed-off-by: May27May --- drivers/char/tpm/tpm-dev-common.c | 2 ++ drivers/char/tpm/tpm2-space.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c index b99e1941c52c..fde81ecbd6a3 100644 --- a/drivers/char/tpm/tpm-dev-common.c +++ b/drivers/char/tpm/tpm-dev-common.c @@ -48,6 +48,8 @@ static ssize_t tpm_dev_transmit(struct tpm_chip *chip, struct tpm_space *space, if (!ret) ret = tpm2_commit_space(chip, space, buf, &len); + else + tpm2_flush_space(chip); out_rc: return ret ? ret : len; diff --git a/drivers/char/tpm/tpm2-space.c b/drivers/char/tpm/tpm2-space.c index ffb35f0154c1..c57404c6b98c 100644 --- a/drivers/char/tpm/tpm2-space.c +++ b/drivers/char/tpm/tpm2-space.c @@ -166,6 +166,9 @@ void tpm2_flush_space(struct tpm_chip *chip) struct tpm_space *space = &chip->work_space; int i; + if (!space) + return; + for (i = 0; i < ARRAY_SIZE(space->context_tbl); i++) if (space->context_tbl[i] && ~space->context_tbl[i]) tpm2_flush_context(chip, space->context_tbl[i]); -- Gitee From 52939f2c2363b402b26bffecb1f94fccee0f0195 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Thu, 22 Aug 2024 10:35:24 +0800 Subject: [PATCH 08/13] ext4: avoid use-after-free in ext4_ext_show_leaf() mainline inclusion from mainline-v6.12-rc1 commit 4e2524ba2ca5f54bdbb9e5153bea00421ef653f5 category: bugfix issue: #IB422D CVE: CVE-2024-49889 Signed-off-by: May27May --------------------------------------- In ext4_find_extent(), path may be freed by error or be reallocated, so using a previously saved *ppath may have been freed and thus may trigger use-after-free, as follows: ext4_split_extent path = *ppath; ext4_split_extent_at(ppath) path = ext4_find_extent(ppath) ext4_split_extent_at(ppath) // ext4_find_extent fails to free path // but zeroout succeeds ext4_ext_show_leaf(inode, path) eh = path[depth].p_hdr // path use-after-free !!! Similar to ext4_split_extent_at(), we use *ppath directly as an input to ext4_ext_show_leaf(). Fix a spelling error by the way. Same problem in ext4_ext_handle_unwritten_extents(). Since 'path' is only used in ext4_ext_show_leaf(), remove 'path' and use *ppath directly. This issue is triggered only when EXT_DEBUG is defined and therefore does not affect functionality. Signed-off-by: Baokun Li Reviewed-by: Jan Kara Reviewed-by: Ojaswin Mujoo Tested-by: Ojaswin Mujoo Link: https://patch.msgid.link/20240822023545.1994557-5-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: May27May --- fs/ext4/extents.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 729c27cba12b..0b35e8a89065 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3317,7 +3317,7 @@ static int ext4_split_extent_at(handle_t *handle, } /* - * ext4_split_extents() splits an extent and mark extent which is covered + * ext4_split_extent() splits an extent and mark extent which is covered * by @map as split_flags indicates * * It may result in splitting the extent into multiple extents (up to three) @@ -3394,7 +3394,7 @@ static int ext4_split_extent(handle_t *handle, goto out; } - ext4_ext_show_leaf(inode, path); + ext4_ext_show_leaf(inode, *ppath); out: return err ? err : allocated; } @@ -3859,14 +3859,13 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode, struct ext4_ext_path **ppath, int flags, unsigned int allocated, ext4_fsblk_t newblock) { - struct ext4_ext_path __maybe_unused *path = *ppath; int ret = 0; int err = 0; ext_debug(inode, "logical block %llu, max_blocks %u, flags 0x%x, allocated %u\n", (unsigned long long)map->m_lblk, map->m_len, flags, allocated); - ext4_ext_show_leaf(inode, path); + ext4_ext_show_leaf(inode, *ppath); /* * When writing into unwritten space, we should not fail to @@ -3963,7 +3962,7 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode, if (allocated > map->m_len) allocated = map->m_len; map->m_len = allocated; - ext4_ext_show_leaf(inode, path); + ext4_ext_show_leaf(inode, *ppath); out2: return err ? err : allocated; } -- Gitee From d63da57fe3c6883831f242f104bb862294d72416 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Thu, 22 Aug 2024 10:35:23 +0800 Subject: [PATCH 09/13] ext4: fix slab-use-after-free in ext4_split_extent_at() mainline inclusion from mainline-v6.12-rc1 commit c26ab35702f8cd0cdc78f96aa5856bfb77be798f category: bugfix issue: #IB422D CVE: CVE-2024-49884 Signed-off-by: May27May --------------------------------------- We hit the following use-after-free: ================================================================== BUG: KASAN: slab-use-after-free in ext4_split_extent_at+0xba8/0xcc0 Read of size 2 at addr ffff88810548ed08 by task kworker/u20:0/40 CPU: 0 PID: 40 Comm: kworker/u20:0 Not tainted 6.9.0-dirty #724 Call Trace: kasan_report+0x93/0xc0 ext4_split_extent_at+0xba8/0xcc0 ext4_split_extent.isra.0+0x18f/0x500 ext4_split_convert_extents+0x275/0x750 ext4_ext_handle_unwritten_extents+0x73e/0x1580 ext4_ext_map_blocks+0xe20/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] Allocated by task 40: __kmalloc_noprof+0x1ac/0x480 ext4_find_extent+0xf3b/0x1e70 ext4_ext_map_blocks+0x188/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] Freed by task 40: kfree+0xf1/0x2b0 ext4_find_extent+0xa71/0x1e70 ext4_ext_insert_extent+0xa22/0x3260 ext4_split_extent_at+0x3ef/0xcc0 ext4_split_extent.isra.0+0x18f/0x500 ext4_split_convert_extents+0x275/0x750 ext4_ext_handle_unwritten_extents+0x73e/0x1580 ext4_ext_map_blocks+0xe20/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] ================================================================== The flow of issue triggering is as follows: ext4_split_extent_at path = *ppath ext4_ext_insert_extent(ppath) ext4_ext_create_new_leaf(ppath) ext4_find_extent(orig_path) path = *orig_path read_extent_tree_block // return -ENOMEM or -EIO ext4_free_ext_path(path) kfree(path) *orig_path = NULL a. If err is -ENOMEM: ext4_ext_dirty(path + path->p_depth) // path use-after-free !!! b. If err is -EIO and we have EXT_DEBUG defined: ext4_ext_show_leaf(path) eh = path[depth].p_hdr // path also use-after-free !!! So when trying to zeroout or fix the extent length, call ext4_find_extent() to update the path. In addition we use *ppath directly as an ext4_ext_show_leaf() input to avoid possible use-after-free when EXT_DEBUG is defined, and to avoid unnecessary path updates. Fixes: dfe5080939ea ("ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling code") Cc: stable@kernel.org Signed-off-by: Baokun Li Reviewed-by: Jan Kara Reviewed-by: Ojaswin Mujoo Tested-by: Ojaswin Mujoo Link: https://patch.msgid.link/20240822023545.1994557-4-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: May27May --- fs/ext4/extents.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 0b35e8a89065..a46b1e7abf6d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3260,6 +3260,25 @@ static int ext4_split_extent_at(handle_t *handle, if (err != -ENOSPC && err != -EDQUOT) goto out; + /* + * Update path is required because previous ext4_ext_insert_extent() + * may have freed or reallocated the path. Using EXT4_EX_NOFAIL + * guarantees that ext4_find_extent() will not return -ENOMEM, + * otherwise -ENOMEM will cause a retry in do_writepages(), and a + * WARN_ON may be triggered in ext4_da_update_reserve_space() due to + * an incorrect ee_len causing the i_reserved_data_blocks exception. + */ + path = ext4_find_extent(inode, ee_block, ppath, + flags | EXT4_EX_NOFAIL); + if (IS_ERR(path)) { + EXT4_ERROR_INODE(inode, "Failed split extent on %u, err %ld", + split, PTR_ERR(path)); + return PTR_ERR(path); + } + depth = ext_depth(inode); + ex = path[depth].p_ext; + *ppath = path; + if (EXT4_EXT_MAY_ZEROOUT & split_flag) { if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { if (split_flag & EXT4_EXT_DATA_VALID1) { @@ -3312,7 +3331,7 @@ static int ext4_split_extent_at(handle_t *handle, ext4_ext_dirty(handle, inode, path + path->p_depth); return err; out: - ext4_ext_show_leaf(inode, path); + ext4_ext_show_leaf(inode, *ppath); return err; } -- Gitee From f54b52dd2dce029517dbe8d8fb139e3e370b83d5 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Thu, 22 Aug 2024 10:35:26 +0800 Subject: [PATCH 10/13] ext4: aovid use-after-free in ext4_ext_insert_extent() mainline inclusion from mainline-v6.12-rc1 commit a164f3a432aae62ca23d03e6d926b122ee5b860d category: bugfix issue: #IB422D CVE: CVE-2024-49883 Signed-off-by: May27May --------------------------------------- As Ojaswin mentioned in Link, in ext4_ext_insert_extent(), if the path is reallocated in ext4_ext_create_new_leaf(), we'll use the stale path and cause UAF. Below is a sample trace with dummy values: ext4_ext_insert_extent path = *ppath = 2000 ext4_ext_create_new_leaf(ppath) ext4_find_extent(ppath) path = *ppath = 2000 if (depth > path[0].p_maxdepth) kfree(path = 2000); *ppath = path = NULL; path = kcalloc() = 3000 *ppath = 3000; return path; /* here path is still 2000, UAF! */ eh = path[depth].p_hdr ================================================================== BUG: KASAN: slab-use-after-free in ext4_ext_insert_extent+0x26d4/0x3330 Read of size 8 at addr ffff8881027bf7d0 by task kworker/u36:1/179 CPU: 3 UID: 0 PID: 179 Comm: kworker/u6:1 Not tainted 6.11.0-rc2-dirty #866 Call Trace: ext4_ext_insert_extent+0x26d4/0x3330 ext4_ext_map_blocks+0xe22/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 [...] Allocated by task 179: ext4_find_extent+0x81c/0x1f70 ext4_ext_map_blocks+0x146/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 ext4_writepages+0x26d/0x4e0 do_writepages+0x175/0x700 [...] Freed by task 179: kfree+0xcb/0x240 ext4_find_extent+0x7c0/0x1f70 ext4_ext_insert_extent+0xa26/0x3330 ext4_ext_map_blocks+0xe22/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 ext4_writepages+0x26d/0x4e0 do_writepages+0x175/0x700 [...] ================================================================== So use *ppath to update the path to avoid the above problem. Reported-by: Ojaswin Mujoo Closes: https://lore.kernel.org/r/ZqyL6rmtwl6N4MWR@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com Fixes: 10809df84a4d ("ext4: teach ext4_ext_find_extent() to realloc path if necessary") Cc: stable@kernel.org Signed-off-by: Baokun Li Reviewed-by: Jan Kara Link: https://patch.msgid.link/20240822023545.1994557-7-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: May27May --- fs/ext4/extents.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index a46b1e7abf6d..fab33df857ec 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2106,6 +2106,7 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode, ppath, newext); if (err) goto cleanup; + path = *ppath; depth = ext_depth(inode); eh = path[depth].p_hdr; -- Gitee From 3623253eed58ae6200102abf3cfe87c89ad932d9 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Thu, 22 Aug 2024 10:35:28 +0800 Subject: [PATCH 11/13] ext4: fix double brelse() the buffer of the extents path mainline inclusion from mainline-v6.12-rc1 commit dcaa6c31134c0f515600111c38ed7750003e1b9c category: bugfix issue: #IB422D CVE: CVE-2024-49882 Signed-off-by: May27May --------------------------------------- In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been released, otherwise it may be released twice. An example of what triggers this is as follows: split2 map split1 |--------|-------|--------| ext4_ext_map_blocks ext4_ext_handle_unwritten_extents ext4_split_convert_extents // path->p_depth == 0 ext4_split_extent // 1. do split1 ext4_split_extent_at |ext4_ext_insert_extent | ext4_ext_create_new_leaf | ext4_ext_grow_indepth | le16_add_cpu(&neh->eh_depth, 1) | ext4_find_extent | // return -ENOMEM |// get error and try zeroout |path = ext4_find_extent | path->p_depth = 1 |ext4_ext_try_to_merge | ext4_ext_try_to_merge_up | path->p_depth = 0 | brelse(path[1].p_bh) ---> not set to NULL here |// zeroout success // 2. update path ext4_find_extent // 3. do split2 ext4_split_extent_at ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth le16_add_cpu(&neh->eh_depth, 1) ext4_find_extent path[0].p_bh = NULL; path->p_depth = 1 read_extent_tree_block ---> return err // path[1].p_bh is still the old value ext4_free_ext_path ext4_ext_drop_refs // path->p_depth == 1 brelse(path[1].p_bh) ---> brelse a buffer twice Finally got the following WARRNING when removing the buffer from lru: ============================================ VFS: brelse: Trying to free free buffer WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90 CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716 RIP: 0010:__brelse+0x58/0x90 Call Trace: __find_get_block+0x6e7/0x810 bdev_getblk+0x2b/0x480 __ext4_get_inode_loc+0x48a/0x1240 ext4_get_inode_loc+0xb2/0x150 ext4_reserve_inode_write+0xb7/0x230 __ext4_mark_inode_dirty+0x144/0x6a0 ext4_ext_insert_extent+0x9c8/0x3230 ext4_ext_map_blocks+0xf45/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] ============================================ Fixes: ecb94f5fdf4b ("ext4: collapse a single extent tree block into the inode if possible") Cc: stable@kernel.org Signed-off-by: Baokun Li Reviewed-by: Jan Kara Reviewed-by: Ojaswin Mujoo Tested-by: Ojaswin Mujoo Link: https://patch.msgid.link/20240822023545.1994557-9-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: May27May --- fs/ext4/extents.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index fab33df857ec..f8f969c03c46 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1878,6 +1878,7 @@ static void ext4_ext_try_to_merge_up(handle_t *handle, path[0].p_hdr->eh_max = cpu_to_le16(max_root); brelse(path[1].p_bh); + path[1].p_bh = NULL; ext4_free_blocks(handle, inode, NULL, blk, 1, EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); } -- Gitee From 046c38967f305d8627d7071161ad54ef7b966597 Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Thu, 22 Aug 2024 10:35:25 +0800 Subject: [PATCH 12/13] ext4: update orig_path in ext4_find_extent() mainline inclusion from mainline-v6.12-rc1 commit 5b4b2dcace35f618fe361a87bae6f0d13af31bc1 category: bugfix issue: #IB422D CVE: CVE-2024-49881 Signed-off-by: May27May --------------------------------------- In ext4_find_extent(), if the path is not big enough, we free it and set *orig_path to NULL. But after reallocating and successfully initializing the path, we don't update *orig_path, in which case the caller gets a valid path but a NULL ppath, and this may cause a NULL pointer dereference or a path memory leak. For example: ext4_split_extent path = *ppath = 2000 ext4_find_extent if (depth > path[0].p_maxdepth) kfree(path = 2000); *orig_path = path = NULL; path = kcalloc() = 3000 ext4_split_extent_at(*ppath = NULL) path = *ppath; ex = path[depth].p_ext; // NULL pointer dereference! ================================================================== BUG: kernel NULL pointer dereference, address: 0000000000000010 CPU: 6 UID: 0 PID: 576 Comm: fsstress Not tainted 6.11.0-rc2-dirty #847 RIP: 0010:ext4_split_extent_at+0x6d/0x560 Call Trace: ext4_split_extent.isra.0+0xcb/0x1b0 ext4_ext_convert_to_initialized+0x168/0x6c0 ext4_ext_handle_unwritten_extents+0x325/0x4d0 ext4_ext_map_blocks+0x520/0xdb0 ext4_map_blocks+0x2b0/0x690 ext4_iomap_begin+0x20e/0x2c0 [...] ================================================================== Therefore, *orig_path is updated when the extent lookup succeeds, so that the caller can safely use path or *ppath. Fixes: 10809df84a4d ("ext4: teach ext4_ext_find_extent() to realloc path if necessary") Cc: stable@kernel.org Signed-off-by: Baokun Li Reviewed-by: Jan Kara Link: https://patch.msgid.link/20240822023545.1994557-6-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: May27May --- fs/ext4/extents.c | 3 ++- fs/ext4/move_extent.c | 1 - 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index f8f969c03c46..ae2623a2b205 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -960,6 +960,8 @@ ext4_find_extent(struct inode *inode, ext4_lblk_t block, ext4_ext_show_path(inode, path); + if (orig_path) + *orig_path = path; return path; err: @@ -3279,7 +3281,6 @@ static int ext4_split_extent_at(handle_t *handle, } depth = ext_depth(inode); ex = path[depth].p_ext; - *ppath = path; if (EXT4_EXT_MAY_ZEROOUT & split_flag) { if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c index f8dd5d972c33..661a8544d781 100644 --- a/fs/ext4/move_extent.c +++ b/fs/ext4/move_extent.c @@ -36,7 +36,6 @@ get_ext_path(struct inode *inode, ext4_lblk_t lblock, *ppath = NULL; return -ENODATA; } - *ppath = path; return 0; } -- Gitee From fe4d2f76b8ff477484604673198bc50adba3ebe6 Mon Sep 17 00:00:00 2001 From: Omar Sandoval Date: Tue, 15 Oct 2024 10:59:46 -0700 Subject: [PATCH 13/13] blk-rq-qos: fix crash on rq_qos_wait vs. rq_qos_wake_function race mainline inclusion from mainline-v6.12-rc4 commit e972b08b91ef48488bae9789f03cfedb148667fb category: bugfix issue: #IB422D CVE: CVE-2024-50082 Signed-off-by: May27May --------------------------------------- We're seeing crashes from rq_qos_wake_function that look like this: BUG: unable to handle page fault for address: ffffafe180a40084 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 100000067 P4D 100000067 PUD 10027c067 PMD 10115d067 PTE 0 Oops: Oops: 0002 [#1] PREEMPT SMP PTI CPU: 17 UID: 0 PID: 0 Comm: swapper/17 Not tainted 6.12.0-rc3-00013-geca631b8fe80 #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:_raw_spin_lock_irqsave+0x1d/0x40 Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 54 9c 41 5c fa 65 ff 05 62 97 30 4c 31 c0 ba 01 00 00 00 0f b1 17 75 0a 4c 89 e0 41 5c c3 cc cc cc cc 89 c6 e8 2c 0b 00 RSP: 0018:ffffafe180580ca0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffffafe180a3f7a8 RCX: 0000000000000011 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffafe180a40084 RBP: 0000000000000000 R08: 00000000001e7240 R09: 0000000000000011 R10: 0000000000000028 R11: 0000000000000888 R12: 0000000000000002 R13: ffffafe180a40084 R14: 0000000000000000 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff9aaf1f280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffafe180a40084 CR3: 000000010e428002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: try_to_wake_up+0x5a/0x6a0 rq_qos_wake_function+0x71/0x80 __wake_up_common+0x75/0xa0 __wake_up+0x36/0x60 scale_up.part.0+0x50/0x110 wb_timer_fn+0x227/0x450 ... So rq_qos_wake_function() calls wake_up_process(data->task), which calls try_to_wake_up(), which faults in raw_spin_lock_irqsave(&p->pi_lock). p comes from data->task, and data comes from the waitqueue entry, which is stored on the waiter's stack in rq_qos_wait(). Analyzing the core dump with drgn, I found that the waiter had already woken up and moved on to a completely unrelated code path, clobbering what was previously data->task. Meanwhile, the waker was passing the clobbered garbage in data->task to wake_up_process(), leading to the crash. What's happening is that in between rq_qos_wake_function() deleting the waitqueue entry and calling wake_up_process(), rq_qos_wait() is finding that it already got a token and returning. The race looks like this: rq_qos_wait() rq_qos_wake_function() ============================================================== prepare_to_wait_exclusive() data->got_token = true; list_del_init(&curr->entry); if (data.got_token) break; finish_wait(&rqw->wait, &data.wq); ^- returns immediately because list_empty_careful(&wq_entry->entry) is true ... return, go do something else ... wake_up_process(data->task) (NO LONGER VALID!)-^ Normally, finish_wait() is supposed to synchronize against the waker. But, as noted above, it is returning immediately because the waitqueue entry has already been removed from the waitqueue. The bug is that rq_qos_wake_function() is accessing the waitqueue entry AFTER deleting it. Note that autoremove_wake_function() wakes the waiter and THEN deletes the waitqueue entry, which is the proper order. Fix it by swapping the order. We also need to use list_del_init_careful() to match the list_empty_careful() in finish_wait(). Fixes: 38cfb5a45ee0 ("blk-wbt: improve waking of tasks") Cc: stable@vger.kernel.org Signed-off-by: Omar Sandoval Acked-by: Tejun Heo Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/d3bee2463a67b1ee597211823bf7ad3721c26e41.1729014591.git.osandov@fb.com Signed-off-by: Jens Axboe Signed-off-by: May27May --- block/blk-rq-qos.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c index e83af7bc7591..26dd3e7bd00d 100644 --- a/block/blk-rq-qos.c +++ b/block/blk-rq-qos.c @@ -225,8 +225,8 @@ static int rq_qos_wake_function(struct wait_queue_entry *curr, data->got_token = true; smp_wmb(); - list_del_init(&curr->entry); wake_up_process(data->task); + list_del_init_careful(&curr->entry); return 1; } -- Gitee