From b7847cd34019463883fab9bcffb7583afd60fb93 Mon Sep 17 00:00:00 2001 From: meganz009 Date: Fri, 9 Jun 2023 16:56:36 +0800 Subject: [PATCH 1/5] ARM: enable irq in translation/section permission fault handlers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 02d7b4ff5ed37112bbc7cb2cd8ee97f43f8c8800 upstream. Probably happens on all ARM, with CONFIG_PREEMPT_RT_FULL CONFIG_DEBUG_ATOMIC_SLEEP This simple program.... int main() { *((char*)0xc0001000) = 0; }; [ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658 [ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a [ 512.743217] INFO: lockdep is turned off. [ 512.743360] irq event stamp: 0 [ 512.743482] hardirqs last enabled at (0): [< (null)>] (null) [ 512.743714] hardirqs last disabled at (0): [] copy_process+0x3b0/0x11c0 [ 512.744013] softirqs last enabled at (0): [] copy_process+0x3b0/0x11c0 [ 512.744303] softirqs last disabled at (0): [< (null)>] (null) [ 512.744631] [] (unwind_backtrace+0x0/0x104) [ 512.745001] [] (dump_stack+0x20/0x24) [ 512.745355] [] (__might_sleep+0x1dc/0x1e0) [ 512.745717] [] (rt_spin_lock+0x34/0x6c) [ 512.746073] [] (do_force_sig_info+0x34/0xf0) [ 512.746457] [] (force_sig_info+0x18/0x1c) [ 512.746829] [] (__do_user_fault+0x9c/0xd8) [ 512.747185] [] (do_bad_area+0x7c/0x94) [ 512.747536] [] (do_sect_fault+0x40/0x48) [ 512.747898] [] (do_DataAbort+0x40/0xa0) [ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8) Oxc0000000 belongs to kernel address space, user task can not be allowed to access it. For above condition, correct result is that test case should receive a “segment fault” and exits but not stacks. the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in prefetch/data abort handlers"),it deletes irq enable block in Data abort assemble code and move them into page/breakpiont/alignment fault handlers instead. But author does not enable irq in translation/section permission fault handlers. ARM disables irq when it enters exception/ interrupt mode, if kernel doesn't enable irq, it would be still disabled during translation/section permission fault. We see the above splat because do_force_sig_info is still called with IRQs off, and that code eventually does a: spin_lock_irqsave(&t->sighand->siglock, flags); As this is architecture independent code, and we've not seen any other need for other arch to have the siglock converted to raw lock, we can conclude that we should enable irq for ARM translation/section permission exception. Signed-off-by: Yadi.hu Signed-off-by: Sebastian Andrzej Siewior --- arch/arm/mm/fault.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index a9ee0d9dc740..20b0e146de98 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -439,6 +439,9 @@ do_translation_fault(unsigned long addr, unsigned int fsr, if (addr < TASK_SIZE) return do_page_fault(addr, fsr, regs); + if (interrupts_enabled(regs)) + local_irq_enable(); + if (user_mode(regs)) goto bad_area; @@ -506,6 +509,9 @@ do_translation_fault(unsigned long addr, unsigned int fsr, static int do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) { + if (interrupts_enabled(regs)) + local_irq_enable(); + do_bad_area(addr, fsr, regs); return 0; } -- Gitee From 124448dcb1ae9c66c70453a2a50876070450bcac Mon Sep 17 00:00:00 2001 From: meganz009 Date: Fri, 9 Jun 2023 16:57:57 +0800 Subject: [PATCH 2/5] genirq: update irq_set_irqchip_state documentation commit dd9707b2db096f5af75ee3207ac1ced6a8acec2d upstream. On -rt kernels, the use of migrate_disable()/migrate_enable() is sufficient to guarantee a task isn't moved to another CPU. Update the irq_set_irqchip_state() documentation to reflect this. Signed-off-by: Josh Cartwright Signed-off-by: Sebastian Andrzej Siewior --- kernel/irq/manage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 9acc2dd43bb4..f2d1dac5e581 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -2770,7 +2770,7 @@ EXPORT_SYMBOL_GPL(irq_get_irqchip_state); * This call sets the internal irqchip state of an interrupt, * depending on the value of @which. * - * This function should be called with preemption disabled if the + * This function should be called with migration disabled if the * interrupt controller has per-cpu registers. */ int irq_set_irqchip_state(unsigned int irq, enum irqchip_irq_state which, -- Gitee From f7e79433843f122277d58d83a4ed18ab3132d6e5 Mon Sep 17 00:00:00 2001 From: meganz009 Date: Fri, 9 Jun 2023 16:58:12 +0800 Subject: [PATCH 3/5] KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable() commit b697d50d198d98ced440f6f1474cb505ff3f70a8 upstream. kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating the vgic and timer states to prevent the calling task from migrating to another CPU. It does so to prevent the task from writing to the incorrect per-CPU GIC distributor registers. On -rt kernels, it's possible to maintain the same guarantee with the use of migrate_{disable,enable}(), with the added benefit that the migrate-disabled region is preemptible. Update kvm_arch_vcpu_ioctl_run() to do so. Cc: Christoffer Dall Reported-by: Manish Jaggi Signed-off-by: Josh Cartwright Signed-off-by: Sebastian Andrzej Siewior --- virt/kvm/arm/arm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index a118151c986c..8b8947c42b10 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -723,7 +723,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) * involves poking the GIC, which must be done in a * non-preemptible context. */ - preempt_disable(); + migrate_disable(); kvm_pmu_flush_hwstate(vcpu); @@ -772,7 +772,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) kvm_timer_sync_hwstate(vcpu); kvm_vgic_sync_hwstate(vcpu); local_irq_enable(); - preempt_enable(); + migrate_enable(); continue; } @@ -850,7 +850,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) /* Exit types that need handling before we can be preempted */ handle_exit_early(vcpu, run, ret); - preempt_enable(); + migrate_enable(); ret = handle_exit(vcpu, run, ret); } -- Gitee From abef5a9f522b128d0b2cc7e8c9487e36e22256d1 Mon Sep 17 00:00:00 2001 From: meganz009 Date: Fri, 9 Jun 2023 16:58:24 +0800 Subject: [PATCH 4/5] arm64: fpsimd: use preemp_disable in addition to local_bh_disable() commit 3fb1b55dbb6832214ae31cb9f5ed08b8360162e7 upstream. In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The code disables BH and expects that it is not preemptible. On -RT the task remains preemptible but remains the same CPU. This may corrupt the content of the SIMD registers if the task is preempted during saving/restoring those registers. Add preempt_disable()/enable() to enfore the required semantic on -RT. Signed-off-by: Sebastian Andrzej Siewior --- arch/arm64/kernel/fpsimd.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 58c53bc96928..71252cd8b594 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -159,6 +159,16 @@ static void sve_free(struct task_struct *task) __sve_free(task); } +static void *sve_free_atomic(struct task_struct *task) +{ + void *sve_state = task->thread.sve_state; + + WARN_ON(test_tsk_thread_flag(task, TIF_SVE)); + + task->thread.sve_state = NULL; + return sve_state; +} + /* * TIF_SVE controls whether a task can use SVE without trapping while * in userspace, and also the way a task's FPSIMD/SVE state is stored @@ -547,6 +557,7 @@ int sve_set_vector_length(struct task_struct *task, * non-SVE thread. */ if (task == current) { + preempt_disable(); local_bh_disable(); fpsimd_save(); @@ -557,8 +568,10 @@ int sve_set_vector_length(struct task_struct *task, if (test_and_clear_tsk_thread_flag(task, TIF_SVE)) sve_to_fpsimd(task); - if (task == current) + if (task == current) { local_bh_enable(); + preempt_enable(); + } /* * Force reallocation of task SVE state to the correct size @@ -813,6 +826,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs) sve_alloc(current); + preempt_disable(); local_bh_disable(); fpsimd_save(); @@ -826,6 +840,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs) WARN_ON(1); /* SVE access shouldn't have trapped */ local_bh_enable(); + preempt_enable(); } /* @@ -892,10 +907,12 @@ void fpsimd_thread_switch(struct task_struct *next) void fpsimd_flush_thread(void) { int vl, supported_vl; + void *mem = NULL; if (!system_supports_fpsimd()) return; + preempt_disable(); local_bh_disable(); memset(¤t->thread.uw.fpsimd_state, 0, @@ -904,7 +921,7 @@ void fpsimd_flush_thread(void) if (system_supports_sve()) { clear_thread_flag(TIF_SVE); - sve_free(current); + mem = sve_free_atomic(current); /* * Reset the task vector length as required. @@ -940,6 +957,8 @@ void fpsimd_flush_thread(void) set_thread_flag(TIF_FOREIGN_FPSTATE); local_bh_enable(); + preempt_enable(); + kfree(mem); } /* @@ -951,9 +970,11 @@ void fpsimd_preserve_current_state(void) if (!system_supports_fpsimd()) return; + preempt_disable(); local_bh_disable(); fpsimd_save(); local_bh_enable(); + preempt_enable(); } /* @@ -1011,6 +1032,7 @@ void fpsimd_restore_current_state(void) if (!system_supports_fpsimd()) return; + preempt_disable(); local_bh_disable(); if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) { @@ -1019,6 +1041,7 @@ void fpsimd_restore_current_state(void) } local_bh_enable(); + preempt_enable(); } /* @@ -1031,6 +1054,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state) if (!system_supports_fpsimd()) return; + preempt_disable(); local_bh_disable(); current->thread.uw.fpsimd_state = *state; @@ -1043,6 +1067,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state) clear_thread_flag(TIF_FOREIGN_FPSTATE); local_bh_enable(); + preempt_enable(); } /* @@ -1088,6 +1113,7 @@ void kernel_neon_begin(void) BUG_ON(!may_use_simd()); + preempt_disable(); local_bh_disable(); __this_cpu_write(kernel_neon_busy, true); @@ -1101,6 +1127,7 @@ void kernel_neon_begin(void) preempt_disable(); local_bh_enable(); + preempt_enable(); } EXPORT_SYMBOL(kernel_neon_begin); -- Gitee From faf734afca252b9c70f91facffa64c2699b56f39 Mon Sep 17 00:00:00 2001 From: meganz009 Date: Fri, 9 Jun 2023 16:58:52 +0800 Subject: [PATCH 5/5] sysfs: Add /sys/kernel/realtime entry commit 1c3cb84e399e50f312537ee0399deefc5557d1ea upstream. Add a /sys/kernel entry to indicate that the kernel is a realtime kernel. Clark says that he needs this for udev rules, udev needs to evaluate if its a PREEMPT_RT kernel a few thousand times and parsing uname output is too slow or so. Are there better solutions? Should it exist and return 0 on !-rt? Signed-off-by: Clark Williams Signed-off-by: Peter Zijlstra --- kernel/ksysfs.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c index 46ba853656f6..9a23632b6294 100644 --- a/kernel/ksysfs.c +++ b/kernel/ksysfs.c @@ -140,6 +140,15 @@ KERNEL_ATTR_RO(vmcoreinfo); #endif /* CONFIG_CRASH_CORE */ +#if defined(CONFIG_PREEMPT_RT_FULL) +static ssize_t realtime_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", 1); +} +KERNEL_ATTR_RO(realtime); +#endif + /* whether file capabilities are enabled */ static ssize_t fscaps_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -230,6 +239,9 @@ static struct attribute * kernel_attrs[] = { #ifndef CONFIG_TINY_RCU &rcu_expedited_attr.attr, &rcu_normal_attr.attr, +#endif +#ifdef CONFIG_PREEMPT_RT_FULL + &realtime_attr.attr, #endif NULL }; -- Gitee