From 038fa384f07a8026607828080fef01b11366406c Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Wed, 7 Jun 2023 15:54:33 +0800 Subject: [PATCH 1/2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform commit 7134335fd1ce44cd451be438e268de77a3ae4319 anolis. ANBZ: #26501 commit 233756a640be811efae33763db718fe29753b1e9 upstream. There were reports again that the tsc clocksource on 4 sockets x86 servers was wrongly judged as 'unstable' by 'jiffies' and other watchdogs, and disabled [1][2]. Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC on qualified platorms") was introduce to deal with these false alarms of tsc unstable issues, covering qualified platforms for 2 sockets or smaller ones. And from history of chasing TSC issues, Thomas and Peter only saw real TSC synchronization issue on 8 socket machines. So extend the exemption to 4 sockets to fix the issue. Rui also proposed another way to disable 'jiffies' as clocksource watchdog [3], which can also solve problem in [1]. in an architecture independent way, but can't cure the problem in [2]. whose watchdog is HPET or PMTIMER, while 'jiffies' is mostly used as watchdog in boot phase. 'nr_online_nodes' has known inaccurate problem for cases like platform with cpu-less memory nodes, sub numa cluster enabled, fakenuma, kernel cmdline parameter 'maxcpus=', etc. The harmful case is the 'maxcpus' one which could possibly under estimates the package number, and disable the watchdog, but bright side is it is mostly for debug usage. All these will be addressed in other patches, as discussed in thread [4]. [1]. https://lore.kernel.org/all/9d3bf570-3108-0336-9c52-9bee15767d29@huawei.com/ [2]. https://lore.kernel.org/lkml/06df410c-2177-4671-832f-339cff05b1d9@paulmck-laptop/ [3]. https://lore.kernel.org/all/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/ [4]. https://lore.kernel.org/all/20221021062131.1826810-1-feng.tang@intel.com/ Hygon-SIG: commit 233756a640be upstream x86/tsc: Extend watchdog check exemption to 4-Sockets platform Hygon-SIG: commit 7134335fd1ce x86/tsc: Extend watchdog check exemption to 4-Sockets platform Backport x86/tsc adjust from anolis. Reported-by: Yu Liao Reported-by: Paul E. McKenney Signed-off-by: Feng Tang Signed-off-by: Paul E. McKenney [ Wenhui Fan: amend commit log ] Signed-off-by: Wenhui Fan Cc: hygon-arch@list.openanolis.cn Reviewed-by: Xiaochen Shen Reviewed-by: Guanghui Feng Reviewed-by: Guixin Liu Link: https://gitee.com/anolis/cloud-kernel/pulls/5911 [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi --- arch/x86/kernel/tsc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 73be982e85f4..a5a5652b9440 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1200,7 +1200,7 @@ static void __init check_system_tsc_reliable(void) if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && boot_cpu_has(X86_FEATURE_TSC_ADJUST) && - nr_online_nodes <= 2) + nr_online_nodes <= 4) tsc_disable_clocksource_watchdog(); } -- Gitee From 94176e285b75296609e4666d97bee2ab221b66ae Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Mon, 29 Jul 2024 10:12:02 +0800 Subject: [PATCH 2/2] x86/tsc: Use topology_max_packages() to get package number commit fbdf127a436c1a0d3de181d2bd84da1367ef70dd anolis. ANBZ: #26501 commit b4bac279319d3082eb42f074799c7b18ba528c71 upstream. Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC on qualified platorms") was introduced to solve problem that sometimes TSC clocksource is wrongly judged as unstable by watchdog like 'jiffies', HPET, etc. In it, the hardware package number is a key factor for judging whether to disable the watchdog for TSC, and 'nr_online_nodes' was chosen due to, at that time (kernel v5.1x), it is available in early boot phase before registering 'tsc-early' clocksource, where all non-boot CPUs are not brought up yet. Dave and Rui pointed out there are many cases in which 'nr_online_nodes' is cheated and not accurate, like: * SNC (sub-numa cluster) mode enabled * numa emulation (numa=fake=8 etc.) * numa=off * platforms with CPU-less HBM nodes, CPU-less Optane memory nodes. * 'maxcpus=' cmdline setup, where chopped CPUs could be onlined later * 'nr_cpus=', 'possible_cpus=' cmdline setup, where chopped CPUs can not be onlined after boot The SNC case is the most user-visible case, as many CSP (Cloud Service Provider) enable this feature in their server fleets. When SNC3 enabled, a 2 socket machine will appear to have 6 NUMA nodes, and get impacted by the issue in reality. Thomas' recent patchset of refactoring x86 topology code improves topology_max_packages() greatly, by making it more accurate and available in early boot phase, which works well in most of the above cases. The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which may under-estimate the package number. As during topology setup, the boot CPU iterates through all enumerated APIC IDs and either accepts or rejects the APIC ID. For accepted IDs, it figures out which bits of the ID map to the package number. It tracks which package numbers have been seen in a bitmap. topology_max_packages() just returns the number of bits set in that bitmap. 'nr_cpus=' and 'possible_cpus=' can cause more APIC IDs to be rejected and can artificially lower the number of bits in the package bitmap and thus topology_max_packages(). This means that, for example, a system with 8 physical packages might reject all the CPUs on 6 of those packages and be left with only 2 packages and 2 bits set in the package bitmap. It needs the TSC watchdog, but would disable it anyway. This isn't ideal, but it only happens for debug-oriented options. This is fixable by tracking the package numbers for rejected CPUs. But it's not worth the trouble for debugging. So use topology_max_packages() to replace nr_online_nodes(). Hygon-SIG: commit b4bac279319d upstream x86/tsc: Use topology_max_packages() to get package number. Hygon-SIG: commit fbdf127a436c x86/tsc: Use topology_max_packages() to get package number Backport x86/tsc adjust from anolis. Reported-by: Dave Hansen Signed-off-by: Feng Tang Signed-off-by: Thomas Gleixner Reviewed-by: Waiman Long [ Wenhui Fan: amend commit log ] Signed-off-by: Wenhui Fan Cc: hygon-arch@list.openanolis.cn Link: https://lore.kernel.org/all/20240729021202.180955-1-feng.tang@intel.com Closes: https://lore.kernel.org/lkml/a4860054-0f16-6513-f121-501048431086@intel.com/ Reviewed-by: Xiaochen Shen Reviewed-by: Guanghui Feng Reviewed-by: Guixin Liu Link: https://gitee.com/anolis/cloud-kernel/pulls/5911 [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi --- arch/x86/kernel/tsc.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a5a5652b9440..68163264aec6 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -27,6 +27,7 @@ #include #include #include +#include #include unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not used here */ @@ -1192,15 +1193,12 @@ static void __init check_system_tsc_reliable(void) * - TSC which does not stop in C-States * - the TSC_ADJUST register which allows to detect even minimal * modifications - * - not more than two sockets. As the number of sockets cannot be - * evaluated at the early boot stage where this has to be - * invoked, check the number of online memory nodes as a - * fallback solution which is an reasonable estimate. + * - not more than four packages */ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && boot_cpu_has(X86_FEATURE_TSC_ADJUST) && - nr_online_nodes <= 4) + topology_max_packages() <= 4) tsc_disable_clocksource_watchdog(); } -- Gitee