From 503d047d651501bf0ec8bfcd1fc822bca8f3336d Mon Sep 17 00:00:00 2001 From: yjmstr Date: Tue, 5 Sep 2023 21:13:50 +0800 Subject: [PATCH 1/5] initial commit for article 5:linux --- .../20230901-riscv-isa-discovery-5-linux.md | 814 ++++++++++++++++++ 1 file changed, 814 insertions(+) create mode 100644 articles/20230901-riscv-isa-discovery-5-linux.md diff --git a/articles/20230901-riscv-isa-discovery-5-linux.md b/articles/20230901-riscv-isa-discovery-5-linux.md new file mode 100644 index 0000000..4ebad4e --- /dev/null +++ b/articles/20230901-riscv-isa-discovery-5-linux.md @@ -0,0 +1,814 @@ +> Author: YJMSTR [jay1273062855@outlook.com](mailto:jay1273062855@outlook.com)
+> Date: 2023/08/29
+> Revisor: Bin Meng, Falcon
+> Project: [RISC-V Linux 内核剖析](https://gitee.com/tinylab/riscv-linux)
+> Sponsor: ISCAS + +# Linux RISC-V ISA 扩展支持 + +## 前言 + +本文是 RISC-V 扩展软硬件支持系列的第 5 篇文章,将介绍 Linux 内核对 RISC-V 扩展的检测与支持方式。建议在阅读本文之前先阅读本系列第一篇文章:[《RISC-V 当前指令集扩展类别与检测方式》][003] 以对 RISC-V ISA 扩展目前的命名、分类与硬件检测方式有所了解。 + +RISC-V 之前使用名为 misa 的 CSR 来检测 ISA 扩展,在 misa 中分配了 26 位,每一位用于标识某扩展/特权模式是否启用,但随着扩展数目增加,misa 的位数不够了。此时 RISC-V 引入了一个名为 mconfigptr 的 CSR,该 CSR 中存放有一个地址,指向包含硬件信息的数据结构,固件可以利用这个数据结构生成 SMBIOS/设备树/ACPI。 + +SMBIOS 中存放的 ISA 信息仅包含 misa 中的那些,还是不够用。设备树通过一个 ISA string 来表示 ISA 扩展组合,ACPI 则是包含有一个 RHCT (RISC-V Hart Capabilities Table),其中同样包含了一个 ISA string 结点。 + +## 获取 Linux 源码 + +Linux 内核较大,使用 `git fetch` 可以断点续传,以免网络出问题导致 `git clone` 中断。 + +```sh +$ mkdir linux-kernel +$ cd linux-kernel +$ git init +$ git fetch https://gitee.com/mirrors/linux_old1.git +$ git checkout FETCH_HEAD +$ git remote add origin https://gitee.com/mirrors/linux_old1.git +$ git pull origin +``` + +`git checkout` 的输出表明:HEAD is now at 2dde18cd1d8f Linux 6.5,本文将基于这一版本进行分析。如果我们想要获取特定版本的 Linux 内核源码,可以从 [kernel.org][001] 上下载,也可以在 `git pull origin` 之后通过 `git checkout` 切换到指定版本。 + +## 编译内核并启动 + +编译内核需要用到 RISC-V 交叉编译工具链,本系列之前的文章已经介绍过,此处不再赘述: + +```sh +$ make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- defconfig +$ make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- -j $(nproc) +``` + +嵌入式领域通常使用 busybox 来构建根文件系统,首先下载编译 busybox: + +```sh +$ git clone https://gitee.com/mirrors/busyboxsource +$ cd busyboxsource +$ export CROSS_COMPILE=riscv64-linux-gnu- +$ make defconfig +$ make menuconfig +# 这里启用了 Settings-->Build Options 里的 Build static binary (no shared libs) 选项 +$ make -j $(nproc) +$ make install +``` + +制作文件系统并新建一个启动脚本: + +```sh +$ cd ~ +$ qemu-img create rootfs.img 1g +$ mkfs.ext4 rootfs.img +$ mkdir rootfs +$ sudo mount -o loop rootfs.img rootfs +$ cd rootfs +$ sudo cp -r ../busyboxsource/_install/* . +$ sudo mkdir proc sys dev etc etc/init.d +$ cd etc/init.d/ +$ sudo touch rcS +$ sudo vi rcS +``` + +编辑启动脚本 rcS 中的内容如下: + +```sh +#!/bin/sh +mount -t proc none /proc +mount -t sysfs none /sys +/sbin/mdev -s +``` + +并修改文件权限: + +```sh +$ sudo chmod +x rcS +$ cd ~ +$ sudo umount rootfs +``` + +随后尝试直接引导内核: + +```sh +$ qemu-system-riscv64 -M virt -m 256M -nographic -kernel linux-kernel/arch/riscv/boot/Image -drive file=rootfs.img,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -append "root=/dev/vda rw console=ttyS0" +``` + +可以得到 Linux 启动日志如下: + +```sh +$ qemu-system-riscv64 -M virt -m 256M -nographic -kernel linux-kernel/arch/riscv/boot/Image -drive file=rootfs.img,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -append "root=/dev/vda rw console=ttyS0" + +OpenSBI v1.3.1 + ____ _____ ____ _____ + / __ \ / ____| _ \_ _| + | | | |_ __ ___ _ __ | (___ | |_) || | + | | | | '_ \ / _ \ '_ \ \___ \| _ < | | + | |__| | |_) | __/ | | |____) | |_) || |_ + \____/| .__/ \___|_| |_|_____/|___/_____| + | | + |_| + +Platform Name : riscv-virtio,qemu +Platform Features : medeleg +Platform HART Count : 1 +Platform IPI Device : aclint-mswi +Platform Timer Device : aclint-mtimer @ 10000000Hz +Platform Console Device : uart8250 +Platform HSM Device : --- +Platform PMU Device : --- +Platform Reboot Device : sifive_test +Platform Shutdown Device : sifive_test +Platform Suspend Device : --- +Platform CPPC Device : --- +Firmware Base : 0x80000000 +Firmware Size : 194 KB +Firmware RW Offset : 0x20000 +Firmware RW Size : 66 KB +Firmware Heap Offset : 0x28000 +Firmware Heap Size : 34 KB (total), 2 KB (reserved), 9 KB (used), 22 KB (free) +Firmware Scratch Size : 4096 B (total), 760 B (used), 3336 B (free) +Runtime SBI Version : 1.0 + +Domain0 Name : root +Domain0 Boot HART : 0 +Domain0 HARTs : 0* +Domain0 Region00 : 0x0000000002000000-0x000000000200ffff M: (I,R,W) S/U: () +Domain0 Region01 : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: () +Domain0 Region02 : 0x0000000080020000-0x000000008003ffff M: (R,W) S/U: () +Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: (R,W,X) S/U: (R,W,X) +Domain0 Next Address : 0x0000000080200000 +Domain0 Next Arg1 : 0x000000008fe00000 +Domain0 Next Mode : S-mode +Domain0 SysReset : yes +Domain0 SysSuspend : yes + +Boot HART ID : 0 +Boot HART Domain : root +Boot HART Priv Version : v1.12 +Boot HART Base ISA : rv64imafdch +Boot HART ISA Extensions : time,sstc +Boot HART PMP Count : 16 +Boot HART PMP Granularity : 4 +Boot HART PMP Address Bits: 54 +Boot HART MHPM Count : 16 +Boot HART MIDELEG : 0x0000000000001666 +Boot HART MEDELEG : 0x0000000000f0b509 +[ 0.000000] Linux version 6.5.0 (mint@linux-lab-host) (riscv64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1 SMP Wed Aug 30 17:20:35 CST 2023 +[ 0.000000] random: crng init done +[ 0.000000] Machine model: riscv-virtio,qemu +[ 0.000000] SBI specification v1.0 detected +[ 0.000000] SBI implementation ID=0x1 Version=0x10003 +[ 0.000000] SBI TIME extension detected +[ 0.000000] SBI IPI extension detected +[ 0.000000] SBI RFENCE extension detected +[ 0.000000] SBI SRST extension detected +[ 0.000000] efi: UEFI not found. +[ 0.000000] OF: reserved mem: 0x0000000080000000..0x000000008001ffff (128 KiB) nomap non-reusable mmode_resv0@80000000 +[ 0.000000] OF: reserved mem: 0x0000000080020000..0x000000008003ffff (128 KiB) nomap non-reusable mmode_resv1@80020000 +[ 0.000000] Zone ranges: +[ 0.000000] DMA32 [mem 0x0000000080000000-0x000000008fffffff] +[ 0.000000] Normal empty +[ 0.000000] Movable zone start for each node +[ 0.000000] Early memory node ranges +[ 0.000000] node 0: [mem 0x0000000080000000-0x000000008003ffff] +[ 0.000000] node 0: [mem 0x0000000080040000-0x000000008fffffff] +[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000008fffffff] +[ 0.000000] SBI HSM extension detected +[ 0.000000] riscv: base ISA extensions acdfhim +[ 0.000000] riscv: ELF capabilities acdfim +[ 0.000000] percpu: Embedded 19 pages/cpu s40888 r8192 d28744 u77824 +[ 0.000000] Kernel command line: root=/dev/vda rw console=ttyS0 +[ 0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes, linear) +[ 0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) +[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64512 +[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off +[ 0.000000] Virtual kernel memory layout: +[ 0.000000] fixmap : 0xff1bfffffea00000 - 0xff1bffffff000000 (6144 kB) +[ 0.000000] pci io : 0xff1bffffff000000 - 0xff1c000000000000 ( 16 MB) +[ 0.000000] vmemmap : 0xff1c000000000000 - 0xff20000000000000 (1024 TB) +[ 0.000000] vmalloc : 0xff20000000000000 - 0xff60000000000000 (16384 TB) +[ 0.000000] modules : 0xffffffff0157b000 - 0xffffffff80000000 (2026 MB) +[ 0.000000] lowmem : 0xff60000000000000 - 0xff60000010000000 ( 256 MB) +[ 0.000000] kernel : 0xffffffff80000000 - 0xffffffffffffffff (2047 MB) +[ 0.000000] Memory: 218332K/262144K available (8728K kernel code, 4974K rwdata, 4096K rodata, 2200K init, 482K bss, 43812K reserved, 0K cma-reserved) +[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 +[ 0.000000] rcu: Hierarchical RCU implementation. +[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=1. +[ 0.000000] rcu: RCU debug extended QS entry/exit. +[ 0.000000] Tracing variant of Tasks RCU enabled. +[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. +[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1 +[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 +[ 0.000000] riscv-intc: 64 local interrupts mapped +[ 0.000000] plic: plic@c000000: mapped 95 interrupts with 1 handlers for 2 contexts. +[ 0.000000] riscv: providing IPIs using SBI IPI extension +[ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention. +[ 0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns +[ 0.000073] sched_clock: 64 bits at 10MHz, resolution 100ns, wraps every 4398046511100ns +[ 0.000182] riscv-timer: Timer interrupt in S-mode is available via sstc extension +[ 0.008216] Console: colour dummy device 80x25 +[ 0.009505] Calibrating delay loop (skipped), value calculated using timer frequency.. 20.00 BogoMIPS (lpj=40000) +[ 0.009635] pid_max: default: 32768 minimum: 301 +[ 0.010714] LSM: initializing lsm=capability,integrity +[ 0.012870] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear) +[ 0.012948] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear) +[ 0.040252] RCU Tasks Trace: Setting shift to 0 and lim to 1 rcu_task_cb_adjust=1. +[ 0.040655] riscv: ELF compat mode supported +[ 0.041099] ASID allocator using 16 bits (65536 entries) +[ 0.042073] rcu: Hierarchical SRCU implementation. +[ 0.042107] rcu: Max phase no-delay instances is 1000. +[ 0.044384] EFI services will not be available. +[ 0.045645] smp: Bringing up secondary CPUs ... +[ 0.046625] smp: Brought up 1 node, 1 CPU +[ 0.057137] devtmpfs: initialized +[ 0.064078] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns +[ 0.064309] futex hash table entries: 256 (order: 2, 16384 bytes, linear) +[ 0.066112] pinctrl core: initialized pinctrl subsystem +[ 0.072724] NET: Registered PF_NETLINK/PF_ROUTE protocol family +[ 0.080456] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations +[ 0.080750] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations +[ 0.081016] audit: initializing netlink subsys (disabled) +[ 0.084218] thermal_sys: Registered thermal governor 'step_wise' +[ 0.084758] cpuidle: using governor menu +[ 0.085806] audit: type=2000 audit(0.040:1): state=initialized audit_enabled=0 res=1 +[ 0.099344] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages +[ 0.099384] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page +[ 0.103034] ACPI: Interpreter disabled. +[ 0.104629] iommu: Default domain type: Translated +[ 0.104662] iommu: DMA domain TLB invalidation policy: strict mode +[ 0.106649] SCSI subsystem initialized +[ 0.108322] usbcore: registered new interface driver usbfs +[ 0.108532] usbcore: registered new interface driver hub +[ 0.108678] usbcore: registered new device driver usb +[ 0.119431] vgaarb: loaded +[ 0.139122] clocksource: Switched to clocksource riscv_clocksource +[ 0.141072] pnp: PnP ACPI: disabled +[ 0.158856] NET: Registered PF_INET protocol family +[ 0.159756] IP idents hash table entries: 4096 (order: 3, 32768 bytes, linear) +[ 0.164351] tcp_listen_portaddr_hash hash table entries: 128 (order: 0, 4096 bytes, linear) +[ 0.164450] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear) +[ 0.164501] TCP established hash table entries: 2048 (order: 2, 16384 bytes, linear) +[ 0.164727] TCP bind hash table entries: 2048 (order: 5, 131072 bytes, linear) +[ 0.164988] TCP: Hash tables configured (established 2048 bind 2048) +[ 0.165942] UDP hash table entries: 256 (order: 2, 24576 bytes, linear) +[ 0.166364] UDP-Lite hash table entries: 256 (order: 2, 24576 bytes, linear) +[ 0.167395] NET: Registered PF_UNIX/PF_LOCAL protocol family +[ 0.170224] RPC: Registered named UNIX socket transport module. +[ 0.170280] RPC: Registered udp transport module. +[ 0.170291] RPC: Registered tcp transport module. +[ 0.170305] RPC: Registered tcp-with-tls transport module. +[ 0.170315] RPC: Registered tcp NFSv4.1 backchannel transport module. +[ 0.170463] PCI: CLS 0 bytes, default 64 +[ 0.177633] workingset: timestamp_bits=46 max_order=16 bucket_order=0 +[ 0.180895] NFS: Registering the id_resolver key type +[ 0.181717] Key type id_resolver registered +[ 0.181750] Key type id_legacy registered +[ 0.181976] nfs4filelayout_init: NFSv4 File Layout Driver Registering... +[ 0.182049] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver Registering... +[ 0.182691] 9p: Installing v9fs 9p2000 file system support +[ 0.184022] NET: Registered PF_ALG protocol family +[ 0.184294] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 246) +[ 0.184406] io scheduler mq-deadline registered +[ 0.184462] io scheduler kyber registered +[ 0.184546] io scheduler bfq registered +[ 0.187615] pci-host-generic 30000000.pci: host bridge /soc/pci@30000000 ranges: +[ 0.188285] pci-host-generic 30000000.pci: IO 0x0003000000..0x000300ffff -> 0x0000000000 +[ 0.188845] pci-host-generic 30000000.pci: MEM 0x0040000000..0x007fffffff -> 0x0040000000 +[ 0.188906] pci-host-generic 30000000.pci: MEM 0x0400000000..0x07ffffffff -> 0x0400000000 +[ 0.189388] pci-host-generic 30000000.pci: Memory resource size exceeds max for 32 bits +[ 0.189752] pci-host-generic 30000000.pci: ECAM at [mem 0x30000000-0x3fffffff] for [bus 00-ff] +[ 0.191528] pci-host-generic 30000000.pci: PCI host bridge to bus 0000:00 +[ 0.191752] pci_bus 0000:00: root bus resource [bus 00-ff] +[ 0.191831] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] +[ 0.191884] pci_bus 0000:00: root bus resource [mem 0x40000000-0x7fffffff] +[ 0.191897] pci_bus 0000:00: root bus resource [mem 0x400000000-0x7ffffffff] +[ 0.193325] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000 +[ 0.287202] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled +[ 0.297113] printk: console [ttyS0] disabled +[ 0.300265] 10000000.serial: ttyS0 at MMIO 0x10000000 (irq = 12, base_baud = 230400) is a 16550A +[ 0.301524] printk: console [ttyS0] enabled +[ 0.323403] SuperH (H)SCI(F) driver initialized +[ 0.343215] loop: module loaded +[ 0.344307] virtio_blk virtio0: 1/0/0 default/read/poll queues +[ 0.348817] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB) +[ 0.384041] e1000e: Intel(R) PRO/1000 Network Driver +[ 0.384260] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. +[ 0.387880] usbcore: registered new interface driver uas +[ 0.388206] usbcore: registered new interface driver usb-storage +[ 0.389412] mousedev: PS/2 mouse device common for all mice +[ 0.393353] goldfish_rtc 101000.rtc: registered as rtc0 +[ 0.394317] goldfish_rtc 101000.rtc: setting system clock to 2023-09-02T07:35:32 UTC (1693640132) +[ 0.398165] syscon-poweroff poweroff: pm_power_off already claimed for sbi_srst_power_off +[ 0.399069] syscon-poweroff: probe of poweroff failed with error -16 +[ 0.401754] sdhci: Secure Digital Host Controller Interface driver +[ 0.401947] sdhci: Copyright(c) Pierre Ossman +[ 0.402639] sdhci-pltfm: SDHCI platform and OF driver helper +[ 0.403484] usbcore: registered new interface driver usbhid +[ 0.403664] usbhid: USB HID core driver +[ 0.404356] riscv-pmu-sbi: SBI PMU extension is available +[ 0.405010] riscv-pmu-sbi: 16 firmware and 18 hardware counters +[ 0.405216] riscv-pmu-sbi: Perf sampling/filtering is not supported as sscof extension is not available +[ 0.410466] NET: Registered PF_INET6 protocol family +[ 0.418128] Segment Routing with IPv6 +[ 0.418632] In-situ OAM (IOAM) with IPv6 +[ 0.419337] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver +[ 0.422999] NET: Registered PF_PACKET protocol family +[ 0.424737] 9pnet: Installing 9P2000 support +[ 0.425341] Key type dns_resolver registered +[ 0.466679] debug_vm_pgtable: [debug_vm_pgtable ]: Validating architecture page table helpers +[ 0.476006] clk: Disabling unused clocks +[ 0.569745] EXT4-fs (vda): recovery complete +[ 0.572787] EXT4-fs (vda): mounted filesystem b1c2c62f-2f6d-4da5-af97-58c3648c79f4 r/w with ordered data mode. Quota mode: disabled. +[ 0.573412] VFS: Mounted root (ext4 filesystem) on device 254:0. +[ 0.576056] devtmpfs: mounted +[ 0.618132] Freeing unused kernel image (initmem) memory: 2200K +[ 0.618994] Run /sbin/init as init process + +Please press Enter to activate this console. + +``` + +## 源码分析 + +### base ISA extensions & ELF capabilities + +Linux Kernel 的启动日志中有两行输出如下: + +```sh +[ 0.000000] riscv: base ISA extensions acdfhim +[ 0.000000] riscv: ELF capabilities acdfim +``` + +在源码中以 `base ISA extensions` 作为关键字进行搜索,可以发现 ISA 扩展信息相关的函数位于 `arch/riscv/kernel/cpufeature.c` 这一文件中。这部分代码是在 [commit 6bcff51][006] 引入的,其中 riscv_isa 这个 bitmap 用于表示主机上所有 CPU 支持的 ISA 扩展的交集,而 elf_hwcap 仅用于表示与用户空间有关的 ISA 扩展的交集。 + +相关的函数比较长,下面通过添加中文注释的方式进行分析: + +```c +/* arch/riscv/kernel/cpufeature.c:102 */ + +void __init riscv_fill_hwcap(void) +{ + // hwcap 指 hardware capability + struct device_node *node; + const char *isa; + char print_str[NUM_ALPHA_EXTS + 1]; + int i, j, rc; + unsigned long isa2hwcap[26] = {0}; + struct acpi_table_header *rhct; + acpi_status status; + unsigned int cpu; + // COMPAT_HWCAP_ISA_? == (1 << (字母 ? 的ASCII码 - 'A')) + isa2hwcap['i' - 'a'] = COMPAT_HWCAP_ISA_I; + isa2hwcap['m' - 'a'] = COMPAT_HWCAP_ISA_M; + isa2hwcap['a' - 'a'] = COMPAT_HWCAP_ISA_A; + isa2hwcap['f' - 'a'] = COMPAT_HWCAP_ISA_F; + isa2hwcap['d' - 'a'] = COMPAT_HWCAP_ISA_D; + isa2hwcap['c' - 'a'] = COMPAT_HWCAP_ISA_C; + isa2hwcap['v' - 'a'] = COMPAT_HWCAP_ISA_V; + + elf_hwcap = 0; + + // 将 riscv_isa 这个 bitmap 清零 + bitmap_zero(riscv_isa, RISCV_ISA_EXT_MAX); + + // 检测是否启用了 ACPI,如果是,读取 ACPI status + if (!acpi_disabled) { + status = acpi_get_table(ACPI_SIG_RHCT, 0, &rhct); + if (ACPI_FAILURE(status)) + return; + } + + for_each_possible_cpu(cpu) { + // struct riscv_isainfo 结构体中包含了一个名为 isa 的长为 64 位的 bitmap + struct riscv_isainfo *isainfo = &hart_isa[cpu]; + // 当前 cpu 的 hwcap + unsigned long this_hwcap = 0; + + // 如果没有启用 ACPI,就读取设备树以获取 isa 信息 + if (acpi_disabled) { + // 获取设备树结点 + node = of_cpu_device_node_get(cpu); + if (!node) { + pr_warn("Unable to find cpu node\n"); + continue; + } + // 从设备树中读出 isa 字符串 + rc = of_property_read_string(node, "riscv,isa", &isa); + of_node_put(node); + if (rc) { + pr_warn("Unable to find \"riscv,isa\" devicetree entry\n"); + continue; + } + } else { + // 如果启用了 ACPI,通过 ACPI 获取 isa 字符串 + rc = acpi_get_riscv_isa(rhct, cpu, &isa); + if (rc < 0) { + pr_warn("Unable to get ISA for the hart - %d\n", cpu); + continue; + } + } + + /* + * For all possible cpus, we have already validated in + * the boot process that they at least contain "rv" and + * whichever of "32"/"64" this kernel supports, and so this + * section can be skipped. + */ + // isa 字符串必包含 rv+位数共 4 个字符,可以跳过对这四个的检查 + isa += 4; + + while (*isa) { + const char *ext = isa++; + const char *ext_end = isa; + // ext_long 表示是否有多字母扩展 + bool ext_long = false, ext_err = false; + + // 逐字符检查 isa 字符串中的扩展 + switch (*ext) { + case 's': + /* + * Workaround for invalid single-letter 's' & 'u'(QEMU). + * No need to set the bit in riscv_isa as 's' & 'u' are + * not valid ISA extensions. It works until multi-letter + * extension starting with "Su" appears. + */ + + if (ext[-1] != '_' && ext[1] == 'u') { + ++isa; + ext_err = true; + break; + } + fallthrough; + case 'S': + case 'x': + case 'X': + case 'z': + case 'Z': + /* + * Before attempting to parse the extension itself, we find its end. + * As multi-letter extensions must be split from other multi-letter + * extensions with an "_", the end of a multi-letter extension will + * either be the null character or the "_" at the start of the next + * multi-letter extension. + * + * Next, as the extensions version is currently ignored, we + * eliminate that portion. This is done by parsing backwards from + * the end of the extension, removing any numbers. This may be a + * major or minor number however, so the process is repeated if a + * minor number was found. + * + * ext_end is intended to represent the first character *after* the + * name portion of an extension, but will be decremented to the last + * character itself while eliminating the extensions version number. + * A simple re-increment solves this problem. + */ + ext_long = true; + for (; *isa && *isa != '_'; ++isa) + if (unlikely(!isalnum(*isa))) + ext_err = true; + + ext_end = isa; + if (unlikely(ext_err)) + break; + + if (!isdigit(ext_end[-1])) + break; + + while (isdigit(*--ext_end)) + ; + + if (tolower(ext_end[0]) != 'p' || !isdigit(ext_end[-1])) { + ++ext_end; + break; + } + + while (isdigit(*--ext_end)) + ; + + ++ext_end; + break; + default: + /* + * Things are a little easier for single-letter extensions, as they + * are parsed forwards. + * + * After checking that our starting position is valid, we need to + * ensure that, when isa was incremented at the start of the loop, + * that it arrived at the start of the next extension. + * + * If we are already on a non-digit, there is nothing to do. Either + * we have a multi-letter extension's _, or the start of an + * extension. + * + * Otherwise we have found the current extension's major version + * number. Parse past it, and a subsequent p/minor version number + * if present. The `p` extension must not appear immediately after + * a number, so there is no fear of missing it. + * + */ + if (unlikely(!isalpha(*ext))) { + ext_err = true; + break; + } + + if (!isdigit(*isa)) + break; + + while (isdigit(*++isa)) + ; + + if (tolower(*isa) != 'p') + break; + + if (!isdigit(*++isa)) { + --isa; + break; + } + + while (isdigit(*++isa)) + ; + + break; + } + + /* + * The parser expects that at the start of an iteration isa points to the + * first character of the next extension. As we stop parsing an extension + * on meeting a non-alphanumeric character, an extra increment is needed + * where the succeeding extension is a multi-letter prefixed with an "_". + */ + if (*isa == '_') + ++isa; + +#define SET_ISA_EXT_MAP(name, bit) \ + do { \ + if ((ext_end - ext == sizeof(name) - 1) && \ + !strncasecmp(ext, name, sizeof(name) - 1) && \ + riscv_isa_extension_check(bit)) \ + set_bit(bit, isainfo->isa); \ + } while (false) \ + + if (unlikely(ext_err)) + continue; + if (!ext_long) { + // 如果没有多字母扩展 + int nr = tolower(*ext) - 'a'; + if (riscv_isa_extension_check(nr)) { + // 设置 hwcap + this_hwcap |= isa2hwcap[nr]; + set_bit(nr, isainfo->isa); + } + } else { + // 判断多字母扩展,检测并设置相应的 bitmap + // riscv_isa_extension_check 函数会额外检测 Zicbom 与 Zicboz 扩展的一些限制 + /* sorted alphabetically */ + SET_ISA_EXT_MAP("smaia", RISCV_ISA_EXT_SMAIA); + SET_ISA_EXT_MAP("ssaia", RISCV_ISA_EXT_SSAIA); + SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF); + SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC); + SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL); + SET_ISA_EXT_MAP("svnapot", RISCV_ISA_EXT_SVNAPOT); + SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); + SET_ISA_EXT_MAP("zba", RISCV_ISA_EXT_ZBA); + SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB); + SET_ISA_EXT_MAP("zbs", RISCV_ISA_EXT_ZBS); + SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); + SET_ISA_EXT_MAP("zicboz", RISCV_ISA_EXT_ZICBOZ); + SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE); + } +#undef SET_ISA_EXT_MAP + } + + /* + * These ones were as they were part of the base ISA when the + * port & dt-bindings were upstreamed, and so can be set + * unconditionally where `i` is in riscv,isa on DT systems. + */ + + // 如果未启用 ACPI,直接无条件将以下扩展在 bitmap 中标记为已启用 + if (acpi_disabled) { + set_bit(RISCV_ISA_EXT_ZICSR, isainfo->isa); + set_bit(RISCV_ISA_EXT_ZIFENCEI, isainfo->isa); + set_bit(RISCV_ISA_EXT_ZICNTR, isainfo->isa); + set_bit(RISCV_ISA_EXT_ZIHPM, isainfo->isa); + } + + /* + * All "okay" hart should have same isa. Set HWCAP based on + * common capabilities of every "okay" hart, in case they don't + * have. + */ + // 取出各个 CPU hwcap 的交集作为 elf_hwcap + if (elf_hwcap) + elf_hwcap &= this_hwcap; + else + elf_hwcap = this_hwcap; + + // 取出各个 CPU ISA 扩展的 bitmap 的交集作为 riscv_isa + if (bitmap_empty(riscv_isa, RISCV_ISA_EXT_MAX)) + bitmap_copy(riscv_isa, isainfo->isa, RISCV_ISA_EXT_MAX); + else + bitmap_and(riscv_isa, riscv_isa, isainfo->isa, RISCV_ISA_EXT_MAX); + } + // 如果启用了 ACPI,并且存在 RHCT + // RHCT 指 RISC-V Hart Capabilities Table,是 RISC-V CPU 与 OS 之间交流 CPU 的功能特性时使用的表格结构 + if (!acpi_disabled && rhct) + acpi_put_table((struct acpi_table_header *)rhct); + + /* We don't support systems with F but without D, so mask those out + * here. */ + // Linux 要求若启用了 F 扩展,D 扩展必须同时启用,否则就关闭 F 扩展 + if ((elf_hwcap & COMPAT_HWCAP_ISA_F) && !(elf_hwcap & COMPAT_HWCAP_ISA_D)) { + pr_info("This kernel does not support systems with F but not D\n"); + elf_hwcap &= ~COMPAT_HWCAP_ISA_F; + } + + if (elf_hwcap & COMPAT_HWCAP_ISA_V) { + riscv_v_setup_vsize(); + /* + * ISA string in device tree might have 'v' flag, but + * CONFIG_RISCV_ISA_V is disabled in kernel. + * Clear V flag in elf_hwcap if CONFIG_RISCV_ISA_V is disabled. + */ + // 如果 config 里没有启用 V 扩展,但是设备树的 ISA string 里包含了 V 扩展,则在 elf_hwcap 中将 V 扩展标记为未启用 + if (!IS_ENABLED(CONFIG_RISCV_ISA_V)) + elf_hwcap &= ~COMPAT_HWCAP_ISA_V; + } + + memset(print_str, 0, sizeof(print_str)); + for (i = 0, j = 0; i < NUM_ALPHA_EXTS; i++) + if (riscv_isa[0] & BIT_MASK(i)) + print_str[j++] = (char)('a' + i); + // 按照字典序输出启用的单字母扩展 + pr_info("riscv: base ISA extensions %s\n", print_str); + + memset(print_str, 0, sizeof(print_str)); + for (i = 0, j = 0; i < NUM_ALPHA_EXTS; i++) + if (elf_hwcap & BIT_MASK(i)) + print_str[j++] = (char)('a' + i); + // 按照字典序输出 ELF 支持的扩展 + pr_info("riscv: ELF capabilities %s\n", print_str); +} +``` + +将上述代码简单总结一下: + +- S-mode 的 Linux 内核会通过 ACPI 或设备树中的 ISA string 得到 RISC-V ISA 扩展信息,但不是直接拿来用,而是会进行一些合法性检测与其它设置,例如:内核中 F 扩展和 D 扩展要么都启用,要么都不启用。 +- 内核最终输出的所支持的 ISA,是各个 hart 所支持的 ISA 的交集,并在此基础上应用一些 config 配置文件中有关 ISA 扩展的设置。 +- 未启用 ACPI 时,Linux 内核通过设备树中的 ISA string 获得扩展信息,此时 Zicsr,Zifencei,Zicntr,Zihpm 扩展会无条件启用。 +- 启用 ACPI 时,Linux 会尝试获取 ACPI 中用于向 OS 传递 CPU 信息的 RHCT (RISC-V Hart Capabilities Table)结构,如果存在 RHCT,就从中读取出 ISA string。 + +上述代码所检测的这些扩展在 bitmap 中所对应的位号定义在 `arch/riscv/include/asm/hwcap.h` 中,每个扩展对应一个位,其中低 26 位用于单字母扩展,多字母扩展对应的位号从 27 开始分配,bitmap 的最大容量为 64: + +```c +/* arch/riscv/include/asm/hwcap.h:16 */ + +#define RISCV_ISA_EXT_a ('a' - 'a') +#define RISCV_ISA_EXT_c ('c' - 'a') +#define RISCV_ISA_EXT_d ('d' - 'a') +#define RISCV_ISA_EXT_f ('f' - 'a') +#define RISCV_ISA_EXT_h ('h' - 'a') +#define RISCV_ISA_EXT_i ('i' - 'a') +#define RISCV_ISA_EXT_m ('m' - 'a') +#define RISCV_ISA_EXT_s ('s' - 'a') +#define RISCV_ISA_EXT_u ('u' - 'a') +#define RISCV_ISA_EXT_v ('v' - 'a') + +/* + * These macros represent the logical IDs of each multi-letter RISC-V ISA + * extension and are used in the ISA bitmap. The logical IDs start from + * RISCV_ISA_EXT_BASE, which allows the 0-25 range to be reserved for single + * letter extensions. The maximum, RISCV_ISA_EXT_MAX, is defined in order + * to allocate the bitmap and may be increased when necessary. + * + * New extensions should just be added to the bottom, rather than added + * alphabetically, in order to avoid unnecessary shuffling. + */ +#define RISCV_ISA_EXT_BASE 26 + +#define RISCV_ISA_EXT_SSCOFPMF 26 +#define RISCV_ISA_EXT_SSTC 27 +#define RISCV_ISA_EXT_SVINVAL 28 +#define RISCV_ISA_EXT_SVPBMT 29 +#define RISCV_ISA_EXT_ZBB 30 +#define RISCV_ISA_EXT_ZICBOM 31 +#define RISCV_ISA_EXT_ZIHINTPAUSE 32 +#define RISCV_ISA_EXT_SVNAPOT 33 +#define RISCV_ISA_EXT_ZICBOZ 34 +#define RISCV_ISA_EXT_SMAIA 35 +#define RISCV_ISA_EXT_SSAIA 36 +#define RISCV_ISA_EXT_ZBA 37 +#define RISCV_ISA_EXT_ZBS 38 +#define RISCV_ISA_EXT_ZICNTR 39 +#define RISCV_ISA_EXT_ZICSR 40 +#define RISCV_ISA_EXT_ZIFENCEI 41 +#define RISCV_ISA_EXT_ZIHPM 42 + +#define RISCV_ISA_EXT_MAX 64 +#define RISCV_ISA_EXT_NAME_LEN_MAX 32 + +#ifdef CONFIG_RISCV_M_MODE +#define RISCV_ISA_EXT_SxAIA RISCV_ISA_EXT_SMAIA +#else +#define RISCV_ISA_EXT_SxAIA RISCV_ISA_EXT_SSAIA +#endif +``` + +但解析 ISA string 面临着诸多问题,从上述代码也可以看出来对 ISA string 的解析比较繁琐且容易出错,相关讨论见 [邮件列表][005]: + + > There's been a bunch of off-list discussions about this, including at + > Plumbers. The original plan was to do something involving providing an + > ISA string to userspace, but ISA strings just aren't sufficient for a + > stable ABI any more: in order to parse an ISA string users need the + > version of the specifications that the string is written to, the version + > of each extension (sometimes at a finer granularity than the RISC-V + > releases/versions encode), and the expected use case for the ISA string + > (ie, is it a U-mode or M-mode string). That's a lot of complexity to + > try and keep ABI compatible and it's probably going to continue to grow, + > as even if there's no more complexity in the specifications we'll have + > to deal with the various ISA string parsing oddities that end up all + > over userspace. + +于是 Linux Kernel 又在用户空间引入了新的系统调用,详见下一小节。 + +### RISC-V Hardware Probing Interface + +上一小节中提到的 elf_hwcap 仅有 64 位可用,但用户空间需要检测的扩展可能不止 64 个,因此上述机制同样面临位数不够的问题。 + +为了解决这些问题,Linux 内核在 [commit ea3de9c][004] 中引入了一个用于在用户空间进行硬件检测的系统调用,相关文档见 + +`Documentation/riscv/hwprobe.rst`。该系统调用的参数包括一个键值对数组,键值对的个数,CPU 个数,CPU set 与一个 flag,目前支持检测 `m{arch,imp,vendor}id` 和少数 ISA 扩展,未来能够基于键值对参数进行更多的检测: + +```c +struct riscv_hwprobe { + __s64 key; + __u64 value; +}; + +long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count, size_t cpu_count, cpu_set_t *cpus, unsigned int flags); +``` + +其中与扩展检测相关的 C 代码如下: + +```c +/* arch/riscv/kernel/sys_riscv.c:125 */ + +static void hwprobe_isa_ext0(struct riscv_hwprobe *pair, + const struct cpumask *cpus) +{ + int cpu; + u64 missing = 0; + + pair->value = 0; + if (has_fpu()) + pair->value |= RISCV_HWPROBE_IMA_FD; + + if (riscv_isa_extension_available(NULL, c)) + pair->value |= RISCV_HWPROBE_IMA_C; + + if (has_vector()) + pair->value |= RISCV_HWPROBE_IMA_V; + + /* + * Loop through and record extensions that 1) anyone has, and 2) anyone + * doesn't have. + */ + for_each_cpu(cpu, cpus) { + struct riscv_isainfo *isainfo = &hart_isa[cpu]; + + if (riscv_isa_extension_available(isainfo->isa, ZBA)) + pair->value |= RISCV_HWPROBE_EXT_ZBA; + else + missing |= RISCV_HWPROBE_EXT_ZBA; + + if (riscv_isa_extension_available(isainfo->isa, ZBB)) + pair->value |= RISCV_HWPROBE_EXT_ZBB; + else + missing |= RISCV_HWPROBE_EXT_ZBB; + + if (riscv_isa_extension_available(isainfo->isa, ZBS)) + pair->value |= RISCV_HWPROBE_EXT_ZBS; + else + missing |= RISCV_HWPROBE_EXT_ZBS; + } + + /* Now turn off reporting features if any CPU is missing it. */ + pair->value &= ~missing; +} +``` + +上述这段代码检查 `IMAFDCV_Zba_Zbb_Zbs` 这些扩展是否受支持。其中 F 和 D 扩展是绑定的,Linux 内核中这两个扩展要么都启用,要么都关闭。`riscv_isa_extensions_avaliable()` 会根据上一小节中设置的 bitmap 来判断指定的扩展是否启用。浮点扩展和向量扩展则是通过 `has_fpu()` 和 `has_vector()` 分别进行判断。 + +## 参考资料 + +- [kernel.org][001] +- [RISC-V Linux 启动流程分析][002] +- [RISC-V 当前指令集扩展类别与检测方式][003] +- [commit ea3de9c: RISC-V: Add a syscall for HW probing][004] +- [引入 RISC-V Hardware Probing User Interface 的邮件讨论][005] +- [commit 6bcff51: RISC-V: Add bitmap reprensenting ISA features common across CPUs][006] + +[001]: https://mirrors.edge.kernel.org/pub/linux/kernel/ +[002]: https://tinylab.org/riscv-linux-startup/ +[003]: https://gitee.com/tinylab/riscv-linux/blob/master/articles/20230715-riscv-isa-extensions-discovery-1.md +[004]: https://github.com/torvalds/linux/commit/ea3de9ce8aa280c5175c835bd3e94a3a9b814b74#diff-24372ab3ad2d22486b15d8a8f7e9e53a04e16efe0a392cec83786e24cb767bdd +[005]: https://lore.kernel.org/all/20230411-primate-rice-a5c102f90c6c@wendy/https://lore.kernel.org/all/20230411-primate-rice-a5c102f90c6c@wendy/ +[006]: https://github.com/torvalds/linux/commit/6bcff51539ccae5431a01f60293419dbae21100f -- Gitee From 23f6ce0abd051fb1c162d1667f6925094b3fb3d1 Mon Sep 17 00:00:00 2001 From: yjmstr Date: Thu, 14 Sep 2023 16:07:10 +0800 Subject: [PATCH 2/5] initial commit --- articles/20230901-riscv-isa-discovery-5-linux.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/articles/20230901-riscv-isa-discovery-5-linux.md b/articles/20230901-riscv-isa-discovery-5-linux.md index 4ebad4e..914fd35 100644 --- a/articles/20230901-riscv-isa-discovery-5-linux.md +++ b/articles/20230901-riscv-isa-discovery-5-linux.md @@ -1,5 +1,5 @@ > Author: YJMSTR [jay1273062855@outlook.com](mailto:jay1273062855@outlook.com)
-> Date: 2023/08/29
+> Date: 2023/09/01
> Revisor: Bin Meng, Falcon
> Project: [RISC-V Linux 内核剖析](https://gitee.com/tinylab/riscv-linux)
> Sponsor: ISCAS @@ -10,10 +10,12 @@ 本文是 RISC-V 扩展软硬件支持系列的第 5 篇文章,将介绍 Linux 内核对 RISC-V 扩展的检测与支持方式。建议在阅读本文之前先阅读本系列第一篇文章:[《RISC-V 当前指令集扩展类别与检测方式》][003] 以对 RISC-V ISA 扩展目前的命名、分类与硬件检测方式有所了解。 -RISC-V 之前使用名为 misa 的 CSR 来检测 ISA 扩展,在 misa 中分配了 26 位,每一位用于标识某扩展/特权模式是否启用,但随着扩展数目增加,misa 的位数不够了。此时 RISC-V 引入了一个名为 mconfigptr 的 CSR,该 CSR 中存放有一个地址,指向包含硬件信息的数据结构,固件可以利用这个数据结构生成 SMBIOS/设备树/ACPI。 +RISC-V 之前使用名为 misa 的 CSR 来检测 ISA 扩展,在 misa 中分配了 26 位,每一位用于标识某扩展/特权模式是否启用,但随着扩展数目增加,misa 的位数不够了。 RISC-V 后来引入了一个名为 mconfigptr 的 CSR,该 CSR 中存放有一个地址,指向包含硬件信息的数据结构,固件可以利用这个数据结构来生成 SMBIOS/设备树/ACPI。 SMBIOS 中存放的 ISA 信息仅包含 misa 中的那些,还是不够用。设备树通过一个 ISA string 来表示 ISA 扩展组合,ACPI 则是包含有一个 RHCT (RISC-V Hart Capabilities Table),其中同样包含了一个 ISA string 结点。 +[本系列的上一篇文章][007]中介绍了 OpenSBI 的 RISC-V ISA 扩展检测情况,SBI 位于 M 模式,能够直接读取相关 CSR 来获得相应的扩展信息。 + ## 获取 Linux 源码 Linux 内核较大,使用 `git fetch` 可以断点续传,以免网络出问题导致 `git clone` 中断。 @@ -326,9 +328,7 @@ Please press Enter to activate this console. ``` -## 源码分析 - -### base ISA extensions & ELF capabilities +## Base ISA extensions & ELF capabilities Linux Kernel 的启动日志中有两行输出如下: @@ -729,7 +729,7 @@ void __init riscv_fill_hwcap(void) 于是 Linux Kernel 又在用户空间引入了新的系统调用,详见下一小节。 -### RISC-V Hardware Probing Interface +## RISC-V Hardware Probing Interface 上一小节中提到的 elf_hwcap 仅有 64 位可用,但用户空间需要检测的扩展可能不止 64 个,因此上述机制同样面临位数不够的问题。 @@ -795,7 +795,7 @@ static void hwprobe_isa_ext0(struct riscv_hwprobe *pair, } ``` -上述这段代码检查 `IMAFDCV_Zba_Zbb_Zbs` 这些扩展是否受支持。其中 F 和 D 扩展是绑定的,Linux 内核中这两个扩展要么都启用,要么都关闭。`riscv_isa_extensions_avaliable()` 会根据上一小节中设置的 bitmap 来判断指定的扩展是否启用。浮点扩展和向量扩展则是通过 `has_fpu()` 和 `has_vector()` 分别进行判断。 +上述这段代码检查 `IMAFDCV_Zba_Zbb_Zbs` 这些扩展是否受支持。其中 F 和 D 扩展是绑定的,Linux 内核中这两个扩展要么都启用,要么都关闭。 ## 参考资料 @@ -805,6 +805,7 @@ static void hwprobe_isa_ext0(struct riscv_hwprobe *pair, - [commit ea3de9c: RISC-V: Add a syscall for HW probing][004] - [引入 RISC-V Hardware Probing User Interface 的邮件讨论][005] - [commit 6bcff51: RISC-V: Add bitmap reprensenting ISA features common across CPUs][006] +- [OpenSBI RISC-V ISA 扩展检测与支持方式分析][007] [001]: https://mirrors.edge.kernel.org/pub/linux/kernel/ [002]: https://tinylab.org/riscv-linux-startup/ @@ -812,3 +813,4 @@ static void hwprobe_isa_ext0(struct riscv_hwprobe *pair, [004]: https://github.com/torvalds/linux/commit/ea3de9ce8aa280c5175c835bd3e94a3a9b814b74#diff-24372ab3ad2d22486b15d8a8f7e9e53a04e16efe0a392cec83786e24cb767bdd [005]: https://lore.kernel.org/all/20230411-primate-rice-a5c102f90c6c@wendy/https://lore.kernel.org/all/20230411-primate-rice-a5c102f90c6c@wendy/ [006]: https://github.com/torvalds/linux/commit/6bcff51539ccae5431a01f60293419dbae21100f +[007]: https://gitee.com/tinylab/riscv-linux/blob/master/articles/20230816-riscv-isa-discovery-4-opensbi.md -- Gitee From 673b573e9b466c81528ec1ec7808d39d7ed505a0 Mon Sep 17 00:00:00 2001 From: yjmstr Date: Sat, 16 Sep 2023 14:44:40 +0800 Subject: [PATCH 3/5] initial commit --- .../20230901-riscv-isa-discovery-5-linux.md | 39 ++++++++++++------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/articles/20230901-riscv-isa-discovery-5-linux.md b/articles/20230901-riscv-isa-discovery-5-linux.md index 914fd35..539b5de 100644 --- a/articles/20230901-riscv-isa-discovery-5-linux.md +++ b/articles/20230901-riscv-isa-discovery-5-linux.md @@ -1,3 +1,4 @@ +> Corrector: [TinyCorrect](https://gitee.com/tinylab/tinycorrect) v0.2-rc2 - [spaces]
> Author: YJMSTR [jay1273062855@outlook.com](mailto:jay1273062855@outlook.com)
> Date: 2023/09/01
> Revisor: Bin Meng, Falcon
@@ -10,9 +11,11 @@ 本文是 RISC-V 扩展软硬件支持系列的第 5 篇文章,将介绍 Linux 内核对 RISC-V 扩展的检测与支持方式。建议在阅读本文之前先阅读本系列第一篇文章:[《RISC-V 当前指令集扩展类别与检测方式》][003] 以对 RISC-V ISA 扩展目前的命名、分类与硬件检测方式有所了解。 -RISC-V 之前使用名为 misa 的 CSR 来检测 ISA 扩展,在 misa 中分配了 26 位,每一位用于标识某扩展/特权模式是否启用,但随着扩展数目增加,misa 的位数不够了。 RISC-V 后来引入了一个名为 mconfigptr 的 CSR,该 CSR 中存放有一个地址,指向包含硬件信息的数据结构,固件可以利用这个数据结构来生成 SMBIOS/设备树/ACPI。 +RISC-V 之前使用名为 misa 的 CSR 来检测 ISA 扩展,在 misa 中分配了 26 位,每一位用于标识某扩展/特权模式是否启用,但随着扩展数目增加,misa 的位数不够了。RISC-V 后来引入了一个名为 mconfigptr 的 CSR,该 CSR 中存放有一个地址,指向包含硬件信息的数据结构,固件可以利用这个数据结构来生成 SMBIOS/设备树/ACPI。 -SMBIOS 中存放的 ISA 信息仅包含 misa 中的那些,还是不够用。设备树通过一个 ISA string 来表示 ISA 扩展组合,ACPI 则是包含有一个 RHCT (RISC-V Hart Capabilities Table),其中同样包含了一个 ISA string 结点。 +SMBIOS 中存放的 ISA 信息仅包含 misa 中的那些,还是不够用。设备树通过一个 ISA string 来表示 ISA 扩展组合,ACPI 则是包含有一个 RHCT(RISC-V Hart Capabilities Table),其中同样包含了一个 ISA string 结点。 + +PLCT 在 2022 年曾经做过相关的[调研工作][008],当时的 Linux 内核可以通过 cpuinfo/环境变量/HWCAP/SIGILL 等方式在用户空间获取 ISA 扩展信息。 [本系列的上一篇文章][007]中介绍了 OpenSBI 的 RISC-V ISA 扩展检测情况,SBI 位于 M 模式,能够直接读取相关 CSR 来获得相应的扩展信息。 @@ -324,7 +327,7 @@ Boot HART MEDELEG : 0x0000000000f0b509 [ 0.618132] Freeing unused kernel image (initmem) memory: 2200K [ 0.618994] Run /sbin/init as init process -Please press Enter to activate this console. +Please press Enter to activate this console. ``` @@ -365,7 +368,7 @@ void __init riscv_fill_hwcap(void) isa2hwcap['v' - 'a'] = COMPAT_HWCAP_ISA_V; elf_hwcap = 0; - + // 将 riscv_isa 这个 bitmap 清零 bitmap_zero(riscv_isa, RISCV_ISA_EXT_MAX); @@ -375,7 +378,7 @@ void __init riscv_fill_hwcap(void) if (ACPI_FAILURE(status)) return; } - + for_each_possible_cpu(cpu) { // struct riscv_isainfo 结构体中包含了一个名为 isa 的长为 64 位的 bitmap struct riscv_isainfo *isainfo = &hart_isa[cpu]; @@ -420,7 +423,7 @@ void __init riscv_fill_hwcap(void) const char *ext_end = isa; // ext_long 表示是否有多字母扩展 bool ext_long = false, ext_err = false; - + // 逐字符检查 isa 字符串中的扩展 switch (*ext) { case 's': @@ -430,7 +433,7 @@ void __init riscv_fill_hwcap(void) * not valid ISA extensions. It works until multi-letter * extension starting with "Su" appears. */ - + if (ext[-1] != '_' && ext[1] == 'u') { ++isa; ext_err = true; @@ -582,7 +585,7 @@ void __init riscv_fill_hwcap(void) * port & dt-bindings were upstreamed, and so can be set * unconditionally where `i` is in riscv,isa on DT systems. */ - + // 如果未启用 ACPI,直接无条件将以下扩展在 bitmap 中标记为已启用 if (acpi_disabled) { set_bit(RISCV_ISA_EXT_ZICSR, isainfo->isa); @@ -601,8 +604,8 @@ void __init riscv_fill_hwcap(void) elf_hwcap &= this_hwcap; else elf_hwcap = this_hwcap; - - // 取出各个 CPU ISA 扩展的 bitmap 的交集作为 riscv_isa + + // 取出各个 CPU ISA 扩展的 bitmap 的交集作为 riscv_isa if (bitmap_empty(riscv_isa, RISCV_ISA_EXT_MAX)) bitmap_copy(riscv_isa, isainfo->isa, RISCV_ISA_EXT_MAX); else @@ -654,7 +657,7 @@ void __init riscv_fill_hwcap(void) - S-mode 的 Linux 内核会通过 ACPI 或设备树中的 ISA string 得到 RISC-V ISA 扩展信息,但不是直接拿来用,而是会进行一些合法性检测与其它设置,例如:内核中 F 扩展和 D 扩展要么都启用,要么都不启用。 - 内核最终输出的所支持的 ISA,是各个 hart 所支持的 ISA 的交集,并在此基础上应用一些 config 配置文件中有关 ISA 扩展的设置。 - 未启用 ACPI 时,Linux 内核通过设备树中的 ISA string 获得扩展信息,此时 Zicsr,Zifencei,Zicntr,Zihpm 扩展会无条件启用。 -- 启用 ACPI 时,Linux 会尝试获取 ACPI 中用于向 OS 传递 CPU 信息的 RHCT (RISC-V Hart Capabilities Table)结构,如果存在 RHCT,就从中读取出 ISA string。 +- 启用 ACPI 时,Linux 会尝试获取 ACPI 中用于向 OS 传递 CPU 信息的 RHCT(RISC-V Hart Capabilities Table)结构,如果存在 RHCT,就从中读取出 ISA string。 上述代码所检测的这些扩展在 bitmap 中所对应的位号定义在 `arch/riscv/include/asm/hwcap.h` 中,每个扩展对应一个位,其中低 26 位用于单字母扩展,多字母扩展对应的位号从 27 开始分配,bitmap 的最大容量为 64: @@ -715,13 +718,13 @@ void __init riscv_fill_hwcap(void) 但解析 ISA string 面临着诸多问题,从上述代码也可以看出来对 ISA string 的解析比较繁琐且容易出错,相关讨论见 [邮件列表][005]: > There's been a bunch of off-list discussions about this, including at - > Plumbers. The original plan was to do something involving providing an + > Plumbers. The original plan was to do something involving providing an > ISA string to userspace, but ISA strings just aren't sufficient for a > stable ABI any more: in order to parse an ISA string users need the > version of the specifications that the string is written to, the version > of each extension (sometimes at a finer granularity than the RISC-V > releases/versions encode), and the expected use case for the ISA string - > (ie, is it a U-mode or M-mode string). That's a lot of complexity to + > (ie, is it a U-mode or M-mode string). That's a lot of complexity to > try and keep ABI compatible and it's probably going to continue to grow, > as even if there's no more complexity in the specifications we'll have > to deal with the various ISA string parsing oddities that end up all @@ -797,6 +800,14 @@ static void hwprobe_isa_ext0(struct riscv_hwprobe *pair, 上述这段代码检查 `IMAFDCV_Zba_Zbb_Zbs` 这些扩展是否受支持。其中 F 和 D 扩展是绑定的,Linux 内核中这两个扩展要么都启用,要么都关闭。 +## 总结 + +本文通过对 `riscv_fill_hwcap` 函数的分析,介绍了 Linux 利用设备树/ACPI 检测 RISC-V ISA 扩展并生成 HWCAP 的方式,并介绍了 Linux 新引入的硬件检测系统调用 hw_probe。 + +2022 年,PLCT 曾经做过有关 RISC-V ISA 扩展检测机制的[调研][008],当时的 Linux 内核通过 M 模式传递来的设备树/SMBIOS/ACPI 中所包含的信息来在检测 ISA 扩展,而用户态可以通过 HWCAP/cpuinfo/环境变量/SIGILL 等方式对 RISC-V ISA 进行检测,但大部分方法本质上都是基于 ISA-string,对 ISA-string 的解析容易出错,且用户态缺少相应的硬件检测系统调用。 + +如今,用户空间的 RISC-V ISA 检测机制发生了一些变化,在 Linux v6.4 版本中引入了新的系统调用 hw_probe,用于检测硬件。目前这一系统调用支持的功能比较少,但它能够解决 HWCAP 位数不够用,用户空间缺少硬件检测的系统调用等问题。 + ## 参考资料 - [kernel.org][001] @@ -806,6 +817,7 @@ static void hwprobe_isa_ext0(struct riscv_hwprobe *pair, - [引入 RISC-V Hardware Probing User Interface 的邮件讨论][005] - [commit 6bcff51: RISC-V: Add bitmap reprensenting ISA features common across CPUs][006] - [OpenSBI RISC-V ISA 扩展检测与支持方式分析][007] +- [PLCT 2022 年对 RISC-V ISA 扩展检测方式的调研][008] [001]: https://mirrors.edge.kernel.org/pub/linux/kernel/ [002]: https://tinylab.org/riscv-linux-startup/ @@ -814,3 +826,4 @@ static void hwprobe_isa_ext0(struct riscv_hwprobe *pair, [005]: https://lore.kernel.org/all/20230411-primate-rice-a5c102f90c6c@wendy/https://lore.kernel.org/all/20230411-primate-rice-a5c102f90c6c@wendy/ [006]: https://github.com/torvalds/linux/commit/6bcff51539ccae5431a01f60293419dbae21100f [007]: https://gitee.com/tinylab/riscv-linux/blob/master/articles/20230816-riscv-isa-discovery-4-opensbi.md +[008]: https://github.com/plctlab/PLCT-Open-Reports/blob/master/20220706-%E9%83%91%E9%88%9C%E5%A3%AC-discovery.pdf -- Gitee From 96ebb13c4c91ec387c548282f2a0d80e05bced71 Mon Sep 17 00:00:00 2001 From: yjmstr Date: Sat, 16 Sep 2023 14:48:15 +0800 Subject: [PATCH 4/5] initial commit --- .../20230901-riscv-isa-discovery-5-linux.md | 29 +++++++++---------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/articles/20230901-riscv-isa-discovery-5-linux.md b/articles/20230901-riscv-isa-discovery-5-linux.md index 539b5de..071541f 100644 --- a/articles/20230901-riscv-isa-discovery-5-linux.md +++ b/articles/20230901-riscv-isa-discovery-5-linux.md @@ -1,4 +1,3 @@ -> Corrector: [TinyCorrect](https://gitee.com/tinylab/tinycorrect) v0.2-rc2 - [spaces]
> Author: YJMSTR [jay1273062855@outlook.com](mailto:jay1273062855@outlook.com)
> Date: 2023/09/01
> Revisor: Bin Meng, Falcon
@@ -33,7 +32,7 @@ $ git remote add origin https://gitee.com/mirrors/linux_old1.git $ git pull origin ``` -`git checkout` 的输出表明:HEAD is now at 2dde18cd1d8f Linux 6.5,本文将基于这一版本进行分析。如果我们想要获取特定版本的 Linux 内核源码,可以从 [kernel.org][001] 上下载,也可以在 `git pull origin` 之后通过 `git checkout` 切换到指定版本。 +`git checkout` 的输出表明:HEAD is now at 2dde18cd1d8f Linux 6.5,本文将基于这一版本进行分析。如果我们想要获取特定版本的 Linux 内核源码,可以从 [kernel.org][001] 上下载,也可以在 `git pull origin` 之后通过 `git checkout` 切换到指定版本。 ## 编译内核并启动 @@ -358,7 +357,7 @@ void __init riscv_fill_hwcap(void) struct acpi_table_header *rhct; acpi_status status; unsigned int cpu; - // COMPAT_HWCAP_ISA_? == (1 << (字母 ? 的ASCII码 - 'A')) + // COMPAT_HWCAP_ISA_? == (1 << (字母 ? 的 ASCII 码 - 'A')) isa2hwcap['i' - 'a'] = COMPAT_HWCAP_ISA_I; isa2hwcap['m' - 'a'] = COMPAT_HWCAP_ISA_M; isa2hwcap['a' - 'a'] = COMPAT_HWCAP_ISA_A; @@ -717,18 +716,18 @@ void __init riscv_fill_hwcap(void) 但解析 ISA string 面临着诸多问题,从上述代码也可以看出来对 ISA string 的解析比较繁琐且容易出错,相关讨论见 [邮件列表][005]: - > There's been a bunch of off-list discussions about this, including at - > Plumbers. The original plan was to do something involving providing an - > ISA string to userspace, but ISA strings just aren't sufficient for a - > stable ABI any more: in order to parse an ISA string users need the - > version of the specifications that the string is written to, the version - > of each extension (sometimes at a finer granularity than the RISC-V - > releases/versions encode), and the expected use case for the ISA string - > (ie, is it a U-mode or M-mode string). That's a lot of complexity to - > try and keep ABI compatible and it's probably going to continue to grow, - > as even if there's no more complexity in the specifications we'll have - > to deal with the various ISA string parsing oddities that end up all - > over userspace. +> There's been a bunch of off-list discussions about this, including at +> Plumbers. The original plan was to do something involving providing an +> ISA string to userspace, but ISA strings just aren't sufficient for a +> stable ABI any more: in order to parse an ISA string users need the +> version of the specifications that the string is written to, the version +> of each extension (sometimes at a finer granularity than the RISC-V +> releases/versions encode), and the expected use case for the ISA string +> (ie, is it a U-mode or M-mode string). That's a lot of complexity to +> try and keep ABI compatible and it's probably going to continue to grow, +> as even if there's no more complexity in the specifications we'll have +> to deal with the various ISA string parsing oddities that end up all +> over userspace. 于是 Linux Kernel 又在用户空间引入了新的系统调用,详见下一小节。 -- Gitee From 5c8154cb3f5cb799474000c6bcc374bfcc0ea6b1 Mon Sep 17 00:00:00 2001 From: yjmstr Date: Fri, 29 Sep 2023 18:08:24 +0800 Subject: [PATCH 5/5] [v0]add article 20230928-riscv-isa-extensions-discovery-6-unified-discovery.md [v0]add article 20230928-riscv-isa-extensions-discovery-6-unified-discovery.md delete for merge --- .../20230901-riscv-isa-discovery-5-linux.md | 828 ------------------ ...xtensions-discovery-6-unified-discovery.md | 279 ++++++ 2 files changed, 279 insertions(+), 828 deletions(-) delete mode 100644 articles/20230901-riscv-isa-discovery-5-linux.md create mode 100644 articles/20230928-riscv-isa-extensions-discovery-6-unified-discovery.md diff --git a/articles/20230901-riscv-isa-discovery-5-linux.md b/articles/20230901-riscv-isa-discovery-5-linux.md deleted file mode 100644 index 071541f..0000000 --- a/articles/20230901-riscv-isa-discovery-5-linux.md +++ /dev/null @@ -1,828 +0,0 @@ -> Author: YJMSTR [jay1273062855@outlook.com](mailto:jay1273062855@outlook.com)
-> Date: 2023/09/01
-> Revisor: Bin Meng, Falcon
-> Project: [RISC-V Linux 内核剖析](https://gitee.com/tinylab/riscv-linux)
-> Sponsor: ISCAS - -# Linux RISC-V ISA 扩展支持 - -## 前言 - -本文是 RISC-V 扩展软硬件支持系列的第 5 篇文章,将介绍 Linux 内核对 RISC-V 扩展的检测与支持方式。建议在阅读本文之前先阅读本系列第一篇文章:[《RISC-V 当前指令集扩展类别与检测方式》][003] 以对 RISC-V ISA 扩展目前的命名、分类与硬件检测方式有所了解。 - -RISC-V 之前使用名为 misa 的 CSR 来检测 ISA 扩展,在 misa 中分配了 26 位,每一位用于标识某扩展/特权模式是否启用,但随着扩展数目增加,misa 的位数不够了。RISC-V 后来引入了一个名为 mconfigptr 的 CSR,该 CSR 中存放有一个地址,指向包含硬件信息的数据结构,固件可以利用这个数据结构来生成 SMBIOS/设备树/ACPI。 - -SMBIOS 中存放的 ISA 信息仅包含 misa 中的那些,还是不够用。设备树通过一个 ISA string 来表示 ISA 扩展组合,ACPI 则是包含有一个 RHCT(RISC-V Hart Capabilities Table),其中同样包含了一个 ISA string 结点。 - -PLCT 在 2022 年曾经做过相关的[调研工作][008],当时的 Linux 内核可以通过 cpuinfo/环境变量/HWCAP/SIGILL 等方式在用户空间获取 ISA 扩展信息。 - -[本系列的上一篇文章][007]中介绍了 OpenSBI 的 RISC-V ISA 扩展检测情况,SBI 位于 M 模式,能够直接读取相关 CSR 来获得相应的扩展信息。 - -## 获取 Linux 源码 - -Linux 内核较大,使用 `git fetch` 可以断点续传,以免网络出问题导致 `git clone` 中断。 - -```sh -$ mkdir linux-kernel -$ cd linux-kernel -$ git init -$ git fetch https://gitee.com/mirrors/linux_old1.git -$ git checkout FETCH_HEAD -$ git remote add origin https://gitee.com/mirrors/linux_old1.git -$ git pull origin -``` - -`git checkout` 的输出表明:HEAD is now at 2dde18cd1d8f Linux 6.5,本文将基于这一版本进行分析。如果我们想要获取特定版本的 Linux 内核源码,可以从 [kernel.org][001] 上下载,也可以在 `git pull origin` 之后通过 `git checkout` 切换到指定版本。 - -## 编译内核并启动 - -编译内核需要用到 RISC-V 交叉编译工具链,本系列之前的文章已经介绍过,此处不再赘述: - -```sh -$ make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- defconfig -$ make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- -j $(nproc) -``` - -嵌入式领域通常使用 busybox 来构建根文件系统,首先下载编译 busybox: - -```sh -$ git clone https://gitee.com/mirrors/busyboxsource -$ cd busyboxsource -$ export CROSS_COMPILE=riscv64-linux-gnu- -$ make defconfig -$ make menuconfig -# 这里启用了 Settings-->Build Options 里的 Build static binary (no shared libs) 选项 -$ make -j $(nproc) -$ make install -``` - -制作文件系统并新建一个启动脚本: - -```sh -$ cd ~ -$ qemu-img create rootfs.img 1g -$ mkfs.ext4 rootfs.img -$ mkdir rootfs -$ sudo mount -o loop rootfs.img rootfs -$ cd rootfs -$ sudo cp -r ../busyboxsource/_install/* . -$ sudo mkdir proc sys dev etc etc/init.d -$ cd etc/init.d/ -$ sudo touch rcS -$ sudo vi rcS -``` - -编辑启动脚本 rcS 中的内容如下: - -```sh -#!/bin/sh -mount -t proc none /proc -mount -t sysfs none /sys -/sbin/mdev -s -``` - -并修改文件权限: - -```sh -$ sudo chmod +x rcS -$ cd ~ -$ sudo umount rootfs -``` - -随后尝试直接引导内核: - -```sh -$ qemu-system-riscv64 -M virt -m 256M -nographic -kernel linux-kernel/arch/riscv/boot/Image -drive file=rootfs.img,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -append "root=/dev/vda rw console=ttyS0" -``` - -可以得到 Linux 启动日志如下: - -```sh -$ qemu-system-riscv64 -M virt -m 256M -nographic -kernel linux-kernel/arch/riscv/boot/Image -drive file=rootfs.img,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -append "root=/dev/vda rw console=ttyS0" - -OpenSBI v1.3.1 - ____ _____ ____ _____ - / __ \ / ____| _ \_ _| - | | | |_ __ ___ _ __ | (___ | |_) || | - | | | | '_ \ / _ \ '_ \ \___ \| _ < | | - | |__| | |_) | __/ | | |____) | |_) || |_ - \____/| .__/ \___|_| |_|_____/|___/_____| - | | - |_| - -Platform Name : riscv-virtio,qemu -Platform Features : medeleg -Platform HART Count : 1 -Platform IPI Device : aclint-mswi -Platform Timer Device : aclint-mtimer @ 10000000Hz -Platform Console Device : uart8250 -Platform HSM Device : --- -Platform PMU Device : --- -Platform Reboot Device : sifive_test -Platform Shutdown Device : sifive_test -Platform Suspend Device : --- -Platform CPPC Device : --- -Firmware Base : 0x80000000 -Firmware Size : 194 KB -Firmware RW Offset : 0x20000 -Firmware RW Size : 66 KB -Firmware Heap Offset : 0x28000 -Firmware Heap Size : 34 KB (total), 2 KB (reserved), 9 KB (used), 22 KB (free) -Firmware Scratch Size : 4096 B (total), 760 B (used), 3336 B (free) -Runtime SBI Version : 1.0 - -Domain0 Name : root -Domain0 Boot HART : 0 -Domain0 HARTs : 0* -Domain0 Region00 : 0x0000000002000000-0x000000000200ffff M: (I,R,W) S/U: () -Domain0 Region01 : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: () -Domain0 Region02 : 0x0000000080020000-0x000000008003ffff M: (R,W) S/U: () -Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: (R,W,X) S/U: (R,W,X) -Domain0 Next Address : 0x0000000080200000 -Domain0 Next Arg1 : 0x000000008fe00000 -Domain0 Next Mode : S-mode -Domain0 SysReset : yes -Domain0 SysSuspend : yes - -Boot HART ID : 0 -Boot HART Domain : root -Boot HART Priv Version : v1.12 -Boot HART Base ISA : rv64imafdch -Boot HART ISA Extensions : time,sstc -Boot HART PMP Count : 16 -Boot HART PMP Granularity : 4 -Boot HART PMP Address Bits: 54 -Boot HART MHPM Count : 16 -Boot HART MIDELEG : 0x0000000000001666 -Boot HART MEDELEG : 0x0000000000f0b509 -[ 0.000000] Linux version 6.5.0 (mint@linux-lab-host) (riscv64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1 SMP Wed Aug 30 17:20:35 CST 2023 -[ 0.000000] random: crng init done -[ 0.000000] Machine model: riscv-virtio,qemu -[ 0.000000] SBI specification v1.0 detected -[ 0.000000] SBI implementation ID=0x1 Version=0x10003 -[ 0.000000] SBI TIME extension detected -[ 0.000000] SBI IPI extension detected -[ 0.000000] SBI RFENCE extension detected -[ 0.000000] SBI SRST extension detected -[ 0.000000] efi: UEFI not found. -[ 0.000000] OF: reserved mem: 0x0000000080000000..0x000000008001ffff (128 KiB) nomap non-reusable mmode_resv0@80000000 -[ 0.000000] OF: reserved mem: 0x0000000080020000..0x000000008003ffff (128 KiB) nomap non-reusable mmode_resv1@80020000 -[ 0.000000] Zone ranges: -[ 0.000000] DMA32 [mem 0x0000000080000000-0x000000008fffffff] -[ 0.000000] Normal empty -[ 0.000000] Movable zone start for each node -[ 0.000000] Early memory node ranges -[ 0.000000] node 0: [mem 0x0000000080000000-0x000000008003ffff] -[ 0.000000] node 0: [mem 0x0000000080040000-0x000000008fffffff] -[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000008fffffff] -[ 0.000000] SBI HSM extension detected -[ 0.000000] riscv: base ISA extensions acdfhim -[ 0.000000] riscv: ELF capabilities acdfim -[ 0.000000] percpu: Embedded 19 pages/cpu s40888 r8192 d28744 u77824 -[ 0.000000] Kernel command line: root=/dev/vda rw console=ttyS0 -[ 0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes, linear) -[ 0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) -[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64512 -[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off -[ 0.000000] Virtual kernel memory layout: -[ 0.000000] fixmap : 0xff1bfffffea00000 - 0xff1bffffff000000 (6144 kB) -[ 0.000000] pci io : 0xff1bffffff000000 - 0xff1c000000000000 ( 16 MB) -[ 0.000000] vmemmap : 0xff1c000000000000 - 0xff20000000000000 (1024 TB) -[ 0.000000] vmalloc : 0xff20000000000000 - 0xff60000000000000 (16384 TB) -[ 0.000000] modules : 0xffffffff0157b000 - 0xffffffff80000000 (2026 MB) -[ 0.000000] lowmem : 0xff60000000000000 - 0xff60000010000000 ( 256 MB) -[ 0.000000] kernel : 0xffffffff80000000 - 0xffffffffffffffff (2047 MB) -[ 0.000000] Memory: 218332K/262144K available (8728K kernel code, 4974K rwdata, 4096K rodata, 2200K init, 482K bss, 43812K reserved, 0K cma-reserved) -[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 -[ 0.000000] rcu: Hierarchical RCU implementation. -[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=1. -[ 0.000000] rcu: RCU debug extended QS entry/exit. -[ 0.000000] Tracing variant of Tasks RCU enabled. -[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. -[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1 -[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 -[ 0.000000] riscv-intc: 64 local interrupts mapped -[ 0.000000] plic: plic@c000000: mapped 95 interrupts with 1 handlers for 2 contexts. -[ 0.000000] riscv: providing IPIs using SBI IPI extension -[ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention. -[ 0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns -[ 0.000073] sched_clock: 64 bits at 10MHz, resolution 100ns, wraps every 4398046511100ns -[ 0.000182] riscv-timer: Timer interrupt in S-mode is available via sstc extension -[ 0.008216] Console: colour dummy device 80x25 -[ 0.009505] Calibrating delay loop (skipped), value calculated using timer frequency.. 20.00 BogoMIPS (lpj=40000) -[ 0.009635] pid_max: default: 32768 minimum: 301 -[ 0.010714] LSM: initializing lsm=capability,integrity -[ 0.012870] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear) -[ 0.012948] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear) -[ 0.040252] RCU Tasks Trace: Setting shift to 0 and lim to 1 rcu_task_cb_adjust=1. -[ 0.040655] riscv: ELF compat mode supported -[ 0.041099] ASID allocator using 16 bits (65536 entries) -[ 0.042073] rcu: Hierarchical SRCU implementation. -[ 0.042107] rcu: Max phase no-delay instances is 1000. -[ 0.044384] EFI services will not be available. -[ 0.045645] smp: Bringing up secondary CPUs ... -[ 0.046625] smp: Brought up 1 node, 1 CPU -[ 0.057137] devtmpfs: initialized -[ 0.064078] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns -[ 0.064309] futex hash table entries: 256 (order: 2, 16384 bytes, linear) -[ 0.066112] pinctrl core: initialized pinctrl subsystem -[ 0.072724] NET: Registered PF_NETLINK/PF_ROUTE protocol family -[ 0.080456] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations -[ 0.080750] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations -[ 0.081016] audit: initializing netlink subsys (disabled) -[ 0.084218] thermal_sys: Registered thermal governor 'step_wise' -[ 0.084758] cpuidle: using governor menu -[ 0.085806] audit: type=2000 audit(0.040:1): state=initialized audit_enabled=0 res=1 -[ 0.099344] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages -[ 0.099384] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page -[ 0.103034] ACPI: Interpreter disabled. -[ 0.104629] iommu: Default domain type: Translated -[ 0.104662] iommu: DMA domain TLB invalidation policy: strict mode -[ 0.106649] SCSI subsystem initialized -[ 0.108322] usbcore: registered new interface driver usbfs -[ 0.108532] usbcore: registered new interface driver hub -[ 0.108678] usbcore: registered new device driver usb -[ 0.119431] vgaarb: loaded -[ 0.139122] clocksource: Switched to clocksource riscv_clocksource -[ 0.141072] pnp: PnP ACPI: disabled -[ 0.158856] NET: Registered PF_INET protocol family -[ 0.159756] IP idents hash table entries: 4096 (order: 3, 32768 bytes, linear) -[ 0.164351] tcp_listen_portaddr_hash hash table entries: 128 (order: 0, 4096 bytes, linear) -[ 0.164450] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear) -[ 0.164501] TCP established hash table entries: 2048 (order: 2, 16384 bytes, linear) -[ 0.164727] TCP bind hash table entries: 2048 (order: 5, 131072 bytes, linear) -[ 0.164988] TCP: Hash tables configured (established 2048 bind 2048) -[ 0.165942] UDP hash table entries: 256 (order: 2, 24576 bytes, linear) -[ 0.166364] UDP-Lite hash table entries: 256 (order: 2, 24576 bytes, linear) -[ 0.167395] NET: Registered PF_UNIX/PF_LOCAL protocol family -[ 0.170224] RPC: Registered named UNIX socket transport module. -[ 0.170280] RPC: Registered udp transport module. -[ 0.170291] RPC: Registered tcp transport module. -[ 0.170305] RPC: Registered tcp-with-tls transport module. -[ 0.170315] RPC: Registered tcp NFSv4.1 backchannel transport module. -[ 0.170463] PCI: CLS 0 bytes, default 64 -[ 0.177633] workingset: timestamp_bits=46 max_order=16 bucket_order=0 -[ 0.180895] NFS: Registering the id_resolver key type -[ 0.181717] Key type id_resolver registered -[ 0.181750] Key type id_legacy registered -[ 0.181976] nfs4filelayout_init: NFSv4 File Layout Driver Registering... -[ 0.182049] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver Registering... -[ 0.182691] 9p: Installing v9fs 9p2000 file system support -[ 0.184022] NET: Registered PF_ALG protocol family -[ 0.184294] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 246) -[ 0.184406] io scheduler mq-deadline registered -[ 0.184462] io scheduler kyber registered -[ 0.184546] io scheduler bfq registered -[ 0.187615] pci-host-generic 30000000.pci: host bridge /soc/pci@30000000 ranges: -[ 0.188285] pci-host-generic 30000000.pci: IO 0x0003000000..0x000300ffff -> 0x0000000000 -[ 0.188845] pci-host-generic 30000000.pci: MEM 0x0040000000..0x007fffffff -> 0x0040000000 -[ 0.188906] pci-host-generic 30000000.pci: MEM 0x0400000000..0x07ffffffff -> 0x0400000000 -[ 0.189388] pci-host-generic 30000000.pci: Memory resource size exceeds max for 32 bits -[ 0.189752] pci-host-generic 30000000.pci: ECAM at [mem 0x30000000-0x3fffffff] for [bus 00-ff] -[ 0.191528] pci-host-generic 30000000.pci: PCI host bridge to bus 0000:00 -[ 0.191752] pci_bus 0000:00: root bus resource [bus 00-ff] -[ 0.191831] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] -[ 0.191884] pci_bus 0000:00: root bus resource [mem 0x40000000-0x7fffffff] -[ 0.191897] pci_bus 0000:00: root bus resource [mem 0x400000000-0x7ffffffff] -[ 0.193325] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000 -[ 0.287202] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled -[ 0.297113] printk: console [ttyS0] disabled -[ 0.300265] 10000000.serial: ttyS0 at MMIO 0x10000000 (irq = 12, base_baud = 230400) is a 16550A -[ 0.301524] printk: console [ttyS0] enabled -[ 0.323403] SuperH (H)SCI(F) driver initialized -[ 0.343215] loop: module loaded -[ 0.344307] virtio_blk virtio0: 1/0/0 default/read/poll queues -[ 0.348817] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB) -[ 0.384041] e1000e: Intel(R) PRO/1000 Network Driver -[ 0.384260] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. -[ 0.387880] usbcore: registered new interface driver uas -[ 0.388206] usbcore: registered new interface driver usb-storage -[ 0.389412] mousedev: PS/2 mouse device common for all mice -[ 0.393353] goldfish_rtc 101000.rtc: registered as rtc0 -[ 0.394317] goldfish_rtc 101000.rtc: setting system clock to 2023-09-02T07:35:32 UTC (1693640132) -[ 0.398165] syscon-poweroff poweroff: pm_power_off already claimed for sbi_srst_power_off -[ 0.399069] syscon-poweroff: probe of poweroff failed with error -16 -[ 0.401754] sdhci: Secure Digital Host Controller Interface driver -[ 0.401947] sdhci: Copyright(c) Pierre Ossman -[ 0.402639] sdhci-pltfm: SDHCI platform and OF driver helper -[ 0.403484] usbcore: registered new interface driver usbhid -[ 0.403664] usbhid: USB HID core driver -[ 0.404356] riscv-pmu-sbi: SBI PMU extension is available -[ 0.405010] riscv-pmu-sbi: 16 firmware and 18 hardware counters -[ 0.405216] riscv-pmu-sbi: Perf sampling/filtering is not supported as sscof extension is not available -[ 0.410466] NET: Registered PF_INET6 protocol family -[ 0.418128] Segment Routing with IPv6 -[ 0.418632] In-situ OAM (IOAM) with IPv6 -[ 0.419337] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver -[ 0.422999] NET: Registered PF_PACKET protocol family -[ 0.424737] 9pnet: Installing 9P2000 support -[ 0.425341] Key type dns_resolver registered -[ 0.466679] debug_vm_pgtable: [debug_vm_pgtable ]: Validating architecture page table helpers -[ 0.476006] clk: Disabling unused clocks -[ 0.569745] EXT4-fs (vda): recovery complete -[ 0.572787] EXT4-fs (vda): mounted filesystem b1c2c62f-2f6d-4da5-af97-58c3648c79f4 r/w with ordered data mode. Quota mode: disabled. -[ 0.573412] VFS: Mounted root (ext4 filesystem) on device 254:0. -[ 0.576056] devtmpfs: mounted -[ 0.618132] Freeing unused kernel image (initmem) memory: 2200K -[ 0.618994] Run /sbin/init as init process - -Please press Enter to activate this console. - -``` - -## Base ISA extensions & ELF capabilities - -Linux Kernel 的启动日志中有两行输出如下: - -```sh -[ 0.000000] riscv: base ISA extensions acdfhim -[ 0.000000] riscv: ELF capabilities acdfim -``` - -在源码中以 `base ISA extensions` 作为关键字进行搜索,可以发现 ISA 扩展信息相关的函数位于 `arch/riscv/kernel/cpufeature.c` 这一文件中。这部分代码是在 [commit 6bcff51][006] 引入的,其中 riscv_isa 这个 bitmap 用于表示主机上所有 CPU 支持的 ISA 扩展的交集,而 elf_hwcap 仅用于表示与用户空间有关的 ISA 扩展的交集。 - -相关的函数比较长,下面通过添加中文注释的方式进行分析: - -```c -/* arch/riscv/kernel/cpufeature.c:102 */ - -void __init riscv_fill_hwcap(void) -{ - // hwcap 指 hardware capability - struct device_node *node; - const char *isa; - char print_str[NUM_ALPHA_EXTS + 1]; - int i, j, rc; - unsigned long isa2hwcap[26] = {0}; - struct acpi_table_header *rhct; - acpi_status status; - unsigned int cpu; - // COMPAT_HWCAP_ISA_? == (1 << (字母 ? 的 ASCII 码 - 'A')) - isa2hwcap['i' - 'a'] = COMPAT_HWCAP_ISA_I; - isa2hwcap['m' - 'a'] = COMPAT_HWCAP_ISA_M; - isa2hwcap['a' - 'a'] = COMPAT_HWCAP_ISA_A; - isa2hwcap['f' - 'a'] = COMPAT_HWCAP_ISA_F; - isa2hwcap['d' - 'a'] = COMPAT_HWCAP_ISA_D; - isa2hwcap['c' - 'a'] = COMPAT_HWCAP_ISA_C; - isa2hwcap['v' - 'a'] = COMPAT_HWCAP_ISA_V; - - elf_hwcap = 0; - - // 将 riscv_isa 这个 bitmap 清零 - bitmap_zero(riscv_isa, RISCV_ISA_EXT_MAX); - - // 检测是否启用了 ACPI,如果是,读取 ACPI status - if (!acpi_disabled) { - status = acpi_get_table(ACPI_SIG_RHCT, 0, &rhct); - if (ACPI_FAILURE(status)) - return; - } - - for_each_possible_cpu(cpu) { - // struct riscv_isainfo 结构体中包含了一个名为 isa 的长为 64 位的 bitmap - struct riscv_isainfo *isainfo = &hart_isa[cpu]; - // 当前 cpu 的 hwcap - unsigned long this_hwcap = 0; - - // 如果没有启用 ACPI,就读取设备树以获取 isa 信息 - if (acpi_disabled) { - // 获取设备树结点 - node = of_cpu_device_node_get(cpu); - if (!node) { - pr_warn("Unable to find cpu node\n"); - continue; - } - // 从设备树中读出 isa 字符串 - rc = of_property_read_string(node, "riscv,isa", &isa); - of_node_put(node); - if (rc) { - pr_warn("Unable to find \"riscv,isa\" devicetree entry\n"); - continue; - } - } else { - // 如果启用了 ACPI,通过 ACPI 获取 isa 字符串 - rc = acpi_get_riscv_isa(rhct, cpu, &isa); - if (rc < 0) { - pr_warn("Unable to get ISA for the hart - %d\n", cpu); - continue; - } - } - - /* - * For all possible cpus, we have already validated in - * the boot process that they at least contain "rv" and - * whichever of "32"/"64" this kernel supports, and so this - * section can be skipped. - */ - // isa 字符串必包含 rv+位数共 4 个字符,可以跳过对这四个的检查 - isa += 4; - - while (*isa) { - const char *ext = isa++; - const char *ext_end = isa; - // ext_long 表示是否有多字母扩展 - bool ext_long = false, ext_err = false; - - // 逐字符检查 isa 字符串中的扩展 - switch (*ext) { - case 's': - /* - * Workaround for invalid single-letter 's' & 'u'(QEMU). - * No need to set the bit in riscv_isa as 's' & 'u' are - * not valid ISA extensions. It works until multi-letter - * extension starting with "Su" appears. - */ - - if (ext[-1] != '_' && ext[1] == 'u') { - ++isa; - ext_err = true; - break; - } - fallthrough; - case 'S': - case 'x': - case 'X': - case 'z': - case 'Z': - /* - * Before attempting to parse the extension itself, we find its end. - * As multi-letter extensions must be split from other multi-letter - * extensions with an "_", the end of a multi-letter extension will - * either be the null character or the "_" at the start of the next - * multi-letter extension. - * - * Next, as the extensions version is currently ignored, we - * eliminate that portion. This is done by parsing backwards from - * the end of the extension, removing any numbers. This may be a - * major or minor number however, so the process is repeated if a - * minor number was found. - * - * ext_end is intended to represent the first character *after* the - * name portion of an extension, but will be decremented to the last - * character itself while eliminating the extensions version number. - * A simple re-increment solves this problem. - */ - ext_long = true; - for (; *isa && *isa != '_'; ++isa) - if (unlikely(!isalnum(*isa))) - ext_err = true; - - ext_end = isa; - if (unlikely(ext_err)) - break; - - if (!isdigit(ext_end[-1])) - break; - - while (isdigit(*--ext_end)) - ; - - if (tolower(ext_end[0]) != 'p' || !isdigit(ext_end[-1])) { - ++ext_end; - break; - } - - while (isdigit(*--ext_end)) - ; - - ++ext_end; - break; - default: - /* - * Things are a little easier for single-letter extensions, as they - * are parsed forwards. - * - * After checking that our starting position is valid, we need to - * ensure that, when isa was incremented at the start of the loop, - * that it arrived at the start of the next extension. - * - * If we are already on a non-digit, there is nothing to do. Either - * we have a multi-letter extension's _, or the start of an - * extension. - * - * Otherwise we have found the current extension's major version - * number. Parse past it, and a subsequent p/minor version number - * if present. The `p` extension must not appear immediately after - * a number, so there is no fear of missing it. - * - */ - if (unlikely(!isalpha(*ext))) { - ext_err = true; - break; - } - - if (!isdigit(*isa)) - break; - - while (isdigit(*++isa)) - ; - - if (tolower(*isa) != 'p') - break; - - if (!isdigit(*++isa)) { - --isa; - break; - } - - while (isdigit(*++isa)) - ; - - break; - } - - /* - * The parser expects that at the start of an iteration isa points to the - * first character of the next extension. As we stop parsing an extension - * on meeting a non-alphanumeric character, an extra increment is needed - * where the succeeding extension is a multi-letter prefixed with an "_". - */ - if (*isa == '_') - ++isa; - -#define SET_ISA_EXT_MAP(name, bit) \ - do { \ - if ((ext_end - ext == sizeof(name) - 1) && \ - !strncasecmp(ext, name, sizeof(name) - 1) && \ - riscv_isa_extension_check(bit)) \ - set_bit(bit, isainfo->isa); \ - } while (false) \ - - if (unlikely(ext_err)) - continue; - if (!ext_long) { - // 如果没有多字母扩展 - int nr = tolower(*ext) - 'a'; - if (riscv_isa_extension_check(nr)) { - // 设置 hwcap - this_hwcap |= isa2hwcap[nr]; - set_bit(nr, isainfo->isa); - } - } else { - // 判断多字母扩展,检测并设置相应的 bitmap - // riscv_isa_extension_check 函数会额外检测 Zicbom 与 Zicboz 扩展的一些限制 - /* sorted alphabetically */ - SET_ISA_EXT_MAP("smaia", RISCV_ISA_EXT_SMAIA); - SET_ISA_EXT_MAP("ssaia", RISCV_ISA_EXT_SSAIA); - SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF); - SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC); - SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL); - SET_ISA_EXT_MAP("svnapot", RISCV_ISA_EXT_SVNAPOT); - SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); - SET_ISA_EXT_MAP("zba", RISCV_ISA_EXT_ZBA); - SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB); - SET_ISA_EXT_MAP("zbs", RISCV_ISA_EXT_ZBS); - SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); - SET_ISA_EXT_MAP("zicboz", RISCV_ISA_EXT_ZICBOZ); - SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE); - } -#undef SET_ISA_EXT_MAP - } - - /* - * These ones were as they were part of the base ISA when the - * port & dt-bindings were upstreamed, and so can be set - * unconditionally where `i` is in riscv,isa on DT systems. - */ - - // 如果未启用 ACPI,直接无条件将以下扩展在 bitmap 中标记为已启用 - if (acpi_disabled) { - set_bit(RISCV_ISA_EXT_ZICSR, isainfo->isa); - set_bit(RISCV_ISA_EXT_ZIFENCEI, isainfo->isa); - set_bit(RISCV_ISA_EXT_ZICNTR, isainfo->isa); - set_bit(RISCV_ISA_EXT_ZIHPM, isainfo->isa); - } - - /* - * All "okay" hart should have same isa. Set HWCAP based on - * common capabilities of every "okay" hart, in case they don't - * have. - */ - // 取出各个 CPU hwcap 的交集作为 elf_hwcap - if (elf_hwcap) - elf_hwcap &= this_hwcap; - else - elf_hwcap = this_hwcap; - - // 取出各个 CPU ISA 扩展的 bitmap 的交集作为 riscv_isa - if (bitmap_empty(riscv_isa, RISCV_ISA_EXT_MAX)) - bitmap_copy(riscv_isa, isainfo->isa, RISCV_ISA_EXT_MAX); - else - bitmap_and(riscv_isa, riscv_isa, isainfo->isa, RISCV_ISA_EXT_MAX); - } - // 如果启用了 ACPI,并且存在 RHCT - // RHCT 指 RISC-V Hart Capabilities Table,是 RISC-V CPU 与 OS 之间交流 CPU 的功能特性时使用的表格结构 - if (!acpi_disabled && rhct) - acpi_put_table((struct acpi_table_header *)rhct); - - /* We don't support systems with F but without D, so mask those out - * here. */ - // Linux 要求若启用了 F 扩展,D 扩展必须同时启用,否则就关闭 F 扩展 - if ((elf_hwcap & COMPAT_HWCAP_ISA_F) && !(elf_hwcap & COMPAT_HWCAP_ISA_D)) { - pr_info("This kernel does not support systems with F but not D\n"); - elf_hwcap &= ~COMPAT_HWCAP_ISA_F; - } - - if (elf_hwcap & COMPAT_HWCAP_ISA_V) { - riscv_v_setup_vsize(); - /* - * ISA string in device tree might have 'v' flag, but - * CONFIG_RISCV_ISA_V is disabled in kernel. - * Clear V flag in elf_hwcap if CONFIG_RISCV_ISA_V is disabled. - */ - // 如果 config 里没有启用 V 扩展,但是设备树的 ISA string 里包含了 V 扩展,则在 elf_hwcap 中将 V 扩展标记为未启用 - if (!IS_ENABLED(CONFIG_RISCV_ISA_V)) - elf_hwcap &= ~COMPAT_HWCAP_ISA_V; - } - - memset(print_str, 0, sizeof(print_str)); - for (i = 0, j = 0; i < NUM_ALPHA_EXTS; i++) - if (riscv_isa[0] & BIT_MASK(i)) - print_str[j++] = (char)('a' + i); - // 按照字典序输出启用的单字母扩展 - pr_info("riscv: base ISA extensions %s\n", print_str); - - memset(print_str, 0, sizeof(print_str)); - for (i = 0, j = 0; i < NUM_ALPHA_EXTS; i++) - if (elf_hwcap & BIT_MASK(i)) - print_str[j++] = (char)('a' + i); - // 按照字典序输出 ELF 支持的扩展 - pr_info("riscv: ELF capabilities %s\n", print_str); -} -``` - -将上述代码简单总结一下: - -- S-mode 的 Linux 内核会通过 ACPI 或设备树中的 ISA string 得到 RISC-V ISA 扩展信息,但不是直接拿来用,而是会进行一些合法性检测与其它设置,例如:内核中 F 扩展和 D 扩展要么都启用,要么都不启用。 -- 内核最终输出的所支持的 ISA,是各个 hart 所支持的 ISA 的交集,并在此基础上应用一些 config 配置文件中有关 ISA 扩展的设置。 -- 未启用 ACPI 时,Linux 内核通过设备树中的 ISA string 获得扩展信息,此时 Zicsr,Zifencei,Zicntr,Zihpm 扩展会无条件启用。 -- 启用 ACPI 时,Linux 会尝试获取 ACPI 中用于向 OS 传递 CPU 信息的 RHCT(RISC-V Hart Capabilities Table)结构,如果存在 RHCT,就从中读取出 ISA string。 - -上述代码所检测的这些扩展在 bitmap 中所对应的位号定义在 `arch/riscv/include/asm/hwcap.h` 中,每个扩展对应一个位,其中低 26 位用于单字母扩展,多字母扩展对应的位号从 27 开始分配,bitmap 的最大容量为 64: - -```c -/* arch/riscv/include/asm/hwcap.h:16 */ - -#define RISCV_ISA_EXT_a ('a' - 'a') -#define RISCV_ISA_EXT_c ('c' - 'a') -#define RISCV_ISA_EXT_d ('d' - 'a') -#define RISCV_ISA_EXT_f ('f' - 'a') -#define RISCV_ISA_EXT_h ('h' - 'a') -#define RISCV_ISA_EXT_i ('i' - 'a') -#define RISCV_ISA_EXT_m ('m' - 'a') -#define RISCV_ISA_EXT_s ('s' - 'a') -#define RISCV_ISA_EXT_u ('u' - 'a') -#define RISCV_ISA_EXT_v ('v' - 'a') - -/* - * These macros represent the logical IDs of each multi-letter RISC-V ISA - * extension and are used in the ISA bitmap. The logical IDs start from - * RISCV_ISA_EXT_BASE, which allows the 0-25 range to be reserved for single - * letter extensions. The maximum, RISCV_ISA_EXT_MAX, is defined in order - * to allocate the bitmap and may be increased when necessary. - * - * New extensions should just be added to the bottom, rather than added - * alphabetically, in order to avoid unnecessary shuffling. - */ -#define RISCV_ISA_EXT_BASE 26 - -#define RISCV_ISA_EXT_SSCOFPMF 26 -#define RISCV_ISA_EXT_SSTC 27 -#define RISCV_ISA_EXT_SVINVAL 28 -#define RISCV_ISA_EXT_SVPBMT 29 -#define RISCV_ISA_EXT_ZBB 30 -#define RISCV_ISA_EXT_ZICBOM 31 -#define RISCV_ISA_EXT_ZIHINTPAUSE 32 -#define RISCV_ISA_EXT_SVNAPOT 33 -#define RISCV_ISA_EXT_ZICBOZ 34 -#define RISCV_ISA_EXT_SMAIA 35 -#define RISCV_ISA_EXT_SSAIA 36 -#define RISCV_ISA_EXT_ZBA 37 -#define RISCV_ISA_EXT_ZBS 38 -#define RISCV_ISA_EXT_ZICNTR 39 -#define RISCV_ISA_EXT_ZICSR 40 -#define RISCV_ISA_EXT_ZIFENCEI 41 -#define RISCV_ISA_EXT_ZIHPM 42 - -#define RISCV_ISA_EXT_MAX 64 -#define RISCV_ISA_EXT_NAME_LEN_MAX 32 - -#ifdef CONFIG_RISCV_M_MODE -#define RISCV_ISA_EXT_SxAIA RISCV_ISA_EXT_SMAIA -#else -#define RISCV_ISA_EXT_SxAIA RISCV_ISA_EXT_SSAIA -#endif -``` - -但解析 ISA string 面临着诸多问题,从上述代码也可以看出来对 ISA string 的解析比较繁琐且容易出错,相关讨论见 [邮件列表][005]: - -> There's been a bunch of off-list discussions about this, including at -> Plumbers. The original plan was to do something involving providing an -> ISA string to userspace, but ISA strings just aren't sufficient for a -> stable ABI any more: in order to parse an ISA string users need the -> version of the specifications that the string is written to, the version -> of each extension (sometimes at a finer granularity than the RISC-V -> releases/versions encode), and the expected use case for the ISA string -> (ie, is it a U-mode or M-mode string). That's a lot of complexity to -> try and keep ABI compatible and it's probably going to continue to grow, -> as even if there's no more complexity in the specifications we'll have -> to deal with the various ISA string parsing oddities that end up all -> over userspace. - -于是 Linux Kernel 又在用户空间引入了新的系统调用,详见下一小节。 - -## RISC-V Hardware Probing Interface - -上一小节中提到的 elf_hwcap 仅有 64 位可用,但用户空间需要检测的扩展可能不止 64 个,因此上述机制同样面临位数不够的问题。 - -为了解决这些问题,Linux 内核在 [commit ea3de9c][004] 中引入了一个用于在用户空间进行硬件检测的系统调用,相关文档见 - -`Documentation/riscv/hwprobe.rst`。该系统调用的参数包括一个键值对数组,键值对的个数,CPU 个数,CPU set 与一个 flag,目前支持检测 `m{arch,imp,vendor}id` 和少数 ISA 扩展,未来能够基于键值对参数进行更多的检测: - -```c -struct riscv_hwprobe { - __s64 key; - __u64 value; -}; - -long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count, size_t cpu_count, cpu_set_t *cpus, unsigned int flags); -``` - -其中与扩展检测相关的 C 代码如下: - -```c -/* arch/riscv/kernel/sys_riscv.c:125 */ - -static void hwprobe_isa_ext0(struct riscv_hwprobe *pair, - const struct cpumask *cpus) -{ - int cpu; - u64 missing = 0; - - pair->value = 0; - if (has_fpu()) - pair->value |= RISCV_HWPROBE_IMA_FD; - - if (riscv_isa_extension_available(NULL, c)) - pair->value |= RISCV_HWPROBE_IMA_C; - - if (has_vector()) - pair->value |= RISCV_HWPROBE_IMA_V; - - /* - * Loop through and record extensions that 1) anyone has, and 2) anyone - * doesn't have. - */ - for_each_cpu(cpu, cpus) { - struct riscv_isainfo *isainfo = &hart_isa[cpu]; - - if (riscv_isa_extension_available(isainfo->isa, ZBA)) - pair->value |= RISCV_HWPROBE_EXT_ZBA; - else - missing |= RISCV_HWPROBE_EXT_ZBA; - - if (riscv_isa_extension_available(isainfo->isa, ZBB)) - pair->value |= RISCV_HWPROBE_EXT_ZBB; - else - missing |= RISCV_HWPROBE_EXT_ZBB; - - if (riscv_isa_extension_available(isainfo->isa, ZBS)) - pair->value |= RISCV_HWPROBE_EXT_ZBS; - else - missing |= RISCV_HWPROBE_EXT_ZBS; - } - - /* Now turn off reporting features if any CPU is missing it. */ - pair->value &= ~missing; -} -``` - -上述这段代码检查 `IMAFDCV_Zba_Zbb_Zbs` 这些扩展是否受支持。其中 F 和 D 扩展是绑定的,Linux 内核中这两个扩展要么都启用,要么都关闭。 - -## 总结 - -本文通过对 `riscv_fill_hwcap` 函数的分析,介绍了 Linux 利用设备树/ACPI 检测 RISC-V ISA 扩展并生成 HWCAP 的方式,并介绍了 Linux 新引入的硬件检测系统调用 hw_probe。 - -2022 年,PLCT 曾经做过有关 RISC-V ISA 扩展检测机制的[调研][008],当时的 Linux 内核通过 M 模式传递来的设备树/SMBIOS/ACPI 中所包含的信息来在检测 ISA 扩展,而用户态可以通过 HWCAP/cpuinfo/环境变量/SIGILL 等方式对 RISC-V ISA 进行检测,但大部分方法本质上都是基于 ISA-string,对 ISA-string 的解析容易出错,且用户态缺少相应的硬件检测系统调用。 - -如今,用户空间的 RISC-V ISA 检测机制发生了一些变化,在 Linux v6.4 版本中引入了新的系统调用 hw_probe,用于检测硬件。目前这一系统调用支持的功能比较少,但它能够解决 HWCAP 位数不够用,用户空间缺少硬件检测的系统调用等问题。 - -## 参考资料 - -- [kernel.org][001] -- [RISC-V Linux 启动流程分析][002] -- [RISC-V 当前指令集扩展类别与检测方式][003] -- [commit ea3de9c: RISC-V: Add a syscall for HW probing][004] -- [引入 RISC-V Hardware Probing User Interface 的邮件讨论][005] -- [commit 6bcff51: RISC-V: Add bitmap reprensenting ISA features common across CPUs][006] -- [OpenSBI RISC-V ISA 扩展检测与支持方式分析][007] -- [PLCT 2022 年对 RISC-V ISA 扩展检测方式的调研][008] - -[001]: https://mirrors.edge.kernel.org/pub/linux/kernel/ -[002]: https://tinylab.org/riscv-linux-startup/ -[003]: https://gitee.com/tinylab/riscv-linux/blob/master/articles/20230715-riscv-isa-extensions-discovery-1.md -[004]: https://github.com/torvalds/linux/commit/ea3de9ce8aa280c5175c835bd3e94a3a9b814b74#diff-24372ab3ad2d22486b15d8a8f7e9e53a04e16efe0a392cec83786e24cb767bdd -[005]: https://lore.kernel.org/all/20230411-primate-rice-a5c102f90c6c@wendy/https://lore.kernel.org/all/20230411-primate-rice-a5c102f90c6c@wendy/ -[006]: https://github.com/torvalds/linux/commit/6bcff51539ccae5431a01f60293419dbae21100f -[007]: https://gitee.com/tinylab/riscv-linux/blob/master/articles/20230816-riscv-isa-discovery-4-opensbi.md -[008]: https://github.com/plctlab/PLCT-Open-Reports/blob/master/20220706-%E9%83%91%E9%88%9C%E5%A3%AC-discovery.pdf diff --git a/articles/20230928-riscv-isa-extensions-discovery-6-unified-discovery.md b/articles/20230928-riscv-isa-extensions-discovery-6-unified-discovery.md new file mode 100644 index 0000000..7fa730c --- /dev/null +++ b/articles/20230928-riscv-isa-extensions-discovery-6-unified-discovery.md @@ -0,0 +1,279 @@ +> Corrector: [TinyCorrect](https://gitee.com/tinylab/tinycorrect) v0.2-rc2 - [spaces quotes comments tables urls epw]
+> Author: YJMSTR [jay1273062855@outlook.com](mailto:jay1273062855@outlook.com)
+> Date: 2023/09/28
+> Revisor: Bin Meng, Falcon
+> Project: [RISC-V Linux 内核剖析](https://gitee.com/tinylab/riscv-linux)
+> Sponsor: ISCAS + +# Unified Discovery 简介及其软硬件协作现状 + +## 前言 + +本文是 RISC-V ISA 扩展的软硬件支持调研系列的第 6 篇文章,将介绍 Unified Discovery 机制。之前的文章中已经对 RISC-V ISA 扩展的分类、命名、硬件支持机制、GCC 支持、OpenSBI 支持、QEMU 支持和 Linux 内核支持进行了分析。 + +RISC-V 官方的 GitHub 仓库于 2023 年 9 月 8 日更新了 Unified Discovery Spec 的[草稿][001],其定义了统一的底层硬件检测机制。本文会对此草稿进行翻译,并结合之前介绍过的 mconfigptr 和 ACPI/SMBIOS/设备树进行分析。 + +在本系列第 1 篇文章[《RISC-V 当前指令集扩展类别与检测方式》][002] 中,已经介绍过了 RISC-V ISA 以及 mconfigptr 相关的基本概念。系统固件可以将从配置结构或其它地方获取的信息编码为设备树/SMBIOS/ACPI 等,再传递给后续启动流程。 + +## Unified Discovery Spec 翻译 + +### Introduction + +> Unified Discovery is intended to be a low-level discovery mechanism. A low-level discovery mechanism is distinguished from a high-level discovery mechanism. The former typically prepare and produce necessary data that is consumed by the latter. Diversified applications would require different high-level mechanism, meanwhile, the low-level mechanism provides the foundation. + +Unified Discovery 旨在成为底层的检测机制。底层的检测机制与上层的检测机制不同,前者通常准备并提供用于后者使用的数据。多种多样的应用程序对上层检测机制的要求不同,于此同时,底层的检测机制为其提供了基础。 + +> This proposal describes a low-level discovery mechanism that is capable of supporting the following use cases: +> +> 1. Hosted discovery of features by firmware, operating systems and applications +> 1. Rich operating systems +> 2. Simple software applications +> 2. Discovery of features by external debug tools +> 3. Out-of-band discovery of features to allow development tools to specialise +> 4. firmware (e.g. choose the appropriate target flags for compilation and link in the required libraries) for deeply embedded applications + +本提案描述了适用于以下场景的底层检测机制: + +1. 固件,操作系统和应用程序托管而来的硬件特性检测 + 1. Rich OS + 2. 简单的软件应用 +2. 外部调试工具的硬件检测 +3. 允许开发工具特化的带外检测 +4. 用于深度嵌入式应用的固件(例:选择合适的 target flags 用于编译,并链接到所需的库) + +> As an example, consider a typical Linux stack: +> +> 1. Firmware performs system/machine-dependent discovery and populates either a Linux device tree or ACPI tables. +> 2. The Linux kernel parses either a Linux device tree or ACPI tables, exposing system specifics to userland processes through multiple interfaces: +> 1. special files the /proc filesystem +> 2. device drivers available under /dev +> 3. files in a mounted sysfs instance +> 4. flags injected into ELF binaries through the ELF auxiliary vector and accessible through getauxval() +> 5. configuration retrieved through the sysconf() system call +> 6. information retrievable through the vdso (a virtual DSO mapped by the kernel into each processes’ address space) + +例如,考虑如下的典型 Linux 栈: + +1. 固件执行依赖于系统/机器的检测机制并生成 Linux 设备树或 ACPI 表格 +2. Linux 内核解析 Linux 设备树或 ACPI 表,通过多个接口向用户侧进程暴露系统特性: + 1. /proc 文件系统中的特殊文件 + 2. /dev 目录下可用的设备驱动 + 3. 已挂载的 sysfs 实例中的文件 + 4. 通过 ELF auxiliary vector 注入到 ELF 二进制中并可以通过 getauxval() 访问的标志 + 5. 从 sysconf() 系统调用返回的配置信息 + 6. vdso(一个虚拟 dynamic shared object,由内核映射到每个进程的地址空间)返回的信息 + +#### No Central Registry + +> The proposal provides a solution that does not require a central registry. +> +> RISC-V allows the vendor-specific extensibility of the ISA without any coordination with RISC-V International (as long as no required features are removed and no incompatible features are introduced). + +本提案提供了一种不需要中间登记处的解决方案。 + +RISC-V 允许厂商在没有 RISC-V 国际基金会参与的情况下添加自定义 ISA 扩展(只要没有将要求的功能移除,并且没有引入新的不兼容的特性) + +> This should also be reflected in the architecture of the discovery format and require a minimum coordination between implementation and RISC-V International: +> +> 1. Based on the published “rules of the land” (i.e. modelling language, encoding rules and the top-level message description), implementers (both soft- and hardware) shall be able to: +> 1. add proprietary entries in the configuration message, that can safely be identified, skipped and linked back to the vendor (without a global vendor registry being operated by RISC-V) that specified the proprietary entry +> 2. parse any configuration message, including the ability: +> 1. to parse a newer-version message, identifying the new “extensions” and being able to safely skip over them +> 2. To parse a message containing vendor-extensions +> 2. Publish a basic message format that enforces the presence of required fields as a machine-readable document/schema. + +这同样需要体现在检测格式的架构中,并且要求实现和 RISC-V 国际基金会之间进行最低限度的协调: + +1. 基于已发布的规则(即建模语言、编码规则和顶层信息描述),实施者(软件和硬件)应该能够: + 1. 向配置信息报文中添加可以被安全地识别,跳过和链接回指定该属性条目的产商(在没有 RISC-V 操作的中间注册处的情况下)的属性条目 + 2. 解析任何配置信息报文,包括如下能力: + 1. 解析更新版本的报文,识别新扩展并能安全地跳过它们。 + 2. 解析包含产商信息的报文 +2. 发布基本报文格式,在基本报文格式中强制要求所需字段作为机器可读文档/表格存在。 + +#### Complete Consumption Requirement + +> A key requirement in any discovery process that avoids a centralized registry is a client’s ability to discover that it has read the entire message (including parts that it can not understand and skips over) and whether any unparsable extra elements were included in the message. This is termed the Complete Consumption Requirement. + +为了避免 centralized registry,所有检测过程都应满足一个关键要求:客户端要能够检测到它自己已经读取了完整的报文(包括它无法识别和跳过的部分)并能够检测到报文中是否包含任何无法解析的额外元素。这被称为 Complete Consumption Requirement。 + +> | Note | The complete consumption requirement is one of the unresolvable issues for CPUID-style instructions that query for values using keys. | +> | ---- | ------------------------------------------------------------ | + +| 注意 | Complete comsumption requirement 是用键查询值的类 CPUID 指令无法解决的问题 | +| ---- | ------------------------------------------------------------ | + +#### Note on Security + +> Independent of the underlying mechanism (i.e. whether a memory-based configuration message is read or CPUID-style instructions are used), securing the discovery mechanism will require cryptographically signed checksums (i.e. electronic signatures) to ascertain the authenticity, integrity and the originator of the configuration data. + +不管底层使用了怎样的检测机制(即读取基于内存的配置报文或使用类 CPUID 的指令),确保检测机制的安全需要用到加密签名校验和(即电子签名)以确保配置数据的真实性、完整性和来源。 + +> Signing the configuration message should be an integral (albeit optional) part of the message format. While this can not address the playback of a valid configuration message, it allows the discovery of modified messages. + +配置报文的签名应当成为报文格式的组成部分之一(尽管是可选的)。虽然这不能解决有效配置报文的回放问题,但是它能够检测报文是否被修改过。 + +> We do not believe that the goal of end-to-end security can be efficiently achieved using a design-approach similar to Intel’s CPUID instruction: a cryptographic signature would need to be computed across the entire configuration space (and not merely individual elements). This precludes the absence of a central registry, as the valid key space needs to be known in advance to concatenate the plaintext for signing. + +我们认为,使用类似于 Intel 的 CPUID 指令无法有效实现端到端安全:因为加密签名需要在整个配置空间(而不仅仅是单个元素)内进行计算。这使得 central registry 必须存在,因为有效键空间需要被提前知晓,用于连接明文,进行签名。 + +### Solution Outline + +> 1. Schema + Value Notation (human readable form) + Parser +> 2. Reuse existing standards → ITU standards & examples (SNMP) +> 3. vendor-specific info +> +> A binary-encoded representation of a device’s configuration is made available to software within the device’s physical address space. The data structure is described as a subset of ASN.1 (see ITU-T X.680 and ISO/IEC 8824) and encoded using standardized encoding rules (see ITU-T X.690 and ISO 8825). For in-memory representations, the unaligned packed encoding rules (unaligned PER, see ITU-T X.691) are used. The configuration data can (optionally) be cryptographically signed. + +1. 模式 + 值标记(人类可读形式)+ 解析器 +2. 重用已存在的标准 → ITU 标准和例子(SNMP) +3. 厂商特定的信息 + +用二进制编码表示的设备配置信息可供设备物理地址空间内的软件使用。配置信息数据结构用 ASN.1 的子集(见 ITU-T X.680 和 ISO/IEC 8824)进行描述,并用标准化编码规则进行编码(见 ITU-T X.690 和 ISO 8825)。内存中的数据使用非对齐打包编码规则(unaligned PER,见 ITU-T X.691)进行表示。配置数据可以(可选地)被加密签名。 + +> | Note | The data structure is static in the sense that no attempt to dynamically enumerate hardware resource is performed. For example, for devices on a hot-pluggable bus such as USB, it is up to the subsequent boot sequences, if needed, to enumerate the devices and add to the handover structure, such as a device tree, expected by the next boot stage after the boot loader. | +> | ---- | ------------------------------------------------------------ | + +| 注意 | 数据结构是静态的,即其不会对硬件资源进行动态枚举。例如,对于 USB 等热拔插总线上的设备,如果需要,可以交由引导子序列来枚举设备,并添加到设备树这样的结构中,交给 boot loader 之后的引导阶段。| +| ---- | ------------------------------------------------------------ | + +> This proposal provides a schema of the data structure that is generic and extensible. See Section 8 for the schema. Vendor-specific data can be included without hindering the successful parsing of the configuration.(?) + +本提案提供了数据结构的一种模式规则,这一模式是通用的、可扩展的。关于该模式的具体描述见 Section 8。制造商特定的数据可以在不妨碍配置的解析的情况下被包含。(?) + +> The base-address of the binary-encoded representation is accessible through a single CSR. No other ISA considerations, beyond the provision of an additional CSR, are required. + +二进制编码表示的基地址可以通过单个 CSR 来取得。除此之外不需要扩展 ISA。 + +> Target software (usually firmware) that performs discovery will read the uPER-encoded message to retrieve the relevant configuration elements. The message can be decoded either using a stream parser with small memory footprint (i.e. the parser reads from the beginning until it retrieves the requested data element) or can be converted start-to-finish into a firmware-specific data structure. Given the compact representation and the low memory requirements for parsers, a uPER message can be efficiently parsed even during the startup of a deeply embedded microcontroller application (even though we envision out-of-band discovery and specialization for deeply embedded and resource-constrained use-cases). + +负责进行检测的目标软件(通常是固件)会读取 uPER 编码的报文,来返回相关的配置元素。报文可以通过占用少量内存的流解析器(即从头开始读取,直到检索到所需元素的解析器)来解码,也可以将从头到尾的解析转换为固件特定的数据结构。鉴于 uPER 报文的表示紧凑,以及对解析器的内存要求低,即使是在深度嵌入式微控制器应用(即使我们设想了针对深度嵌入式设备和资源受限使用情况下的带外发现和专门化)的启动过程中,uPER 报文也可以被有效解析。 + +> The unified discovery mechanism for RISC-V builds on the following technology stack: +> +> 1. ASN.1 (X.680) for modelling the data structures, independent of their encoding +> 2. Packed Encoding Rules (X.691) for the binary encoding of data structures (in-band) +> 3. XML Encoding Rules (X.693) for the XML encoding of data structures (out-of-band) +> 4. RISC-V International specific guidelines to allow the efficient aggregation of RISC-V global and vendor-specific data elements without a central registration authority +> 5. RISC-V International specific guidelines for the encoding of detached signatures (PKCS#7/CMS) using Packed Encoding Rules + +RISC-V 的 Unified discovery 机制基于如下技术栈: + +1. ASN.1(X.680)用于对数据结构进行建模,与其编码无关 +2. 用于数据结构二进制编码(带内)的 Packed Encoding Rules (X.691)。 +3. XML Encoding Rules (X.693) 用于数据结构(带外)的 XML 编码。 +4. RISC-V 国际基金会的指南,允许在没有中间注册机构的情况下有效汇总 RISC-V 的全球产商特定元素。 +5. RISC-V 国际基金会的指南,使用 Packed Encoding Rules 对分离签名(PKCS#7/CMS)进行编码。 + +> | Note | The benefits of using X.680 and X.693 over vendor-specific (e.g., Google Protobuf, Apache Avro, …) marshalling frameworks are its international standardization, widespread adoption and availability of open-source and commercial codec libraries. | +> | ---- | ------------------------------------------------------------ | + +| 注意 | 与使用产商特定编组框架(如 Google Protobuf,Apache Avro 等)相比,使用 X.680 和 X.693 的好处是它们是国际标准化的,其开源和商业解码器库被广泛使用,可用性强。| +| ---- | ------------------------------------------------------------ | + +> Retrieval and decoding of the configuration structure can happen in any of the following scenarios: +> +> - Software (in-band) +> +> Firmware will access the CSR and read the configuration message to extract the device’s configuration as part of its discovery process. The implementation details of this process (e.g., whether firmware initiates a read from the top and searches for individual tags, or if firmware converts the entire discovery information into an in-memory representation at once) are left to device implementers. +> +> - External debug (in-band) +> +> External debug will retrieve the CSR and then read out (once) the referenced memory region to retrieve the configuration information for a specific target device. The retrieved configuration message is then parsed by the external debugger to determine the configuration, features and capabilities of the device. +> +> - Software development environment (out-of-band) +> +> For (deeply) embedded applications, firmware will be specialised to target the specific target device only by pushing the discovery and configuration to the software development environment. These cases can be efficiently supported either by reading the configuration structure from a target device using an external debugger, or by retrieving a configuration structure from the manufacturer’s website. + +配置结构的检索和编码可能发生在以下任一场景中: + +- 软件(带内) + +固件将访问 CSR 并读取配置报文来提取设备的配置信息,作为其检测过程中的一部分。该过程的实现细节(例如固件是否从顶部发起一次读取,并搜索独立标签,或固件是否立刻将整个检测信息转换为内存内的表示方式)交由设备实现者来决定。 + +- 外部调试(带内) + +外部调试会检索 CSR 并读出(一次)引用的内存区域,来检索特定目标设备的配置信息。被检索的配置报文随后会被外部调试器解析,来确定设备的配置、特性和兼容性。 + +- 软件开发环境(带外) + +对于(深度)嵌入式应用程序,固件只需要将检测和配置传给软件开发环境,就可以针对专门的目标设备进行特化。这些情况可以通过用外部调试器读取特定设备的配置结构,或从制造商的网站上检索数据结构来有效实现。 + +### The mconfig CSR + +> The machine config pointer (mconfigptr) CSR provides the base-address of the binary-encoded representation. The mconfigptr is a machine-mode CSR. On platforms that does not require runtime update of the address of the binary representation of the configuration, this register can be hardwired to zero. +> +> For backward compatibility, the firmware can emulate this CSR on platforms that does not implement this CSR prior to this proposal. + +machine config pointer (mconfigptr) CSR 提供了二进制编码表示的基地址。mconfigptr 是机器模式的 CSR。在没有要求要在运行时更新配置二进制表示的基地址的平台上,这个寄存器可以硬连线到 0。 + +为了向后兼容,固件可以在本提案之前未实现该 CSR 的平台上模拟这个 CSR。 + +### Hypervisor: Unified Discovery for Guest OSes + +> For virtualisation purposes, only the retrieval of the mconfigptr CSR has to be intercepted (i.e. either a virtualized CSR would be provided to the guest that can be written by the hypervisor — or trap-and-emulate would be used) if the guest is to be provided with a configuration structure that may or may not be from what is retrieved from the underlying hardware. + +在用于虚拟化时,如果要向客户机提供配置结构,只需要拦截对 mconfigptr CSR 的检索(即向客户机提供一个可以被 hypervisor 写入的虚拟 CSR,或使用 trap-and-emulate)即可。该配置结构可能来自底层硬件,也可能不来自底层硬件。 + +### 其它 + +此规范草稿中的其它部分目前仅有标题,如下所示: + +``` +RISC-V Unified Discovery Specification + Introduction + No Central Registry + Complete Consumption Requirement + Note on Security + Solution Outline + The mconfigptr CSR + Hypervisor: Unified Discovery for Guest OSes + /** 下面的部分目前仅有标题 * */ + Referenced standards + Guidelines/Mappings from discoverable elements → ASN.1 + Extensibility, versioning & “container format” + What types of discoverable elements do we support? + Existence + Structural elements (lists, arrays) + Parameters (enums, integer ranges, addresses) + How to map these to ASN.1 + Encoding rules + Reference back to X.69x ? + Top-level schema → appendix ( normative ) + container format + standard elements (vectors, bitmanip, …) +``` + +此规范的草稿近期还在不断更新中,本文翻译的部分可能不全,仅供参考,请以 [GitHub 仓库][001] 为准。当规范草稿更新后,本文的翻译部分也会不定期更新。 + +## 从配置结构到业界标准 + +SBI 规范并没有规定标准的硬件检测方法,S 模式的软件需要借助其它的业界标准方法来实现硬件检测,即设备树或 ACPI 等。 + +与此同时,SBI 规范中也没有将设备树作为规范的一部分。如果 SBI 实现不使用 a1 寄存器来传输设备树地址,这也是符合 SBI 规范的,但会导致 Linux 无法启动(Linux 会在启动过程中从 a1 寄存器获取设备树地址)。 + +configuration-structure 的规范中提到,系统固件可以根据配置结构来生成设备树/SMBIOS/ACPI,以供启动流程中的后续阶段使用。系统固件通常是在系统上电时被执行的,其初始化硬件,并建立固件服务或引导所需的数据结构。常见的系统固件如嵌入式系统所使用的 U-Boot 或 PC 和服务器所使用的 BIOS。 + +由于 mconfigptr 是 M 模式的 CSR,而 U-Boot Proper 运行在 S 模式,若 U-Boot 想要访问 mconfigptr,要么通过 M 模式的 U-Boot SPL 访问,要么借助 SBI 实现来访问。 + +那 SBI 实现是怎么使用 mconfigptr 指向的 configuration-structure 的呢?以 OpenSBI 为例,查阅 OpenSBI 的仓库、源码与和邮件列表可知,目前 OpenSBI 并没有使用 configuration-structure 来生成设备树/ACPI/SMBIOS,其 `include/sbi/riscv_encoding.h` 文件中也未对 mconfigptr CSR 进行编码。据此可以判断目前 OpenSBI v1.3.1 还没有使用 mconfigptr 和 configuration-structure。不过 RISC-V Priv Spec v1.12 已经将 mconfigptr 视为必须实现的 CSR,其编号 0xf15 不应再分配给其它 CSR。 + +## 总结 + +近期,RISC-V 的 GitHub 仓库中出现了 Unified Discovery 的规范草稿,其介绍了新的硬件检测机制的动机、性质,以及 Unified Discovery 解决方案的大纲等。本文对其现有的部分进行了翻译。 + +RISC-V 通常使用 CSR 来检测硬件信息,但这对于某些类型的信息来说显得有些不够灵活,而且随着要检测的信息种类和数量的增加,要不断消耗 CSR。对于 RISC-V ISA 扩展而言,misa 寄存器能够检测的扩展数量有限,如果还沿用 misa 的方法对其它扩展的检测提供支持,就需要新增若干个类似 misa 的寄存器。 + +为了解决这些问题,configuration-structure 被提了出来。RISC-V 设备仅需使用 mconfigptr 这一个 CSR 来指向 configuration-structure 的地址,硬件信息被保存在这一结构中。不过目前关于这个配置结构的文档还处于草稿状态。 + +系统固件可以借助 configuration-structure 来生成设备树/SMBIOS/ACPI 等结构,但目前 OpenSBI 还没这么做。相关规范进入 ratified 状态后,可以根据本系列调研的成果向相关的开源软件提交 patch,加入使用 Unified Discovery 机制进行硬件检测的支持。 + +## 参考资料 + +- [Unified Discovery Spec(Draft)][001] +- [RISC-V 当前指令集扩展类别与检测方式][002] +- [riscv-sbi-doc:Add Hart Discovery Extension #60][003] + +[001]: https://github.com/riscv/configuration-structure/blob/master/riscv-unified-discovery-draft.adoc +[002]: https://gitee.com/tinylab/riscv-linux/blob/master/articles/20230715-riscv-isa-extensions-discovery-1.md +[003]: https://github.com/riscv-non-isa/riscv-sbi-doc/pull/60 -- Gitee