diff --git a/src/content/posts/2024-01/HorizonChaser__RVV_on_GCC_13.md b/src/content/posts/2024-01/HorizonChaser__RVV_on_GCC_13.md new file mode 100644 index 0000000000000000000000000000000000000000..2638d3511602a649ddac6a667409bb0cd2995022 --- /dev/null +++ b/src/content/posts/2024-01/HorizonChaser__RVV_on_GCC_13.md @@ -0,0 +1,293 @@ +--- +title: 使用 GCC 13 编译包含 RVV Intrinsic 的程序 +author: 侯文轩 +pubDate: 2024-06-17 +categories: ['2024 年第一期'] +description: '使用 GCC 13 配置 RVV 环境, 并进行简单分析' +--- + +## 0. 总结 + +1. GCC 13 仅支持 RVV 0.11 (但相对容易获取), GCC 14 则支持完整的 RVV 1.0 +2. GCC 的 RVV Intrinsic 是通过编译器实现的 + - `riscv_vector.h` 中没有相关 Intrinsic 与数据类型的声明, 这导致 Intellisense 等代码补全无法直接工作 +3. GCC 的 RVV Intrinsic 需要 `__riscv` 前缀 +4. Linux 内核已经合并了对 RVV 的支持 + +## 1. 获取 GCC 13 + +目前 Debian 12 源中的 `gcc-riscv64-unknown-elf` 版本还在 `12.2.0`, 不支持 RVV Intrinsic. 所以我们需要从 `riscv-collab/riscv-gnu-toolchain` 根据自己的系统版本下载[最新的 nightly release](https://github.com/riscv-collab/riscv-gnu-toolchain/releases/tag/2024.04.12). 下载后解压并将路径添加到 PATH, 再看看我们拿到的 GCC 版本: + +``` +horizon@horizon-vm-ubuntu20:~$ riscv64-unknown-elf-gcc --version +riscv64-unknown-elf-gcc () 13.2.0 +Copyright (C) 2023 Free Software Foundation, Inc. +This is free software; see the source for copying conditions. There is NO +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +``` + +GCC 13 仅支持 RVV 0.11, [GCC 14 则支持 RVV 1.0](https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/RISC-V-Options.html) . 这里我们出于简便起见不再自行编译 GCC (Debian sid 中 `gcc-riscv64-unknown-elf` 版本也只有 `13.2.0`; Arch 上有 [riscv64-elf-gcc 14.1.0](https://archlinux.org/packages/extra/x86_64/riscv64-elf-gcc/)). + +## 2. RVV Intrinsics + +目前最新的 RVV Intrinsic 规范的版本是 `1.0 RC0`, 可以在[这里](https://github.com/riscv-non-isa/rvv-intrinsic-doc/releases)获取. 每个 Intrinsic 函数名称可以分为如下部分: + +![rvv intrinsics nameing rule](imgs/horizonchaser__rvv_intrinsics_naming_rule.webp) + +- 前缀: `__risc_v` +- 操作 +- 操作数: `v` 表示向量, `x` 表示标量 +- 元素格式: `i` 整数, `f` IEEE-754 浮点数 +- `LMUL` 寄存器设定 + - `LMUL` 决定了将若干个向量寄存器视为一个整体, 例如 `LMUL=2` 时, v0 将表示 v0v1, v2 表示 v2v3; `LMUL=4` 时, v0 将表示 v0v1v2v3, v4 表示 v4v5v6v7. +- 可选的掩码或后缀 + +更多关于命名规则的信息可见 [Chapter 6. Naming scheme](https://github.com/riscv-non-isa/rvv-intrinsic-doc/releases/download/draft-20240612-a7127a7371caac3a6c76094fe4efb48cb560340a/v-intrinsic-spec.pdf#[{"num"%3A113%2C"gen"%3A0}%2C{"name"%3A"XYZ"}%2C0%2C841.89%2Cnull]). + +通过 `__riscv_v_intrinsic` 宏, 可以检测编译器对 RVV 的支持情况. 查看 `riscv_vector.h`, 可以看到这样一段: + +```c +/* NOTE: This implementation of riscv_vector.h is intentionally short. It does + not define the RVV types and intrinsic functions directly in C and C++ + code, but instead uses the following pragma to tell GCC to insert the + necessary type and function definitions itself. The net effect is the + same, and the file is a complete implementation of riscv_vector.h. */ +#pragma riscv intrinsic "vector" +``` + +由于不包含 intrinsic 函数以及相应的数据类型, 因此代码提示与补全不能生效 :( + +## 3. 编写一个 RVV demo 并在 qemu 中测试 + +### Vector Add + +参考张宇轩同学在[RVV C Intrinsic 配置教程](https://learning.eulixos.com/posts/2024-01/zyx_01_config/)中的实例代码, 增加了对拍: + +> **注意这里的 intrinsic 有 `__risc_v` 前缀, 如果你按照张宇轩同学的方法配置的环境的话, 可能需要去掉这些前缀** + +```c +// Reference: https://pages.dogdog.run/toolchain/riscv_vector_extension.html +// #define __riscv_vector // make VSC happy when reading riscv_vector.h +#include +#include + +int x[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 0}; +int y[10] = {0, 9, 8, 7, 6, 5, 4, 3, 2, 1}; +int z[10], z_real[10]; + +void vec_add_rvv(int* dst, int* lhs, int* rhs, size_t avl) { + vint32m2_t vlhs, vrhs, vres; + for (size_t vl; (vl = __riscv_vsetvl_e32m2(avl)); + avl -= vl, lhs += vl, rhs += vl, dst += vl) { + vlhs = __riscv_vle32_v_i32m2(lhs, vl); + vrhs = __riscv_vle32_v_i32m2(rhs, vl); + vres = __riscv_vadd_vv_i32m2(vlhs, vrhs, vl); + __riscv_vse32_v_i32m2(dst, vres, vl); + } +} + +void vec_add_real(int* dest, int* lhs, int* rhs, size_t vlen) { + for (size_t i = 0; i < vlen; i++) { + dest[i] = lhs[i] + rhs[i]; + } +} + +void print_vec(int* v, size_t vlen, char* msg) { + printf("%s={ ", msg); + for (size_t i = 0; i < vlen; i++) { + printf("%d, ", v[i]); + } + + printf("}\n"); +} + +int main(int argc, char const* argv[]) { + +// check RVV support +#ifndef __riscv_v_intrinsic + printf("RVV NOT supported in this compiler\n"); + return 0; +#endif + + vec_add_rvv(z, x, y, 10); + vec_add_real(z_real, x, y, 10); + + print_vec(x, 10, "x[10]"); + print_vec(y, 10, "y[10]"); + + for (size_t i = 0; i < 10; i++) { + if (z[i] != z_real[i]) { + printf("==========\nTest FAILED: pos %d mismatch\n", i); + print_vec(z, 10, "z[10]"); + print_vec(z_real, 10, "z_real[10]"); + + return -1; + } + } + + print_vec(z, 10, "z[10]"); + printf("==========\nTest PASSED\n"); + + return 0; +} +``` + +使用 `riscv64-unknown-elf-gcc -O -march=rv64gcv vadd_test.c -o vadd_out` 编译, `qemu-riscv64 -cpu rv64,v=true,vlen=128,vext_spec=v1.0 ./vadd_out` 运行: + +```text +horizon@horizon-vm-ubuntu20:~$ qemu-riscv64 -cpu rv64,v=true,vlen=128,vext_spec=v1.0 ./rvv_out +x[10]={ 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, } +y[10]={ 0, 9, 8, 7, 6, 5, 4, 3, 2, 1, } +z[10]={ 1, 11, 11, 11, 11, 11, 11, 11, 11, 1, } +========== +Test PASSED +``` + +可以看到 RVV Intrinsic 成功发挥了作用, 并且计算结果与我们手动计算完全相同~ + +#### One Step Further + +在 [这里](https://godbolt.org/z/T9qaebW1P), 我们可以看到 `vec_add_rvv` 函数反汇编后 (或者说汇编前? 总之是它对应的 RISC-V 汇编) 的结果. + +```ASM +vector_add: + beq a3,zero,.L1 +.L3: + vsetvli a5,a3,e32,m1,ta,ma + vle32.v v24,0(a1) + vle32.v v25,0(a2) + vfadd.vv v24,v24,v25 + vse32.v v24,0(a0) + sub a3,a3,a5 + slli a5,a5,2 + add a1,a1,a5 + add a2,a2,a5 + add a0,a0,a5 + bne a3,zero,.L3 +.L1: + ret +``` + +`beq` 对应 `for` 循环的初值边界: 若向量长度等于零, 则直接返回. + +`vsetvli a5, a3, e32, m1, ta, ma` 设置向量相关的配置: a3 是剩余的元素数,e32 表示每个元素是 32 位, m1 表示每个向量寄存器的长度是 1 倍的标准长度 (也就是说每个寄存器里面都是一个向量), ta 和 ma 分别表示尾部与掩码都是不可知的. + +两条 `vle32.v` 分别从 `a1` 与 `a2` 寄存器指向的地址加载一个 32 位的向量到对应的向量寄存器. `vfadd.vv` 进行两个向量之间的加法, `vse32.v` 将 `v24` 中的结果保存到 `a0` 指向的地址. 之后更新各个地址的值, 进行下一轮加法, 直到所有元素都被计算. + +### Vector Multiply + +类似地, 我们可以使用向量乘法, 这里使用 32 位浮点数, 每个向量由 8 个元素构成. + +> **注意这里的 intrinsic 有 `__risc_v` 前缀, 如果你按照张宇轩同学的方法配置的环境的话, 可能需要去掉这些前缀** + +```c +#include +#include + +void vector_mul_rvv(const float *a, const float *b, float *dest, size_t vlen) { + size_t vl; + size_t i = 0; + + for (; i < vlen; i += vl) { + // set vector length + vl = __riscv_vsetvl_e32m1(vlen - i); + + vfloat32m1_t va = __riscv_vle32_v_f32m1(&a[i], vl); + vfloat32m1_t vb = __riscv_vle32_v_f32m1(&b[i], vl); + + vfloat32m1_t vres = __riscv_vfmul_vv_f32m1(va, vb, vl); + + __riscv_vse32_v_f32m1(&dest[i], vres, vl); + } +} + +void vector_mul_real(const float *a, const float *b, float *dest, size_t vlen) { + for (size_t i = 0; i < vlen; i++) { + dest[i] = a[i] * b[i]; + } +} + +void print_vec(float *v, size_t vlen, char *msg) { + printf("%s={ ", msg); + for (size_t i = 0; i < vlen; i++) { + printf("%f, ", v[i]); + } + + printf("}\n"); +} + +int main() { + // check RVV support +#ifndef __riscv_v_intrinsic + printf("RVV NOT supported in this compiler\n"); + return 0; +#endif + + float a[8] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0}; + float b[8] = {8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0}; + float c[8], c_real[8]; + + vector_mul_rvv(a, b, c, 8); + vector_mul_real(a, b, c_real, 8); + + print_vec(a, 8, "a[8]"); + print_vec(b, 8, "b[8]"); + + for (size_t i = 0; i < 8; i++) { + if (c[i] != c_real[i]) { + printf("==========\nTest FAILED: pos %d mismatch\n", i); + print_vec(c, 8, "c[10]"); + print_vec(c_real, 8, "c_real[10]"); + + return -1; + } + } + + print_vec(c, 8, "c[10]"); + printf("==========\nTest PASSED\n"); + + return 0; +} +``` + +编译: `riscv64-unknown-elf-gcc -O -march=rv64gcv rvv_mul.c -o rvv_mul` + +运行: ` qemu-riscv64 -cpu rv64,v=true,vlen=128,vext_spec=v1.0 ./rvv_mul` + +结果: +```text +horizon@horizon-vm-ubuntu20:~$ qemu-riscv64 -cpu rv64,v=true,vlen=128,vext_spec=v1.0 ./rvv_mul +a[8]={ 1.000000, 2.000000, 3.000000, 4.000000, 5.000000, 6.000000, 7.000000, 8.000000, } +b[8]={ 8.000000, 7.000000, 6.000000, 5.000000, 4.000000, 3.000000, 2.000000, 1.000000, } +c[10]={ 8.000000, 14.000000, 18.000000, 20.000000, 20.000000, 18.000000, 14.000000, 8.000000, } +========== +Test PASSED +``` + +## 4. Linux 内核目前对 RVV 的支持 + +RVV 1.0 的支持 [已经被合并到内核](https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/commit/?h=for-next&id=d5e45e810e0e08114035d31d88049544c038e6fc). + +内核通过 [`bool has_vector()`](https://github.com/torvalds/linux/blob/6ba59ff4227927d3a8530fc2973b80e94b54d58f/arch/riscv/include/asm/vector.h#L38) 检测对 RVV 的支持: +- [`has_vector()`](https://github.com/torvalds/linux/blob/6ba59ff4227927d3a8530fc2973b80e94b54d58f/arch/riscv/include/asm/vector.h#L38) -> +- [`riscv_has_extension_unlikely(RISCV_ISA_EXT_v)`](https://github.com/torvalds/linux/blob/6ba59ff4227927d3a8530fc2973b80e94b54d58f/arch/riscv/include/asm/cpufeature.h#L109) -> +- [`__riscv_isa_extension_available(NULL, ext)`](https://github.com/torvalds/linux/blob/6ba59ff4227927d3a8530fc2973b80e94b54d58f/arch/riscv/kernel/cpufeature.c#L64) -> +- [`test_bit(bit, bmap) ? true : false`](https://github.com/torvalds/linux/blob/6ba59ff4227927d3a8530fc2973b80e94b54d58f/arch/riscv/kernel/cpufeature.c#L71), 最终还是以检测特定比特位的形式完成. + +### 4.1 RVV 任务切换 + +在 [arch/riscv/include/asm/switch_to.h#L102](https://github.com/torvalds/linux/blob/6ba59ff4227927d3a8530fc2973b80e94b54d58f/arch/riscv/include/asm/switch_to.h#L102) 中, 可以看到首先判断是否存在 RVV 支持, 若存在, 则调用 [`__switch_to_vector(__prev, __next)`](https://github.com/torvalds/linux/blob/6ba59ff4227927d3a8530fc2973b80e94b54d58f/arch/riscv/include/asm/vector.h#L240) 完成 RVV 上下文的切换: + +1. 判断当前任务状态 +2. 调用 [`__riscv_v_vstate_save`](https://github.com/torvalds/linux/blob/6ba59ff4227927d3a8530fc2973b80e94b54d58f/arch/riscv/include/asm/vector.h#L103) 以 8 个一组保存 v0~v31 的值 +3. 判断下一个任务的状态 +4. 调用 [`riscv_v_vstate_set_restore`](https://github.com/torvalds/linux/blob/6ba59ff4227927d3a8530fc2973b80e94b54d58f/arch/riscv/include/asm/vector.h#L126), 同样分组恢复各向量寄存器的值. + +## 99. 更多关于 RVV 的材料 + +- [RVV Intrinsic Viewer](https://dzaima.github.io/intrinsics-viewer): 可以看到 RVV Intrinsics 的定义与功能分析 +- [RISC-V Vector Programming in C with Intrinsics +](https://fprox.substack.com/p/risc-v-vector-programming-in-c-with): 一份很好的 RVV 教程, 包括直接用汇编以及使用 intrinsics +- [RISC-V Vector Intrinsic Document](https://github.com/riscv-non-isa/rvv-intrinsic-doc): RVV Intrinsic 标准 + + diff --git a/src/content/posts/2024-01/imgs/horizonchaser__rvv_intrinsics_naming_rule.webp b/src/content/posts/2024-01/imgs/horizonchaser__rvv_intrinsics_naming_rule.webp new file mode 100755 index 0000000000000000000000000000000000000000..594d80556ea2e712ef62b9126364bf505592bb90 Binary files /dev/null and b/src/content/posts/2024-01/imgs/horizonchaser__rvv_intrinsics_naming_rule.webp differ