diff --git a/articles/20220315-Porting-Linux-to-a-new-processor-architecture, part-1-The-basics.md b/articles/20220315-Porting-Linux-to-a-new-processor-architecture, part-1-The-basics.md new file mode 100644 index 0000000000000000000000000000000000000000..a6873c645c8a1f60b12e6053f88fa4cefc65d18a --- /dev/null +++ b/articles/20220315-Porting-Linux-to-a-new-processor-architecture, part-1-The-basics.md @@ -0,0 +1,227 @@ + +> Title: Porting Linux to a new processor architecture, part 1: The basics +> Author: Joël Porquet@**August 26, 2015** +> Translator: 985400330@qq.com +> Revisor: ABC +> Project: https://gitee.com/tinylab/riscv-linux + +# 将Linux移植到新的处理器体系结构,第1部分:基础 + +> Although a simple port may count as little as 4000 lines of code—exactly 3,775 for the mmu-less Hitachi 8/300 recently reintroduced in Linux 4.2-rc1—getting the Linux kernel running on a new processor architecture is a difficult process. Worse still, there is not much documentation available describing the porting process. The aim of this series of three articles is to provide an overview of the procedure, or at least one possible procedure, that can be followed when porting the Linux kernel to a new processor architecture. + +​ 虽然对于最近在Linux 4.2-rc1中移植的不带mmu的Hitachi 8/300可能只需要3775行代码,但让Linux内核在新的处理器架构上运行仍然是一个困难的过程。并且网络上描述移植过程的文档并不多。本系列共三篇文章的目的是概述将Linux内核移植到新处理器架构时可以遵循的过程。 + +> After spending countless hours becoming almost fluent in many of the supported architectures, I discovered that a well-defined skeleton shared by the majority of ports exists. Such a skeleton can logically be split into two parts that intersect a great deal. The first part is the boot code, meaning the architecture-specific code that is executed from the moment the kernel takes over from the bootloader until `init` is finally executed. The second part concerns the architecture-specific code that is regularly executed once the booting phase has been completed and the kernel is running normally. This second part includes starting new threads, dealing with hardware interrupts or software exceptions, copying data from/to user applications, serving system calls, and so on. + +在研究了许多受支持的架构之后,我发现Linux内核有一个定义非常好的框架,他提供了很多接口供移植使用。这样一个框架在逻辑上可以分为两个部分,这两部分贯穿整个系统。第一部分是引导代码,也就是说从内核接管bootloader->init最终执行的那一刻开始,特定于架构的代码就被执行了。第二部关于架构的代码在引导阶段完成后,内核正常运行时定期执行。包括启动新线程,处理硬件中断、软件中断,用户态和内核态的数据传输,处理system calls等。 + +## 1、如何确定到底是不是一个新的架构移植?(Is a new port necessary?) + +> As LWN [reported](https://lwn.net/Articles/597351/) about another porting experience in an article published last year, there are three meanings to the word "porting". +> +> It can be a port to a new board with an already-supported processor on it. Or it can be a new processor from an existing, supported processor family. The third alternative is to port to a completely new architecture. + +“移植”一词有三种含义 +1、移植到一个新的开发板(已经支持的处理器) +2、移植到当前受支持的处理器系列中的新处理器 +3、移植到一个全新架构 + +> Sometimes, the answer to whether one should start a new port from scratch is crystal clear—if the new processor comes with a new instruction set architecture (ISA), that is usually a good indicator. Sometimes it is less clear. In my case, it took me a couple weeks to figure out this first question. + +如果一个新处理器增加了新的指令集,那肯定是一次新的架构移植,但有时就不太清楚到底是不是一次架构移植。有一次花了我几周的时间才搞清楚到底是不是一次新的架构移植。 + +> At the time, May 2013, I had just been hired by the French academic computer lab [LIP6](http://www.lip6.fr/?LANG=en) to port the Linux kernel to [TSAR](https://www-soc.lip6.fr/trac/tsar), an academic processor architecture that the system-on-chip research group was designing. TSAR is an architecture that follows many of the current trends: lots of small, single-issue, energy-efficient processor cores around a scalable network-on-chip. It also adds some nice innovations: a full-hardware cache-coherency protocol for both data/instruction caches and translation lookaside buffers (TLBs) as well as physically distributed but logically shared memory. + +当时是2013年5月,我刚刚被法国学术计算机实验室LIP6聘请,将Linux内核移植到TSAR,这是一种正在被研究设计的架构,TSAR是一种遵循当前趋势的架构图:更小、single-issue,低功耗多核通信网络。他还增加了一些创新:一个完整的硬件缓存一致性协议,用于icache和dcache和TLB以及物理分布但逻辑共享的内存(physically distributed but logically shared memory)。 + +> My dilemma was that the processor cores were compatible with the MIPS32 ISA, which meant the port could fall into the second category: "new processor from an existing processor family". But since TSAR had a virtual-memory model radically different from those of any MIPS processors, I would have been forced to drastically modify the entire MIPS branch in order to introduce this new processor, sometimes having almost no choice but to surround entire files with `#ifndef TSAR ... #endif`. +> +> Quickly enough, it came down to the most logical—and interesting—conclusion: +> +> ``` +> mkdir linux/arch/tsar +> ``` + +我当时的问题是,处理器核心与MIPS32 ISA兼容,这意味着本次移植可能属于第二类:“现有处理器系列中的新处理器”。但是,由于TSAR的虚拟内存模型与任何MIPS处理器的虚拟内存模型完全不同,我不得不大幅修改整个MIPS分支,以引入这种新处理器,有时几乎别无选择,只能用#ifndef TSAR#将整个文件包围起来#endif。 +所以最终结果是,我们就是一个新架构的移植,为其创建一个新的文件夹:mkdir linux/arch/tsar + +## 2、了解你要移植硬件(Get to know your hardware) + +> *Really* knowing the underlying hardware is definitely the fundamental, and perhaps most obvious, prerequisite to porting Linux to it. + +真正了解底层硬件无疑是将Linux移植到它的基本前提,这是最明显的前提。 + +> The specifications of a processor are often—logically or physically—split into a least two parts (as were, for example, the recently published specifications for the new [RISC-V](http://www.riscv.org/) processor). The first part usually details the user-level ISA, which basically means the list of user-level instructions that the processor is able to understand—and execute. The second part describes the privileged architecture, which includes the list of kernel-level-only instructions and the various system registers that control the processor status. + +处理器的规格通常在逻辑上或物理上分为至少两部分(例如,最近发布的新RISC-V处理器规格)。第一部分通常详细介绍用户级ISA,这是处理器能够理解和执行的用户级指令列表。第二部分描述了特权体系结构,其中包括仅内核级指令列表和控制处理器状态的各种系统寄存器。 + +> This second part contains the majority—if not the entirety—of the information that makes a port special and thus often prevents the developer from opportunely reusing code from other architectures. + +第二部分包含了需要移植的大部分信息,这部分的差别导致无法重用其他体系结构中的代码。 + +> Among the important questions that should be answered by such specifications are: + +需要重视的信息主要有: + +> - What are the virtual-memory model of the processor architecture, the format of the page table, and the translation mechanism? + + **处理器体系结构的虚拟内存模型、页表的格式和转换机制是什么?** + +> Many processor architectures (e.g. x86, ARM, or TSAR) define a flexible virtual-memory layout. Their virtual address space can theoretically be split any way between the user and kernel spaces—although the default layout for 32-bit processors in Linux usually allocates the lower 3GiB to user space and reserves the upper 1GiB for kernel space. In some other architectures, this layout is strongly constrained by the hardware design. For instance, on MIPS32, the virtual address space is statically split into two regions of the same size: the lower 2GiB is dedicated to user space and the upper 2GiB to kernel space; the latter even contains predefined windows into the physical address space. + + 许多处理器架构(如x86、ARM或TSAR)定义了灵活的虚拟内存布局。理论上,它们的虚拟地址空间可以在用户空间和内核空间之间以任何方式分割,尽管Linux中32位处理器的默认布局通常将较低的3GiB分配给用户空间,并将较高的1Gb保留给内核空间。在其他一些体系结构中,这种布局受到硬件设计的强烈限制。例如,在MIPS32上,虚拟地址空间被静态分割为两个大小相同的区域:较低的2GiB专用于用户空间,较高的2GiB专用于内核空间;后者甚至在物理地址空间中包含预定义的窗口。 + +> The format of the page table is intimately linked to the translation mechanism used by the processor. In the case of a hardware-managed mechanism, when the TLB—a hardware cache of limited size containing recently used translations between virtual and physical addresses—does not contain the translation for a given virtual address (referred to as *TLB miss*), a hardware state machine will transparently fetch the proper translation from the page table structure in memory and fill the TLB with it. This means that the format of the page table must be fixed—and certainly defined by the processor's specifications. In a software-based mechanism, a TLB miss exception is handled by a piece of code, which theoretically leaves complete liberty as to how the page table is organized—only the format of TLB entries is specified. + + 页面表的格式与处理器使用的转机制密切相关。在硬件管理机制的情况下,当包含虚拟地址和物理地址之间最近使用的转换的有限大小的TLB-a硬件缓存不包含给定虚拟地址的转换(称为TLB未命中)时,硬件状态机将直接从内存中的页表结构中获取正确的转换,并用它填充TLB。这意味着页表的格式必须是固定的,并且必须由处理器的规范定义。在基于软件的机制中,TLB未命中异常由一段代码处理,从理论上讲,只需指定TLB条目的格式,就可以完全自由地组织页表。 + +> - How to enable/disable the interrupts, switch from privileged mode to user mode and vice-versa, get the cause of an exception, etc.? + +**如何启用/禁用中断,从特权模式切换到用户模式,反之亦然,获取异常的原因,等等。** + +> Although all these operations generally only involve reading and/or modifying certain bit fields in the set of available system registers, they are always very particular to each architecture. It is for this reason that, most of the time, they are actually performed by small chunks of dedicated assembly code. + +尽管所有这些操作通常只涉及读取或修改可用系统寄存器集中的某些位字段,但是不同架构的寄存器有不同的用法。正是因为这个原因,在大多数情况下,它们实际上是由小块专用汇编代码执行的。 + +> - What is the ABI? + +**什么ABI?** + +> Although one could think that the Application Binary Interface (ABI) is only supposed to concern compilation tools, as it defines the way the stack is formatted into stack-frames, the ways arguments and return values are given or returned by functions, etc.; it is actually absolutely necessary to be familiar with it when porting Linux. For example, as the recipient of system calls (which are typically defined by the ABI), the kernel has to know where to get the arguments and how to return a value; or on a context switch, the kernel must know what to save and restore, as well as what constitutes the context of a thread, and so on. + + 尽管人们可能会认为应用程序二进制接口(ABI)只与编译工具有关,因为它定义了堆栈格式化为堆栈帧的方式、函数给出或返回参数和返回值的方式等。;在移植Linux时,实际上完全有必要熟悉它。例如,作为系统调用(通常由ABI定义)的接收者,内核必须知道从哪里获取参数以及如何返回值;或者在上下文开关上,内核必须知道要保存和恢复的内容,以及构成线程上下文的内容,等等。 + +## 3、了解内核(Get to know the kernel) + +> Learning a few kernel concepts, especially concerning the memory layout used by Linux, will definitely help. I admit it took me a while to understand what exactly was the distinction between *low memory* and *high memory*, and between the *direct mapping* and *vmalloc* regions. + +学习一些内核概念,尤其是关于Linux使用的内存布局,肯定会有所帮助。我承认,我花了一段时间才弄清楚低记忆和高记忆之间的区别,以及直接映射和vmalloc区域之间的区别。 + +> For a typical and simple port (to a 32-bit processor), in which the kernel occupies the upper 1GiB of the virtual address space, it is usually fairly straightforward. Within this 1GiB, Linux defines that the lower portion of it will be directly mapped to the lower portion of the system memory (hence referred to as low memory): meaning that if the kernel accesses the address `0xC0000000`, it will be redirected to the physical address `0x00000000`. + +对于一个典型且简单的移植(到32位处理器),内核占据了虚拟地址空间的上1GB,它通常相当简单。在这个1GiB中,Linux定义它的较低部分将直接映射到系统内存的较低部分(因此称为低内存):这意味着如果内核访问地址0xC0000000,它将被重定向到物理地址0x00000000。 + +> In contrast, in systems with more physical memory than that which is mappable in the direct mapping region, the upper portion of the system memory (referred to as high memory) is not normally accessible to the kernel. Other mechanisms must be used, such as `kmap()` and `kmap_atomic()`, in order to gain temporary access to these high-memory pages. + +相比之下,在物理内存多于直接映射区域中可映射内存的系统中,内核通常无法访问系统内存的上部(称为高内存)。必须使用其他机制,例如kmap()和kmap_atomic(),以便临时访问这些高内存页。 + +> Above the direct mapping region is the vmalloc region that is controlled by `vmalloc()`. This allocation mechanism provides the ability to allocate pages of memory in a virtually contiguous way in spite of the fact that these pages may not necessarily be physically contiguous. It is particularly useful for allocating a large amount of memory pages in a virtually contiguous manner, as otherwise it can be impossible to find the equivalent amount of contiguous free physical pages. + +基于直接映射区域的vmalloc区域,是由vmalloc()控制的。这种分配机制提供了以几乎连续的方式分配内存页的能力,尽管这些页不一定是物理上连续的。它对于以几乎连续的方式分配大量内存页特别有用,否则可能无法找到等量的连续空闲物理页。 + +> Further reading about the memory management in Linux can be found in [*Linux Device Drivers* [PDF\]](https://lwn.net/images/pdf/LDD3/ch15.pdf) and this [LWN article](https://lwn.net/Articles/356378/). + +在[Linux设备驱动程序\[PDF\]](https://lwn.net/images/pdf/LDD3/ch15.pdf)和这篇[LWN文章](https://lwn.net/Articles/356378/)中可以找到关于Linux中内存管理的进一步阅读。 + +## 4、如何开始(How to start?) + +> With your head full of the processor's specifications and kernel principles, it is finally time to add some files to this newly created arch directory. But wait ... where and how should we start? As with any porting or even any code that must respect a certain API, the procedure is a two-step process. + +在你满脑子都是处理器规范和内核原理的情况下,终于到了向这个新创建的arch目录添加一些文件的时候了。但是等等。。。我们应该从哪里开始,如何开始?我们在移植过程中应该必须遵守哪些API接口? + +> First, a minimal set of files that define a minimal set of symbols (functions, variables, defines) is necessary for the kernel to even compile. This set of files and symbols can often be deduced from compilation failures: if compilation fails because of a missing file/symbol, it is a good indicator that it should probably be implemented (or sometimes that some configuration options should be modified). In the case of porting Linux, this approach is particularly relevant when implementing the numerous headers that define the API between the architecture-specific code and the rest of the kernel. + +首先,内核甚至需要一组最小的文件来定义一组最小的符号(函数、变量、定义)。这组文件和符号通常可以从编译失败中推断出来:如果由于缺少文件/符号而导致编译失败,则很好地表明可能应该实现它(或者有时应该修改某些配置选项)。在移植Linux的情况下,当实现架构代码和内核其他部分时的头文件以及API时,这种方法尤其重要。 + +> After the kernel finally compiles and is able to be executed on the target hardware, it is useful to know that the boot code is very sequential. That allows many functions to stay empty at first and to only be implemented gradually until the system finally becomes stable and reaches the `init` process. This approach is generally possible for almost all of the C functions executed after the early assembly boot code. However it is advised to have the `early_printk()` infrastructure up and working otherwise it can be difficult to debug. + +在内核最终编译并能够在目标硬件上执行之后,了解引导代码的执行流程非常有用。这允许许多函数在开始时保持为空,直到系统最终变得稳定并达到init进程,才逐渐实现。这种方法通常适用于在早期汇编引导代码之后执行的几乎所有C函数。不过,建议您启动早期的_printk()基础设施,否则可能很难调试。 + +## 5、终于开始了:非代码文件的最小集合(Finally getting started: the minimal set of non-code files) + +> Porting the compilation tools to the new processor architecture is a prerequisite to porting the Linux kernel, but here we'll assume it has already been performed. All that is left to do in terms of compilation tools is to build a cross-compiler. Since at this point it is likely that porting a standard C library has not been completed (or even started), only a stage-1 cross-compiler can be created. + +将编译工具移植到新的处理器体系结构是移植Linux内核的先决条件,但在这里,我们假设它已经被执行了。就编译工具而言,剩下要做的就是构建一个交叉编译器。由于此时很可能尚未完成(甚至尚未启动)标准C库的移植,因此只能创建一个阶段1交叉编译器。 + +> Such a cross-compiler is only able to compile source code for bare metal execution, which is a perfect fit for the kernel since it does not depend on any external library. In contrast, a stage-2 cross-compiler has built-in support for a standard C library. + +这种交叉编译器只能编译裸机执行的源代码,这非常适合内核,因为它不依赖任何外部库。相比之下,stage-2交叉编译器内置了对标准C库的支持。 + +> The first step of porting Linux to a new processor is the creation of a new directory inside `arch/`, which is located at the root of the kernel tree (e.g. `linux/arch/tsar/` in my case). Inside this new directory, the layout is quite standardized: + +将Linux移植到新处理器的第一步是在arch/中创建一个新目录,该目录位于内核树的根(例如,在我的例子中是Linux/arch/tsar/)。在这个新目录中,布局相当标准化: + +> - `configs/`: default configurations for supported systems (i.e. `*_defconfig` files) +> - `include/asm/` for the headers dedicated to internal use only, i.e. Linux source code +> - `include/uapi/asm` for the headers that are meant to be exported to user space (e.g. the libc) +> - `kernel/`: general kernel management +> - `lib/`: optimized utility routines (e.g. `memcpy()`, `memset()`, etc.) +> - `mm/`: memory management + +- configs/: 支持系统的默认配置 (i.e. *_defconfig files) + +- include/asm/ ,Linux源码内部使用的头文件 + +- include/uapi/asm 对于要导出到用户空间(例如libc)的头文件 + +- kernel/: 通用内核管理 + +- lib/: 常用的那套函数 (e.g. memcpy(), memset(), etc.) + +- mm/: 内存管理 + + > The great thing is that once the new arch directory exists, Linux automatically knows about it. It only complains about not finding a Makefile, not about this new architecture: + + 一旦新的arch目录存在,Linux就会自动知道它。但是会报找不到Makefile,而不是找不到找不到这种新的架构。 +```bash + ~/linux $ make ARCH=tsar + Makefile: ~/linux/arch/tsar/Makefile: No such file or directory +``` +> As shown in the following example, a minimal arch Makefile only has a few variables to specify: + +如以下示例所示,最小arch Makefile只有几个变量需要指定: + +``` + KBUILD_DEFCONFIG := tsar_defconfig + + KBUILD_CFLAGS += -pipe -D__linux__ -G 0 -msoft-float + KBUILD_AFLAGS += $(KBUILD_CFLAGS) + + head-y := arch/tsar/kernel/head.o + + core-y += arch/tsar/kernel/ + core-y += arch/tsar/mm/ + + LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name) + libs-y += $(LIBGCC) + libs-y += arch/tsar/lib/ + + drivers-y += arch/tsar/drivers/ +``` +> - `KBUILD_DEFCONFIG` must hold the name of a valid default configuration, which is one of the `defconfig` files in the `configs` directory (e.g. `configs/tsar_defconfig`). +> - `KBUILD_CFLAGS` and `KBUILD_AFLAGS` define compilation flags, respectively for the compiler and the assembler. +> - `{head,core,libs,...}-y` list the objects (or subdirectory containing the objects) to be compiled in the kernel image (see [Documentation/kbuild/makefiles.txt](https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt) for detailed information) + +- KBUILD_DEFCONFIG 必须包含有效默认配置的名称,该配置是configs目录中的defconfig文件之一 (e.g. configs/tsar_defconfig). + +- KBUILD_CFLAGS 和 KBUILD_AFLAGS分别为“compiler ”和“assembler”定义编译标志。 + +- {head,core,libs,}-y列出要在内核映像中编译的对象(或包含对象的子目录)(有关详细信息,请参阅[Documentation/kbuild/makefiles.txt](https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt)) + +> Another file that has its place at the root of the arch directory is `Kconfig`. This file mainly serves two purposes: it defines new arch-specific configuration options that describe the features of the architecture, and it selects arch-independent configuration options (i.e. options that are already defined elsewhere in Linux source code) that apply to the architecture. + + 另一个位于arch目录根目录下的文件是Kconfig。该文件主要用于两个目的: + 1、定义描述架构的arch的配置选项。 + 2、选择适用于该架构的独立于arch的配置选项(即Linux源代码中其他地方已经定义的选项)。 + +> As this will be the main configuration file for the newly created arch, its content also determines the layout of the menuconfig command (e.g. `make ARCH=tsar menuconfig`). It is difficult to give a snippet of the file as it depends very much on the targeted architecture, but looking at the same file for other (simple) architectures should definitely help. + +由于这将是新创建的 arch 的主要配置文件,它的内容也决定了 menuconfig 命令的布局(例如 make ARCH=tsar menuconfig)。 很难给出一个例子,因为它在很大程度上取决于目标架构,但是可以参考一下其他简单架构的配置文件。 + +> The `defconfig` file (e.g. `configs/tsar_defconfig`) is necessary to complete the files related to the Linux kernel build system (kbuild). Its role is to define the default configuration for the architecture, which basically means specifying a set of configuration options that will be used as a seed to generate a full configuration for the Linux kernel compilation. Once again, starting from defconfig files of other architectures should help, but it is still advised to refine them, as they tend to activate many more features than a minimalistic system would ever need—support for USB, IOMMU, or even filesystems is, for example, too early at this stage of porting. + +defconfig 文件(例如 configs/tsar_defconfig)是完成与 Linux 内核构建系统(kbuild)相关的文件所必需的。 它的作用是定义架构的默认配置,这基本上意味着指定一组配置选项,这些选项将用作种子,以生成用于 Linux 内核编译的完整配置。 再一次,从其他架构的 defconfig 文件开始应该会有所帮助,但仍然建议对其进行改进,因为它们往往会能激活更多功能——支持 USB、IOMMU 甚至文件系统,不过在这个移植阶段搞这些还太早。 + +> Finally the last "not really code but still really important" file to create is a script (usually located at `kernel/vmlinux.lds.S`) that will instruct the linker how to place the various sections of code and data in the final kernel image. For example, it is usually necessary for the early assembly boot code to be set at the very beginning of the binary, and it is this script that allows us do so. + +最后,要创建的最后一个“不是真正的代码但仍然很重要”的文件是一个脚本(通常位于 kernel/vmlinux.lds.S),它将指示链接器如何将各个代码和数据部分放置在最终的内核映像中 . 例如,通常需要在二进制文件的最开头设置早期汇编引导代码,正是这个脚本干的。 + +## 6、结论(Conclusion) + +> At this point, the build system is ready to be used: it is now possible to generate an initial kernel configuration, customize it, and even start compiling from it. However, the compilation stops very quickly since the port still does not contain any code. + +至此,构建系统就可以使用了:现在可以生成初始内核配置,对其进行自定义,甚至开始编译。 但是,编译很快就会停止,因为该移植仍然不包含任何代码。 + +> In the next article, we will dive into some code for the second portion of the port: the headers, the early assembly boot code, and all the most important arch functions that are executed until the first kernel thread is created. + +在下一篇文章中,我们将深入探讨移植的第二部分的一些代码:头文件、早期的汇编引导代码,以及在创建第一个内核线程之前执行的所有最重要的 arch 函数。 \ No newline at end of file