# shecc **Repository Path**: studvc/shecc ## Basic Information - **Project Name**: shecc - **Description**: A self-hosting and educational C compiler - **Primary Language**: Unknown - **License**: BSD-2-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2021-01-30 - **Last Updated**: 2023-04-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # shecc : self-hosting and educational C compiler

logo image

## Introduction `shecc` is built from scratch, targeted at 32-bit Arm architecture, as a self-compiling compiler for a subset of the C language. ### Features * Generate executable Linux ELF binaries for ARMv7-A; * Provide a minimal C standard library for basic I/O on GNU/Linux; * The cross-compiler is written in ANSI C, arguably running on most platforms; * Self-contained C language front-end and machine code generator; * Two-pass compilation: on the first pass it checks the syntax of statements and constructs a table of symbols, while on the second pass it actually translates program statements into Arm machine code. ## Compatibility `shecc` is capable of compiling C source files written in the following syntax: * data types: char, int, struct, and pointer * condition statements: if, while, for, switch, case, break, return, and general expressions * compound assignments: `+=`, `-=`, `*=` * global/local variable initializations for supported data types - e.g. `int i = [expr]` The backend targets armv7hf with Linux ABI, verified on Raspberry Pi 3. ## Bootstrapping The steps to validate `shecc` bootstrapping: 1. `stage0`: `shecc` source code is initially compiled using an ordinary compiler which generates a native executable. The generated compiler can be used as a cross-compiler. 2. `stage1`: The built binary reads its own source code as input and generates an ARMv7-A binary. 3. `stage2`: The generated ARMv7-A binary is invoked (via QEMU or running on Arm devices) with its own source code as input and generates another ARMv7-A binary. 4. `bootstrap`: Build the `stage1` and `stage2` compilers, and verify that they are byte-wise identical. If so, `shecc` can compile its own source code and produce new versions of that same program. ## Prerequisites Code generator in `shecc` does not rely on external utilities. You only need ordinary C compilers such as `gcc` and `clang`. However, `shecc` would bootstrap itself, and Arm ISA emulation is required. Install QEMU for Arm user emulation on GNU/Linux: ```shell $ sudo apt-get install qemu-user ``` It is still possible to build `shecc` on macOS or Microsoft Windows. However, the second stage bootstrapping would fail due to `qemu-arm` absence. ## Build and Verify Run `make` and you should see this: ``` CC+LD out/inliner GEN out/libc.inc CC out/src/main.o LD out/shecc SHECC out/shecc-stage1.elf SHECC out/shecc-stage2.elf ``` File `out/shecc` is the first stage compiler. Its usage: ``` shecc [-o output] [-no-libc] [--dump-ir] ``` Compiler options: - `-o` : output file name (default: out.elf) - `--no-libc` : Exclude embedded C library (default: embedded) - `--dump-ir` : Dump intermediate representation (IR) Example: ```shell $ out/shecc -o fib tests/fib.c $ chmod +x fib $ qemu-arm fib ``` `shecc` comes with unit tests. To run the tests, give "check" as an argument: ```shell $ make check ``` Reference output: ``` ... int main(int argc, int argv) { exit(sizeof(char)); } => 1 int main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; case 3: a = 10; break; case 1: return 0; } exit(a); } => 10 int main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; default: a = 10; break; } exit(a); } => 10 OK ``` ## Intermediate Representation Once the option `--dump-ir` is passed to `shecc`, the intermediate representation (IR) will be generated. Take the file `tests/fib.c` for example. It consists of a recursive Fibonacci sequence function. ```c int fib(int n) { if (n == 0) return 0; else if (n == 1) return 1; return fib(n - 1) + fib(n - 2); } ``` Execute the following to generate IR: ```shell $ out/shecc --dump-ir -o fib tests/fib.c ``` Line-by-line explanation between C source and IR: ```asm C Source IR Explanation -------------------+--------------------------+---------------------------------------------------- int fib(int n) fib: Reserve stack frame for function fib { { if (n == 0) x0 = &n Get address of variable n x0 = *x0 (4) Read value from address into x0, length = 4 (int) x1 := 0 Set x1 to zero x0 == x1 ? Compare x0 with x1 if false then goto 1641 If x0 != x1, then jump to label 1641 return 0; x0 := 0 Set x0 to zero. x0 is the return value. return (from fib) Jump to function exit 1641: else if (n == 1) x0 = &n Get address of variable n x0 = *x0 (4) Read value from address into x0, length = 4 (int) x1 := 1 Set x1 to 1 x0 == x1 ? Compare x0 with x1 if true then goto 1649 If x0 != x1, then jump to label 1649 return 1; x0 := 1 Set x0 to 1. x0 is the return value. return (from fib) Jump to function exit 1649: return x0 = &n Get address of variable n fib(n - 1) x0 = *x0 (4) Read value from address into x0, length = 4 (int) x1 := 1 Set x1 to 1 x0 -= x1 Subtract x1 from x0 i.e. (n - 1) + x0 := fib() @ 1631 Call function fib() into x0 push x0 Store the result on stack fib(n - 2); x0 = &n Get address of variable n x0 = *x0 (4) Read value from address into x0, length = 4 (int) x1 := 2 Set x1 to 2 x0 -= x1 Subtract x1 from x0 i.e. (n - 2) x1 := fib() @ 1631 Call function fib() into x1 pop x0 Retrieve the result off stack into x0 x0 += x1 Add x1 to x0 i.e. the result of fib(n-1) + fib(n-2) return (from fib) Jump to function exit } Restore the previous stack frame exit fib ``` ## Known Issues 2. The generated ELF lacks of .bss and .rodata section 3. The unary `*` operator is not supported, which makes it necessary to use `[0]` syntax. Consider `int x = 5; int *ptr = &x;` and it is forbidden to use `*ptr`. However, it is valid to use `ptr[0]`, which behaves the same of `*ptr`. 4. The support of varying number of function arguments is incomplete. No `` can be used. Alternatively, check the implementation `printf` in source `lib/c.c` for `var_arg`. 7. The C front-end is a bit dirty because there is no effective AST. 8. No function pointer is supported. ## License `shecc` is freely redistributable under the BSD 2 clause license. Use of this source code is governed by a BSD-style license that can be found in the `LICENSE` file.