Skip to main content
Riya Bisht

Creating a Toy Compiler

In this project, we are creating a toy compiler - Ukiyo that takes rust code and emits LLVM IR.

Rust (frontend ) → LLVM IR (middle-end) → Code generation(x86 assembly)(back-end)

Frontend of Compiler(Lexical and Syntactic Anaylsis): #

Performs the lexical anaylsis and syntactic anaylsis. Commonly known as Scanning and Parsing.

Scanning → generates tokens

Parsing → takes tokens as input and converts it into an abstract syntax tree(AST)

Middle-end of compiler(Optimization): #

The abstract syntax tree is then converted into an intermediate representation(IR), here we are using LLVM infrastructure, so LLVM IR will be generated. Optimizations like eliminating redundant computations, etc are performed on LLVM IR that outputs the optimized IR code.

Backend of Compiler(Code Generation): #

The optimized IR is not ready to execute on machine architecture. Hence, the conversion of

optimized IR to machine code takes place, also known as Code Generation.

In a nutshell, a rust code is broken into LLVM IR which further transformed into machine code for final code execution.

Sometimes in the code generation phase, we are not emitting the final machine code but an assembly code(which is closest to the machine code but not machine code itself).

This requires us to add an additional step of converting assembly code to machine code using a tool called Assembler.

Linking and Loading #

The machine code is generated but that code is dependent on multiple header files and other library dependencies. Linker is used to link those files into a single shared object file called .so file. This file is given to the loader which loads the final .so files into the main memory during runtime/program execution time.

To understand more about the linking and loading: refer→ my upcoming linking and loading blogs.

[Diagram of phases of compiler w.r.t to Ukiyo compiler]