This post is the introduction to a new series which aims to give a walk-through of the Rust compiler, starting from the initial entry point and going right the way through the compilation process. The goal is to give a decent understanding of what exactly is happening behind the scenes when you invoke rustc on a source file.

I've decided to undertake this for one main reason - I want to know what the compiler is doing and how it works.

I've decided to go for this "full stack" approach rather than dig into
one particular area mainly because I don't know what all the areas of
the compiler are and I want to discover as much about it as I can. I'm
sure there are plenty of super interesting parts that I could very easily
never encounter if I was to just dive in and focus on one particular
area over another. So, I'm going right from the start all the way down
to see what I find. I'm sure I'll learn lots, and perhaps some others
will find it beneficial too.

Firstly, if you haven't worked with the source of the compiler before,
I enourange you to check out my article on contributing to the Rust
compiler
which gives an overview on how to build the compiler from source which is useful to know if you want to dig into this sort of stuff.

Series overview

As I mentioned above, in this series I want to journey right through the
entire compilation process. Throughout this series I'll be taking the
following (very simple) Rust program from source through to executable:

use std::io::{Write, stdout};

fn main() {
    let mut out = stdout();
    writeln!(&mut out, "Hello world");
}

This is a pretty simple "Hello world" Rust program. It's more complex
than it needs to be however. The "typical" hello world program in Rust
looks some thing like:

fn main() {
    println!("Hello world");
}

This is obviously more concise than my first example, so why have I
chosen to go with the more verbose option? There are a couple of
reasons. Primarily because it uses a few different language features which I'm curious about, and my hope is that I'll get to learn about how they work throughout the series. Namely, the things my simple program displays:

  • Imports (use std::io::{...};)
  • Variable assignment (let mut out = ...)
  • Function calls (stdout();)
  • Macros (writeln!(...))
  • Warnings (writeln!() produces a result which I've intentionally
    ignored here so we can investigate how the compiler identifies & outputs
    warning messages)

Throughout the series we will track the various changes to &
representations of our simple Rust program and how the compiler deals
with each representation internally.

Compiler version

For the duration of this series I will be working against the version of the compiler as it was at this commit. All references to files in the compiler source will use this version of the compiler to ensure the links always point to the correct portions of code. There's no particular reason I've chosen this version other than the fact this was the latest commit when I started writing the series.

If you want to follow along and build the compiler with the same version I'm using throughout the series I encourage you to checkout and build from that commit. I built my version like so:

$ ./configure --enable-rustbuild
$ python src/bootstrap/bootstrap.py --stage 1 --jobs 2

Which results in rustc version 1.12.0-nightly (58c5716e2 2016-08-08).

Compilation overview

Starting off, how do we even know what happens when the compiler runs? A while back I stumbled across the -Z time-passes option. When rustc gets this option it prints the time taken for each pass of the compiler. For our purposes here this is a good starting place. Throughout the series we will track our program through as many of these passes as we can.

For my version of rustc, this is the output when compiling our sample
program:

time: 0.001; rss: 56MB  parsing
time: 0.000; rss: 56MB  configuration
time: 0.000; rss: 56MB  recursion limit
time: 0.000; rss: 56MB  crate injection
time: 0.000; rss: 56MB  plugin loading
time: 0.000; rss: 56MB  plugin registration
time: 0.275; rss: 93MB  expansion
time: 0.000; rss: 93MB  maybe building test harness
time: 0.000; rss: 93MB  assigning node ids
time: 0.000; rss: 93MB  checking for inline asm in case the target doesn't support it
time: 0.000; rss: 93MB  complete gated feature checking
time: 0.000; rss: 93MB  collecting defs
time: 0.035; rss: 93MB  external crate/lib resolution
time: 0.000; rss: 93MB  early lint checks
time: 0.000; rss: 97MB  AST validation
time: 0.004; rss: 97MB  name resolution
time: 0.000; rss: 97MB  lowering ast -> hir
time: 0.000; rss: 97MB  indexing hir
time: 0.000; rss: 97MB  attribute checking
time: 0.000; rss: 97MB  language item collection
time: 0.000; rss: 97MB  lifetime resolution
time: 0.000; rss: 97MB  looking for entry point
time: 0.000; rss: 97MB  looking for plugin registrar
time: 0.000; rss: 97MB  region resolution
time: 0.000; rss: 97MB  loop checking
time: 0.000; rss: 97MB  static item recursion checking
time: 0.000; rss: 101MB load_dep_graph
time: 0.000; rss: 101MB type collecting
time: 0.000; rss: 101MB variance inference
time: 0.095; rss: 107MB coherence checking
time: 0.001; rss: 107MB wf checking
time: 0.003; rss: 109MB item-types checking
time: 0.042; rss: 110MB item-bodies checking
time: 0.000; rss: 110MB drop-impl checking
time: 0.003; rss: 110MB const checking
time: 0.000; rss: 110MB privacy checking
time: 0.000; rss: 110MB stability index
time: 0.000; rss: 110MB intrinsic checking
time: 0.000; rss: 110MB effect checking
time: 0.000; rss: 110MB match checking
time: 0.000; rss: 110MB liveness checking
time: 0.001; rss: 110MB rvalue checking
time: 0.001; rss: 113MB MIR dump
time: 0.014; rss: 114MB MIR passes
time: 0.002; rss: 114MB borrow checking
time: 0.000; rss: 114MB reachability checking
time: 0.000; rss: 114MB death checking
time: 0.000; rss: 114MB stability checking
time: 0.000; rss: 114MB unused lib feature checking
<std macros>:2:1: 2:54 warning: unused result which must be used, #[warn(unused_must_use)] on by default 
<std macros>:2 $ dst . write_fmt ( format_args ! ( $ ( $ arg ) * ) ) )
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<std macros>:2:1: 2:46 note: in this expansion of write! (defined in <std macros>)
/home/gchp/test.rs:5:5: 5:39 note: in this expansion of writeln! (defined in <std macros>)
time: 0.002; rss: 114MB lint checking
time: 0.018; rss: 114MB resolving dependency formats
time: 0.001; rss: 114MB Prepare MIR codegen passes
  time: 0.000; rss: 117MB       write metadata
  time: 0.472; rss: 124MB       translation item collection
  time: 0.005; rss: 124MB       codegen unit partitioning
  time: 0.011; rss: 131MB       internalize symbols
time: 0.965; rss: 131MB translation
time: 0.000; rss: 131MB assert dep graph
time: 0.000; rss: 132MB serialize dep graph
  time: 0.001; rss: 129MB       llvm function passes [0]
  time: 0.001; rss: 129MB       llvm module passes [0]
  time: 0.039; rss: 133MB       codegen passes [0]
  time: 0.000; rss: 133MB       codegen passes [0]
time: 0.043; rss: 133MB LLVM passes
time: 0.000; rss: 133MB serialize work products
  time: 0.297; rss: 133MB       running linker
time: 0.300; rss: 134MB linking

The above output can be somewhat daunting, so let's categorize them a little. There are six main phases in the compilation process.

  1. Parsing input
  2. Configuration & expansion
  3. Analysis passes
  4. Translation to LLVM
  5. LLVM passes
  6. Linking

My initial plan is to write one post per phase, though I may end up breaking some of them into several posts if they become too long.

The first one will cover parsing, which is the first phase. You can
subscribe to this series to be notified when that lands!

In the mean time, let's look at the entrypoint for the compiler.

Entry point

After a few levels of indirection while looking through the compiler source you'll come to the main function in librustc_driver. This is the main entry point which kicks off the entire process.

Elsewhere in this file there is code for handling command-line options, building configuration for the compilation session, getting the input source, and eventually calling out to compile_input . This compile_input function is what drives the various stages listed above, and will be the starting point of the next post.