The Story Behind My Compiler
I first seriously became interested in C compilers after the book “Writing a C compiler” by Nora Sandler came out. Before that I had read the first couple of chapters of the LCC book. I had written compilers in mainly Swift for a several toy languages, with no AI/LLM assistance, because it didn’t exist. I particularly liked going through “Writing an Interpreter/Compiler” series by Thorsten Ball, though Go was the implementation language in those books if memory serves. I first started working through the project using Swift. Ultimately I moved away from it for two reasons: XCTest’s testing harness was “weird”, and Swift is basically an Apple only language. So sometime last summer I made the jump to C++.
The first thing I tackled was porting over the test infrastructure: I rewrote the initial harness using Catch2 and CMake, folding in the test suite that comes with Sandler’s book. A deliberate choice I made early on was to model my AST after Clang’s. I just didn’t see a compelling reason to design something completely novel in that specific department.
From there I moved on to lexing and parsing. It’s almost amusing to look at the early code compared to where things stand now. The codebase has grown far more sophisticated, yet it’s clearly grown around the original architecture: there are still plenty of fingerprints from those early decisions. Although some things have changed: what was common_keyword_table in the current code was a simple long list of initializers in a unordered_map initialized dynamically rather than statically in the constructor of Lexer.
As the lexer and parser matured, though, codegen started to feel like the weak link. I moved away from the TACKY IR described in Sandler’s book and adopted a minimal IR inspired by QBE instead. Even so, backend work still didn’t feel right. I was constantly fighting ARM64 memory/register movement rules and I didn’t like the interface of the codegen to arm64 emission. I considered using LLVM instead, but I didn’t know much about integrating with LLVM IR so progress slowed for a while.
Until December 2025 I had hand-written all of the code. Dax Reed, who built the OpenCode coding agent, has talked about how he always writes a project’s initial code (“interface”) by hand and only brings in agents later on. I think that practice is a big part of why this compiler has scaled the way it has, especially contrasted with other projects of mine where I didn’t write or review any of the foundational code myself. I’m returning to handwriting the core pieces and interfaces for future projects.
December 2025 is when coding LLMs took a pretty dramatic leap in capability. I was home on winter break and had a “minor” realization: I’d spent essentially every previous break binging video games I didn’t even genuinely enjoy. Ruling the Balkans as Serbia-turned-Yugoslavia or as the Ottoman Empire in Victoria 3 gets old.
I wanted something more productive to fill the hours, and with these new tools suddenly available, it felt like the right moment to push the compiler project forward again. The timing worked out: LLM assistance was immediately handy for the ast2llvm glue code in particular. That gave me enough traction to keep the momentum going, and I’ve been actively developing the compiler since.
Later posts will go deeper into the technical evolution of the project. One example worth previewing: semantic analysis started out woven directly into the parser with no real separation between the two, and over time it was refactored into a distinct Sema pass that runs after parsing is complete, aside from the well-known typedef edge case. But that involved duplicating the entire AST twice (a scalability issue, plus we never used the old AST for anything and there was a mismatch between duplication and in place node replacement between various areas of Sema), and I also needed to lay groundwork for C++, where features like templates make parsing far more context-sensitive. So I ended up moving to an in-place Collect system, which is loosely modeled on Clang’s ActOn architecture. Both migrations were not easy: to anyone making a C/C++ compiler I recommend you start off with the Collect style structure right off the bat even if you are keeping it to C only.