The Story Behind My Compiler

Recently I have observed people on certain social media platforms seem shook by the release of the compiler and make various comments regarding it. Let me state upfront that I’m a real human being :). And an English native speaker although I can be a bit terse in commit messages. One did get right that I am fairly young (somewhere in my 20s).

The site itself was a Hugo template generated by GPT 5.3 Codex xHigh hence all the LLM-sounding placeholder text (I am not a web dev, but I told it to keep the JS minimal as I don’t like how many sites use JS when they dont need to). Believe it or not it made some mistakes despite webdev being the flagship use case of these tools (at least it corrected them all after I told it about the mistake).

With that out of the way, I was a bit sad none of the commenters picked up on my 67 /cultural-themed testing convention scheme. That was the one aspect I hoped people would find the “weirdest”, and one that will hold on for my future projects. Maybe it’s just another instance of the generational gap between Gen Z and millennials.

I understand where they are coming from: when a literal who from nowhere releases a project that normally take an experienced developer a lifetime to build, people are naturally going to have questions.

With that in mind, I thought it was worth taking some time to shed more light on why I built this and how the development actually unfolded over time. This will likely be a multi-part series.

A short history

I first seriously became interested in C compilers after the book “Writing a C compiler” by Nora Sandler came out. Before I had written compilers in mainly Swift for a several toy languages (with no AI/LLM assistance, because it didn’t exist). I particularly liked going through “Writing an Interpreter/Compiler” series by Thorsten Ball, though Go was the implementation language in those books if memory serves. I first started working through the project using Swift. Ultimately I moved away from it for two reasons: XCTest’s testing harness was “weird”, and Swift is basically an Apple only language. So sometime last summer I made the jump to C++.

The first thing I tackled was porting over the test infrastructure: I rewrote the initial harness using Catch2 and CMake, folding in the test suite that comes with Sandler’s book. A deliberate choice I made early on was to model my AST after Clang’s. I just didn’t see a compelling reason to design something completely novel in that specific department.

From there I moved on to lexing and parsing. It’s almost amusing to look at the early code compared to where things stand now. The codebase has grown far more sophisticated, yet it’s clearly grown around the original architecture: there are still plenty of fingerprints from those early decisions. Although some things have changed: what was common_keyword_table in the current code was a simple long list of initializers in a unordered_map initialized dynamically rather than statically in the constructor of Lexer.

As the lexer and parser matured, though, codegen started to feel like the weak link. I moved away from the TACKY IR described in Sandler’s book and adopted a minimal IR inspired by QBE instead. Even so, backend work still didn’t feel right. I was constantly fighting ARM64 assembly rules (memory/register movement rules IIRC) and I didn’t like the interface of the codegen to arm64 emission. I considered using LLVM instead, but I didn’t know much about integrating with LLVM IR so progress slowed for a while.

Until December 2025 I had hand-written all of the code. Dax Reed, who built the OpenCode coding agent, has talked about how he always writes a project’s initial code (“interface”) by hand and only brings in agents later on (“agents come later”). I think that practice is a big part of why this compiler has scaled the way it has, especially contrasted with other projects of mine where I didn’t write or review any of the foundational code myself (Going forward, I’m returning to handwriting the core pieces and interfaces for future projects).

December 2025 is when coding LLMs took a pretty dramatic leap in capability. I was home on winter break and had a “minor” realization: I’d spent essentially every previous break binging video games I didn’t even genuinely enjoy. (Ruling the Balkans as Serbia-turned-Yugoslavia or as the Ottoman Empire in Victoria 3 gets old). Some of the last non-strategy games I truly enjoyed playing was Cyberpunk 2077, Earthbound, and Resident Evil 4 remake and I got through all those a while back (anyone remember “brat summer”)?

I wanted something more productive to fill the hours, and with these new tools suddenly available, it felt like the right moment to push the compiler project forward again. The timing worked out: LLM assistance was immediately handy for the ast2llvm glue code in particular. That gave me enough traction to keep the momentum going, and I’ve been actively developing the compiler since.

Later posts will go deeper into the technical evolution of the project. One example worth previewing: semantic analysis started out woven directly into the parser with no real separation between the two, and over time it was refactored into a distinct Sema pass that runs after parsing is complete (aside from the well-known typedef edge case). But hat involved duplicating the entire AST twice (a scalability issue, plus we never used the old AST for anything and there was a mismatch between duplication and in place node replacement between various areas of Sema), and I also needed to lay groundwork for C++, where features like templates make parsing far more context-sensitive. So I ended up moving to an in-place Collect system, which is loosely modeled on Clang’s ActOn architecture. Both migrations were not easy: to anyone making a C/C++ compiler I recommend you start off with the Collect style structure right off the bat even if you are keeping it to C only.

Questions

Q: Why is your site and Github new?

A: I’d never actually had a real GitHub account before: nothing I’d worked on was ever submitted anywhere, and none of my courses or jobs required it. My older code lived on GitLab, and before GitHub offered free private repositories, I kept things on Bitbucket.

The blog and the site came about because I intend to keep building this project and others in the open. I used to be perfectly happy just quietly working on things by myself. But the landscape has shifted: the industry increasingly values programmers who put their work out there and actually talk about it. There’s no telling how long this profession will persist in its current form (the METR estimates don’t look good for the future of human workers), but I’m not about to walk away from a hobby and interest that’s shaped most of my life for as long as theres opportunity present.

Q: Which AI models do you use?

A: Initially I relied on the Gemini Code Assist plugin inside CLion. Once Anthropic’s C compiler effort was announced, I brought in Claude Code to handle a lot of the mid-to-late-stage C language features (Codex did most of the advanced preprocessor code after I’d handwritten the early parts). These days I’m working with Codex exclusively on this project. I still use the “Outline” feature of Gemini Code Assist.

That said, as the codebase grows, agents become noticeably worse at adding features and debugging end-to-end: human involvement is still essential. Commit 2ea3b0b0 is a good example: a new field was added to a struct in an earlier commit, which broke a caller using braced initialization without field labels, triggering a spurious error later in the codepath. I fixed it in about 15 minutes. Codex (xHigh) couldn’t find it after 5 (possibly more) hours of effort. There have been countless other bugs I can/do resolve in ten to thirty minutes as a human but take Codex and Claude twelve to twenty-four hours or more (not exaggerating). Although it could also be because they are redirecting compute for Spud and Mythos.

Outside of the compiler itself, I use a mix of Gemini, Claude, ChatGPT, and several Chinese models for research and auxiliary tasks. I’m intentionally diversifying my workflow as usage limits tighten and compute costs look likely to climb.

Q: What’s your take on Circle (the other independent C++ frontend)?

A: Circle is genuinely impressive. However like my own compiler it remains (remained?) a work in progress. At one point I ran it on godbolt.org against a file from the Fujitsu Compiler Test Suite (I forget exactly which one) and it didn’t pass (It didn’t at first with my compiler, for what it’s worth). I’m sure the author could address that but it goes to show compilers have an enormous surface area. Insisting on 100% hand-written code simply wouldn’t have produced viable results in the timespan I have: Circle itself took a decade plus to build. Plus mine is open source people can learn, build, and extend from it.