A Compiler Writing Playground
Written by Nikos Vaggalis   
Friday, 25 November 2022

"Create Your Own Compiler" is an interactive tutorial that step by step shows how to write your own simple compiler that transforms JavaScript into Lisp. Along with it, we take a look at what a compiler actually is and the state of the art that is Roslyn.

Compilers are important, but most people go day by day using their favorite programming language and tools without
thinking too much about them, ignoring what happens under the covers.

However peeking into that black box and learning to write a compiler gives you super powers. It will allow you to write custom tools, min languages/DSLs, make your own fully fledged language, or as in "Create Your Own Compiler", transform one language to another!

A prime example of why the latter, in other words transpilation, has proved indispensable is the case of Babel. Since not all browsers are able to cope with all the latest Javascript language features, Babel translates that newest Javascript code into backwards compatible version of JavaScript in current and older browsers or environments.

Yet another example is the case of Typescript which adds optional typing (on that matter make sure to also check Sorbet - Making Ruby Statically Typed) to Javascript, acting as a statically typed and better superset of it. The TypeScript compiler analyzes and compiles the TypeScript code into JavaScript in order to run on any browser. Since the VM engine that runs Javascript is there, why not reuse it instead of building one from scratch to support our own language? It's easier to convert!

Fable, is yet another X-to-Javascript transpiler. Fable transpiles F# to ES2015 JavaScript so code written in F# can run anywhere JavaScript runs - the browser, Node.js, Electron,React Native or generally V8.

But a compiler's most popular application is for programs to translate from a higher language to a lower-level language in order to create an executable program;see C.

Every compiler works by executing several well defined phases, each phase taking the input of the previous one until it finally produces runnable code.

The first phase is tokenization by the part called the lexer. It takes a stream of characters and using regexes groups them according the language syntax into what's called the tokens - keywords functions, operators, etc.

The next phase is parsing. The parser takes the stream of tokens made by the lexer and represents them in a structure, the abstract syntax tree, something much easier to work with.

The next phase is the semantic analysis where the compiler considers the language's syntax constraints and the data types. It makes sure that the code is well-formed and well-typed.

The next phase is to optimize the AST - eliminating dead code using techniques like Tree shaking for example. The result of this phase is the Intermediate Representation or IR. IR does itself undergo optimizations specific to the target CPU architecture to produce machine code.

The last step is to produce a standalone executable (runtimes that work with bytecode like the JVM, work with IR instead of creating an executable), something usual in C programming but with the new tools now available, even high level languages like Java, under GraalVM, can compile to native executables.

The above list is simplified of course but in general the steps you have to take in order to take an input source and transform it to the desired output are

 

  • Lexing
  • Parsing
  • Building up an Abstract Syntax Tree (AST)
  • Generating IR code for the given AST
  • Optimizations on the generated IR code
  • Generate machine code

 

Add to those the step of defining the syntax of your new programming language, if you want to go that way.

The "Create Your Own Compiler" playground makes that complicated process easy to go through. Actualy is an annotated walkthrough of Jamie Kyle's "The Super Tiny Compiler", a simple compiler written in Javascript. The goal of the tutorial is to compile a Lisp statement into Javascript. Along the way we go through the different stages of Lexical Analysis, Syntactic Analysis, Transformation, and Code Generation.

Each stage is broken into multiple steps and each step comes with the annotated code interactively. It's a great way to get your feet wet and to grasp the bare concepts.

The other, post-modern way of building compilers is by going the Roslyn way. Write a compiler for the language in that language? Microsoft has done that with the state of the art compiler platform, Roslyn.

As to the question of what Roslyn actually is, what is better than getting the authoritative answer than by a member of the Roslyn team , the renowned C# Guru himself, Eric Lippert? The opportunity came about in the form of an interview that he gave us back in 2014:

NV: Roslyn's official definition states that it is a "project to fully rewrite the Visual Basic and C# compilers and language services in their own respective managed code language; Visual Basic is being rewritten in Visual Basic and C# is being rewritten in C#. "
How is C# being rewritten in C# ?

EL: When I was at Microsoft I saw so many people write their own little C# parsers or IDEs or little mini compilers or whatever, for their own purposes. That's very difficult, it’s time-consuming, it's expensive, and it's almost impossible to do right. Roslyn changes all that, by giving everyone a library of analysis tools for C# and VB which is correct, very fast, and designed specifically to make tool builder's lives better. I am very excited that it is almost done! I worked on it for many years and can't wait to get my hands on the release version.

Click on this link to read the rest of the Eric's comments.

Building your compiler using Roslyn gives you distinct advantages:

 

  • Massive performance improvement and built-in mechanism for handling dynamic objects. Crucial functionality for code emitting, parsing assemblies and the structure of the compiler itself that result in assemblies portability and the possibility of integrating it with tools available only for C# (code analysis, VS extensions).
  • Cross platform capability since Roslyn produces portable class libraries compatible with Mono and the . NET Core.
  • Visual studio integration and other functionality including code colourization, syntax highlighting and IntelliSense.

 

Magic!

More Information

Create your own compiler

Related Articles

C# Guru - An Interview With Eric Lippert

Fable - Write Front-End Apps For The Web In F#

Sorbet - Making Ruby Statically Typed

How To Create Pragmatic, Lightweight Languages

Take Cornell's CS 6120 Advanced Compilers For Free  

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


The Art Of Computer Programming - A Great Present
15/12/2024

If you are looking for a programmer present this holiday season, there is one book, or set of books, that should be top of any list... Donald Knuth's The Art of Computer Programming.



RAG from Scratch
10/12/2024

The "RAG from Scratch" tutorial by Langchain coupled with the "RAG playground" are two great educational resources that will help you kickstart your journey with RAG.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Friday, 25 November 2022 )