How To Create Pragmatic, Lightweight Languages
Article Index
How To Create Pragmatic, Lightweight Languages
Part I: The Basics
Part II Compiling

 

6: Mapping: from the parse-tree to the Abstract Syntax Tree
It's not all over for the parsing phase yet. What's left is to transform the Parse tree to another tree, the Abstract Syntax Tree (AST) which will serve as the basis of performing validation and producing compiled code. At this point, as the book puts it "The kind of operations that we are going to perform are based on the semantic content of the code, not its syntactic structure."

Transforming the parse tree is a bit more complicated and requires working with the classes generated by ANTLR to represent the nodes of the parse tree. So the rest of the chapter does that, mapping each parse tree node to a Kotlin class.In other words, we treat the AST as a collection of Kotlin classes and interfaces.

In addition, we also have to resolve symbols. By resolving symbols we create new links between references and their declaration. In a sense we transform our tree into a graph, because links are not strictly hierarchical. This is further expanded in 7: Symbol resolution".

The following chapter, 8: Typesystem, is a very interesting one in that it attempts to be more theoretical in tackling the type system pertaining to a language, so it briefly catalogues the most common types found in a language such as Java:

  • Basic Types such as boolean, types for numbers, character, string
  • Declared types classes, structs, interfaces
  • Parametric types. I'll relay the explanation because I find it rather intuitive:
    Simple types are types which do not have any sort of parameter. So two instances of that type are indistinguishable. An int is just an int.Not all types are like that: think about arrays or collections. An array of int is not the same type as an array of string. This is because array is a parametric type. Or if you wish, it is not a type at all, it is more like a type template, something that you can use to create proper types like array of float, or array of double.
  • Subtypes.

We also define our type system's rules in order to check for semantic errors such as:

  •     defining twice variables with the same name
  •     referring to a symbol that was not defined
  •     trying to assign a value to a variable of an incompatible type

In case you're not in sync with the background theory, just jump into the code examples which leverage these concepts.Everything will become clear after going through the code which is annotated and explained by the author after it gets dissected.This principle runs throughout the book.Kotlin is surely playing its part with its clean syntax in rendering things more accessible.

Of course the chapter wouldn't end without defining a type system for our toy languages.

9: Validation, which has sections for both MiniCalc and StaMac, brings Prt 1 to an end.

 

bookrevunix

  (click on book cover for details on Lean Pub site)

 

Part II: Compiling

At this point you should be wondering, "OK, I've done both syntactic and semantic validation, what's next?" Doing something with the information obtained of course! This could be interpreting the code to execute something in response; compiling it to native code or bytecode; or even generating something else like a graph or some code for another language.

But first, in 10: Build an interpreter we look at the components required which, surprisingly, do not include data structures or garbage collectors; we can just reuse the ones from the host language:

Do you need a map in your language and you are writing the interpreter in Java? Just use a Java map in your interpreter. Same applies to lists, hash tables and so on. This significantly reduces our work.

That is the modern way of building compilers.Reusing existing, well crafted components to lessen the workload.

The interpreter code written in Kotlin follows, but don't imagine thousands of lines of code. Kotlin makes it a breeze since it reduces boilerplate.As far as the interpreter vs compiler debate goes, the book has the following to say, which also puts a cap on the chapter:

In many cases starting by writing an interpreter is just easier compared to writing a compiler.I would suggest going thorugh this path at least while you are designing the language and it is not yet stable. Have fun writing interpreters and when you are ready let’s move to the next chapter and explore an alternative: generating bytecode.

So bytecode it is, in 11: Generate JVM bytecode. But first what are the advantages of compiling to JVM bytecode?

we can leverage all the JVM libraries, we can run the same code on all platforms supported by the JVM, but the user must have the JVM installed.

In order to do that we have to build a Compiler and even before that we have to familiarize ourselves with the inner workings of the Java Virtual Machine, therefore we first examine how Java code ends up in bytecode. As such we see how Classes, Mathematical operations, Type conversions, Operations on objects, method invocations, etc are represented in bytecode.

As for building the compiler, another component is called into play, ASM, a library which can produce class files and bytecode. Again, advanced constructs and functions such as generics and arrays, inner classes, invokedynamic, control-flow statements, etc have been left out. Bear in mind that as happens in almost every chapter, the book doesn't cover advanced language elements. not because it can't but because it wants to keep it simple, rather acting as the gateway to further research. As such, this chapter presents the most common, useful concepts you can leverage to write compilers for the JVM.

Then there comes its how-to-compile-to-native-code counterpart, in 12: Generate an executable using LLVM. In considering the advantage of compiling to native code:

we do not need any software installed to run on the machine of the user, we can reuse all native libraries but we need to generate different executables for different platforms.

Discussing all the features of LLVM is beyond the scope of the chapter, but for our purposes we need to know that LLVM is able to work with an intermediate representation (IR) of low-level code. You can think of the IR as a sort of portable assembler and in fact the aim is to leverage it through KLLVM, a library to generate LLVM IR in Kotlin.

So after looking into bytecode, we now must examine IR and a good way to do that is to write simple examples in C and use the clang compiler to generate the corresponding IR code which we compare against the C listings to figure out what's happening.

This chapter thus gets very technical, and if, like me. you have no previous experience of how LLVM and IR works, you'll find it difficult to follow, even at a basic level. There's a lot of voodoo going on and probably justified as compiling to native code was always considered non trivial, as other languages like Perl have discovered through unsuccessful experiments such as the perlcc tool - although RPerl looks like making the C jump, but only for a subset of the Perl language. Makes you feel grateful for the existance of the Virtual Machines!

The final chapters 13:Syntax highlighting and 14: Autocomplete
put everything that we've accomplished so far together in demonstrating proof of concept.

In the end, what this book has managed to do is to untangle a skein which had the notions of lexing, parsing, transformation, interpretation and compilation tied tightly together, and broken them apart into clean, well-defined parts, which under the unix philosophy they do one thing and they do it well, working together in order to reach to something bigger and more complex.

Further, serving those parts in bite size chunks renders them easier to absorb as the basics of post-modern compiler design are laid out. Lightweight but precise won't turn you into a language engineer as such, but at the very least will pave the way to finding that path yourself by removing the barrier and mist surrounding the art of writing compilers.

For those of  you who have decided to  build your own language or tool, and thus have the necessary impetus, I can recommend this as THE entry level guide to lexing, parsing and compiling with no prior experience required. 

 

How To Create Pragmatic, Lightweight Languages

 

To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.

Banner


Continuous Architecture In Practice (Addison-Wesley)

Author: Murat Erder, Pierre Pureur and Eoin Woods
Publisher: Addison-Wesley
Pages: 352
ISBN: 978-0136523567
Print: 0136523560
Kindle: ‎B08ZRTQGLJ
Audience: Software Architects
Rating: 3
Reviewer: Kay Ewbank

This book sets out the case for why software architecture is more important than ever, and in p [ ... ]



PHP In Easy Steps, 4th Ed

Author: Mike McGrath
Publisher: In Easy Steps
Date: April 2021
Pages: 192
ISBN: 978-1840789232
Print: 1840789239
Kindle: B08ZSV3MNH
Audience: People wanting to learn PHP
Rating: 4
Reviewer: Ian Elliot
PHP isn't a fashionable language, but this doesn't mean it isn't worth learning.


More Reviews



Last Updated ( Tuesday, 08 August 2017 )