Compilers, Interpreters, VMs and JIT
Written by Mike James   
Thursday, 09 May 2024
Article Index
Compilers, Interpreters, VMs and JIT
The Interpreter
Virtual Machines And Intermediate Languages

The distinction between a compiler and an interpreter is one that can cause controversy. One programmer's compiler is another's interpreter and the whole subject gets very murky when you throw in the idea of the Virtual Machine and Just In Time compilation. So what is it all about?

If you want to start an argument between programmers or software engineers then just say “language X is implemented by an interpreter”.

If that doesn’t cause a heated exchange, then change “interpreter” to “compiler” in the same sentence. For while there are languages that are generally accepted as being implemented usually as one or the other, the situation is rarely clear cut.

Back in the early days of computing the distinction was much more important and the debate about compilers versus interpreters would rage in the pages of any magazine or journal and verbal wars were common. Today, with more powerful machines, the issue isn’t quite as important but it can still raise a good argument! 

It can also be the deeper cause of a language war. For example, C++ v C# isn't just about which language is better it is also about the way that the languages are generally implemented. C++ is usually compiled to machine code where C# is usually compiled to an intermediate code and then interpreted or JITed.

But this is getting ahead of our selves.

As well as being controversial it is also still at the leading edge of development although now we tend to talk more about “virtual machines” and “Just In Time Compilers” than straightforward interpreters and compilers and this makes everything dissolve into shades of grey.

In other articles we have looked at the progression from machine code to assembler and on to high-level languages. An inherent assumption in this discussion is that the language would always be translated, or compiled, to machine code before it was expected to do anything much - but this isn’t the only possible approach. 

The run time

The idea of an interpreter as opposed to a compiler evolved slowly and it isn’t very easy to say exactly where the idea came from.

In the early days assemblers and compilers would translate every last instruction of a high-level language program into machine code and the resulting machine code was then taken and run on a "real" machine. When you think about it what other way could you get a program written in say, Fortran, to run on hardware that knew nothing of Fortran and only understood machine code? 

In other words the process of compiling from a language to machine code seemed unavoidable and the only way to do things. This was a major problem however as no one had any idea how to do it and it wasn't at all obvious that the resulting automatically generated code would be fast enough to be useful. Eventually the first compilers were created and they proved valuable.

Then some clever programmer had an idea. Why not make use of a library of machine code subroutines to make life easier?

For example, if the high level language program contained the line

A=B*C

then a standard compiler approach would take the “B times C” part of the and translate it to the machine code equivalent. Which, if you were lucky would be something like:

MUL B,C

However, many early machines didn’t have a multiply command and so the multiplication had to be built up using add and shift instructions. A true compiler would and should take the high level multiplication B*C and convert it into a complete sequence of machine code instructions that performed the multiplication. 

You can see that implemented this way a simple operation of multiplication would generate a lot of machine code each time you used it.

The clever idea was to create a subroutine that would multiply two numbers together and then a multiplication operation would be compiled to a call to the new subroutine.

That is B*C compiles to:

CALL Multiply

Of course the values in B and C would have to be loaded into the locations in which the subroutine expected to find them before the call, but in principle this is still a great simplification.

The advantage of this approach is that the compiled program is smaller because the multiplication code isn’t repeated every time it is used.

However there are some disadvantages to the method. The first is that there is usually an overhead in calling a subroutine that makes this approach slower than simply compiling in the instructions needed to multiply two numbers together. The second is that now any program that runs on the machine compiled in this way needs access to a chunk of standard code – usually called a “run time library”.

Compilers that use run time libraries are common and they can hide the fact that they are doing so by including a copy of the library with each and every program. This, of course wastes space but it does produce programs that can run without the obvious help of a run time library.

Smarter compilers only compile in the bits of the run time library that a program actually uses.This is what a linker does. It takes the separate sections of a program including any routines needed from the run time library and stitches it all together to create a complete program.

You can even package much of the run time library into separate modules which are used at run time and not compiled into the application - this is what Windows DLLs and Linux SO files are all about. The are shared libraries that are precompiled and available for use by any program.

As time went on run time libraries tended to get bigger and bigger and more sophisticated. The consequence is that more and more of the compiler's time is spent not in compiling to machine code but simply putting in calls to routines in the run time. 



Last Updated ( Thursday, 09 May 2024 )