A Simple Virtual Machine |
Written by Alexey Lyashko | ||||
Wednesday, 01 February 2012 | ||||
Page 2 of 3
Pseudo Assembly LanguageNow, when we are done with the file format, we have to define our pseudo assembly language. This includes both definition of commands and instruction encoding. As this VM is designed to only code/decode short text message, there is no need to develop full scale set of commands. All we need is MOV, XOR, ADD, LOOP and RET. Before you start writing macros that would represent these commands, we have to think about instruction encoding. This is not going to be difficult - we are not trying to be Intel. For simplicity, all our instructions will be two bytes long followed by one or more immediate arguments if there are any. This allows us to encode all the needed information, such as opcode, type of arguments, size of arguments and operation direction: typedef struct _INSTRUCTION Define the following constants: /* Operand types */ It seems to me that there is no reason to list all the macros defining our pseudo assembly opcodes here, as it would be a waste of space. I will just list one as an example. This will be the definition of MOV instruction: Constants to be used with our pseudo assembly language Click to enlarge
Macro defining the MOV instruction Click to enlarge
As you can see in the code above, I've been lazy again and decided, that it would be easier to implicitly specify the size of the arguments, rather then writing some extra code to identify their size automatically. In addition, the name of the instruction tells what that specific instruction is intended to do. For example, mov_rm - moves value from memory to register and letters 'r' and 'm' tell what types of arguments are in use (register, memory). In this case, moving a WORD from memory to a register would look like this: mov_rm REG_A, address, _WORD and the whole code section (currently contains only one function) is represented by the image below: This loads address of the message as immediate value into B register; loads length of the message from address described by message_len into C register; iterates message_len times and applies XOR to every byte of the message. "mov_rmi" performs the same operation as "mov_rm" but the address is in the register specified as second parameter. This is what the output looks like in IDA Pro: Header (click to enlarge) Code (click to enlarge) Data and Export sections (click to enlarge) |
||||
Last Updated ( Wednesday, 01 February 2012 ) |