A Simple Scripting Language – Part 5 


Posted by Tristan on June 21st, 2011.

Categories: programming.

Part 1 | Part 2 | Part 3 | Part 4 | Part 5

Its been a while since I’ve written a post on the simple scripting language.  While steadily working on Trudy’s Mechanicals,  I’ve recorded and released an album with my band Bacchus and integrated the simple scripting language into our internal projects. So without further ado, I’ll be looking at the virtual machine (VM) that the script’s compiled output will run on.

The reason I’ve skipped ahead is that I can’t start outputting assembly until I’ve designed the virtual machine (VM) that it will run on.

In the past I’ve designed stack based VMs exclusively, so this time I decided to try a register based VM.  Register based VMs  are predominantly RISC based, with fixed size instructions and simple addressing modes.  Register based VMs have the advantage that each instruction can do more than a stack based one, but stack-based assembly has the advantage that it  takes up less space.  There are exceptions to both of these cases.

An example of the size/complexity trade-off can be seen in the following arithmetic example.

int x,y;
int z = x*y + 1

Stack-based assembly (JVM inspired) output would be roughly:

  1. LOAD X (1 to 2 bytes) – push x on to the top of the stack
  2. LOAD Y (1 to 2 bytes) – push y on to the top of the stack
  3. MULTIPLY (1 byte) – remove the top two items from the stack, multiply them and push the result
  4. INCREMENT (1 byte) – increment the value on the top of the stack
  5. STORE Z (1 to 2 bytes) – pop the top value from the stack and place it into the local variable z

Register-based assembly (ARM inspired) output would be roughly:

  1. MULTIPLY reg[tmp] = reg[x]*reg[y] (4 bytes)
  2. ADD_IMMEDIATE reg[z] = reg[tmp] + 1 (4 bytes)

As you can see stack-based assembly would be 5 instructions, with a size between 5 and 8 bytes.  Register based assembly code would be 2 instructions, with a size of 8 bytes

The Virtual Machine

In physical computers each additional register has a significant manufacturing expense.   The only cost of adding more registers in a VM is decreasing the number of bits used for opcodes or decreasing the number of registers that can be specified in a single instruction.

Like LUA 5, I found that 6 bits (64 opcodes) is more than sufficient for the number of instruction types I wanted my VM to contain. The virtual machine I’ve designed has 256 registers, available to a function at a single time.  Each register can hold either a 32-bit integer or a floating point value.

Each 32 bit instruction falls into one of the following four categories:

FORMAT 1 – Three Registers src register 2 (10 bits) src register 1 (8 bits) dst register (8 bits) opcode (6 bits)
FORMAT 2 – Two register and an immediate value src immediate (10 bits) src register (8 bits) dst register (8 bits) opcode (6 bits)
FORMAT 3 – One register and an immediate values immediate (18 bits) register (8 bits) opcode (6 bits)
FORMAT 4 – Just an immediate value immediate (26 bits) opcode (6 bits)

In addition to the 256 general purpose registers, two special purpose “registers” are required for the VM to execute:

  • PC – program counter.  A pointer to the next instruction in memory to execute
  • BP – base pointer.  A pointer to the location in memory, where the 256 registers start at, for the currently executing function.

List of Instructions

Below is a table of op-codes that I’m currently supporting:

Arithmetic

  • MOV_IMM -  dst register = 18 bit sign extended immediate
  • MOV_CONST – dst register = constant_pool[18 bit immediate]
  • ADD – dst register = src register 1  + src register 2
  • ADD_IMM – dst register = src register 1 + 10 bit sign extended immediate
  • SUB – dst register = src register 1 – src register 2
  • MUL – dst register = src register 1 * src register 2
  • DIV – dst register = src register 1 / src register 2
  • FADD – dst register = src register 1  + src register 2  (treated as floats)
  • FSUB – dst register = src register 1  – src register 2 (treated as floats)
  • FMUL – dst register = src register 1  * src register 2 (treated as floats)
  • FDIV – dst register = src register 1  / src register 2 (treated as floats)

Compares / Jumps

  • CMPEQ – dst register = (src register 1 == src register 2) ? 1 : 0
  • CMPNE – dst register = (src register 1 != src register 2) ? 1 : 0
  • CMPLT – dst register = (src register 1 < src register 2) ? 1 : 0
  • CMPLE – dst register = (src register 1 <= src register 2) ? 1 : 0
  • FCMPLT – dst register = (src register 1 < src register 2) ? 1 : 0   (treated as floats)
  • FCMPLE – dst register = (src register 1 <= src register 2) ? 1 : 0  (treated as floats)
  • JNE – if (dst register != 0) {  pc += sign extended 18 bit immediate }
  • JEQ – if (dst register == 0) {  pc += sign extended 18 bit immediate }
  • J – pc += sign extended 26 bit immediate

Operational

  • CALL – call the function [ 18 bit immediate] retrieving parameters starting at  register’s index
  • YIELD – interrupt execution
  • RETURN – return the value contained in the destination register of the callee, and resume the exection of the caller
  • RETURN_I – return an 18 bit immediate, and resume the execution of the caller

nb.  this is in no way a final list

Calling Conventions

In stack based languages, the arguments to a  function  are taken from the top of the stack. However, in a register based machine, there is no notion of a “top element of the stack”.  When we CALL a function we need to specify where the arguments should be retrieved from.

The CALL opcode has a register and an 18 bit immediate value as its parameters.

The 18bit value specifies the index  into a table of functions (internal to the VM) to invoke.  The register parameter specifies where in the current register set that the arguments reside.

[addNumbers:  1  and: 2 and: 3];

Would compile to:

MOV     r0,  #1
MOV     r1, #2
MOV     r2, #3
CALL      r0, [index of function "addNumbers:and:and:"]

The table of functions contains the following information:

  • code pointer – code to execute for the function
  • number of arguments – the number of arguments that the function requires
  • register count – how many registers this function requires (I’ll use this later to check that we don’t exceed memory limits)

When the CALL is executed it shuffles the arguments 2 register indexes up and stores the current BP and PC below them for later use.

?*,?,1 , 2, 3 (inside caller)
?,?,BP,PC, 1* , 2, 3  (inside callee)

* represents where the current BP is pointing at

PC is now set to the code pointer found in the table of functions
BP is set to point to the first argument to the function.  This has the advantage that all arguments start at register indices 0 through N.  This also places the constraint that the caller must not have any useful data placed in registers higher than the ones being passed as arguments as they may be overwritten by the callee.

When a RETURN instruction is encountered within the callee, the old BP and PC are retrieved so that execution can continue from where it left off in the caller.   If a return value is specified, it is place into the position of the first argument inside the caller.

?,?,BP,PC, 1* , 2, 3  (inside callee)
?*,?,return value, ?,? (inside caller)

 

Implementation

The software implementation of a VM is usually a combination of  a loop and a switch or jump table:

int * bp = start address of the stack
int * pc = start address of the code to run
for (;;)
{
int instruction = *pc;
pc++;

switch(OP(instruction))
{
case    ADD:
bp[R0(instruction)] = bp[R1(instruction)] + bp[R2(instruction)];
break;

case ADD_IMM:
bp[R0(instruction)] = bp[R1(instruction)] + IMM10(instruction);
break;

….
}
}

Constant Pools

There are two types of constants contained within the VM:  32 bit constants and string constants.

  • 32-bit constants are used when a number (int or float) cannot be represented with the number of bits provided by an immediate value.  To get over this limitation, MOV_CONST moves a 32-bit value from the constant pool into a specified register.
  • Since strings are immutable in the language, the string pool is used to store the actual character representation of them.  When strings are placed into registers by the language,under the hood it is not a string pointer copied, but the index of the string within the string constant pool

Wrap Up

You can download the current version of the VM here.   Next time I’ll be bridging the gap between the last two posts and starting to output assembly code to run on this VM.

As always you can follow me on twitter at @igtristan