Gavare's eXperimental Emulator: Dynamic Translation

Dynamic Translation

Static vs. dynamic
Executable Intermediate Representation
Performance
Instruction Combinations
Native Code Generation Back-ends

Static vs. dynamic:

In order to support guest operating systems, which can overwrite old code pages in memory with new code, it is necessary to translate code dynamically. It is not possible to do a "one-pass" (static) translation. Self-modifying code and Just-in-Time compilers running inside the emulator are other things that would not work with a static translator. GXemul is a dynamic translator. However, it does not necessarily translate into native code, like many other emulators.

Executable Intermediate Representation:

Dynamic translators usually translate from the emulated architecture (e.g. MIPS) into a kind of intermediate representation (IR), and then to native code (e.g. AMD64 or x86 code). Since one of my main goals for GXemul is to keep everything as portable as possible, I have tried to make sure that the IR is something which can be executed regardless of whether the final step (translation from IR to native code) has been implemented or not.

The IR in GXemul consists of arrays of pointers to functions, and a few arguments which are passed along to those functions. The functions are implemented in either manually hand-coded C, or automatically generated C. In any case, this is all statically linked into the GXemul binary at link time.

Here is a simplified diagram of how these arrays work.

There is one instruction call slot for every possible program counter location. In the MIPS case, instruction words are 32 bits in length, and pages are (usually) 4 KB large, resulting in 1024 instruction call slots. After the last of these instruction calls, there is an additional call to a special "end of page" function (which doesn't count as an executed instruction). This function switches to the first instruction on the next virtual page (which might cause exceptions, etc).

The complexity of individual instructions vary. A simple example of what an instruction can look like is the MIPS addiu instruction:

	X(addiu)
	{
	        reg(ic->arg[1]) = (int32_t)
	            ((int32_t)reg(ic->arg[0]) + (int32_t)ic->arg[2]);
	}

It stores the result of a 32-bit addition of the register at arg[0] with the immediate value arg[2] (treating both as signed 32-bit integers) into register arg[1]. If the emulated CPU is a 64-bit CPU, then this will store a correctly sign-extended value into arg[1]. If it is a 32-bit CPU, then only the lowest 32 bits will be stored, and the high part ignored. X(addiu) is expanded to mips_instr_addiu in the 64-bit case, and mips32_instr_addiu in the 32-bit case. Both are compiled into the GXemul executable; no code is created during run-time.

Performance:

The performance of using this kind of executable IR is obviously lower than what can be achieved by emulators using native code generation, but can be significantly higher than using a naive fetch-decode-execute interpretation loop. In my opinion, using an executable IR is an interesting compromise.

The overhead per emulated instruction is usually around or below approximately 10 host instructions. This is very much dependent on your host architecture and what compiler and compiler switches you are using. Added to this instruction count is (of course) also the C code used to implement each specific instruction.

Instruction Combinations:

Short, common instruction sequences can sometimes be replaced by a "compound" instruction. An example could be a compare instruction followed by a conditional branch instruction. The advantages of instruction combinations are that

the amortized overhead per instruction is slightly reduced, and
the host's compiler can make a good job at optimizing the common instruction sequence.

The special cases where instruction combinations give the most gain are in the cores of string/memory manipulation functions such as memset() or strlen(). The core loop can then (at least to some extent) be replaced by a native call to the equivalent function.

The implementations of compound instructions still keep track of the number of executed instructions, etc. When single-stepping, these translations are invalidated, and replaced by normal instruction calls (one per emulated instruction).

Native Code Generation Back-ends:

In theory, it will be possible to implement native code generation, similar to what is used in high-performance emulators such as QEMU, as long as that generated code abides to the C ABI on the host.

However, since I wanted to make sure that GXemul works without such native code back-ends, there are no implemented backends in this release.