Chapter 1: TINSEL on the Raspberry Pi – The Beginning

Introduction

In this first Chapter we will produce the first version of TINSEL running on the Raspberry Pi. It will be an exact translation of the Tinsel from Linux and X86 architecture to ARM architecture and the Raspberry Pi. Various extensions will follow in the upcoming chapters.

First things first

Our Compiler is already structured in three key modules: the Scanner, the Parser and the Code Module. So, if this ḧas been done well, we will have to make changes only in the Code Module, to make it produce Arm assembly output instead of X86. Given that I want to retain the X86 capability as well, I will create an

interface CodeModule

and two classes that will implement this interface

class X86_64Instructions(outFile: String = ""): CodeModule
class Arm_32Instructions(outFile: String = ""): CodeModule

The first one will produce X86 84-bit code, while the second one, which we will cover in this chapter, will produce ARM 32-bit code. To switch between the two modes I will introduce new command line arguments that will set the _cpuArchitecture_ variable as follows (defaults to x86):

when (arg) {
    ...
    "-x86" -> cpuArchitecture = CPUArch.x86
    "-arm" -> cpuArchitecture = CPUArch.arm
}

Based on this, the _code_ module object is initialised

code = when (cpuArchitecture) {
CPUArch.x86 -> X86_64Instructions(outFile)
CPUArch.arm -> Arm_32Instructions(outFile)
}

As it turned out, the separation of the functions was quite good in the Compiler. The only thing that needed changing outside the code module was the declaration of the size of our variables, which until now was a global variable:

val INT_SIZE = 8    // 64-bit integers

Given that now we will have X86 in 64 bits and ARM in 32 bits, this variable will be moved into the code module and will be set to the right size in each of the two code classes respectively:

val WORD_SIZE = 4  // 32-bit architecture
override val INT_SIZE = WORD_SIZE 

or

val WORD_SIZE = 8  // 64-bit architecture
override val INT_SIZE = WORD_SIZE 

And with that preliminary work done we are ready to start writing ARM assembly.

Some basics on the ARM architecture

In terms of technicalities, let’s start with which Registers we will use: I will use _R3_ as the “Accumulator” register. Also other registers that will be used extensively in the new code module will be _SP, FP_ and _LR_ and of course _R0, R1_ and _R2_. Please feel free to look in the on-line documentation to discover more about the ARM registers.

In the following paragraphs I will explain how my implementation of ARM assembly in the Compiler differs from the X86 implementation, and finally, I will show how to install and run the new compiler on the Raspberry Pi.

Simple Arithmetic Instructions

These will generally translate directly form the X86 version, with one fundamental difference. In ARM there are no push or pop instructions. If you have seen them in ARM assembly listings, please note, these are Assembler pseudo-instructions and not ARM architecture instructions. Instead the push looks like this:

str r3, [sp, #-4]!

which means “store register R3 to where the stack pointer is pointing and then decrement the stack pointer by 4“, while the the pop looks like this:

ldr r2, [sp], #4

which I’m sure you can guess what it means.

With this in mind, an addition will look like this:

ldr r2, [sp], #4
adds r3, r3, r2

which says “pop register R2 from the stack, add it to R3, store the result to R3 and set the flags (this is the -s suffix in adds). The s suffix can be used with _add, sub, mul_, but also _orr, and, eor_, not with _sdiv_ though. For this reason we need the below instruction to set the flags after a division:

tst r3, r3

As a side note, earlier ARM standards did not have a division instruction, however, the cortex-a53 standard that I’m using does have sdiv.

All the other binary operations look very much like the above.

Global Variables

Assignment of global variables to the accumulator (R3) and back would generally be the same as in X86 but for one difference. If you remember in X86 all the addresses of global variables (which are labels in the .data section) are relative to %RIP (also known as program counter). We need to do something similar in ARM and there are two ways to do it:

One way would be to use the assembler shorthand

ldr r3, =symbol

that tells the assembler to generate the necessary relocation information for the linker. However, I want to make sure I understand what exactly is happening in my assembly code, so I will follow the example of the GNU C Compiler instead:

.data
.align 2
    v1:    .word 0
    ...
.text
.align 2
    ...
    ldr	r2, v1_addr
    ldr	r3, [r2]
    ...
    v1_addr:	.word v1

As you can see, a variable called _v1_ is declared in the .data section, its address is stored in the .text section under a label called _v1_addr_, so the value of this variable can be accessed by the _ldr_ instruction above.

Integer Constants

Integer constants pose another challenge in the ARM architecture. As you can read in the Arm Developer pages “You cannot load an arbitrary 32-bit immediate constant into a register in a single instruction without performing a data load from memory. This is because ARM and Thumb-2 instructions are only 32 bits long.” The rule is that the constant used in an immediate MOV instruction has to have no more than 8 significant digits and it can also be a result of shift left by an even number of places. Here’s some examples:

mov r2, #255         @valid - 8 significant bits: 1111 1111
mov r2, #257         @invalid - 9 significant bits: 1 0000 0001
mov r2, #1069547520  @valid - 8 significant bits shifted left by 22 places
mov r2, #534773760   @invalid - 8 significant bits shifted left by 21 places

This rule may sound a bit odd but it is dictated by the way the MOV instruction is encoded. There are various different ways to load a 32 bit constant to a register as described in the ARM Developer pages. In this version of the compiler, with simplicity in mind, I will follow a simplified version of the above rule, and will use a direct MOV instruction for any constant between 0 and 255 (8 significant bits). For any negative constant or a constant above 255 I will use a memory location to store its value and then fetch it from there (same as the GNU C Compiler)

mov r3, #0
mov r2, #10

and

.text
    ...
    INTCONST_1: .word 500
    ...
    ldr	r3, INTCONST_1

Similarly to the previous section, you can see here the integer constant stored in memory with a suitable label. It’s then loaded to the register using again the _ldr_ instruction.

Even though this may seem that I deliberately sacrifice efficiency, please note that the constants between 0 and 255 are those that are most frequently used in most programs. Besides I don’t see why it would be more likely to use #1069547520 but less likely to use #534773760 from the examples above. So I will leave the implementation of the full rule regarding 32 bit constants for some later stage.

Comparisons

Comparisons are pretty straight-forward and the ARM code is almost the same as the X86 code. Here’s the _compareEquals()_:

ldr r2, [sp], #4
cmp    r2, r3
mov    r3, #0
moveq  r3, #1
ands   r3, r3, #1

This code pops R2 out of the stack, compares it with R3, sets R3 to 0, sets R3 to 1 if the result of the comparison is “equal” and finally sets the flags. This is needed because, as I’m sure you remember, the result of the comparison must give us a 0 or a 1 (same logic as the boolean variables in C). The two _mov_ instructions are instead of _sete %al_ that we would do in X86.

Function Calls and Return

Here we will see a few differences between the two implementations. To call a function, the following instruction is used:

bl symbol

which means branch to symbol and save the return address in the link register (LR).

Similarly to X86, some of the function parameters (but only the first 4 in ARM) are passed via registers: _R0, R1, R2, R3_ (in this order) while any additional parameters or parameters that are larger in size are passed in the stack. Keeping with the principle of simplicity, and given that all our parameters are 32 bit long, I will limit the number of parameters in a function to 4 in the ARM implementation.

In the function prologue we need to save the frame pointer and the link register and setup a new stack frame:

stmdb  sp!, {fp, lr}     @ save registers
add    fp, sp, #4        @ new stack frame

The stmdb instruction stands for “store multiple registers and decrement (the SP) before”. As you can see above, in the ARM standard, when a new frame is set, the first thing that’s in it at the very bottom of it, is the previous value of the FP. Anything else goes above that. This means that the first stack variable will be at [fp, #-8]. This is the reason the stack offset variable is set to -4 as opposed to 0 in the ARM Code Module:

// the offset from frame pointer for the next local variable (in the stack)
override var stackVarOffset = -4

Same as in the X86 implementation, the parameters are saved in the stack so that these four registers can be used for other purposes or as parameters when calling another function within the first function.

In the function epilogue we will do exactly the opposite, i.e. restore the stack pointer, the frame pointer and the link register, but with a little twist: we will restore that value of the link register that was saved in the beginning (which is the return address), into the program counter; this means that there is no return instruction needed. The execution will continue at the instruction where the link register was pointing, which is the instruction after the bl instruction that resulted in this function call.

sub sp, fp, #4         @ restore stack pointer
ldmia  sp!, {fp, pc}   @ restore registers - lr goes into pc to return to caller

As you can guess, ldmia stands for “load multiple registers and increase (the SP) after”

Stack Variables

Stack variables are accessed in exactly the same way as in the X86 implementation relative to the frame pointer. As mentioned above the first stack variable after the new frame has been setup is at offset -8:

ldr r3, [fp, #-8]

Runtime Library

In the X86 version I have written my own runtime library in assembly, where I implemented my own _read_s_, write_s_, read_i_, write_i_. and also _strlen_, strcpy_, strcat_, streq_, atoi_, itoa_. which were adequate for this version of TINSEL.

In the ARM version though, the focus will be not so much on developing in ARM assembly, but on enhancing TINSEL so that it will support as much Raspberry Pi functionality as possible. For this reason and in order to make this more efficient, I will use the C library and its functions that will be linked with the TINSEL output. That’s why you will see in the Compiler output for ARM calls to _read, printf, atoi, strcat_ and other standard C functions and system calls. This also changes the way the ARM assembly output is compiled on Raspberry Pi (see below section).

The main function

Given that we will be using the C library, the entry point for the main program has to change from start: to _main:_

.text
.align 2
.global main
...
.type main %function
@ main program
main:
    ...

And at the end of the main program we need to return to the caller as the C library actually calls our main function and we must return to it. We need to set the exit code and same as any other function, restore the stack pointer, frame pointer and link register. Here I will actually restore the LR back into the LR (and not the PC) so that the _bx lr_ instruction at the end of the main will send the program back to where LR is pointing, which is the next instruction after the one that called main:

mov    r0, #0        @ exit code 0
sub    sp, fp, #4    @ restore stack pointer
ldmia  sp!, {fp, lr}
bx lr                @ return to caller

Deploying and Running the Compiler on the Raspberry Pi

And this is where we will see the final result: TINSEL actually running on the Raspberry Pi. But let’s go back to the beginning for one moment.

The first question I faced was “how would I develop the ARM implementation of the Compiler?”. One answer would be to load Linux on the Pi, then load Intellij and develop there. This would have worked well, but would most likely need a proper SSD storage for the Pi, otherwise Linux would have been a bit slow. This would increase the costs and the clutter on my desk.

The other option, which in the end I followed, was to continue developing on my Linux laptop and deploy the final product to the Raspberry Pi, where it would be finally tested. The Pi would be running it’s own Raspberry Pi OS, which runs at satisfactory speeds from an SD card, keeping things simple. And this is how I did this:

First, I use a headless Raspberry Pi, which is only connected to power and to my WiFi (no keyboard, no mouse, no screen, no clutter).

I connect to the Raspberry Pi from my Linux laptop using SSH, so I can open a terminal window on the Pi or copy files to it (e.g. the Compiler). There are numerous resources on the Internet about how to do this.

When the Compiler development was completed I produced a .jar file as follows:

jar cvfm tinsel_v3.jar META-INF/MANIFEST.MF -C out/production/CompilerV3-X86-Arm .

The contents of the MANIFEST file is here:

Manifest-Version: 1.0
Main-Class: mpdev.compilerv3.chapter_xa_01.CompilerMainKt
Class-Path: lib/kotlin/kotlin-stdlib-jdk8/1.6.10/kotlin-stdlib-jdk8-1.6.10.jar lib/kotlin/kotlin-stdlib/1.6.10/kotlin-stdlib-1.6.10.jar lib/annotations/13.0/annotations-13.0.jar lib/kotlin/kotlin-stdlib-common/1.6.10/kotlin-stdlib-common-1.6.10.jar lib/kotlin/kotlin-stdlib-jdk7/1.6.10/kotlin-stdlib-jdk7-1.6.10.jar

The Class-Path contains all the Java libraries that Kotlin needs. This list will be different for each implementation and will depend on the version of Kotlin used. This list of libraries is visible in the command line that Intellij executes when the Compiler is run under the IDE.

And finally here are the steps to deploy to the Raspberry Pi:

  1. Decide the directory to run TINSEL from – let’s call it $HOME/tinsel
  2. Create a lib directory under the tinsel directory and copy all the Kotlin libraries identified above from your Linux box to the Raspberry Pi (which are listed in the Manifest file)
  3. Copy the Compiler jar to the tinsel directory on the Raspberry Pi

And that’s it! At this stage you can run TINSEL programs on the Pi by running these two steps:

  • Compile the TINSEL source to ARM assembly by running:
java -jar tinsel_v3.jar prog1.tnsl -o prog1.s -arm
  • Compile the ARM assembly produced by running:
gcc prog1.s -o prog1

Needless to say, you can script these steps to make it easy to run a TINSEL program in one step, as I have done.

And with that you can now write and run a TINSEL program on the Raspberry Pi! Enjoy!

Coming up next: Control the Raspberry Pi GPIO – LED Blink

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: