Introduction
In my “Let’s Build a Compiler” blog (which was based on Jack Crenshaw’s original series) I developed an experimental compiler in Kotlin and in this process I developed TINSEL, my own programming language. That version of the compiler would produce assembly code for X86 architecture, which could run on Linux.
In this new series I will evolve the compiler so that it will be able to produce ARM assembly. This way the user will be able to run TINSEL programs on the Raspberry Pi. Please note, the purpose of this series is not to offer a tutorial on the Raspberry Pi. The “Pi” is very popular and its fans can find numerous resources on the Internet that describe anything and everything about it. What this series will try to achieve, is to add an element of fun in building projects on the Raspberry Pi, by enabling its user to program it using his/her own programming language that can be changed, enhanced or developed in any way the reader wishes.
In this first Chapter we will produce the first version of TINSEL running on the Raspberry Pi. It will be an exact translation of the Tinsel from Linux and X86 architecture to ARM architecture and the Raspberry Pi. Various extensions will follow in the upcoming chapters.
First things first
Our Compiler is already structured in three key modules: the Scanner, the Parser and the Code Module. So, if this ḧas been done well, we will have to make changes only in the Code Module, to make it produce Arm assembly output instead of X86. Given that I want to retain the X86 capability as well, I will create an
interface CodeModule
and two classes that will implement this interface
class X86_64Instructions(outFile: String = ""): CodeModule
class Arm_32Instructions(outFile: String = ""): CodeModule
The first one will produce X86 84-bit code, while the second one, which we will cover in this chapter, will produce ARM 32-bit code. To switch between the two modes I will introduce new command line arguments that will set the _cpuArchitecture_
variable as follows (defaults to x86):
when (arg) { ... "-x86" -> cpuArchitecture = CPUArch.x86 "-arm" -> cpuArchitecture = CPUArch.arm }
Based on this, the
module object is initialised _
code_
code = when (cpuArchitecture) {
CPUArch.x86 -> X86_64Instructions(outFile)
CPUArch.arm -> Arm_32Instructions(outFile)
}
As it turned out, the separation of the functions was quite good in the Compiler. The only thing that needed changing outside the code module was the declaration of the size of our variables, which until now was a global variable:
val INT_SIZE = 8 // 64-bit integers
Given that now we will have X86 in 64 bits and ARM in 32 bits, this variable will be moved into the code module and will be set to the right size in each of the two code classes respectively:
val WORD_SIZE = 4 // 32-bit architecture override val INT_SIZE = WORD_SIZE
or
val WORD_SIZE = 8 // 64-bit architecture override val INT_SIZE = WORD_SIZE
And with that preliminary work done we are ready to start writing ARM assembly.
Some basics on the ARM architecture
Describing the ARM architecture is well beyond the scope of this series. But to keep with the basics, if you look up “ARM architecture” on the Internet, you will find out that it stands for Advanced RISC Machines (where RISC = Reduced Instruction Set Computer, as I’m sure you know). And it is this R that is the main difference between X86 and ARM. X86 is a CISC (Complex Instruction Set Computer) architecture that can execute a complex operation with a single instruction (e.g. a division, or moving a block of bytes from one memory location to another). This power comes at a cost though: CISC processors have larger and more expensive chips, with more transistors that need more power and generate more heat, in order to process these complex instructions. ARM on the other hand supports a smaller set of instructions (those that are most commonly used) that results in smaller and cheaper chips that need less power and generate less heat. This is the reason why X86 is dominating the PCs and Servers sector while ARM is the preferred choice for small electronics, mobile phones and of course the Raspberry Pi! Based on this, I’m sure you can guess that the resulting assembly programs that our TINSEL compiler will produce for ARM will be a bit longer in number of lines than the X86 output. I will stop my little introduction here. For those who are interested in more, I have found this very informative page. And of course a plethora of technical documentation on ARM can be found here.
In terms of technicalities, let’s start with which Registers we will use: I will use
as the “Accumulator” register. Also other registers that will be used extensively in the new code module will be _
R3_
and _
SP, FP_
and of course _
LR_
and _
R0, R1_
. Please feel free to look in the on-line documentation to discover more about the ARM registers._
R2_
In the following paragraphs I will explain how my implementation of ARM assembly in the Compiler differs from the X86 implementation, and finally, I will show how to install and run the new compiler on the Raspberry Pi.
Simple Arithmetic Instructions
These will generally translate directly form the X86 version, with one fundamental difference. In ARM there are no push or pop instructions. If you have seen them in ARM assembly listings, please note, these are Assembler pseudo-instructions and not ARM architecture instructions. Instead the push looks like this:
str r3, [sp, #-4]!
which means “store register R3
to where the stack pointer is pointing and then decrement the stack pointer by 4“, while the the pop looks like this:
ldr r2, [sp], #4
which I’m sure you can guess what it means.
With this in mind, an addition will look like this:
ldr r2, [sp], #4
adds r3, r3, r2
which says “pop register R2
from the stack, add it to R3
, store the result to R3
and set the flags (this is the -s suffix in adds). The s suffix can be used with
, but also _
add, sub, mul_
, not with _
orr, and, eor_
though. For this reason we need the below instruction to set the flags after a division:_
sdiv_
tst r3, r3
As a side note, earlier ARM standards did not have a division instruction, however, the cortex-a53 standard that I’m using does have sdiv.
All the other binary operations look very much like the above.
Global Variables
Assignment of global variables to the accumulator (R3
) and back would generally be the same as in X86 but for one difference. If you remember in X86 all the addresses of global variables (which are labels in the .data section) are relative to %RIP
(also known as program counter). We need to do something similar in ARM and there are two ways to do it:
One way would be to use the assembler shorthand
ldr r3, =symbol
that tells the assembler to generate the necessary relocation information for the linker. However, I want to make sure I understand what exactly is happening in my assembly code, so I will follow the example of the GNU C Compiler instead:
.data .align 2 v1: .word 0 ... .text .align 2 ... ldr r2, v1_addr ldr r3, [r2] ... v1_addr: .word v1
As you can see, a variable called
is declared in the
v1_
_
.data
section, its address is stored in the .text
section under a label called
, so the value of this variable can be accessed by the _
v1_addr_
instruction above._
ldr_
Integer Constants
Integer constants pose another challenge in the ARM architecture. As you can read in the Arm Developer pages “You cannot load an arbitrary 32-bit immediate constant into a register in a single instruction without performing a data load from memory. This is because ARM and Thumb-2 instructions are only 32 bits long.” The rule is that the constant used in an immediate MOV
instruction has to have no more than 8 significant digits and it can also be a result of shift left by an even number of places. Here’s some examples:
mov r2, #255 @valid - 8 significant bits: 1111 1111 mov r2, #257 @invalid - 9 significant bits: 1 0000 0001 mov r2, #1069547520 @valid - 8 significant bits shifted left by 22 places mov r2, #534773760 @invalid - 8 significant bits shifted left by 21 places
This rule may sound a bit odd but it is dictated by the way the MOV
instruction is encoded. There are various different ways to load a 32 bit constant to a register as described in the ARM Developer pages. In this version of the compiler, with simplicity in mind, I will follow a simplified version of the above rule, and will use a direct MOV
instruction for any constant between 0
and 255
(8 significant bits). For any negative constant or a constant above 255
I will use a memory location to store its value and then fetch it from there (same as the GNU C Compiler)
mov r3, #0 mov r2, #10
and
.text ... INTCONST_1: .word 500 ... ldr r3, INTCONST_1
Similarly to the previous section, you can see here the integer constant stored in memory with a suitable label. It’s then loaded to the register using again the
instruction. _
ldr_
Even though this may seem that I deliberately sacrifice efficiency, please note that the constants between 0
and 255
are those that are most frequently used in most programs. Besides I don’t see why it would be more likely to use #1069547520
but less likely to use #534773760
from the examples above. So I will leave the implementation of the full rule regarding 32 bit constants for some later stage.
Comparisons
Comparisons are pretty straight-forward and the ARM code is almost the same as the X86 code. Here’s the
:
compareEquals()_
_
ldr r2, [sp], #4 cmp r2, r3 mov r3, #0 moveq r3, #1 ands r3, r3, #1
This code pops R2
out of the stack, compares it with R3
, sets R3
to 0
, sets R3
to 1
if the result of the comparison is “equal” and finally sets the flags. This is needed because, as I’m sure you remember, the result of the comparison must give us a 0
or a 1
(same logic as the boolean variables in C). The two
instructions are instead of
mov_
_
that we would do in X86.
sete %al_
_
Function Calls and Return
Here we will see a few differences between the two implementations. To call a function, the following instruction is used:
bl symbol
which means branch to symbol and save the return address in the link register (LR
).
Similarly to X86, some of the function parameters (but only the first 4 in ARM) are passed via registers:
(in this order) while any additional parameters or parameters that are larger in size are passed in the stack. Keeping with the principle of simplicity, and given that all our parameters are 32 bit long, I will limit the number of parameters in a function to 4 in the ARM implementation.
R0, R1, R2, R3_
_
In the function prologue we need to save the frame pointer and the link register and setup a new stack frame:
stmdb sp!, {fp, lr} @ save registers add fp, sp, #4 @ new stack frame
The stmdb
instruction stands for “store multiple registers and decrement (the SP
) before”. As you can see above, in the ARM standard, when a new frame is set, the first thing that’s in it at the very bottom of it, is the previous value of the FP
. Anything else goes above that. This means that the first stack variable will be at [fp, #-8]
. This is the reason the stack offset variable is set to -4
as opposed to 0
in the ARM Code Module:
// the offset from frame pointer for the next local variable (in the stack) override var stackVarOffset = -4
Same as in the X86 implementation, the parameters are saved in the stack so that these four registers can be used for other purposes or as parameters when calling another function within the first function.
In the function epilogue we will do exactly the opposite, i.e. restore the stack pointer, the frame pointer and the link register, but with a little twist: we will restore that value of the link register that was saved in the beginning (which is the return address), into the program counter; this means that there is no return instruction needed. The execution will continue at the instruction where the link register was pointing, which is the instruction after the bl
instruction that resulted in this function call.
sub sp, fp, #4 @ restore stack pointer ldmia sp!, {fp, pc} @ restore registers - lr goes into pc to return to caller
As you can guess, ldmia
stands for “load multiple registers and increase (the SP
) after”
Stack Variables
Stack variables are accessed in exactly the same way as in the X86 implementation relative to the frame pointer. As mentioned above the first stack variable after the new frame has been setup is at offset -8
:
ldr r3, [fp, #-8]
Runtime Library
In the X86 version I have written my own runtime library in assembly, where I implemented my own
and also
read_s_, write_s_, read_i_, write_i_._
which were adequate for this version of TINSEL.
strlen_, strcpy_, strcat_, streq_, atoi_, itoa__
.
In the ARM version though, the focus will be not so much on developing in ARM assembly, but on enhancing TINSEL so that it will support as much Raspberry Pi functionality as possible. For this reason and in order to make this more efficient, I will use the C library and its functions that will be linked with the TINSEL output. That’s why you will see in the Compiler output for ARM calls to
and other standard C functions and system calls. This also changes the way the ARM assembly output is compiled on Raspberry Pi (see below section).
read, printf, atoi, strcat_
_
The main function
Given that we will be using the C library, the entry point for the main program has to change from start:
to
main:_
_
.text .align 2 .global main ... .type main %function @ main program main: ...
And at the end of the main program we need to return to the caller as the C library actually calls our main function and we must return to it. We need to set the exit code and same as any other function, restore the stack pointer, frame pointer and link register. Here I will actually restore the LR
back into the LR
(and not the PC
) so that the
instruction at the end of the main will send the program back to where
bx lr_
_
LR
is pointing, which is the next instruction after the one that called main
:
mov r0, #0 @ exit code 0 sub sp, fp, #4 @ restore stack pointer ldmia sp!, {fp, lr} bx lr @ return to caller
Deploying and Running the Compiler on the Raspberry Pi
And this is where we will see the final result: TINSEL actually running on the Raspberry Pi. But let’s go back to the beginning for one moment.
The first question I faced was “how would I develop the ARM implementation of the Compiler?”. One answer would be to load Linux on the Pi, then load Intellij and develop there. This would have worked well, but would most likely need a proper SSD storage for the Pi, otherwise Linux would have been a bit slow. This would increase the costs and the clutter on my desk.
The other option, which in the end I followed, was to continue developing on my Linux laptop and deploy the final product to the Raspberry Pi, where it would be finally tested. The Pi would be running it’s own Raspberry Pi OS, which runs at satisfactory speeds from an SD card, keeping things simple. And this is how I did this:
First, I use a headless Raspberry Pi, which is only connected to power and to my WiFi (no keyboard, no mouse, no screen, no clutter).
I connect to the Raspberry Pi from my Linux laptop using SSH, so I can open a terminal window on the Pi or copy files to it (e.g. the Compiler). There are numerous resources on the Internet about how to do this.
When the Compiler development was completed I produced a .jar
file as follows:
jar cvfm tinsel_v3.jar META-INF/MANIFEST.MF -C out/production/CompilerV3-X86-Arm .
The contents of the MANIFEST
file is here:
Manifest-Version: 1.0
Main-Class: mpdev.compilerv3.chapter_xa_01.CompilerMainKt
Class-Path: lib/kotlin/kotlin-stdlib-jdk8/1.6.10/kotlin-stdlib-jdk8-1.6.10.jar lib/kotlin/kotlin-stdlib/1.6.10/kotlin-stdlib-1.6.10.jar lib/annotations/13.0/annotations-13.0.jar lib/kotlin/kotlin-stdlib-common/1.6.10/kotlin-stdlib-common-1.6.10.jar lib/kotlin/kotlin-stdlib-jdk7/1.6.10/kotlin-stdlib-jdk7-1.6.10.jar
The Class-Path contains all the Java libraries that Kotlin needs. This list will be different for each implementation and will depend on the version of Kotlin used. This list of libraries is visible in the command line that Intellij executes when the Compiler is run under the IDE.
And finally here are the steps to deploy to the Raspberry Pi:
- Decide the directory to run TINSEL from – let’s call it
$HOME/tinsel
- Create a
lib
directory under the tinsel directory and copy all the Kotlin libraries identified above from your Linux box to the Raspberry Pi (which are listed in the Manifest file) - Copy the Compiler jar to the tinsel directory on the Raspberry Pi
And that’s it! At this stage you can run TINSEL programs on the Pi by running these two steps:
- Compile the TINSEL source to ARM assembly by running:
java -jar tinsel_v3.jar prog1.tnsl -o prog1.s -arm
- Compile the ARM assembly produced by running:
gcc prog1.s -o prog1
Needless to say, you can script these steps to make it easy to run a TINSEL program in one step, as I have done.
And with that you can now write and run a TINSEL program on the Raspberry Pi! Enjoy!
As usual, all the sources are in my GitHub repository.
Coming up next: Control the Raspberry Pi GPIO – LED Blink
Leave a Reply