Introduction

A sample program in MIPS assembly language is shown below

# Title:
# Author:
# Date:
# Description:
# Input parameters:
# Return data:
#################### data segment #########################
.data
. . .
#################### code segment #########################
.text
.globl main
main: # start of main function
. . .
li $v0, 10
syscall # system call to exit

There are three types of instructions that can be used in assembly language, where each instruction appears on a separate line:

  • Assembler Directives: These provide information to the Assembler tool for translating a program into machine code. Directives are used to define segments and allocate space in memory for global program variables. An assembler directive always begins with a period (.). A typical MIPS assembly language program uses the following directives:
    • .data: defines the data segment of the program, containing the global program variables.
    • .text: defines the code segment of the program, containing the instructions to be executed.
    • .globl: defines a symbol (label) as global that can be referenced from other files.
  • Executable Instructions: These generate machine code that will be executed by the processor. The instructions tell the processor what to do.

  • Pseudo Instructions and Macros: Translated by the Assembler tool into actual instructions. These pseudo-instructions simplify the coding task.

Additionally, comments can be inserted into the code. Comments are very important to programmers, but ignored by the Assembler tool. In MIPS, a comment begins with the symbol # and ends at the end of the line. Comments can appear at the beginning of a line or after an instruction. They explain the purpose of the program, when it was written, revised, and by whom. They explain the data and registers used in the program, the inputs, outputs, the sequence of instructions, and the algorithms implemented.

The Edit-Assemble-Link-Execute Cycle

Before we can run a MIPS program, we need to convert the source text written in assembly language into a form that can be executed by the processor. This is done in two steps:

  1. Assembling: Translates the assembly language text into a binary object file. This is done by the Assembler tool. If there is more than one assembly language source file, then each of these files will be assembled separately. The Assembler tool detects all syntax errors and will report them to the programmer. The latter will have to amend his program accordingly and assemble it again.

  2. Linking: Combines all the object files together (if there is more than one) and possibly with function libraries. This task is done by the linker tool. The linker checks and links the function calls in the object files with libraries or other object files. The result of this step is an executable file.

It is typical that the first executable version of your program would have some runtime errors. These errors are not detectable by the Assembler tool, but occur when you run your program (e.g. your program may produce wrong results). Therefore, you should debug your program to identify errors at runtime. In this sense, you should run your program with different input values ​​and different operating situations to make sure it runs correctly. In essence, writing a program, whether in assembly or any other language, will require going through a cycle of edit-assemble-link-execute until all errors are ironed out and the correct results are obtained. The figure below summarizes the Edit-Assemble-Link-Execute cycle.

EALE

In MARS, you can use the “slow execution” mode, the “single-step” feature, or insert “breakpoints” into your program to debug errors. “Single-step” execution mode is a standard and essential feature in any debugger. It allows you to inspect the effect of each instruction on the processor registers and main memory.

MIPS Registers and Instructions

The MIPS architecture defines 32 general purpose registers, numbered from $0 to $31. The symbol $ is used to denote a register. To simplify assembly programming, we can also refer to the registers by their names as shown below. The Assembler tool will then convert the register names to their corresponding numbers.

Number Name Use
$0 $0 always equal to 0
$1 $at is used by the assembler tool for pseudo instruction expansion
$2-$3 $v0-$v1 hold the values ​​returned by the called functions
$4-$7 $a0-$a3 used to pass arguments to functions
$8-$15 , $24-$25 $t0-$t9 for handling temporary data
$16-$23 $s0-$s7 store local data to functions
$26-$27 $k0-$k1 kernel registers – used by exception/interrupt routines
$28 $gp pointer to global data of the program
$29 $sp stack pointer
$30 $fp frame pointer
$31 $ra stores the return address (i.e. where to return to in the calling function)

The general syntax of an instruction in MIPS is:

[label:] mnemonic  [operands]  [# comment]
  • label is optional. It marks the address in memory of the instruction and must be followed by a colon (:). In addition, a label can be used to refer to the address of a variable in memory.

  • The mnemonic indicates the operation to be performed: add, sub, sll, etc.

  • operands specify the data required by the instruction. Different instructions have different numbers of operands. Operands can be registers, labels, or constants. Most arithmetic and logic instructions require three operands.

Here is an example of a MIPS instruction

 L1: addiu $t0, $t0, 1	# increment $t0

A program written in assembler consists of a set of instructions of this type. Depending on the operation performed, the set of MIPS instructions can be grouped into five categories::

  • arithmetic instructions,
  • logic instructions,
  • data transfer instructions,
  • branching instructions, and
  • system call instructions

N.B: The instruction categories below are described in the RTL format. This is an intermediate representation close to assembly language, but which allows a general representation of instructions. Here, rd, rs and rt mean one of the 32 MIPS registers. For example, if rd = $a0, rs = $s0 and rt = $0 then the first instruction in the list below becomes in MIPS assembler:

 add   $a0, $s0, $0	   # $a0 = $s0 + $0

Arithmetic instructions

 add    rd, rs, rt     # rd = rs + rt                 (addition)
 sub    rd, rs, rt     # rd = rs – rt                 (substruction) 
 addu   rd, rs, rt     # rd = rs + rt                 (unsigned addition)  
 subu   rd, rs, rt     # rd = rs - rt                 (unsigned substruction)  
 addi   rt, rs, Imm    # rt = rs + Imm                (Imm is a 16-bit signed constant)
 addiu  rt, rs, Imm    # rt = rs + Imm                (Imm is a 16-bit unsigned constant)

 mult   rs, rt         # [HI,LO] = rs * rt            (multiplication) 
 multu  rs, rt         # [HI,LO] = rs * rt            (unsigned multiplication)
 
 div    rs, rt         # HI = rs % rt; LO = rs / rt   (division)
 divu   rs, rt         # HI = rs % rt; LO = rs / rt   (unsigned division)  

Logic instructions

 or    rd, rs, rt      # rd = rs | rt                 (logical OR)
 and   rd, rs, rt      # rd = rs & rt                 (logical AND)
 xor   rd, rs, rt      # rd = rs ^ rt                 (logical eXclusive OR) 
 nor   rd, rs, rt      # rd = ~(rs | rt)              (logical NOT OR)
 ori   rt, rs, Imm     # rt = rs | Imm                (Imm is a 16-bit unsigned constant)
 andi  rt, rs, Imm     # rt = rs & Imm                (Imm is a 16-bit unsigned constant)
 xori  rt, rs, Imm     # rt = rs ^ Imm                (Imm is a 16-bit unsigned constant) 

 sllv  rd, rt, rs      # rd = rt << rs                (logical shift left with rs[4..0])
 srlv  rd, rt, rs      # rd = rt >>> rs               (logical shift right with rs[4..0]) 
 srav  rd, rt, rs      # rd = rt >> rs                (arithmetic shift right with rs[4..0]) 

 sll   rd, rt, Sha     # rd = rt << Sha               (logical shift left with 0 <= Sha <=31)
 srl   rd, rt, Sha     # rd = rt >>> Sha              (logical shift right with 0 <= Sha <=31)
 sra   rd, rt, Sha     # rd = rt >> Sha               (Arithmetic shift right with 0 <= Sha <= 31)     

 slt   rd, rs, rt      # rd = rs < rt ? 1 : 0         (if rs < rt then rd = 1 else rd = 0)
 sltu  rd, rs, rt      # rd = rs < rt ? 1 : 0         (rs and rt are unsigned operands)
 slti  rt, rs, Imm     # rt = rs < Imm ? 1 : 0        (similar to slt with Imm a 16-bit signed constant)
 sltiu rt, rs, Imm     # rt = rs < Imm ? 1 : 0        (similar to sltu with Imm a 16-bit unsigned constant)

Data transfer instructions

 mfhi  rd              # rd = HI                      (move HI register to rd register)
 mflo  rd              # rd = LO                      (move LO register to rd register)
 mthi  rs              # HI = rs                      (move rs register to HI register)
 mtlo  rs              # LO = rs                      (move rs register to LO register)
 lui   rt, Imm         # rt = Imm << 16               (rt[31..16] = Imm, rt[15..0] = 0)

 lw    rt, Imm(rs)     # rt = MEM[rs + Imm]           (load a 'word' from memory)
 sw    rt, Imm(rs)     # MEM[rs + Imm] = rt           (store a 'word' into memory)
 lh    rt, Imm(rs)     # rt = MEM[rs + Imm]           (load a 'half-word' from memory - i.e. do sign extension)
 lhu   rt, Imm(rs)     # rt = MEM[rs + Imm]           (load an unsigned 'half-word' from memory - i.e. do zero extension)
 sh    rt, Imm(rs)     # MEM[rs + Imm] = rt           (store a'half-word' into memory)
 lb    rt, Imm(rs)     # rt = MEM[rs + Imm]           (load a byte from memory - i.e. do sign extension)
 lbu   rt, Imm(rs)     # rt = MEM[rs + Imm]           (load an unsigned byte from memory - i.e. do zero extension)
 sb    rt, Imm(rs)     # MEM[rs + Imm] = rt           (store a byte into memory)

Branching instructions

 beq   rs, rt, label   # if (rs == rt) goto label     (take the branch if equal)
 bne   rs, rt, label   # if (rs != rt) goto label     (take the branch if not equal)
 bgez  rs, label       # if (rs >= 0) goto label      (take the branch if greater than or equal to zero)
 bgtz  rs, label       # if (rs >  0) goto label      (take the branch if greater than zero)
 blez  rs, label       # if (rs <= 0) goto label      (take the branch if less than or equal to zero)
 bltz  rs, label       # if (rs <  0) goto label      (take the branch if less than zero)
 
 j     label           # goto label                   (jump to address)
 jal   label           # $ra = $pc + 4 ; goto label   (jump to address and link in $ra the returning position)
 jr    rs              # $pc = rs                     (jump to address in register rs)
 jalr  rs              # $ra = $pc + 4 ; $pc = rs     (jump to address in register rs and link in $ra the returning position)   
 jalr  rd, rs          # rd  = $pc + 4 ; $pc = rs     (jump to address in register rs and link in rd the returning position)

System calls instructions

A program performs inputs and outputs using specific instructions that generate system calls. On a real machine, these calls are managed by the operating system (Windows, macOS, UN*X, etc.). In the MIPS architecture, it is the special instruction (syscall) that allows generating “system calls”.

Before calling the syscall instruction, the $v0 register is initialized with the required service number (i.e., the functionality requested from the operating system). In addition, the $a0$a3 registers are used to pass any additional arguments to the routine. After issuing the syscall instruction, any value returned by the operating system is retrieved from the $v0 register.

System call handling routines are specific to the operating system. On MARS, which is a simulator and not a real system, there is no operating system involved. Hence, it is the MARS simulator that provides system services to programs by handling the initiated system calls. The following table shows a small set of services provided by MARS for basic input/output operations.

Service $v0 Argument(s) Return
Print an integer 1 $a0 = integer to print
Print a string of characters 4 $a0 = address of the string of characters
Read in integer from the terminal 5 $v0 = returned integer
Read a string of characters from the terminal 8 $a0 = memory address where to store the string
$a1 = maximum number of characters to read from the terminal
Terminate program execution 10
Print a character 11 $a0 = character to print
Read a character from the terminal 12 $v0 = returned character

The code example below shows a simple program that prompts the user for a value (an integer) and then displays that value on the MARS console (our simulated screen). Five system calls are used. The first system call displays the string str1, the second system call reads an integer as input. The third call displays the string str2, the fourth system call displays the value that was entered by the user, and the fifth system call exits the program.

.data
str1:     .asciiz   "Enter a value: "
str2:     .asciiz   "You entered the value: "


.globl main
.text
main:
   addi $v0, $0, 4      # $v0 = service number for 'print a string of characters'
   la   $a0, str1       # $a0 = memory address of the string to print (here str1)
   syscall              # a system call for printing the string str1

   addi $v0, $0, 5      # $v0 = service number for 'read an integer'
   syscall              # a system call for reading an integer from the console. 
                        # The value is returned in $v0

   add  $s0, $v0, $0    # $s0 = $v0  ($s0 = value entered by the user)
   addi $v0, $0, 4      # $v0 = service number for 'print a string of characters'
   la   $a0, str2       # $a0 = memory address of the string to print (here str2) 
   syscall              # a system call for printing the string str2

   addi $v0, $0, 1      # $v0 = service number for 'print an integer'
   add  $a0, $s0, $0    # $a0 = $s0 ($a0 = value to print)
   syscall              # a system call for printing the integer entered previously.

   addi $v0, $0, 10     # $v0 = service number for 'quit execution'
   syscall              # a system call for terminating the execution of the program    

Assembler Pseudo-instructions

Pseudo-instructions are “macros” (group of basic instructions) recognised by the Assembler tool and behave as if they were real instructions. Pseudo-instructions are useful because they make programming in assembly language easier.

We saw above an example of pseudo-instructions (i.e. bgezbltz). Another set of instructions that the MIPS ISA does not implement are the following conditional instructions:

 blt   rs, rt, label   # if (rs < rt) goto label        (take the branch if less than)
 bltu  rs, rt, label   # if (rs < rt) goto label        (take the branch if less than - unsigned comparison) 
 ble   rs, rt, label   # if (rs <= rt) goto label       (take the branch if less than or equal) 
 bleu  rs, rt, label   # if (rs <= rt) goto label       (take the branch if less than or equal - unsigned comparison) 
 bgt   rs, rt, label   # if (rs > rt) goto label        (take the branch if greater than)
 bgtu  rs, rt, label   # if (rs > rt) goto label        (take the branch if greater than - unsigned comparison) 
 bge   rs, rt, label   # if (rs >= rt) goto label       (take the branch if greater than or equal) 
 bgeu  rs, rt, label   # if (rs >= rt) goto label       (take the branch if greater than or equal - unsigned comparison) 

The aforementioned instructions are not hardwired in the MIPS ISA because they can be easily implemented using a reduced set of actual instructions. For example, the pseudo-instruction blt $s0, $s1, etiq can be replaced by the following sequence of real instructions:

 slt  $at, $a0, $s1
 bne  $at, $0, etiq

Similarly, the pseudo-instruction ble $s2, $s3, etiq will be converted by the Assembler tool into this sequence of real instructions:

 slt  $at, $s3, $s2
 beq  $at, $0, etiq

Note the use of register $at as a temporary register when converting pseudo-instructions to real instructions. This register is reserved by the Assembler tool for this purpose. The table below shows other examples of MIPS pseudo-instructions.

Pseudo-instruction Actual instruction(s) Operation
move $s1, $s2 addu $s1, $2, $0 $s1 = $s2
not $s1, $s2 nor $s1, $s2, $0 $s1 = not($s2)
li $s1, 0xabcd ori $s1, $0, 0xabcd $s1 = 0x0000abcd
li $s1, 0xabcd1234 lui $at, $x0abcd
ori $s1, $at, 0x1234
$s1 = 0xabcd1234
sgt $s1, $s2, $s3 slt $s1, $s3, $s2 $s1 = $s2 > $s3 ? 1 : 0
blt $s1, $s2, label slt $at, $s1, $s2
bne $at, $0, label
if $s1 < $s2 goto label

Translation of control structures in high level languages ​​to MIPS

One can translate any high-level control structure to assembler using branching instructions. Consider the following if test in the C language:

  if ( a == b )
     c = d + e;
  else
     c = d - e;

If we assume the C compiler has associated the registers $s0$s4 with the variables a,b,c,d and e. The above test can be performed in MIPS assembly with the following instructions:

  bne  $s0, $s1, ELSE
  add  $s2, $s3, $s4 
  j EXIT
ELSE:	
  sub  $s2, $s3, $s4 
EXIT: 
  . . .

We can also implement a compound condition involving the logical operator &&:

  if ( (b > 0) && (c < 0) )
     d++;

In assembler, the if statement above is implemented using the fall through algorithmic concept. The idea here is to ‘skip’ the execution of the instruction following the test if the condition is not true:

  blez   $s1, next    # branch to next if b <= 0
  bgez   $s2, next    # branch to next if c >= 0
  addiu  $s3, $s3, 1  # d++ ; both conditions are true
next: 
  . . .

Similarly, we can also translate an if test with a compound condition involving the logical operator ||. For example :

 if ( (b > c) || (c > d) ) 
   e = 1;

The assembly implementation below checks the first condition of the test if it is true and therefore skips checking the second condition:

  bgt    $s1, $s2, L1    # branch to L1 if b > c (i.e. go into the block of the if statement)
  ble    $s2, $s3, next  # branch to next if c <= d. Implicitly one has b <=c since the above instruction  did not 
                         #                           branch to L1. Therefore, if this condition (i.e. c > d) is not 
                         #                           true as well, than skip the execution of the if-statement.  
L1:  
  addi   $s4, $0, 1      # e = 1
next: 
  . . .

A step-by-step from C to MIPS

In order to better understand the ideas and rules discussed above, we will walk through the translation of a C text code to a MIPS program. The example below in C language prints the nth Fibonacci number.

#include <stdio.h>

int n = 9;

// Function to find the nth Fibonacci number
int main(void) {
    int curr_fib = 0, next_fib = 1;
    int new_fib;
    for (int i = n; i > 0; i--) {
        new_fib = curr_fib + next_fib;
        curr_fib = next_fib;
        next_fib = new_fib;
    }
    printf("%d\n", curr_fib);
    return 0;
}

We are going to translate this program step by step. First, we need to rewrite the C source code into an “ unstructured ” program.

#include <stdio.h>

int n = 9;

// Function to find the nth Fibonacci number
int main(void) {
    int curr_fib = 0, next_fib = 1;
    int new_fib;
    int i = n;

WHILE:
    if (i == 0) goto ELIHW;
    new_fib  = curr_fib + next_fib;
    curr_fib = next_fib;
    next_fib = new_fib;
    i--;
    goto WHILE;

ELIHW:
    printf("%d\n", curr_fib);
    return 0;
}

Next, we need to define the global variable n. In MIPS, global variables are declared under the .data directive which represents the data segment. the declaration under MIPS looks like this:

.data
n: .word 9
  • The label n is chosen to mimic the variable name in C. We could choose any other name for the label as long as we keep track of the association between the variable name in C and the defined label in MIPS.
  • .word allocates a 32-bit space pointed to by the label we just defined (i.e. n),
  • 9 is the value stored in the allocated space (this is the value assigned to the variable n in C).

Now, we need to declare and initialise curr_fib and next_fib. Since these are local variables in our C code, we will simply associate some registers to them. By convention, registers $s0$s7 are used for local variables in MIPS.

.text
main:
    add  $s0, $0, $0    # curr_fib = 0
    addi $s1, $0,  1    # next_fib = 1
  • We added the .text directive here. Any instruction under this directive belongs to the executable code.
  • Recall that register $0 is immutable. It always contains the value 0.
  • The local variable new_fib is not declared (we try not to declare variables in assembly when we can associate registers to them. Can you tell why?).

Let’s move on to the loop and start by initialising the iteration variable. The following code sets i to the value of n.

    la $t0, n           # load in $t0 the address n (remember, n is a label, hence an address!)
    lw $t0, 0($t0)      # load in $t0 the value stored at the address defined by the label n. $t0 is now our variable i.

The la instruction loads the address associated with the label (here, n) into a register (here, $t0). The first line basically defines $t0 to be a pointer to n. the second line uses lw to dereference $t0 and sets this register to the value stored in n.

In C language, the MIPS instructions above look like this

 t0 = &n;    // you may get errors and warnings when compiling this in C, but
 t0 = *t0;   // can be fixed with the right type casting and some parenthesis

You’re probably thinking: “Why can’t we directly set $t0 to n?” In the .text section of the code, there is no way to directly access n which is in the .data section (We cannot write add $t0, n, $0 because the arguments of the add instruction must be registers and n IS NOT a register). The only way to access n is to get its address with the la instruction. Once the address of n is known, we can get its content (with lw since n is a 32-bit value). Recall that lw assigns to register rt ($t0, in our case) the data content stored at memory address Imm + rs (i.e. rt = MEM[Imm + rs]).

Let’s move on to the loop and begin by implementing the code for the iteration:

WHILE:
    beq $t0, $0, ELIHW  # exit the loop once we have completed n iterations
    ...
    ...
    addi $t0, $t0, -1   # decrement the loop counter
    j WHILE             # loop again (jump to WHILE label)
ELIHW:
  • The first line (WHILE:) is a label that marks the start of the loop.

  • The following line (beq $t0, $0, ELIHW) implements the exit condition. Here, the code branches to label ELIHW once $t0 reaches the value 0 (remember that we set $t0 to represent the variable i) .

  • At the end of the loop body, the line (addi $t0, $t0, -1) decrements the variable i.

  • The next statement (j WHILE) returns to the beginning of the loop.

Now, to the body of the loop…

WHILE:
    beq $t0, $0, ELIHW  # exit the loop once we have completed n iterations
     
    add $t2, $s0, $s1   # new_fib  = curr_fib + next_fib;
    add $s0, $s1, $0    # curr_fib = next_fib;
    add $s1, $t2, $0    # next_fib = new_fib;
     
    addi $t0, $t0, -1   # decrement the loop counter
    j WHILE             # loop again (jump to WHILE label)
ELIHW:

Nothing special here, the corresponding C lines are indicated in the comments.

Let’s move on to displaying the nth Fibonacci number! Recall that displaying data on the console (i.e. printf("..."); and the like) are system calls managed by the operating system! Therefore, we will use the special “syscall” instruction. And since we want to print an integer (i.e. the value of curr_fib stored in $s0), we need to set $v0 to 1 and copy the integer value into $a0.

ELIHW:
    addi $v0,  $0, 1    # $v0 = service number for 'print an integer' 
    addi $a0, $s0, 0    # $a0 = the value to print (i.e. curr_fib) 
    syscall             # system call for printing curr_fib

Finally, let’s terminate and exit our program! This also requires a syscall!

    addi $v0, $0, 10    # service number for 'exit'
    syscall             # system call to terminate the execution of the program

And voila! Here is our complete program!

.data
n: .word 9

.text
main:
    add  $s0, $0, $0    # curr_fib = 0
    addi $s1, $0,  1    # next_fib = 1

    la $t0, n           # load in $t0 the address n (remember, n is a label, hence an address!)
    lw $t0, 0($t0)      # load in $t0 the value stored at the address defined by the label n. $t0 is now our variable i.

WHILE:
    beq $t0, $0, ELIHW  # exit the loop once we have completed n iterations
     
    add $t2, $s0, $s1   # new_fib  = curr_fib + next_fib;
    add $s0, $s1, $0    # curr_fib = next_fib;
    add $s1, $t2, $0    # next_fib = new_fib;
     
    addi $t0, $t0, -1   # decrement the loop counter
    j WHILE             # loop again (jump to WHILE label)

ELIHW:
    addi $v0,  $0, 1    # $v0 = service number for 'print an integer' 
    addi $a0, $s0, 0    # $a0 = the value to print (i.e. curr_fib) 
    syscall             # system call for printing curr_fib

    addi $v0, $0, 10    # service number for 'exit'
    syscall             # system call to terminate the execution of the program