



## The MIPS Instruction Set Used as the example throughout the book Stanford MIPS commercialized by MIPS Technologies (www.mips.com) Large share of embedded core market Applications in consumer electronics, network/storage equipment, cameras, printers, ... Typical of many modern ISAs See MIPS Reference Data tear-out card, and Appendixes B and E







# Register Operand Example C code: f = (g + h) - (i + j); f, ..., j in \$s0, ..., \$s4 Compiled MIPS code: add \$t0, \$s1, \$s2 add \$t1, \$s3, \$s4 sub \$s0, \$t0, \$t1

```
Memory Operands

Main memory used for composite data
Arrays, structures, dynamic data
To apply arithmetic operations
Load values from memory into registers
Store result from register to memory
Memory is byte addressed
Each address identifies an 8-bit byte
Words are aligned in memory
Address must be a multiple of 4
MIPS is Big Endian
Most-significant byte at least address of a word
c.f. Little Endian: least-significant byte at least address
```

```
Memory Operand Example 1

C code:

g = h + A[8];

g in $$1, h in $$2, base address of A in $$3

Compiled MIPS code:

Index 8 requires offset of 32

4 bytes per word

W $$10, 32($$3)  # load word add $$1, $$2, $$10

Offset

Chapter 2 — Instructions: Language of the Computer — 9
```



## Registers vs. Memory Registers are faster to access than memory Operating on memory data requires loads and stores More instructions to be executed Compiler must use registers for variables as much as possible Only spill to memory for less frequently used variables Register optimization is important!



# The Constant Zero MIPS register 0 (\$zero) is the constant 0 Cannot be overwritten Useful for common operations E.g., move between registers add \$t2, \$s1, \$zero

```
Unsigned Binary Integers

• Given an n-bit number

x = x_{n-1}2^{n-1} + x_{n-2}2^{n-2} + \dots + x_12^1 + x_02^0
• Range: 0 to +2^n - 1
• Example
• 0000 0000 0000 0000 0000 0000 0000 1011<sub>2</sub>
= 0 + ... + 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0
= 0 + ... + 8 + 0 + 2 + 1 = 11<sub>10</sub>

• Using 32 bits
• 0 to +4,294,967,295
```

```
2s-Complement Signed Integers

Bit 31 is sign bit

1 for negative numbers

0 for non-negative numbers

-(-2<sup>n-1</sup>) can't be represented

Non-negative numbers have the same unsigned and 2s-complement representation

Some specific numbers

0: 0000 0000 ... 0000

-1: 1111 1111 ... 1111

Most-negative: 1000 0000 ... 0000

Most-positive: 0111 1111 ... 1111
```

```
Signed Negation

■ Complement and add 1

■ Complement means 1 \rightarrow 0, 0 \rightarrow 1

x + \overline{x} = 1111...111_2 = -1

\overline{x} + 1 = -x

■ Example: negate +2

■ +2 = 0000 0000 ... 0010<sub>2</sub>

■ -2 = 1111 1111 ... 1101<sub>2</sub> + 1

= 1111 1111 ... 1110<sub>2</sub>
```



































## Branch Instruction Design Why not blt, bge, etc? Hardware for <, ≥, ... slower than =, ≠ Combining with branch involves more work per instruction, requiring a slower clock All instructions penalized! beq and bne are the common case This is a good design compromise





## Procedure Call Instructions Procedure call: jump and link jal ProcedureLabel Address of following instruction put in \$ra Jumps to target address Procedure return: jump register jr \$ra Copies \$ra to program counter Can also be used for computed jumps e.g., for case/switch statements







```
Non-Leaf Procedure Example

C code:
    int fact (int n)
    {
        if (n < 1) return f;
        else return n * fact(n - 1);
     }
     Argument n in $a0
     Result in $v0
```











```
String Copy Example

C code (naïve):

Null-terminated string
void strcpy (char x[], char y[])
{ int i;
 i = 0;
 while ((x[i]=y[i])!='\0')
 i += 1;
}

Addresses of x, y in $a0, $a1
 i in $s0
```















# Synchronization Two processors sharing an area of memory P1 writes, then P2 reads Data race if P1 and P2 don't synchronize Result depends of order of accesses Hardware support required Atomic read/write memory operation No other access to the location allowed between the read and write Could be a single instruction E.g., atomic swap of register ↔ memory Or an atomic pair of instructions Chapter 2 – Instructions: Language of the Computer – 57







### **Producing an Object Module**

- Assembler (or compiler) translates program into machine instructions
- Provides information for building a complete program from the pieces
  - Header: described contents of object module
  - Text segment: translated instructions
  - Static data segment: data allocated for the life of the program
  - Relocation info: for contents that depend on absolute location of loaded program
  - Symbol table: global definitions and external refs
  - Debug info: for associating with source code



Chapter 2 — Instructions: Language of the Computer — 61

### **Linking Object Modules**

- Produces an executable image
  - 1. Merges segments
  - 2. Resolve labels (determine their addresses)
  - 3. Patch location-dependent and external refs
- Could leave location dependencies for fixing by a relocating loader
  - But with virtual memory, no need to do this
  - Program can be loaded into absolute location in virtual memory space



Chapter 2 — Instructions: Language of the Computer — 6

### **Loading a Program**

- Load from image file on disk into memory
  - 1. Read header to determine segment sizes
  - 2. Create virtual address space
  - 3. Copy text and initialized data into memory
    - Or set page table entries so they can be faulted in
  - 4. Set up arguments on stack
  - 5. Initialize registers (including \$sp, \$fp, \$gp)
  - 6. Jump to startup routine
    - Copies arguments to \$a0, ... and calls main
    - When main returns, do exit syscall



Chapter 2 — Instructions: Language of the Computer — 63

### **Dynamic Linking**

- Only link/load library procedure when it is called
  - Requires procedure code to be relocatable
  - Avoids image bloat caused by static linking of all (transitively) referenced libraries
  - Automatically picks up new library versions



Chapter 2 — Instructions: Language of the Computer — 64





```
C Sort Example

Illustrates use of assembly instructions for a C bubble sort function

Swap procedure (leaf)

void swap(int v[], int k)

{
   int temp;
   temp = v[k];
   v[k] = v[k+1];
   v[k+1] = temp;
}

v in $a0, k in $a1, temp in $t0
```

```
The Procedure Swap

swap: sl1 $t1, $a1, 2  # $t1 = k * 4
    add $t1, $a0, $t1 # $t1 = v+(k*4)
    # (address of v[k])

lw $t0, 0($t1)  # $t0 (temp) = v[k]
    lw $t2, 4($t1)  # $t2 = v[k+1]
    sw $t2, 0($t1)  # v[k] = $t2 (v[k+1])
    sw $t0, 4($t1)  # v[k+1] = $t0 (temp)
    jr $ra  # return to calling routine

Chapter 2 — Instructions: Language of the Computer — 68
```



















### Compare and Branch in ARM

- Uses condition codes for result of an arithmetic/logical instruction
- Negative, zero, carry, overflow
- Compare instructions to set condition codes without keeping the result
- Each instruction can be conditional
  - Top 4 bits of instruction word: condition value
  - Can avoid branches over single instructions

Chapter 2 — Instructions: Language of the Computer — 79



### The Intel x86 ISA Evolution with backward compatibility 8080 (1974): 8-bit microprocessor Accumulator, plus 3 index-register pairs 8086 (1978): 16-bit extension to 8080 Complex instruction set (CISC) 8087 (1980): floating-point coprocessor Adds FP instructions and register stack 80286 (1982): 24-bit addresses, MMU Segmented memory mapping and protection 80386 (1985): 32-bit extension (now IA-32) Additional addressing modes and operations Paged memory mapping as well as segments

Chapter 2 — Instructions: Language of the Computer — 81

MK











# Implementing IA-32 Complex instruction set makes implementation difficult Hardware translates instructions to simpler microoperations Simple instructions: 1–1 Complex instructions: 1–many Microengine similar to RISC Market share makes this economically viable Comparable performance to RISC Compilers avoid complex instructions Chapter 2 – Instructions: Language of the Computer – 87









### **Concluding Remarks**

- Measure MIPS instruction executions in benchmark programs
  - Consider making the common case fast
  - Consider compromises

| Instruction class | MIPS examples                                           | SPEC2006 Int | SPEC2006 FP |
|-------------------|---------------------------------------------------------|--------------|-------------|
| Arithmetic        | add, sub, addi                                          | 16%          | 48%         |
| Data transfer     | lw, sw, lb, lbu,<br>lh, lhu, sb, lui                    | 35%          | 36%         |
| Logical           | and, or, nor, andi, ori, sll, srl                       | 12%          | 4%          |
| Cond. Branch      | beq, bne, slt,<br>slti, sltiu                           | 34%          | 8%          |
| Jump              | j, jr, jal                                              | 2%           | 0%          |
| M M               | Chapter 2 — Instructions: Language of the Computer — 92 |              |             |