## **EEEN301 Embedded systems**

**Lecture 5 2023** 

**Computer organization** 

The Processor

### Introduction

- CPU performance factors
  - Instruction count
    - Determined by ISA and compiler
  - CPI and Cycle time
    - Determined by CPU hardware
- We will examine a simplified (non pipelined)
  LEGv8 implementation
- Simple subset, shows most aspects
  - Memory reference: LDUR, STUR
  - Arithmetic/logical: add, sub, and, or, slt
  - Control transfer: beq, j



### **Instruction Execution**

- PC → instruction memory, fetch instruction
- Register numbers → register file, read registers
- Depending on instruction class
  - Use ALU to calculate
    - Arithmetic result
    - Memory address for load/store
    - Branch target address
  - Access data memory for load/store
  - PC ← target address or PC + 4

# Logic Design Basics

- Information encoded in binary
  - Low voltage = 0, High voltage = 1
  - One wire per bit
  - Multi-bit data encoded on multi-wire buses
- Combinational element
  - Operate on data
  - Output is a function of input
- State (sequential) elements
  - Store information



## **Combinational Elements**

- AND-gate
  - Y = A & B

- Multiplexer
  - Y = S ? I1 : I0



Adder

$$\mathbf{Y} = \mathbf{A} + \mathbf{B}$$



Arithmetic/Logic Unit

• 
$$Y = F(A, B)$$



## Sequential Elements

- Register: stores data in a circuit
  - Uses a clock signal to determine when to update the stored value
  - Edge-triggered: update when Clk changes from 0 to 1





## Sequential Elements

- Register with write control
  - Only updates on clock edge when write control input is 1
  - Used when stored value is required later



# **Clocking Methodology**

- Combinational logic transforms data during clock cycles
  - Between clock edges
  - Input from state elements, output to state element
  - Longest delay determines clock period







## **Building a Datapath**

- Datapath
  - Elements that process data and addresses in the CPU
    - Registers, ALUs, mux's, memories, ...
- We will build a LEGv8 datapath incrementally
  - Refining the overview design

### **Instruction Fetch**



### **R-Format Instructions**

- Read two register operands
- Perform arithmetic/logical operation
- Write register result



a. Registers

b. ALU

#### **Load/Store Instructions**

- Read register operands
- Calculate address using 16-bit offset
  - Use ALU, but sign-extend offset
- Load: Read memory and update register
- Store: Write register value to memory



a. Data memory unit

b. Sign extension unit

#### **Branch Instructions**

- Read register operands
- Compare operands
  - Use ALU, subtract and check Zero output
- Calculate target address
  - Sign-extend displacement
  - Shift left 2 places (word displacement)
  - Add to PC + 4
    - Already calculated by instruction fetch

#### **Branch Instructions**



# Composing the Elements

- First-cut data path does an instruction in one clock cycle
  - Each datapath element can only do one function at a time
  - Hence, we need separate instruction and data memories
- Use multiplexers where alternate data sources are used for different instructions

# R-Type/Load/Store Datapath



# **Full Datapath**



#### **ALU Control**

- ALU used for
  - Load/Store: F = add
  - Branch: F = subtract
  - R-type: F depends on opcode

| ALU control | Function     |  |  |
|-------------|--------------|--|--|
| 0000        | AND          |  |  |
| 0001        | OR           |  |  |
| 0010        | add          |  |  |
| 0110        | subtract     |  |  |
| 0111        | pass input b |  |  |
| 1100        | NOR          |  |  |



#### **ALU Control**

- Assume 2-bit ALUOp derived from opcode
  - Combinational logic derives ALU control

| opcode | ALUOp | Operation                  | Opcode field | ALU function | ALU<br>control |
|--------|-------|----------------------------|--------------|--------------|----------------|
| LDUR   | 00    | load register              | XXXXXXXXXX   | add          | 0010           |
| STUR   | 00    | store register             | XXXXXXXXXX   | add          | 0010           |
| CBZ    | 01    | compare and branch on zero | XXXXXXXXXX   | pass input b | 0111           |
| R-type | 10    | add                        | 100000       | add          | 0010           |
|        |       | subtract                   | 100010       | subtract     | 0110           |
|        |       | AND                        | 100100       | AND          | 0000           |
|        |       | ORR                        | 100101       | OR           | 0001           |

### **The Main Control Unit**

opcode

#### Control signals derived from instruction

|                              | · ·          |         |       |     |     |  |  |  |  |
|------------------------------|--------------|---------|-------|-----|-----|--|--|--|--|
| Bit positions                | 31:21        | 20:16   | 15:10 | 9:5 | 4:0 |  |  |  |  |
| a. R-type instruction        |              |         |       |     |     |  |  |  |  |
|                              |              |         |       |     |     |  |  |  |  |
| Field                        | 1986 or 1984 | addres  | s 0   | Rn  | Rt  |  |  |  |  |
| Bit positions                | 31:21        | 20:12   | 11:10 | 9:5 | 4:0 |  |  |  |  |
| b. Load or store instruction |              |         |       |     |     |  |  |  |  |
|                              |              |         |       |     |     |  |  |  |  |
| Field                        | 180          | address |       |     | Rt  |  |  |  |  |
| Bit positions                | 31:26        |         | 4:0   |     |     |  |  |  |  |

Rm

shamt

Rn

Rd

c. Conditional branch instruction

Field

## **Datapath With Control**



# **R-Type Instruction**



### **Load Instruction**



### **CBZ** Instruction



# Implementing Uncnd'l Branch



- Jump uses word address
- Update PC with concatenation of
  - Top 4 bits of old PC
  - 26-bit jump address
  - **00**
- Need an extra control signal decoded from opcode

# Datapath With B Added



#### Performance Issues

- Longest delay determines clock period
  - Critical path: load instruction
  - Instruction memory → register file → ALU → data memory → register file
- Not feasible to vary period for different instructions
- Violates design principle
  - Making the common case fast
- Improve performance by pipelining