Single Cycle RISC V Processor - A Detailed Guide

§1 Foundation

Microarchitecture Overview

Microarchitecture = how to implement an ISA in hardware. Same ISA (RISC-V) can have multiple microarchitectures with different performance/cost trade-offs.

Single-Cycle

Each instruction executes in exactly one clock cycle. Simple but slow — clock period limited by the longest instruction (lw).

Multicycle (not in syllabus)

Instruction broken into shorter steps. Each step takes one cycle. Faster clock, but more cycles per instruction.

Pipelined

Multiple instructions execute simultaneously in different stages. Best throughput, but hazards must be handled. (Page 2)

Performance Equation — The Most Important Formula

Execution Time = (#Instructions) × CPI × T_c

CPI = Cycles per Instruction | T_c = Clock Period (seconds/cycle)

Single-Cycle CPI = 1

Every instruction, regardless of complexity, uses exactly 1 clock cycle. But T_c is long.

IPC = 1/CPI

Instructions per Cycle. Single-cycle IPC = 1. Pipelining aims to keep IPC near 1 with a faster clock.

Critical Path → T_c

Clock period is set by the longest combinational path through the datapath. For single-cycle, that's the lw instruction.

RISC-V Instructions We Implement

Type	Instructions	Key Characteristic
R-type	add, sub, and, or, slt	Both operands from register file
I-type (Memory)	lw	ALU computes address; reads memory
S-type (Memory)	sw	ALU computes address; writes memory
B-type (Branch)	beq	Compares registers; may change PC
I-type (ALU)	addi	One operand is sign-extended immediate
J-type	jal	Jump and link; saves PC+4 to rd

§2 Datapath

Building the Datapath

The datapath is built incrementally, starting with lw (the most complex instruction). All other instructions reuse this hardware with different control signals.

Key Datapath Components

Program Counter (PC)

32-bit register storing address of current instruction. Updated every cycle: PC ← PCNext. A mux selects between PC+4 (sequential) and PCTarget (branch).

Instruction Memory

Read-only during execution. Addressed by PC. Outputs 32-bit instruction word. In single-cycle, separate from data memory.

Register File

32 × 32-bit registers. Two read ports (A1→RD1, A2→RD2) and one write port (A3, WD3, WE3). Reads combinationally; writes on clock edge.

Sign Extend / ImmExt

Takes the immediate field from the instruction and sign-extends to 32 bits. ImmSrc[1:0] selects which format (I, S, B, J-type).

ALU

Performs arithmetic/logic. Controlled by ALUControl[2:0]. Outputs: ALUResult (32-bit) and Zero flag (used for branches).

Data Memory

For lw: reads data at address given by ALUResult. For sw: writes RD2 to that address. WE (write enable) controlled by MemWrite.

Critical Muxes & Their Control Signals

// Control signal → MUX → Effect

ALUSrc0 = second ALU input is RD2 (register). 1 = second ALU input is ImmExt (immediate). Selects between R-type and I-type operation.

ResultSrc[1:0]00 = ALUResult (R-type, addi). 01 = ReadData (lw). 10 = PCPlus4 (jal). Selects what gets written back to register file.

PCSrc0 = PC+4 (normal sequential). 1 = PCTarget (branch taken or jal). PCTarget = PC + ImmExt (sign-extended offset).

SrcA (for branch)Branch target adder uses PC as SrcA, ImmExt as SrcB. PC+4 adder always adds 4 to PC. Two separate adders in single-cycle.

lw Datapath — The Critical Path

lw rd, imm(rs1) — data flows through ALL 6 elements:

PC → Instr
Fetch

→

RF Read
rs1→RD1

→

ImmExt
imm sign ext

→

ALU: Add
rs1+imm=addr

→

Data Mem
Read addr

→

RF Write
rd←data

This is why lw sets the critical path. It uses EVERY component: PC, InstMem, RegFile read, ImmExt, ALU, DataMem, RegFile write.

💡 Why single-cycle needs TWO memories

In a single clock cycle, we must simultaneously fetch the instruction (from instruction memory) AND read/write data (from data memory). With one shared memory, you'd get a conflict — both the instruction fetch and lw's data read happen at the same time.

§3 Instruction Execution

How Each Instruction Uses the Datapath

Same hardware, different control signals. Understanding this is key for the "which datapath is active?" and "what instruction is this?" exam questions.

lw rd, imm(rs1) — Load Word

Active signals: RegWrite=1, ALUSrc=1, MemWrite=0, ResultSrc=01, PCSrc=0, ALUControl=Add

PC→InstrMem→decode rs1,rd,imm → RegFile reads rs1 → ImmExt extends imm → ALU adds rs1+imm (address) → DataMem reads at address → result written to rd → PC=PC+4

sw rs2, imm(rs1) — Store Word

Active signals: RegWrite=0, ALUSrc=1, MemWrite=1, ResultSrc=XX, PCSrc=0, ALUControl=Add

RegFile reads rs1 AND rs2 → ImmExt extends imm (S-type, ImmSrc=01) → ALU adds rs1+imm (address) → DataMem writes RD2 (rs2's value) to address → No write-back to RegFile → PC=PC+4

add/sub/and/or rd, rs1, rs2 — R-type

Active signals: RegWrite=1, ALUSrc=0, MemWrite=0, ResultSrc=00, PCSrc=0

RegFile reads rs1 AND rs2 → ALU operates on RD1 and RD2 directly (ALUSrc=0, no immediate) → ALUResult written to rd → DataMem NOT accessed → PC=PC+4

beq rs1, rs2, offset — Branch Equal

Active signals: RegWrite=0, ALUSrc=0, MemWrite=0, ALUControl=Sub, Branch=1

RegFile reads rs1 and rs2 → ALU subtracts → Zero flag = 1 if rs1==rs2 → PCSrc = Branch AND Zero → If taken: PC=PCTarget=PC+ImmExt, else PC=PC+4. Two adders: one for PC+4, one for branch target.

addi rd, rs1, imm — Add Immediate

Active signals: RegWrite=1, ALUSrc=1, MemWrite=0, ResultSrc=00, PCSrc=0, ALUControl=Add

Same as R-type ADD but ALUSrc=1 (second operand is ImmExt, not a register). ImmSrc=00 (I-type format). Result written to rd.

jal rd, offset — Jump and Link

Active signals: RegWrite=1, Jump=1, ResultSrc=10, MemWrite=0

PCTarget = PC + ImmExt (J-type). PC is set to PCTarget unconditionally. rd ← PC+4 (return address). ResultSrc=10 selects PCPlus4 for the write-back value.

🎯 Exam Trap: beq PCSrc

PCSrc = Branch AND Zero — not just Zero! If Branch=0 (not a branch instruction), PCSrc is always 0 even if Zero happens to be 1. This is an AND gate in hardware.

§4 Control

The Control Unit

The control unit takes the opcode (and funct3, funct7) and generates all the control signals. It has two parts: Main Decoder and ALU Decoder.

Main Decoder — Opcode → Control Signals

The main decoder looks at bits [6:0] of the instruction (the opcode field).

op[6:0]	Instruction	RegWrite	ImmSrc	ALUSrc	MemWrite	ResultSrc	Branch	ALUOp	Jump
0000011	lw	1	00	1	0	01	0	00	0
0100011	sw	0	01	1	1	XX	0	00	0
0110011	R-type	1	XX	0	0	00	0	10	0
1100011	beq	0	10	0	0	XX	1	01	0
0010011	I-type ALU (addi)	1	00	1	0	00	0	10	0
1101111	jal	1	11	X	0	10	0	XX	1

ALU Decoder — ALUOp + funct3 + funct7 → ALUControl

ALUOp	Instruction	funct3	funct7[5]	ALUControl[2:0]	Operation
00	lw / sw	—	—	000	ADD
01	beq	—	—	001	SUB (check Zero)
10	R-type / I-type	000	0	000	ADD (add, addi)
10	R-type	000	1	001	SUB (sub)
10	R-type / I-type	111	—	010	AND (and, andi)
10	R-type / I-type	110	—	011	OR (or, ori)
10	R-type / I-type	010	—	101	SLT (slt, slti)

💡 Two-Level Decoding

ALUOp is an intermediate signal from the main decoder to the ALU decoder. It avoids looking at funct3/funct7 when they don't matter (e.g. lw always adds, regardless of funct fields). Think of it as: main decoder says "what kind of ALU operation?" and ALU decoder says "specifically which one?"

ImmSrc Encoding

ImmSrc[1:0]	Type	Immediate Bits	Used by
00	I-type	inst[31:20]	lw, addi, jalr
01	S-type	inst[31:25], inst[11:7]	sw
10	B-type	inst[31], [7], [30:25], [11:8]	beq
11	J-type	inst[31], [19:12], [20], [30:21]	jal

§5 Datapath Tracing

Reading Control Signal Tables — Exam Skill

The exam gives you signals and asks what instruction it is, or gives you code and asks which signals are active. Master this lookup table.

🔑 Method: Code → Signals

1. Identify the instruction type from the mnemonic.
2. Look up its opcode in the main decoder table.
3. If R-type/I-type ALU, also look at funct3 and funct7[5] for ALUControl.
4. List all the active signals: RegWrite, ImmSrc, ALUSrc, MemWrite, ResultSrc, Branch/Jump, ALUControl.

🔑 Method: Signals → Code

1. Check RegWrite, MemWrite to narrow down type.
2. Check ALUSrc: 0=R-type, 1=I/S-type.
3. Check ResultSrc: 01=lw, 10=jal, 00=ALU result.
4. Check Branch and Jump for beq/jal.
5. Use ALUControl + ImmSrc to pin down the exact instruction.

Worked Example: Identify Instruction from Signals

Given: RegWrite=1, ALUSrc=1, MemWrite=0, ResultSrc=01, Branch=0, Jump=0

Step 1: RegWrite=1 → writes to register file → not sw or beq.
Step 2: ALUSrc=1 → second ALU input is immediate → not R-type.
Step 3: ResultSrc=01 → result is ReadData from memory → this is lw!
Step 4: MemWrite=0 → confirms read (not write). Answer: lw

Worked Example: Signals for "or s4, s5, s6"

or rd, rs1, rs2 — R-type with funct3=110

RegWrite=1 (writes rd), ImmSrc=XX (don't care, no immediate used), ALUSrc=0 (both operands from RF), MemWrite=0 (no memory), ResultSrc=00 (ALUResult to RF), Branch=0, Jump=0, ALUOp=10 → ALUControl=011 (OR, since funct3=110).

Worked Example: Signals for "beq s0, s1, target"

beq rs1, rs2, offset — B-type

RegWrite=0, ImmSrc=10 (B-type), ALUSrc=0 (both from RF), MemWrite=0, ResultSrc=XX, Branch=1, Jump=0, ALUControl=001 (SUB). Zero = 1 if s0==s1. PCSrc = Branch AND Zero = 1 AND Zero.

§6 Performance

Single-Cycle Performance Analysis

The clock period must accommodate the slowest instruction. With single-cycle, CPI=1 always, but T_c is expensive.

Critical Path Formula

T_{c_single} = t_{pcq_PC} + 2t_mem + t_RFread + t_ALU + t_mux + t_RFsetup

The lw critical path: PC register → Instr Memory → RF read → ALU → Data Memory → RF write setup

⚠️ Why 2× t_mem?

lw accesses memory TWICE: once to fetch the instruction (instruction memory) and once to read data (data memory). Both happen in the same clock cycle, and both are on the critical path for lw.

Standard Component Delays (from textbook)

Element	Parameter	Delay (ps)
Register (PC) clock-to-Q	t_{pcq_PC}	40 ps
Register setup time	t_setup	50 ps
Multiplexer	t_mux	30 ps
AND-OR gate	t_AND-OR	20 ps
ALU	t_ALU	120 ps
Decoder (Control Unit)	t_dec	25 ps
Extend unit	t_ext	35 ps
Memory read	t_mem	200 ps
Register file read	t_RFread	100 ps
Register file setup	t_RFsetup	60 ps

Worked Calculation

// Calculate single-cycle clock period

T_c = t_pcq + 2*t_mem + t_RFread + t_ALU + t_mux + t_RFsetup
    = 40 + 2(200) + 100 + 120 + 30 + 60
    = 40 + 400 + 100 + 120 + 30 + 60
    = 750 ps

For 100 billion instructions:
Execution Time = N × CPI × T_c
               = (100 × 10^9) × 1 × (750 × 10^-12 s)
               = 75 seconds

✅ Key Insight: CPI=1 is the advantage

Single-cycle has CPI=1 which is optimal. But T_c=750ps is the problem. Pipelined reduces T_c to ~350ps while keeping average CPI near 1, achieving ~1.7× speedup.

Performance Comparison (textbook example)

Processor	T_c	CPI	Execution Time (100B instr)	Speedup vs Single-Cycle
Single-Cycle	750 ps	1.0	75 s	1× (baseline)
Multicycle	~300 ps	4.12	155 s	0.5× (SLOWER!)
Pipelined	350 ps	1.23	43 s	1.7× faster

§7 Exam Prep

Practice MCQs — Click to Reveal

Based on the pattern of exam questions you've seen. Click any card to show the answer.

Q: In a single-cycle processor, which instruction determines the minimum clock period?

A) add B) beq C) lw D) sw

✅ C) lw. The lw instruction uses every component in the datapath: PC, instruction memory, register file (read), sign extension, ALU, data memory, and register file (write). This makes it the longest path and sets T_c.

▼ Show Answer

Q: For a single-cycle RISC-V processor, what is the CPI?

A) Depends on instruction B) Always 1 C) Between 1 and 5 D) 4.12

✅ B) Always 1. By definition, single-cycle means every instruction completes in exactly one clock cycle. The clock is long enough to handle even the worst case (lw), so all instructions finish within it.

▼ Show Answer

Q: The control signals are: RegWrite=1, ALUSrc=0, MemWrite=0, ResultSrc=00, Branch=0, ALUOp=10, funct3=000, funct7[5]=0. What instruction is executing?

A) sub B) addi C) add D) lw

✅ C) add. ALUSrc=0 → R-type (both operands from RF). ResultSrc=00 → result is ALUResult. ALUOp=10 + funct3=000 + funct7[5]=0 → ALUControl=000 → ADD. Not sub (which has funct7[5]=1). Not addi (ALUSrc would be 1).

▼ Show Answer

Q: What does PCSrc=1 mean in the single-cycle datapath?

A) Fetch next instruction at PC+4 B) Jump to PCTarget C) Stall the pipeline D) Write PC to register

✅ B) Jump to PCTarget. PCSrc selects the PC mux input. PCSrc=0 → PC+4 (sequential), PCSrc=1 → PCTarget (branch/jump target = PC + sign-extended offset). PCSrc = (Branch AND Zero) OR Jump.

▼ Show Answer

Q: For single-cycle with t_pcq=40, t_mem=200, t_RFread=100, t_ALU=120, t_mux=30, t_RFsetup=60. What is T_c?

A) 550 ps B) 750 ps C) 870 ps D) 400 ps

✅ B) 750 ps. T_c = t_pcq + 2×t_mem + t_RFread + t_ALU + t_mux + t_RFsetup = 40 + 400 + 100 + 120 + 30 + 60 = 750 ps. Note: 2×t_mem because lw reads both instruction memory AND data memory.

▼ Show Answer

Q: For sw instruction: which signal is 1 — RegWrite or MemWrite?

A) Only RegWrite=1 B) Only MemWrite=1 C) Both=1 D) Both=0

✅ B) Only MemWrite=1. sw writes to memory (MemWrite=1) but does NOT write back to the register file (RegWrite=0). There is no destination register — the data goes from rs2 to memory at address rs1+imm.

▼ Show Answer

Q: The machine code 0x00940A63 is a beq instruction (B-type). The immediate bits decoded give imm=20 (decimal). First instruction at 0x0000_000C. What is PCTarget?

A) 0x20 B) 0x2C C) 0x14 D) 0x1C

✅ PCTarget = PC + ImmExt. If beq is at 0x0000_000C and imm=20: PCTarget = 0xC + 20 = 0xC + 0x14 = 0x20. But note: B-type immediates are already in bytes and represent the offset from the current PC. Always confirm the immediate decoding from the bit fields.

▼ Show Answer

Q: Which signal selects between the ALU result and memory read data to write back to the register file?

A) ALUSrc B) MemWrite C) ResultSrc D) RegWrite

✅ C) ResultSrc. ResultSrc[1:0] controls the mux feeding WD3 (write data) of the register file: 00=ALUResult (R-type, addi), 01=ReadData (lw), 10=PCPlus4 (jal). RegWrite=1 is also needed to actually enable the write.

▼ Show Answer

🎯 Remember for the B-type immediate (beq)

The B-type immediate is sign-extended and already in byte addresses. The instruction encodes bits [12:1] of the offset (bit 0 is always 0 since instructions are 4-byte aligned). So imm[12:1] from bits = actual byte offset, which is then added to PC.

From machine code 0x00940A63: op=1100011 (beq), extract imm fields → imm = {inst[31], inst[7], inst[30:25], inst[11:8], 0} = decode carefully!

Microarchitecture Overview

Single-Cycle

Multicycle (not in syllabus)

Pipelined

Performance Equation — The Most Important Formula

Single-Cycle CPI = 1

IPC = 1/CPI

Critical Path → Tc

RISC-V Instructions We Implement

Building the Datapath

Key Datapath Components

Program Counter (PC)

Instruction Memory

Register File

Sign Extend / ImmExt

ALU

Data Memory

Critical Muxes & Their Control Signals

// Control signal → MUX → Effect

lw Datapath — The Critical Path

lw rd, imm(rs1) — data flows through ALL 6 elements:

How Each Instruction Uses the Datapath

lw rd, imm(rs1) — Load Word

Active signals: RegWrite=1, ALUSrc=1, MemWrite=0, ResultSrc=01, PCSrc=0, ALUControl=Add

sw rs2, imm(rs1) — Store Word

Active signals: RegWrite=0, ALUSrc=1, MemWrite=1, ResultSrc=XX, PCSrc=0, ALUControl=Add

add/sub/and/or rd, rs1, rs2 — R-type

Active signals: RegWrite=1, ALUSrc=0, MemWrite=0, ResultSrc=00, PCSrc=0

beq rs1, rs2, offset — Branch Equal

Active signals: RegWrite=0, ALUSrc=0, MemWrite=0, ALUControl=Sub, Branch=1

addi rd, rs1, imm — Add Immediate

Active signals: RegWrite=1, ALUSrc=1, MemWrite=0, ResultSrc=00, PCSrc=0, ALUControl=Add

jal rd, offset — Jump and Link

Active signals: RegWrite=1, Jump=1, ResultSrc=10, MemWrite=0

The Control Unit

Main Decoder — Opcode → Control Signals

ALU Decoder — ALUOp + funct3 + funct7 → ALUControl

ImmSrc Encoding

Reading Control Signal Tables — Exam Skill

Worked Example: Identify Instruction from Signals

Given: RegWrite=1, ALUSrc=1, MemWrite=0, ResultSrc=01, Branch=0, Jump=0

Worked Example: Signals for "or s4, s5, s6"

or rd, rs1, rs2 — R-type with funct3=110

Worked Example: Signals for "beq s0, s1, target"

beq rs1, rs2, offset — B-type

Single-Cycle Performance Analysis

Critical Path Formula

Standard Component Delays (from textbook)

Worked Calculation

// Calculate single-cycle clock period

Performance Comparison (textbook example)

Practice MCQs — Click to Reveal

Critical Path → T_c