Single-Cycle Processor — PainlessProgramming.com
PainlessProgramming.com Page 1 of 2

Ch 7: Single-Cycle
Processor

RISC-V Microarchitecture — Datapath, Control Signals, Performance Analysis & Exam Practice

→ Page 2: Pipelined Processor
§1 Foundation

Microarchitecture Overview

Microarchitecture = how to implement an ISA in hardware. Same ISA (RISC-V) can have multiple microarchitectures with different performance/cost trade-offs.

Single-Cycle

Each instruction executes in exactly one clock cycle. Simple but slow — clock period limited by the longest instruction (lw).

Multicycle (not in syllabus)

Instruction broken into shorter steps. Each step takes one cycle. Faster clock, but more cycles per instruction.

Pipelined

Multiple instructions execute simultaneously in different stages. Best throughput, but hazards must be handled. (Page 2)

Performance Equation — The Most Important Formula

Execution Time = (#Instructions) × CPI × Tc
CPI = Cycles per Instruction  |  Tc = Clock Period (seconds/cycle)

Single-Cycle CPI = 1

Every instruction, regardless of complexity, uses exactly 1 clock cycle. But Tc is long.

IPC = 1/CPI

Instructions per Cycle. Single-cycle IPC = 1. Pipelining aims to keep IPC near 1 with a faster clock.

Critical Path → Tc

Clock period is set by the longest combinational path through the datapath. For single-cycle, that's the lw instruction.

RISC-V Instructions We Implement

TypeInstructionsKey Characteristic
R-typeadd, sub, and, or, sltBoth operands from register file
I-type (Memory)lwALU computes address; reads memory
S-type (Memory)swALU computes address; writes memory
B-type (Branch)beqCompares registers; may change PC
I-type (ALU)addiOne operand is sign-extended immediate
J-typejalJump and link; saves PC+4 to rd

§2 Datapath

Building the Datapath

The datapath is built incrementally, starting with lw (the most complex instruction). All other instructions reuse this hardware with different control signals.

Key Datapath Components

Program Counter (PC)

32-bit register storing address of current instruction. Updated every cycle: PC ← PCNext. A mux selects between PC+4 (sequential) and PCTarget (branch).

Instruction Memory

Read-only during execution. Addressed by PC. Outputs 32-bit instruction word. In single-cycle, separate from data memory.

Register File

32 × 32-bit registers. Two read ports (A1→RD1, A2→RD2) and one write port (A3, WD3, WE3). Reads combinationally; writes on clock edge.

Sign Extend / ImmExt

Takes the immediate field from the instruction and sign-extends to 32 bits. ImmSrc[1:0] selects which format (I, S, B, J-type).

ALU

Performs arithmetic/logic. Controlled by ALUControl[2:0]. Outputs: ALUResult (32-bit) and Zero flag (used for branches).

Data Memory

For lw: reads data at address given by ALUResult. For sw: writes RD2 to that address. WE (write enable) controlled by MemWrite.

Critical Muxes & Their Control Signals

// Control signal → MUX → Effect
ALUSrc0 = second ALU input is RD2 (register). 1 = second ALU input is ImmExt (immediate). Selects between R-type and I-type operation.
ResultSrc[1:0]00 = ALUResult (R-type, addi). 01 = ReadData (lw). 10 = PCPlus4 (jal). Selects what gets written back to register file.
PCSrc0 = PC+4 (normal sequential). 1 = PCTarget (branch taken or jal). PCTarget = PC + ImmExt (sign-extended offset).
SrcA (for branch)Branch target adder uses PC as SrcA, ImmExt as SrcB. PC+4 adder always adds 4 to PC. Two separate adders in single-cycle.

lw Datapath — The Critical Path

lw rd, imm(rs1) — data flows through ALL 6 elements:
PC → Instr
Fetch
RF Read
rs1→RD1
ImmExt
imm sign ext
ALU: Add
rs1+imm=addr
Data Mem
Read addr
RF Write
rd←data

This is why lw sets the critical path. It uses EVERY component: PC, InstMem, RegFile read, ImmExt, ALU, DataMem, RegFile write.

💡 Why single-cycle needs TWO memories

In a single clock cycle, we must simultaneously fetch the instruction (from instruction memory) AND read/write data (from data memory). With one shared memory, you'd get a conflict — both the instruction fetch and lw's data read happen at the same time.


§3 Instruction Execution

How Each Instruction Uses the Datapath

Same hardware, different control signals. Understanding this is key for the "which datapath is active?" and "what instruction is this?" exam questions.

lw rd, imm(rs1) — Load Word

Active signals: RegWrite=1, ALUSrc=1, MemWrite=0, ResultSrc=01, PCSrc=0, ALUControl=Add

PC→InstrMem→decode rs1,rd,imm → RegFile reads rs1 → ImmExt extends imm → ALU adds rs1+imm (address) → DataMem reads at address → result written to rd → PC=PC+4

sw rs2, imm(rs1) — Store Word

Active signals: RegWrite=0, ALUSrc=1, MemWrite=1, ResultSrc=XX, PCSrc=0, ALUControl=Add

RegFile reads rs1 AND rs2 → ImmExt extends imm (S-type, ImmSrc=01) → ALU adds rs1+imm (address) → DataMem writes RD2 (rs2's value) to address → No write-back to RegFile → PC=PC+4

add/sub/and/or rd, rs1, rs2 — R-type

Active signals: RegWrite=1, ALUSrc=0, MemWrite=0, ResultSrc=00, PCSrc=0

RegFile reads rs1 AND rs2 → ALU operates on RD1 and RD2 directly (ALUSrc=0, no immediate) → ALUResult written to rd → DataMem NOT accessed → PC=PC+4

beq rs1, rs2, offset — Branch Equal

Active signals: RegWrite=0, ALUSrc=0, MemWrite=0, ALUControl=Sub, Branch=1

RegFile reads rs1 and rs2 → ALU subtracts → Zero flag = 1 if rs1==rs2 → PCSrc = Branch AND Zero → If taken: PC=PCTarget=PC+ImmExt, else PC=PC+4. Two adders: one for PC+4, one for branch target.

addi rd, rs1, imm — Add Immediate

Active signals: RegWrite=1, ALUSrc=1, MemWrite=0, ResultSrc=00, PCSrc=0, ALUControl=Add

Same as R-type ADD but ALUSrc=1 (second operand is ImmExt, not a register). ImmSrc=00 (I-type format). Result written to rd.

jal rd, offset — Jump and Link

Active signals: RegWrite=1, Jump=1, ResultSrc=10, MemWrite=0

PCTarget = PC + ImmExt (J-type). PC is set to PCTarget unconditionally. rd ← PC+4 (return address). ResultSrc=10 selects PCPlus4 for the write-back value.

🎯 Exam Trap: beq PCSrc

PCSrc = Branch AND Zero — not just Zero! If Branch=0 (not a branch instruction), PCSrc is always 0 even if Zero happens to be 1. This is an AND gate in hardware.


§4 Control

The Control Unit

The control unit takes the opcode (and funct3, funct7) and generates all the control signals. It has two parts: Main Decoder and ALU Decoder.

Main Decoder — Opcode → Control Signals

The main decoder looks at bits [6:0] of the instruction (the opcode field).

op[6:0]InstructionRegWriteImmSrcALUSrcMemWriteResultSrcBranchALUOpJump
0000011lw10010010000
0100011sw00111XX0000
0110011R-type1XX00000100
1100011beq01000XX1010
0010011I-type ALU (addi)10010000100
1101111jal111X0100XX1

ALU Decoder — ALUOp + funct3 + funct7 → ALUControl

ALUOpInstructionfunct3funct7[5]ALUControl[2:0]Operation
00lw / sw000ADD
01beq001SUB (check Zero)
10R-type / I-type0000000ADD (add, addi)
10R-type0001001SUB (sub)
10R-type / I-type111010AND (and, andi)
10R-type / I-type110011OR (or, ori)
10R-type / I-type010101SLT (slt, slti)
💡 Two-Level Decoding

ALUOp is an intermediate signal from the main decoder to the ALU decoder. It avoids looking at funct3/funct7 when they don't matter (e.g. lw always adds, regardless of funct fields). Think of it as: main decoder says "what kind of ALU operation?" and ALU decoder says "specifically which one?"

ImmSrc Encoding

ImmSrc[1:0]TypeImmediate BitsUsed by
00I-typeinst[31:20]lw, addi, jalr
01S-typeinst[31:25], inst[11:7]sw
10B-typeinst[31], [7], [30:25], [11:8]beq
11J-typeinst[31], [19:12], [20], [30:21]jal

§5 Datapath Tracing

Reading Control Signal Tables — Exam Skill

The exam gives you signals and asks what instruction it is, or gives you code and asks which signals are active. Master this lookup table.

🔑 Method: Code → Signals

1. Identify the instruction type from the mnemonic.
2. Look up its opcode in the main decoder table.
3. If R-type/I-type ALU, also look at funct3 and funct7[5] for ALUControl.
4. List all the active signals: RegWrite, ImmSrc, ALUSrc, MemWrite, ResultSrc, Branch/Jump, ALUControl.

🔑 Method: Signals → Code

1. Check RegWrite, MemWrite to narrow down type.
2. Check ALUSrc: 0=R-type, 1=I/S-type.
3. Check ResultSrc: 01=lw, 10=jal, 00=ALU result.
4. Check Branch and Jump for beq/jal.
5. Use ALUControl + ImmSrc to pin down the exact instruction.

Worked Example: Identify Instruction from Signals

Given: RegWrite=1, ALUSrc=1, MemWrite=0, ResultSrc=01, Branch=0, Jump=0

Step 1: RegWrite=1 → writes to register file → not sw or beq.
Step 2: ALUSrc=1 → second ALU input is immediate → not R-type.
Step 3: ResultSrc=01 → result is ReadData from memory → this is lw!
Step 4: MemWrite=0 → confirms read (not write). Answer: lw

Worked Example: Signals for "or s4, s5, s6"

or rd, rs1, rs2 — R-type with funct3=110

RegWrite=1 (writes rd), ImmSrc=XX (don't care, no immediate used), ALUSrc=0 (both operands from RF), MemWrite=0 (no memory), ResultSrc=00 (ALUResult to RF), Branch=0, Jump=0, ALUOp=10ALUControl=011 (OR, since funct3=110).

Worked Example: Signals for "beq s0, s1, target"

beq rs1, rs2, offset — B-type

RegWrite=0, ImmSrc=10 (B-type), ALUSrc=0 (both from RF), MemWrite=0, ResultSrc=XX, Branch=1, Jump=0, ALUControl=001 (SUB). Zero = 1 if s0==s1. PCSrc = Branch AND Zero = 1 AND Zero.


§6 Performance

Single-Cycle Performance Analysis

The clock period must accommodate the slowest instruction. With single-cycle, CPI=1 always, but Tc is expensive.

Critical Path Formula

Tc_single = tpcq_PC + 2tmem + tRFread + tALU + tmux + tRFsetup
The lw critical path: PC register → Instr Memory → RF read → ALU → Data Memory → RF write setup
⚠️ Why 2× tmem?

lw accesses memory TWICE: once to fetch the instruction (instruction memory) and once to read data (data memory). Both happen in the same clock cycle, and both are on the critical path for lw.

Standard Component Delays (from textbook)

ElementParameterDelay (ps)
Register (PC) clock-to-Qtpcq_PC40 ps
Register setup timetsetup50 ps
Multiplexertmux30 ps
AND-OR gatetAND-OR20 ps
ALUtALU120 ps
Decoder (Control Unit)tdec25 ps
Extend unittext35 ps
Memory readtmem200 ps
Register file readtRFread100 ps
Register file setuptRFsetup60 ps

Worked Calculation

// Calculate single-cycle clock period
T_c = t_pcq + 2*t_mem + t_RFread + t_ALU + t_mux + t_RFsetup
    = 40 + 2(200) + 100 + 120 + 30 + 60
    = 40 + 400 + 100 + 120 + 30 + 60
    = 750 ps

For 100 billion instructions:
Execution Time = N × CPI × T_c
               = (100 × 10^9) × 1 × (750 × 10^-12 s)
               = 75 seconds
✅ Key Insight: CPI=1 is the advantage

Single-cycle has CPI=1 which is optimal. But Tc=750ps is the problem. Pipelined reduces Tc to ~350ps while keeping average CPI near 1, achieving ~1.7× speedup.

Performance Comparison (textbook example)

ProcessorTcCPIExecution Time (100B instr)Speedup vs Single-Cycle
Single-Cycle750 ps1.075 s1× (baseline)
Multicycle~300 ps4.12155 s0.5× (SLOWER!)
Pipelined350 ps1.2343 s1.7× faster

§7 Exam Prep

Practice MCQs — Click to Reveal

Based on the pattern of exam questions you've seen. Click any card to show the answer.

Q: In a single-cycle processor, which instruction determines the minimum clock period?
A) add   B) beq   C) lw   D) sw
C) lw. The lw instruction uses every component in the datapath: PC, instruction memory, register file (read), sign extension, ALU, data memory, and register file (write). This makes it the longest path and sets Tc.
▼ Show Answer
Q: For a single-cycle RISC-V processor, what is the CPI?
A) Depends on instruction   B) Always 1   C) Between 1 and 5   D) 4.12
B) Always 1. By definition, single-cycle means every instruction completes in exactly one clock cycle. The clock is long enough to handle even the worst case (lw), so all instructions finish within it.
▼ Show Answer
Q: The control signals are: RegWrite=1, ALUSrc=0, MemWrite=0, ResultSrc=00, Branch=0, ALUOp=10, funct3=000, funct7[5]=0. What instruction is executing?
A) sub   B) addi   C) add   D) lw
C) add. ALUSrc=0 → R-type (both operands from RF). ResultSrc=00 → result is ALUResult. ALUOp=10 + funct3=000 + funct7[5]=0 → ALUControl=000 → ADD. Not sub (which has funct7[5]=1). Not addi (ALUSrc would be 1).
▼ Show Answer
Q: What does PCSrc=1 mean in the single-cycle datapath?
A) Fetch next instruction at PC+4   B) Jump to PCTarget   C) Stall the pipeline   D) Write PC to register
B) Jump to PCTarget. PCSrc selects the PC mux input. PCSrc=0 → PC+4 (sequential), PCSrc=1 → PCTarget (branch/jump target = PC + sign-extended offset). PCSrc = (Branch AND Zero) OR Jump.
▼ Show Answer
Q: For single-cycle with t_pcq=40, t_mem=200, t_RFread=100, t_ALU=120, t_mux=30, t_RFsetup=60. What is T_c?
A) 550 ps   B) 750 ps   C) 870 ps   D) 400 ps
B) 750 ps. T_c = t_pcq + 2×t_mem + t_RFread + t_ALU + t_mux + t_RFsetup = 40 + 400 + 100 + 120 + 30 + 60 = 750 ps. Note: 2×t_mem because lw reads both instruction memory AND data memory.
▼ Show Answer
Q: For sw instruction: which signal is 1 — RegWrite or MemWrite?
A) Only RegWrite=1   B) Only MemWrite=1   C) Both=1   D) Both=0
B) Only MemWrite=1. sw writes to memory (MemWrite=1) but does NOT write back to the register file (RegWrite=0). There is no destination register — the data goes from rs2 to memory at address rs1+imm.
▼ Show Answer
Q: The machine code 0x00940A63 is a beq instruction (B-type). The immediate bits decoded give imm=20 (decimal). First instruction at 0x0000_000C. What is PCTarget?
A) 0x20   B) 0x2C   C) 0x14   D) 0x1C
✅ PCTarget = PC + ImmExt. If beq is at 0x0000_000C and imm=20: PCTarget = 0xC + 20 = 0xC + 0x14 = 0x20. But note: B-type immediates are already in bytes and represent the offset from the current PC. Always confirm the immediate decoding from the bit fields.
▼ Show Answer
Q: Which signal selects between the ALU result and memory read data to write back to the register file?
A) ALUSrc   B) MemWrite   C) ResultSrc   D) RegWrite
C) ResultSrc. ResultSrc[1:0] controls the mux feeding WD3 (write data) of the register file: 00=ALUResult (R-type, addi), 01=ReadData (lw), 10=PCPlus4 (jal). RegWrite=1 is also needed to actually enable the write.
▼ Show Answer
🎯 Remember for the B-type immediate (beq)

The B-type immediate is sign-extended and already in byte addresses. The instruction encodes bits [12:1] of the offset (bit 0 is always 0 since instructions are 4-byte aligned). So imm[12:1] from bits = actual byte offset, which is then added to PC.

From machine code 0x00940A63: op=1100011 (beq), extract imm fields → imm = {inst[31], inst[7], inst[30:25], inst[11:8], 0} = decode carefully!

Scroll to Top