simulating fetch-decode-execute of a simple cpu model

October 03, 2021

I wanted to play with simulating what real computer hardware does, after reading the theory in CS:APP. In pursuit of that, I have built a simple machine that, given a quite limited set of instructions, can operate on its internal memory.

Take note, that this is, compared to the way a modern CPU operates, a gross simplification. Modern CPUs don’t even operate on a single ALU, but on multiple execution units. This implementation is based on a simplified version of the Von Neumann architecture, that has been developed in 1945.


The computer I am simulating is going to have a tiny amount of memory, and I’m gonna start with 256 bytes.

# this is the entire memory of the computer. Think of it as RAM
# the layout is as follows
# byte 0x00: return value of the program
# bytes 0x01-0x07 other data - we can use them to store any arbitrary data
# bytes 0x08-0xff the instructions for the program
memory = bytearray(256)

memory layout

The memory layout is going to be simple - since the memory is simulating RAM, it is a contiguous addressable array of cells, and I am putting aside first eight for general values and the latter part for storing the program itself. Also first address, 0x00 is a special place where the output of the entire program is going to be stored.

instruction set encoding

For simplification of the instruction encoding, I am going to encode each of them as a full byte - which is pretty wasteful, but will pay off in the ease of encoding. Each parameter passed to them is also going to be a single byte.

For example, an instruction to lad data from byte 0x01 to register 1 will look like this:

load r1 0x01

So in hexadecimal notation it will be 0x010101, or with spaces for readability, 0x 01 01 01. First byte being the instruction, second the register index, and third the location of data to load from. So far so good.


My machine will have three registers, to keep it really Simple. Although I am basing the general ideas on a C implementation of LC-3, I am narrowing the scope (as you can see from the memory size, and amount of registers, for example).

# create three registers. PC, R1, and R2
registers = bytearray(3)
# set PC to point at the first instruction of the program
registers[0] = 8

PC will serve as a program counter - a pointer to where the current byte of instruction or parameter is stored at, and two general purpose registers - R1 and R2. Notice that I declare them as elements of a bytearray, so the maximum range of available data in each (just as in the memory array cells) is 0x00 to 0xFF (0 to 255).


I am going to use a Python dictionary with keys that describe the instruction and integers that map to opcodes. For the sake of double-sided lookup I am going to also reverse the dictionary so I can use the dict[key] syntax both ways. Remember this isn’t production technique, this is wild west of programming for fun. A large dictionary in memory inverted like that is pointless. But here all bets are off. Here goes:

# mapping of instructions to their hex opcodes
instructions = {
    'load': 1,      # load r1 addr
    'store': 2,     # store r2 addr
    'add': 3,       # add r1 r2 (Set r1 = r1 + r2)
    'sub': 4,       # sub r1 r2 (Set r1 = r1 - r2)
    'addi': 5,      # add immediate
    'subi': 6,      # sub immediate
    'jump': 7,      # move PC to instruction no. x
    'beqz': 8,      # branch if equal to zero
    'halt': 255,

inverted_instructions = {v: k for k, v in instructions.items()}

writing programs

Well then now, we have a small set of instructions, let’s write a program! My first groundbreaking piece of code is going to take over the world by loading two variables into registers, adding them, storing it in the output register (see section on memory layout earlier) and halting. Bear in mind that there is an assumption that there is data in those memory cells! Let’s make sure it indeed is there.A

# store data
memory[1] = 2
memory[2] = 2

And here goes the program:

program = """
load    r1 0x01
load    r2 0x02
add     r1 r2  
store   r1 0x00

Now, our vm has absolutely no faculties whatsoever to run. There is no main loop that simulates the CPU clock, or any instruction interpretation functionality. Golly, there exists not a way to parse the program from this text above into the actual opcodes! And this is where I am going to start.

parsing the program

The program I want to run is but a string of text. I need to make sure that I can do a few things. Firstly, turn this utf-8 string into an array of bytes representing the opcodes and their operands. Then I need to load it into the memory. Only then I can dream of making my monster alive.

def parse_instruction(line):
    """parses a single line of instruction into opcodes"""
    line = line.split()
    instruction = line[0]

    # halt is the only one-octet instruction
    if instruction == 'halt':
        output = bytearray(1)
        output[0] = instructions[instruction]
        output = bytearray(3)
        output[0] = instructions[instruction]
        output[1] = parse_operand(line[1])
        output[2] = parse_operand(line[2])

    return output

The first function, parse_instruction() can turn a single line into a bytearray of integers. Notice the yet undefined parse_operand(), and notice the minor redundancy in branching. Remember that this is wild west of writing for fun? Refactor later.

Let’s define parse_operand():

def parse_operand(operand):
    if operand.startswith('r'):
        return int(list(operand)[1])
        return int(operand, 16)

And finally, we can parse the whole program into a bytearray that we will be able to load into our memory!

def parse_program_into_opcode_array(program):
    # maximum length of our program in the VM is:
    # whole RAM: 256 bytes
    # first byte is the output value
    # next 8 bytes are for data
    # thus we can make only an array of 247 bytes
    bytecode_array = bytearray(247)
    current_byte = 0
    for instruction in program.strip().splitlines():
        parsed_instruction = parse_instruction(instruction)
        for byte in parsed_instruction:
            bytecode_array[current_byte] = byte
            current_byte += 1
    return bytecode_array

And load it.

def load_memory_with_program(program_bytecode_array, memory):
    # program starts at 0x08
    index = 8
    for byte in program_bytecode_array:
        memory[index] = byte
        index += 1

main loop

Since we are simulating a fetch-decode-execute loop, our program is just a loop that goes through cases of what kind of opcodes are there, with what operands, and moves the program counter along. Please notice how all operations increase the program counter (registers[0]) by 3 to move along. Here I have implemented only the instructions that are in my sample program above. If I will return to simulating this, I intend to expand the functionality of that little VM. Anyway, here goes:

debug = True

while True:
    # get value of program counter
    pc = registers[0]
    # find out what operation to perform
    op = inverted_instructions[memory[pc]]

    match op:
        case "load":
            target_register = memory[pc + 1]
            data_location = memory[pc + 2]
            data = memory[data_location]
            registers[target_register] = data
            if debug:
                print(f"loading into r{target_register}, value of {data}")
            registers[0] += 3
        case "store":
            register_to_take_value_from = memory[pc + 1]
            output_location = memory[pc + 2]
            memory[output_location] = registers[register_to_take_value_from]
            if debug:
                    f"storing data from r{register_to_take_value_from}, into {output_location}")
            registers[0] += 3
        case "add":
            register_a = memory[pc + 1]
            register_b = memory[pc + 2]
            registers[register_a] = \
                registers[register_a] + \
            if debug:
                    f"adding value from r{register_a} to value from r{register_b}")
            registers[0] += 3
        case "halt":
            print(f"halt encountered at {pc}")
    cycle += 1


There’s a lot to expand on. Not mentioning technicalities of refactoring or performance tweaks, it would be great to do things like implement some memory protection, visualise the memory (a 16x16 grid?), implement a step-by-step debugger, etc. Time and will permitting.

Written by Daniel Kaczmarczyk, a software engineer and educator. you can find me on twitter or email me at

a pale blue and yellow circle