detecting cycles in a directed graph

February 24, 2023

what will you learn

I’d like to present a simple example of finding a cycle in a directed graph. It’s a useful method for implementing topological sort, i.e. within dependency trees. If you have the ability to perform a depth - first search on a directed graph, you’re halfway there.

the graph

We’re gonna work with the graph below:

directed graph image

And its representation as an adjacency matrix is here:

g = {
    1: [2, 4],
    2: [5],
    3: [5, 6],
    4: [2],
    5: [4],
    6: [6],
}

detection - heuristic

As in the intro, I am assuming you’re familiar with how to perform a depth-first search. If not, the wikipedia entry on DFS is pretty good. Make sure to familiarize yourself with it if you’re not sure how to write a simple DFS.

Due to the recursive nature of the DFS implementation, detecting a cycle takes very little code. The crux of the idea behind how to detect a cycle though is that in order to declare a graph having a cycle, it needs to fulfill one predicate:

In order for a graph to be cyclic, a depth-first search of that graph needs to yield at least one back edge

That’s it. If a DFS finds a back edge, the graph has a cycle.

back edge?

A back edge in a graph is an edge that points to an already visited vertex. Let’s look at it in practice. If you were to DFS traverse the graph starting from node 1, your order of traversal would be:

1 -> 2 -> 5 -> 4

or

1 -> 4 -> 2 -> 5

This order of traversal assumes the basic property of your DFS algorithm - that is, a node once visited, does not get visited again. Look at the image of the graph again to verify my claim about the path:

directed graph image

Let’s analyze the first path. After reaching vertex 4, we terminate DFS - the only path that is left to take from 4 is to go back to 2. The edge between vertices 4 and 2 is the back edge. You can think of a back edge as an edge that would ‘take you back’ on a path that you’ve already taken.

Bear in mind though, that an edge cannot be a back edge without the context of DFS. There’s nothing special about edge from 4 to 2, apart from the fact that in the current context of our search path, at a point where we are visiting 4, we have already visited 2. Here’s a visual for you, with the back edge labelled in yellow with the exclamation mark. The green edges represent the path we’ve already taken. The proper nomenclature for them is ‘tree edges’, since they form the edges of the current DFS tree we’re building as we traverse.

directed graph cycle 1

one more example

What if we started our traversal at edge 3? In that case our journey is going to be pretty short. All that will happen is we will visit 3, and then promptly move to 6, just to discover it only points at itself and nowhere else. Since at the time of discovering the 6 -> 6 edge, we have already visited 6, the edge that points at 6 itself is a back edge. Here’s an image with the same labels as the previous one:

directed graph cycle 2

Bear in mind that there is another path starting from 3 that will result in a cycle. I will leave you to discover that path yourself.

detection - implementation

Let’s look at the code that implements that detection method. We’ll use executable pseudocode (python) to demonstrate that. Notice that in the call to the function I’m using graph g, as defined at the beginning of the post. Also worth noting its the same graph (duh) that you see in the drawings.

def dfs(graph, node, visited):

    visited.add(node)
    # printing the currently visited node. Will help you trace execution.
    print(node)

    for child in graph[node]:
        if child in visited:
            print(f"LOOP found. currently at {node}, looking at {child}")
        if child not in visited:
            dfs(graph, child, visited)

if __name__ == "__main__":
    dfs(g, 3, set())

Executing this code yields this output (SPOILER - you’ll find the other cycle I mentioned in the ‘one more example’ section):

Example A

3
5
4
2
LOOP found. currently at 2, looking at 5
6
LOOP found. currently at 6, looking at 6

Whereas running dfs(g, 1, set()) (starting from node 1), yields this:

Example B

1
2
5
4
LOOP found. currently at 4, looking at 2
LOOP found. currently at 1, looking at 4

When analysing those examples of the output you may be surprised how the algorithm jumps from vertex 2 to vertex 6 in example A, and from 4 to 1 in B. If that surprised you, make sure to take another stab at how DFS works wiki link again. The jumps are a quality of using a call stack to track most-recently visited nodes.

If you’re still unsure how this works, try implementing the iterative example from the pseudocode sections in the wikipedia article. There isn’t a way to inspect the call stack from within python, and so using an explicit array you may find the process easier to analyze.

wrapping up

Hope this helps. If you found this post and wanted to understand how to detect cycles in a directed graph and you didn’t, please email me or DM me on twitter (links below). I am happy to issue corrections to the explanation!


Written by Daniel Kaczmarczyk, a software engineer and educator. you can find me on twitter or email me at daniel.kacz@protonmail.com

a pale blue and yellow circle