Cuthill-McKee Algorithm

2024-01-20

The Cuthill-McKee algorithm is an algorithm that permutes the node index of the nodes in a graph. This algorithm is generally used when:

You are creating a sparse matrix, in this case it allows you to reduce the memory needed to save the whole matrix.
You are performing operations through nodes, in such way that you also need the value of the neighbours of the node you are operating with.
- In this case, you are actually reducing the cache miss rate. When you are trying to get a data from the memory, the CPU actually takes a page of data from the memory to the CPU caches, If there are multiple data that you need in the same page (cache hit), you are saving the time to get the rest of data from the RAM memory.

The algorithm is fairly simple, but many information in the internet isn’t intuitive, and there are even implementations that give wrong results (I know this because I used them as reference when implementing).

I can promise you that my implementation is correct. The repository contains the code in Fortran. ~~Yes, I would preferred to have it written in Rust, or at least in C++, but it was a work for a Fortran program.~~

Implementation
Validation

Implementation

Before getting into the details of the algorithm, we have to first define some concepts used later:

Degree of a node: Number of nodes connected to that node (or the number of edges formed by that node). See the figure below for reference.
Queue: A list of node that contains a range of nodes, candidates of the next index.

node-degree — Example graph, where numbers in red are the degree of the node.

The scheme of the Cuthill-McKee algorithm is fairly simple:

Calculate the degree of all nodes
Choose a starting node, which should have the smallest degree. There could be multiple nodes with such value, thus we have to choose a convention that can be choosing the minimum original index or maximum original index.
Assign the node with a new index —which is 1—, and add the nodes connected to that node to a list of queue.
Assign the next index to the node of the smallest degree in the queue, add the connected nodes that have never been added to queue into a new queue (different from the current queue), and remove the node from the current queue.
Repeat the previous step until the current queue is empty. Move the contents of the new queue to the current queue and repeat until both queue are empty (sorting completed).¹

Using the example of figure, the procedure is:

Choose one of the nodes of $[A,D,F]$ as starting node, as they have the smallest degree with a value of 2. $A$ will be chosen as example for the following steps, but the reader can verify that choosing any of them will provide a similar ordering.
We add $[B,C]$ , nodes connected to $A$ , to a queue.
We choose the next node with the smallest degree from the current queue, which are $[B,C]$ . Again, similar ordering produces regardless of the node chosen. We will be choosing $B$ here, adding the nodes $[D,E]$ to the new queue (the node $C$ have already been visited, therefore we will ignore it).
Repeat the previous process, we now find the next index is $C$ . Adding the node $F$ that have not yet visited to the new queue, resulting in $[D,E,F]$ . As the current queue is now empty, we move the contents of the new queue to current queue.
No more nodes would be added to the new queue as all nodes have been visited. Finalizing the sorting we obtain the following ordering: $[A,B,C,D,F,E]$ $[A, B, C, D, F, E]$
- Note that this is just one of the many possibilities, depending on the node chosen when having multiple nodes with the smallest degree to choose from.

In order to obtain Reversed Cuthill-McKee, we only have to reverse the index of the ordering we obtained. This is accomplished by subtracting the index from the total number of nodes:

RCM_i=n-CM_i\quad/i=1,2,3\dots,n\quad;\quad n=\text{number of nodes}

The reversed form is more commonly used as it can generally reduce the bandwidth when Gaussian elimination is applied.

Validation

Some testing were done with the sorting:

Mesh	Edges	Nodes	Original	RCM	Random
Mesh 1 (32 threads)	$12\,785\,324$	$4\,313\,549$	$8\,767~\mathrm{s}$	$8\,353~\mathrm{s}$	$23\,611~\mathrm{s}$
Mesh 2 (8 threads)	$288\,160$	$115\,648$	$200~\mathrm{s}$	$199~\mathrm{s}$	$225~\mathrm{s}$

Time taken for various test cases with 1000 iterations.

From the results we observe that the RCM reordering does not have a significant impact to the original ordering, perhaps already sorted by the meshing software: RCM only have an impact of $\pm5\%$ to the original ordering. In some other test cases, it even made the iteration speed slower. In addition, the node search algorithm also have a significant impact in performance.

Cuthill-McKee Algorithm

Contents

Implementation

Validation

Footnotes