Cuthill-McKee Algorithm


The Cuthill-McKee algorithm is an algorithm that permutes the node index of the nodes in a graph. This algorithm is generally used when:

  • You are creating a sparse matrix, in this case it allows you to reduce the memory needed to save the whole matrix.
  • You are performing operations through nodes, in such way that you also need the value of the neighbours of the node you are operating with.
    • In this case, you are actually reducing the cache miss rate. When you are trying to get a data from the memory, the CPU actually takes a page of data from the memory to the CPU caches, If there are multiple data that you need in the same page (cache hit), you are saving the time to get the rest of data from the RAM memory.
result
Result of the Cuthill-Mckee algorithm ordering used in this post is shown in the right of the figure in red. The graph of the reversed result is obtained by flipping the graph vertically (line with negative slope).

The algorithm is fairly simple, but many information in the internet isn’t intuitive, and there are even implementations that give wrong results (I know this because I used them as reference when implementing).

I can promise you that my implementation is correct. The repository contains the code in Fortran. Yes, I would preferred to have it written in Rust, or at least in C++, but it was a work for a Fortran program.

Contents

Implementation

Before getting into the details of the algorithm, we have to first define some concepts used later:

  • Degree of a node: Number of nodes connected to that node (or the number of edges formed by that node). See the figure below for reference.
  • Queue: A list of node that contains a range of nodes, candidates of the next index.
node-degree
Example graph, where numbers in red are the degree of the node.

The scheme of the Cuthill-McKee algorithm is fairly simple:

  1. Calculate the degree of all nodes
  2. Choose a starting node, which should have the smallest degree. There could be multiple nodes with such value, thus we have to choose a convention that can be choosing the minimum original index or maximum original index.
  3. Assign the node with a new index —which is 1—, and add the nodes connected to that node to a list of queue.
  4. Assign the next index to the node of the smallest degree in the queue, add the connected nodes that have never been added to queue into a new queue (different from the current queue), and remove the node from the current queue.
  5. Repeat the previous step until the current queue is empty. Move the contents of the new queue to the current queue and repeat until both queue are empty (sorting completed).1

Using the example of figure, the procedure is:

  1. Choose one of the nodes of [A,D,F][A,D,F] as starting node, as they have the smallest degree with a value of 2. AA will be chosen as example for the following steps, but the reader can verify that choosing any of them will provide a similar ordering.
  2. We add [B,C][B,C], nodes connected to AA, to a queue.
  3. We choose the next node with the smallest degree from the current queue, which are [B,C][B,C]. Again, similar ordering produces regardless of the node chosen. We will be choosing BB here, adding the nodes [D,E][D,E] to the new queue (the node CC have already been visited, therefore we will ignore it).
  4. Repeat the previous process, we now find the next index is CC. Adding the node FF that have not yet visited to the new queue, resulting in [D,E,F][D,E,F]. As the current queue is now empty, we move the contents of the new queue to current queue.
  5. No more nodes would be added to the new queue as all nodes have been visited. Finalizing the sorting we obtain the following ordering: [A,B,C,D,F,E][A,B,C,D,F,E]
    • Note that this is just one of the many possibilities, depending on the node chosen when having multiple nodes with the smallest degree to choose from.

In order to obtain Reversed Cuthill-McKee, we only have to reverse the index of the ordering we obtained. This is accomplished by subtracting the index from the total number of nodes:

RCMi=nCMi/i=1,2,3,n;n=number of nodesRCM_i=n-CM_i\quad/i=1,2,3\dots,n\quad;\quad n=\text{number of nodes}

The reversed form is more commonly used as it can generally reduce the bandwidth when Gaussian elimination is applied.

Validation

Some testing were done with the sorting:

MeshEdgesNodesOriginalRCMRandom
Mesh 1 (32 threads)1278532412\,785\,32443135494\,313\,5498767 s8\,767~\mathrm{s}8353 s8\,353~\mathrm{s}23611 s23\,611~\mathrm{s}
Mesh 2 (8 threads)288160288\,160115648115\,648200 s200~\mathrm{s}199 s199~\mathrm{s}225 s225~\mathrm{s}
Time taken for various test cases with 1000 iterations.

From the results we observe that the RCM reordering does not have a significant impact to the original ordering, perhaps already sorted by the meshing software: RCM only have an impact of ±5%\pm5\% to the original ordering. In some other test cases, it even made the iteration speed slower. In addition, the node search algorithm also have a significant impact in performance.

Recommended further reading:


Footnotes

  1. If this happens before all nodes have been visited, this means that the graph is disconnected in somewhere.