Dijkstra's Algorithm Parallel Implementation

Parallel Processing and High Performance Computing, Siena University, Fall 2024

Project Description

Dijkstra's algorithm is one of the most fundamental algorithms in graphing and mapping. Given a weighted and connected graph, Dijkstra's algorithm finds the shortest path between two nodes. With its efficiency, the algorithm is used for large-scale graphs and has many real-world applications such as IP routing and building road networks. This study focuses on parallelizing the algorithm through the priority queue using Java Threads and analyzing its performance. By introducing threading to Dijkstra's Algorithm, this study looks to improve the speedup and runtime of the program while also finding the shortest path. This study looks at different ways to parallelize the algorithm and key challenges to it, such as thread synchronization and overhead. For testing the algorithm, we used various graph of different sizes from the METAL project.

Edsger W. Dijkstra
Edsger W. Dijkstra, 1930-2002

Methods and Implementation


Each thread has its own program

Using a serial version of Dijkstra's Algorithm as a baseline, we made modifications to parallelize the program using Java threads. One way we considered parallelizing the program was having each thread compute its own instance of the program independently using random start and ending points. However, we decided not to use this method because it introduced challenges. For example, if one thread chooses two points that do not connect, the program would stall leading to inefficiency.

Spanning Tree Approach

While we could have used a spanning tree rather than the shortest points, we decided not to take this approach as it had several disadvantages. For example, Threads would need to share updates on the shortest path. This constant synchronization would lead to overhead when accessing the priority queue. Our goal is to have an efficient algorithm that improves speedup and runtimes, and this would cause us to have a limited speedup on the algorithm.

Priority Queue

Another method we considered was just parallelizing the priority queue, a key component of the algorithm. By parallelizing the priority queue, each thread would get its own access to the queue and add items to compute the shortest path. We chose this method because we felt it's the most effective way to parallelize the program as you can update the distance to a vertex’s neighbors concurrently. While we found this approach to be more efficient, there were disadvantages such as the fact that it does not compute the absolute shortest path, but rather gives an approximation of the shortest path.

The worker thread class in our program is responsible for parallelizing the priority queue. Each thread repeatedly calls poll() to process the next edge from the queue. When an edge is processed, the thread checks if the start vertex has been visited, if it has then it is skipped. If it has not been visited, it is added to the result map which stores the shortest path from the start vertex to each destination. The threads are synchronized to make sure only one thread updates the map at a time. The threads then add all of the neighbors of the starting vertex to the priority queue if they haven’t been visited already, this is also synchronized. Then the threads repeat the process until it finds the end vertex or all of the vertices have been visited.

Advantages and Disadvantages

Implementing Dijkstra's algorithm in C

Initially, we were going to code our program in C. After some time working on the serial version in C, we decided to pivot to Java threads as Java has more better features for multithreading.If users have different operating systems, it is more challenging to program with Pthreads, as Pthreads are platform dependent because you need a UNIX like system.

Features of C
Figure 1: Features of C (Source: Unstop)

Advantages of Java Threads

Java has synchronized blocks and locking/unlocking that makes it easier to share the priority queue. Java threads are platform independent which makes it easier for us to work. Multithreading gives us more control in parallelization compared to message passing. Java does not use any garbage pointers, which reduces the chances of memory management errors. Java has a big library for multithreading which has thread safe data structures such as the ConcurrentLinkedQueue which was used in our program. The ConcurrentLinkedQueue is a thread safe queue with built-in methods for queues such as peek(), poll(), and pop().

Comparing Java and C
Figure 2: Comparing Java and C (Source: Unstop)

Timings and Performance

Graphs Details

Name |V| |E| Start End
RPI 73 85 NY2@CR139 NY378/US9
London 2269 3425 A414/A1019/A1169 B2036/B2037
Wolf Creek 128 128 CO149@SanJuanDr CO17Ant@CRD.5
Flagstaff 349 413 US160@BIA16 AZ89@ESCWAY
NY 6774 8077 NY18@OldLakeRd NY27@End
United States 186250 221643 US82Bus_E/US271Bus_S MA10@EarSt

Download graphs:
RPI
London
Wolf Creek
Flagstaff
NY
USA

Timings (MS) and Path Length (MI)

Threads RPI London Wolf Creek Flagstaff NY USA
MS Mi MS Mi MS Mi MS Mi MS Mi MS Mi
1 100 9.59 305 68.92 111 184.64 181 282.82 229 582.23 4826 1853.50
2 94 9.59 311 64.92 116 184.64 162 282.82 212 582.23 4808 1857.72
4 92 9.59 312 64.92 1114 184.64 181 233.84 193 593.86 5002 1871.49
8 93 9.59 286 64.92 157 184.64 187 239.22 204 580.36 6859 1846.38
16 98 11.57 303 64.92 559 184.64 192 238.82 205 595.92 6006 1859.99
32 99 9.59 307 61.14 513 184.64 172 272.29 354 591.99 5891 1910.97
64 108 11.27 302 88.76 1837 184.64 198 272.29 238 636.88 14213 1880.99
128 107 6.61 318 64.92 486 184.64 232 272.29 290 587.94 65829 1973.67

Findings

We noticed there is not a big speedup with multiple threads. This is because the threads don’t have a lot of work to do before locking and unlocking which causes overhead. A lower number of threads is better, specifically two, four, or eight threads, because we get too much overhead with a large number of threads. This is shown in the NY graph. The shortest path is also approximated when using more than one threads. The best number of threads, whether it is two, four, or eight, depends on the graph. We get a different answer in the parallel version with each run, as the shortest path found is close to the serial version, which is always the correct answer. With parallelizing the program, the sacrifice we make for speedup is an approximation of the shortest path. However, sometimes the parallel version gives us the correct shortest path. Having four threads for this map is the most efficient number to solve the problem, although it can be slower than the one thread version.