Dijkstra's algorithm is one of the most fundamental algorithms in graphing and mapping. Given a weighted and connected graph, Dijkstra's algorithm finds the shortest path between two nodes. With its efficiency, the algorithm is used for large-scale graphs and has many real-world applications such as IP routing and building road networks. This study focuses on parallelizing the algorithm through the priority queue using Java Threads and analyzing its performance. By introducing threading to Dijkstra's Algorithm, this study looks to improve the speedup and runtime of the program while also finding the shortest path. This study looks at different ways to parallelize the algorithm and key challenges to it, such as thread synchronization and overhead. For testing the algorithm, we used various graph of different sizes from the METAL project.
Using a serial version of Dijkstra's Algorithm as a baseline, we made modifications to parallelize the program using Java threads. One way we considered parallelizing the program was having each thread compute its own instance of the program independently using random start and ending points. However, we decided not to use this method because it introduced challenges. For example, if one thread chooses two points that do not connect, the program would stall leading to inefficiency.
While we could have used a spanning tree rather than the shortest points, we decided not to take this approach as it had several disadvantages. For example, Threads would need to share updates on the shortest path. This constant synchronization would lead to overhead when accessing the priority queue. Our goal is to have an efficient algorithm that improves speedup and runtimes, and this would cause us to have a limited speedup on the algorithm.
Another method we considered was just parallelizing the priority queue, a key component of the algorithm. By parallelizing the priority queue, each thread would get its own access to the queue and add items to compute the shortest path. We chose this method because we felt it's the most effective way to parallelize the program as you can update the distance to a vertex’s neighbors concurrently. While we found this approach to be more efficient, there were disadvantages such as the fact that it does not compute the absolute shortest path, but rather gives an approximation of the shortest path.
The worker thread class in our program is responsible for parallelizing the priority queue. Each thread repeatedly calls poll() to process the next edge from the queue. When an edge is processed, the thread checks if the start vertex has been visited, if it has then it is skipped. If it has not been visited, it is added to the result map which stores the shortest path from the start vertex to each destination. The threads are synchronized to make sure only one thread updates the map at a time. The threads then add all of the neighbors of the starting vertex to the priority queue if they haven’t been visited already, this is also synchronized. Then the threads repeat the process until it finds the end vertex or all of the vertices have been visited.
Initially, we were going to code our program in C. After some time working on the serial version in C, we decided to pivot to Java threads as Java has more better features for multithreading.If users have different operating systems, it is more challenging to program with Pthreads, as Pthreads are platform dependent because you need a UNIX like system.
Java has synchronized blocks and locking/unlocking that makes it easier to share the priority queue. Java threads are platform independent which makes it easier for us to work. Multithreading gives us more control in parallelization compared to message passing. Java does not use any garbage pointers, which reduces the chances of memory management errors. Java has a big library for multithreading which has thread safe data structures such as the ConcurrentLinkedQueue which was used in our program. The ConcurrentLinkedQueue is a thread safe queue with built-in methods for queues such as peek(), poll(), and pop().
Name | |V| | |E| | Start | End |
---|---|---|---|---|
RPI | 73 | 85 | NY2@CR139 | NY378/US9 |
London | 2269 | 3425 | A414/A1019/A1169 | B2036/B2037 |
Wolf Creek | 128 | 128 | CO149@SanJuanDr | CO17Ant@CRD.5 |
Flagstaff | 349 | 413 | US160@BIA16 | AZ89@ESCWAY |
NY | 6774 | 8077 | NY18@OldLakeRd | NY27@End |
United States | 186250 | 221643 | US82Bus_E/US271Bus_S | MA10@EarSt |
Threads | RPI | London | Wolf Creek | Flagstaff | NY | USA | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
MS | Mi | MS | Mi | MS | Mi | MS | Mi | MS | Mi | MS | Mi | |
1 | 100 | 9.59 | 305 | 68.92 | 111 | 184.64 | 181 | 282.82 | 229 | 582.23 | 4826 | 1853.50 |
2 | 94 | 9.59 | 311 | 64.92 | 116 | 184.64 | 162 | 282.82 | 212 | 582.23 | 4808 | 1857.72 |
4 | 92 | 9.59 | 312 | 64.92 | 1114 | 184.64 | 181 | 233.84 | 193 | 593.86 | 5002 | 1871.49 |
8 | 93 | 9.59 | 286 | 64.92 | 157 | 184.64 | 187 | 239.22 | 204 | 580.36 | 6859 | 1846.38 |
16 | 98 | 11.57 | 303 | 64.92 | 559 | 184.64 | 192 | 238.82 | 205 | 595.92 | 6006 | 1859.99 |
32 | 99 | 9.59 | 307 | 61.14 | 513 | 184.64 | 172 | 272.29 | 354 | 591.99 | 5891 | 1910.97 |
64 | 108 | 11.27 | 302 | 88.76 | 1837 | 184.64 | 198 | 272.29 | 238 | 636.88 | 14213 | 1880.99 |
128 | 107 | 6.61 | 318 | 64.92 | 486 | 184.64 | 232 | 272.29 | 290 | 587.94 | 65829 | 1973.67 |
We noticed there is not a big speedup with multiple threads. This is because the threads don’t have a lot of work to do before locking and unlocking which causes overhead. A lower number of threads is better, specifically two, four, or eight threads, because we get too much overhead with a large number of threads. This is shown in the NY graph. The shortest path is also approximated when using more than one threads. The best number of threads, whether it is two, four, or eight, depends on the graph. We get a different answer in the parallel version with each run, as the shortest path found is close to the serial version, which is always the correct answer. With parallelizing the program, the sacrifice we make for speedup is an approximation of the shortest path. However, sometimes the parallel version gives us the correct shortest path. Having four threads for this map is the most efficient number to solve the problem, although it can be slower than the one thread version.