Dijkstra's Algorithm Parallel Implementation

Project Description

Dijkstra's algorithm is one of the most fundamental algorithms in graphing and mapping. Given a weighted and connected graph, Dijkstra's algorithm finds the shortest path between two nodes. With its efficiency, the algorithm is used for large-scale graphs and has many real-world applications such as IP routing and building road networks. This study focuses on parallelizing the algorithm through the priority queue using Java Threads and analyzing its performance. By introducing threading to Dijkstra's Algorithm, this study looks to improve the speedup and runtime of the program while also finding the shortest path. This study looks at different ways to parallelize the algorithm and key challenges to it, such as thread synchronization and overhead. For testing the algorithm, we used various graph of different sizes from the METAL project.

Methods and Implementation

Each thread has its own program

Using a serial version of Dijkstra's Algorithm as a baseline, we made modifications to parallelize the program using Java threads. One way we considered parallelizing the program was having each thread compute its own instance of the program independently using random start and ending points. However, we decided not to use this method because it introduced challenges. For example, if one thread chooses two points that do not connect, the program would stall leading to inefficiency.

Spanning Tree Approach

While we could have used a spanning tree rather than the shortest points, we decided not to take this approach as it had several disadvantages. For example, Threads would need to share updates on the shortest path. This constant synchronization would lead to overhead when accessing the priority queue. Our goal is to have an efficient algorithm that improves speedup and runtimes, and this would cause us to have a limited speedup on the algorithm.

Priority Queue

Another method we considered was just parallelizing the priority queue, a key component of the algorithm. By parallelizing the priority queue, each thread would get its own access to the queue and add items to compute the shortest path. We chose this method because we felt it's the most effective way to parallelize the program as you can update the distance to a vertex’s neighbors concurrently. While we found this approach to be more efficient, there were disadvantages such as the fact that it does not compute the absolute shortest path, but rather gives an approximation of the shortest path.

The worker thread class in our program is responsible for parallelizing the priority queue. Each thread repeatedly calls poll() to process the next edge from the queue. When an edge is processed, the thread checks if the start vertex has been visited, if it has then it is skipped. If it has not been visited, it is added to the result map which stores the shortest path from the start vertex to each destination. The threads are synchronized to make sure only one thread updates the map at a time. The threads then add all of the neighbors of the starting vertex to the priority queue if they haven’t been visited already, this is also synchronized. Then the threads repeat the process until it finds the end vertex or all of the vertices have been visited.

Advantages and Disadvantages

Implementing Dijkstra's algorithm in C

Initially, we were going to code our program in C. After some time working on the serial version in C, we decided to pivot to Java threads as Java has more better features for multithreading.If users have different operating systems, it is more challenging to program with Pthreads, as Pthreads are platform dependent because you need a UNIX like system.

Figure 1: Features of C (Source: Unstop)

Advantages of Java Threads

Java has synchronized blocks and locking/unlocking that makes it easier to share the priority queue. Java threads are platform independent which makes it easier for us to work. Multithreading gives us more control in parallelization compared to message passing. Java does not use any garbage pointers, which reduces the chances of memory management errors. Java has a big library for multithreading which has thread safe data structures such as the ConcurrentLinkedQueue which was used in our program. The ConcurrentLinkedQueue is a thread safe queue with built-in methods for queues such as peek(), poll(), and pop().

Figure 2: Comparing Java and C (Source: Unstop)

Timings and Performance

Graphs Details

Name	\|V\|	\|E\|	Start	End
RPI	73	85	NY2@CR139	NY378/US9
London	2269	3425	A414/A1019/A1169	B2036/B2037
Wolf Creek	128	128	CO149@SanJuanDr	CO17Ant@CRD.5
Flagstaff	349	413	US160@BIA16	AZ89@ESCWAY
NY	6774	8077	NY18@OldLakeRd	NY27@End
United States	186250	221643	US82Bus_E/US271Bus_S	MA10@EarSt

Download graphs:
RPI
London
Wolf Creek
Flagstaff
NY
USA

Timings (MS) and Path Length (MI)

Threads	RPI		London		Wolf Creek		Flagstaff		NY		USA
Threads	MS	Mi	MS	Mi	MS	Mi	MS	Mi	MS	Mi	MS	Mi
1	100	9.59	305	68.92	111	184.64	181	282.82	229	582.23	4826	1853.50
2	94	9.59	311	64.92	116	184.64	162	282.82	212	582.23	4808	1857.72
4	92	9.59	312	64.92	1114	184.64	181	233.84	193	593.86	5002	1871.49
8	93	9.59	286	64.92	157	184.64	187	239.22	204	580.36	6859	1846.38
16	98	11.57	303	64.92	559	184.64	192	238.82	205	595.92	6006	1859.99
32	99	9.59	307	61.14	513	184.64	172	272.29	354	591.99	5891	1910.97
64	108	11.27	302	88.76	1837	184.64	198	272.29	238	636.88	14213	1880.99
128	107	6.61	318	64.92	486	184.64	232	272.29	290	587.94	65829	1973.67

Findings

We noticed there is not a big speedup with multiple threads. This is because the threads don’t have a lot of work to do before locking and unlocking which causes overhead. A lower number of threads is better, specifically two, four, or eight threads, because we get too much overhead with a large number of threads. This is shown in the NY graph. The shortest path is also approximated when using more than one threads. The best number of threads, whether it is two, four, or eight, depends on the graph. We get a different answer in the parallel version with each run, as the shortest path found is close to the serial version, which is always the correct answer. With parallelizing the program, the sacrifice we make for speedup is an approximation of the shortest path. However, sometimes the parallel version gives us the correct shortest path. Having four threads for this map is the most efficient number to solve the problem, although it can be slower than the one thread version.