Imagine that you have a big log of financial transactions and you want to extract the largest 5 transactions that happened. We might have a few ways to approach this.
1. Sort them? Duh.
If we have 3 million transactions, let’s say, on a span of month. We will have to do the following:
- Store all the 3 million transactions in a data structure (array?)
- Sort the array
- Return the last 5 elements (if it’s a nondescending ordering)
But we have a fundamental challenge with this approach:
- Too many elements: we will use space equivalent to the number of elements or more, depending on the sorting algorithm
2. Compare each of the 3 million transactions with an array of 5 largest transactions seen so far.
- You might want to consider being more empathetic towards the computer resources if you see that this is a good approach.
- The time complexity of this approach would be O(N*M) where N is the number of elements, M is the amount of transactions we want to extract. It’s cumbersome if M is large.
3. Priority queues
We can process the transactions in chunks. This way we wouldn’t need to store all the 3 million transactions. We can create a collection of size 5, keep adding transactions to this collection, then when we reach its full capacity of 5, we remove the minimum. If we keep doing this, we can guarantee that we end up with that collection of size 5 with the largest 5.
Priority queue is a data type that has two fundamental operations: insert and remove the maximum.
So, what is a priority queue?
As we mentioned in the last paragraph in the previous section, a priority queue is a data type that can make our lives easier if we want to, for example, find a number of maximum or minimum values in a stream of data. Let’s see the applications.
Applications
There are situations where using a priority queue fits just perfectly. Here are some:
- Huffman coding
- Heapsort
- CPU Job Scheduling
- Graph algorithms (Dijkstra’s algorithm)
- …and many more…
But how is a priority queue formed?
In the next part, we will take a look at the implementation of priority queues.
References
- Algorithms, 4th ed.
- Wikipedia articles on Huffman Coding, Dijkstra’s algorithm, Priority Queue