Priority Queues part III: Implementation and Algorithms

We learned in the last post about possible ways of implementing a priority queue data type. In this part, we’re going to talk about representing a heap internally. We create a private array pq[] of length N+1. Our array will be 1-indexed.

Implementation and algorithms

Heapifying

As we have seen in the previous parts, for a max-oriented heap to be valid, a condition must hold. That is, the root node, which is the element at index k, is bigger than or equal to both its children, at 2k and 2k+1.

There are cases when this property is temporarily violated and should be corrected:

  • When a new node is inserted at the end (of the pq[] array) or if a value of the key becomes larger than its parent. (Parent must be larger)
  • When the root (max, in case of a max-heap) is removed.
  • When a node’s value becomes smaller than one of its children (the node is no more the max of the sub-heap)

Bottom-up reheapifying (swim)

If a violation happens and a node’s key becomes larger than its parent, we can fix this by exchanging the node with its parent.

If the node is still larger, which is the same case as the image above, we can fix it the exact same way we did: we exchange it again with its parent.

We keep going up the heap until we find that the key is not larger than the one in the parent or if we reached the root then we stop the swimming.

private void swim(int k) {
  while (k > 1 && less(k/2, k)) {
    exch(k/2, k);
    k = k/2;
  }
}

Top-down reheapifying (sink)

This is the second violation that we might have: a node is smaller than one or both of its children. We can do a slightly different approach than the previous violation fix, but similar in concept: we will exchange that node with the larger of its children. If there’s still a violation at the child, we can keep doing the same thing: pushing the node down the heap by exchanging with the larger of children, until we either reach a case where both children are smaller so the node would be indeed the max or we hit the bottom of the heap (end of the pq[]).

Let’s take a look at the implementation:

private void sink(int k) {
  while (2*k <= N) {
    int j = 2*k;
    if (j < N && less(j, j+1)) j++;
    if (!less(k, j)) break;
    exch(k, j);
    k = j;
  }
}

Complete implementation of a Heap priority queue

public class MaxPQ<Key extends Comparable<Key>> {
  private Key[] pq; // heap-ordered complete binary tree
  private int N = 0; // in pq[1..N] with pq[0] unused
  public MaxPQ(int maxN) { 
    pq = (Key[]) new Comparable[maxN+1]; 
  }
  public boolean isEmpty() { return N == 0; }
  public int size() { return N; }
  public void insert(Key v) {
    pq[++N] = v;
    swim(N);
  }
  public Key delMax() {
    Key max = pq[1]; // Retrieve max key from top.
    exch(1, N--); // Exchange with last item.
    pq[N+1] = null; // Avoid loitering.
    sink(1); // Restore heap property.
    return max;
  }

  private boolean less(int i, int j)
  private void exch(int i, int j)
  private void swim(int k)
  private void sink(int k)
}

Code is courtesy of Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne. Chapter 2, page 318.

References

Priority Queues Part II: Heap Implementation in Java

Brief

In the previous part, we talked about the theory and motivation for priority queues. In this part, we dive a little bit into the implementation. We will see possible implementations of a priority queue, especially heap implementation, and in the next part of the series we will have a look at the Java implementation.

Implementation

A priority queue is a data type that supports these operations:

  • insert: insert a new key into the queue
  • deleteMinimum (or maximum, we’ll get to that): pop and return the minimum (maximum) key from the queue.
  • isEmpty: is the collection empty?

These 3 operations would form our main API for the data type. We can go about implementing such a data type in number of ways and we’ll have different implementations of insert and removing max operations depending on how we will structure our priority queue:

  1. Array-based (unordered): our “lazy” approach
    1. insert:similar to stack’s push, we insert an element at the end of the unordered array, increase the pointer to the end of the array, N. This gives us an O(1) insert.
    2. removing maxwe linearly search for the max, as in selection sort, and when we find the max, we exchange it with the element at the end of the array and we decrement the pointer to the end of the array (we fake-remove it.) We now have O(n) deleteMaximum
  2. Array-based (ordered):
    1. insert:Like in insertion sort, find the correct index to insert the key, shift the remaining elements of the array. We have now an O(n)insertion.
    2. removing maxthe array is already sorted; we remove the last element by decreasing the pointer to the array and returning they key. We get O(1) deleteMaximum.

All elementary implementations leaves us in a position where we have a trade-off between the performances of insert and deleteMaximum operations. But we can get to a third implementation that could guarantee much better performance than O(n) on both operations.


Priority Queues Heap Implementation

Heap implementation of a priority queue gives us a fast insert and a fast deleteMaximum. Let’s dig deeper into priority queues heap implementation. A heap is a data structure in which keys are stored such that each key has value that is bigger than or equal to other two keys at other specific positions.

A heap is a balanced heap-ordered binary tree

If we thought of a heap as a binary tree, each root contains a key that is bigger than the keys in its children. Equivalently, the key in each node is smaller than or equal to the key in that node’s parent (if it has). As we move up from any node, we get a nondecreasing sequence of keys; as we move down, we get a nonincreasing sequence of keys. We can move up and down the tree (or heap) in logarithmic time.

The height of a heap of size N is log N

Heap representation

In the previous paragraphs, we learned that it’s most convenient to represent a heap as an array. Since a heap is a complete binary tree, we have a good opportunity to use a compact array representation as we don’t need explicit links.

We can see that each node at position has its parent at position k/2. Similarly, its children are at positions 2k and 2k+1.

This allows us to traverse the heap as we move up from a[k]by continuously going to the nodes parent, k/2. Or move down to 2k and 2k+1

Next part

In the next part, we will take a look at the algorithms that are performed on the heaps and we will be solving a coding challenge that we will try to use priority queues in.

References

  • Algorithms, 4th ed.
  • Wikipedia articles on Huffman Coding, Dijkstra’s algorithm, Priority Queue.

Priority Queues Part I: What Are They?

Imagine that you have a big log of financial transactions and you want to extract the largest 5 transactions that happened. We might have a few ways to approach this.

1. Sort them? Duh.

If we have 3 million transactions, let’s say, on a span of month. We will have to do the following:

  • Store all the 3 million transactions in a data structure (array?)
  • Sort the array
  • Return the last 5 elements (if it’s a nondescending ordering)

But we have a fundamental challenge with this approach:

  • Too many elements: we will use space equivalent to the number of elements or more, depending on the sorting algorithm

2. Compare each of the 3 million transactions with an array of 5 largest transactions seen so far.

  • You might want to consider being more empathetic towards the computer resources if you see that this is a good approach.
  • The time complexity of this approach would be O(N*M) where N is the number of elements, M is the amount of transactions we want to extract. It’s cumbersome if M is large.

3. Priority queues

We can process the transactions in chunks. This way we wouldn’t need to store all the 3 million transactions. We can create a collection of size 5, keep adding transactions to this collection, then when we reach its full capacity of 5, we remove the minimum. If we keep doing this, we can guarantee that we end up with that collection of size 5 with the largest 5.

Priority queue is a data type that has two fundamental operations: insert and remove the maximum.

So, what is a priority queue?

As we mentioned in the last paragraph in the previous section, a priority queue is a data type that can make our lives easier if we want to, for example, find a number of maximum or minimum values in a stream of data. Let’s see the applications.

Applications

There are situations where using a priority queue fits just perfectly. Here are some:

But how is a priority queue formed?

In the next part, we will take a look at the implementation of priority queues.

References

  • Algorithms, 4th ed.
  • Wikipedia articles on Huffman Coding, Dijkstra’s algorithm, Priority Queue