Tree Lookups

Series

This is post 2 of the Age-Aware Data Structures series.

Age-aware data structures respect some notion of “recency” in your data. They can be useful for tracking logs or statistics: the runtime of many operations is independent of the total amount of data. This second of four parts describes why age-aware data structures are useful. And once the motivation is out of the way, we take the first step into the land of age-aware data structures.

Where array search is lacking

Our age-aware array search takes us from an \(O(\log n)\) lookup in a sorted array to an \(O(\log s)\) lookup. This is great, but an array isn’t really a great foundation for building more interesting age-aware data structures because it doesn’t support fast insertions and deletions.

To insert or remove an element from an array, one needs to move all the elements that come after it. So insertion or deletion would cost \(O(\log s)\) to find the correct location to insert or delete, and then \(O(s)\) to move the elements that come after that index. All in all, the insert and delete would be age-aware, but they would still take linear time.

The way we avoid linear-time insertions and deletions in a classical data structures class is to move from arrays to balanced trees. It turns out that the same ideas can be applied to age-aware data structures, with a twist.

Adapting a tree structure

An ordinary balanced tree is unsuited to our task, because it offers \(O(\log n)\) bounds on insert, delete, and find, instead of \(O(\log s)\) time. Where does the time go? In walking from the root of the tree down to the leaves.

If we want to save that time, we have to start at the leaves. In particular, we can start at the “leftmost” leaf, where more recent leaves are on the left. Then the distance along the tree from our starting point to any node is \(O(\log s)\), which gives us some hope of accomplishing useful operations like insertions and deletions in logarithmic time as well.

But how can we walk a tree from the leftmost node? To get to any other node, we will need to walk up a few edges, then walk down once we got high enough. In particular, the leftmost “spine” of the tree will have to have edges pointing up, whereas the rest of the tree will have edges pointing down.

Searching the tree

To search the tree, we will need to first walk up the correct number of steps. Then, once we’re high enough up in the tree, we’ll take a step down and to the right, and finally search this subtree like any other binary tree.

So we must answer a key question: how high up should we walk? The easiest solution is to store extra information in each node on this leftmost spine, giving the range of key values in the subtree below it. Since we’re always walking up this spine from left to right, it suffices to give the key in the rightmost child of each spine node. Then as we search, we will walk upwards until this rightmost child has a larger key than the one we are looking for, and descend into that subtree.

An implementation

An implementation needs a data structure to store tree nodes. Each tree node needs to store a key, a value, and possible left and right children. Spine nodes also have to store the key of their rightmost child, and to simplify things I’ll have each node store the rightmost child’s node. This will also be important when we go ahead and add insertion/deletion.

class TreeNode[K, V](
  val key: K,
  val value: V,
  val rightmost: K,
  val lft: Option[TreeNode[K, V]],
  val rgt: Option[TreeNode[K, V]])

Spine nodes will also be TreeNode objects, but they’ll be a bit strange in that their lft member will be their parent pointer.

An age-aware tree will have to store the leftmost node in the tree; that, too, will be an Option[TreeNode[K, V]], because there might be no nodes in the tree at all.

class AATree[K <% Ordered[K], V] {
  var leftmost: Option[TreeNode[K, V]] = None
  var size = 0

  def lookup(key: K): Option[V] = {
    // ...
  }
}

Our lookup will have two phases: in the first, we walk up the tree, and in the second, we walk down it. The first phase:

if (leftmost.isEmpty) return None

var node = leftmost
while (node.isDefined && (node.get.rightmost < key)) {
  node = node.get.lft
}

if (node.isEmpty) {
  return None
} else {
  node = node.get.rgt
}

Since nodes are stored in Option types, there are a lot of checks for emptiness. I really like Option types because they force me to carefully check whether the node I expect really exists. Note the isDefined check in the loop; this makes sure that if the key we want is larger than all keys in the tree, and we walk to the top of the tree and then try to take another step up, we exit the loop and return None.

The second phase is just like an ordinary binary search tree:

while (node.isDefined) {
  if (node.get.key == key) {
    return Some(node.get.value)
  } else if (key > node.get.key) {
    node = node.get.rgt
  } else {
    node = node.get.lft
  }
}

return None

There are no tests yet, because we don’t have a way of adding nodes to our tree. That’s for the next post.

Next Steps

The next step is to add insertion and deletion. These operations will have to keep the tree balanced, in order to give the runtime guarantees we want. But that won’t be so hard to do—in fact, we’ll end up using standard balancing algorithms, just modified for our weird left-spine-upward tree.

By Pavel Panchekha

27 February 2014