EECS 311: Trees

Terminology

There's a lot of terminology associated with trees. You should be familiar with the following basic terms:

There are also many kinds of trees, including:

Implementing Binary Trees

Linear or Array Representation

This method is easy to understand and implement. It's very useful for certain kinds of tree applications, such as heaps, and fairly useless for others. It's typically used on dense binary trees.

The idea is simple:

Three simple formulae allow you to go from the index of the parent to the index of its children and vice versa:

The advantage of the linear representation is this easy traversal up and down, and efficient use of space if the tree is complete. The disadvantage is inefficient use of space if the tree is sparse.

Linked Representation

Again, the idea is simple. A node in the tree has

The most important thing to remember about the linked representation is this:

A tree is represented by the pointer to the root node, not a node.

The empty tree is simply the NULL pointer, not an empty node.

Traversal and Recursion

Traversing a binary tree in any of the three orders, even in a linked representation without parent links, is trivial with recursion.

For example, here's the pseudo-code for inorder traversal:

inorderTraversal(bt, fn):
 
if (bt != NULL)
  inorderTraversal(bt->leftChild)
  fn(bt->info)
  inorderTraversal(bt->rightChild)

Here,

It should be obvious how to code the other two traversals.

Binary Search Trees

Binary search trees have the property that

What's the Point?

Consider the combinatorial complexity of the data structures we've seen so far. We give best and worse case Big O values:

  Add data Retrieve data
     
Array (unordered) constant (just add to end) O(N) (linear search)
Array (ordered) O(log2N) compares, O(N) data transfers (shift data) O(log2N) (binary search)
Linked list (unordered) constant (assuming end pointer) O(N) (linear search)
Linked list (ordered) O(log2N) compares, constant data transfers (change a few pointers), O(N) nodes visited sequentially O(log2N) (binary search), visit O(N) nodes
Binary search tree (best case) O(log2N) compares, constant data transfers, O(log2N) nodes visited O(log2N)
Binary search tree (worst case) O(N) compares, constant data transfers, O(N) nodes visited O(N)

On the average, then, a binary search tree will be

Binary search tree algorithms for adding and retrieving are also very simple.

In the worst case, however, a binary search tree will be as bad as a linked list. Many of the variations of binary search trees that we'll see will be attempts to get the best of both worlds: fast access and fast storage, albeit using more complex algorithms.

Adding Data

It's easy to add new data and "grow" the tree:

addData(&bt, x):
 
if (bt == NULL)
  bt = new BtNode
  bt->info = x
  bt->left = bt->right = NULL
else
if (x < bt->info)
  addData(bt->left, x)
else 
if (x > bt->info)
  addData(bt->right, x)
else // already in tree

Adding a new node takes O(log2n) steps, where n is the number of nodes in the tree. This is best and average case behavior. If the tree is very unbalanced, e.g., we passed it a sorted list of numbers, then adding a new node will take O(n2) steps.

Deleting Data

Deleting data is complicated when the data being removed is not in a leaf node. We can't just delete the node, because then our tree would "fall apart." We have to promote one of the children to become the new parent. The child has to be

There are at most two possible candidates:

It doesn't matter which one we pick. If neither subtree exists, we have a leaf node, which can be just deleted and it's pointer removed from the parent node.

The following algorithm deletes a node in a binary search tree:

  1. if there's a left subtree, use the rightmost child of the left subtree
  2. otherwise, if there's a right subtree, use the leftmost child of the right subtree
  3. otherwise, this is a leaf node, just delete it from its parent

Note that

Consider removing TALBOT from the tree in Figure 7.18. If we pick SELIGER, then we have to move SEFTON to where SELIGER was.

Heaps

A heap is a binary tree (not a binary search tree) with the following properties:

Full and dense means that only the bottom level of the tree has gaps, and the gaps are all on the right.

Being dense means the linear array representation is the most appropriate. A linked representation would waste space.

What's the Point?

A heap is a great data structure for implementing a priority queue, because

It also turns out we can use the routines for adding and removing items to create an N log2N sort algorithm called heap sort.

Adding Data

The algorithm for adding data to a heap is simple, with one unintuitve aspect: we add an item to the bottom first then move it upwards, if necessary:

Walking up the tree will take at most log2N steps (compares and data transfers), because the tree is always balanced, so the height of the tree is log2N.

Deleting Data

Both uses of heaps (priority queues and heap sort) only need to delete the root, so that's the only case we'll consider here. Deleting is somewhat similar to adding in that we'll replaced the deleted root with another item from the tree and bubble it down. The algorithm is made only slightly more complex because we have two children to worry about instead of just one parent.

  1. Remove the root.
  2. Replace it with the rightmost item in the bottom level of the tree.

    The new item is almost certainly in the wrong place because it's one of the smaller data elements, but this keeps the heap full and dense.

  3. If the new root has no children bigger than it is, we're done.
  4. If it has just a left child that is bigger, swap it with the child and go back to step 2 with the left subtree.
  5. If it has two children that are bigger, swap it with the larger child and go back to step 2 with the affected subtree.

Note that

Walking down the tree will take at most log2N steps (compares and data transfers). We may have to do three comparisons (parent with each child and child against child) but 3 log2N is still O(log2N).

Implementing Heaps

As mentioned above, heaps are best implemented using a linear array. The move up and down the tree are simple jumps around the array and the array is mostly filled. The only downside is that data has to be moved around, unless, of course, what we really have is an array of pointers to data, which is the best way to go with non-numeric data.

Heap Sort

There are two versions of heap sort that I know about. Both

The versions differ on how the first phase is done.

Version 1 of Heap Sort

Phase 1
Phase 2

The first phase takes no more than O(N log2N) swaps and compares, because each walk down takes log2N steps (N starting small and growing to the number of elements) and it's called N times. In fact, it's actually better than O(N log N), it's O(N), although proving this requires some mathematics.

Similarly, the second phase takes O(N log2N) swaps and compares for the same reason. Ths is both best and worst case behavior. In the best case, you have a heap when you start and no swaps are needed for the first phase, but they become necessary in the second phase.

Version 2 of Heap Sort

Phase 1

The first phase takes O(N log2N) swaps and compares, because each walkUp takes log2N steps and it's called N times.

Version 2 is simpler, especially if you've already implemented walking up and down. Version 1, however, only needs walking down, and Phase 1 of Version 1 is faster (O(N) instead of O(N log N)).


Comments? comment icon Send mail to c-riesbeck@northwestern.edu.

Valid HTML 4.01 Strict