CompSci 141 / CSE 141 / Informatics 101 Spring 2013, Project #2

Due date and time: Monday, April 29, 11:59pm

Introduction

Following from our recent discussion of static type systems and generic programming in Java during lecture, this project asks you to write a few generic classes in Java that interact with one another. In addition to the straightforward style of generics exhibited by the Stack example in lecture, this assignment will include concepts such as bounded type parameters, comparability, iteration, and the "foreach" loop, making for an interesting tour of some of the depth in Java's type system.

The task itself will have a somewhat algorithmic bent, though I'll provide you with details of how to implement the algorithms — I don't expect you to have built something like this before, but I do expect that your prior coursework will have prepared you to do so, given a description.

Background information

There are a few relatively brief articles you should read to give you more background on the Java features you'll be using in this assignment. If you attended the lecture on this topic, you've seen much of this — but certainly not all of it — already. If you missed the lecture, this reading will be a reasonable substitute.

The Generics lesson from "The Java Tutorial", which is a complete tutorial on the subject of Java generics. It would be wise to go through all of this material, even though some of it goes a bit beyond the scope of what we talked about in lecture.
A more thorough lesson on generics from Gilad Bracha. You don't have to read this, but it will provide you with some deeper insights if you're so inclined.
The Java Generics FAQ from Angelika Langer is a comprehensive set of information about generics in a question-and-answer format. Much of this is well beyond the scope of our needs in this course, but if you have a question about some of the inner workings of generics, you'll very likely find the answer to it here.

Additionally, be sure to have a look at my commented code example, a generic, iterable Stack class in Java, much of which we went through in lecture. Part of this project will require the use of a stack. You may use this provided stack class as-is if you wish, or adapt it if you'd like it to behave differently.

The program

For this project, you will be required to build a generic binary search tree implementation of the classical map data structure. Recall that, abstractly, a map is a set of associations, each of which contains a unique key and a value (not necessarily unique) that is associated with that key. Maps typically provide common operations such as adding a new key/value pair, looking up the value associated with a key, and removing a key/value pair given its key, among other possibilities.

In addition to implementing the basic concept of a map, binary search trees also add the ability to keep the keys sorted automatically, so, for example, you can iterate them in ascending order. In order to keep the keys sorted, the keys must be comparable, in the sense that it must be possible to compare two keys and decide which is smaller; this property also allows the tree to be constructed in a way that it can be efficiently searched.

Your binary search tree will need to be iterable, meaning that its associations can be accessed sequentially using an iterator. You are required to implement this iteration using the standard Java techniques that are demonstrated in the generic Stack example.

In addition to the generic binary search tree class, you will need to build a short program that demonstrates its functionality. Be sure that you've instantiated at least two different kinds of binary search trees (i.e., binary search trees with different actual type parameters) and demonstrated each of the methods in your binary search tree class, along with iteration both with and without the use of the "foreach" loop. For at least one of the instantiations, I'd like you to define and use a class of your own design that implements the Comparable<E> interface, which is discussed later in the write-up; the class can otherwise encapsulate whatever abstraction you'd like.

Recalling binary search trees

Recall, again, that a binary search tree is one way of implementing a map. It is a binary tree with one association (key/value pair) stored in each node, with the following rules used to enforce an ordering on the keys within the tree:

For all nodes n containing a key k:
- All of the keys in the left subtree of n will have keys less than k.
- All of the keys in the right subtree of n will have keys greater than k.

With these rules in place, fast searching for keys (O(log n) time in a tree with n nodes) is possible (assuming that the tree is relatively balanced) by following one "path" from the root of the tree downward toward the key in question.

Three fundamental binary search tree algorithms

The three most fundamental algorithms that you'll need to implement in a binary search tree are adding a new key/value pair, removing a key/value pair given its key, and looking up the value associated with a key. They are summarized in pseudocode below, which you're free to adapt into Java and use in your solution. Each of these algorithms is implemented recursively, though iterative implementations are also possible, and either is fine for this assignment.

    // Add a new association with key k and value v to a subtree whose
    // root is n, returning the resulting subtree.
    Node add(Node n, Key k, Value v):
        if n == null:
            return new Node(k, v);
        else if k < n.key:
            n.left = add(n.left, k, v);
            return n;
        else // k > n.key
            n.right = add(n.right, k, v);
            return n;


    // Lookup the value associated with the key k in the subtree whose
    // root is n.
    Value lookup(Node n, Key k):
        if n == null:
            return an error (e.g., throw an exception);
        else if k == n.key:
            return n.value;
        else if k < n.key:
            return lookup(n.left, k);
        else // k > n.key
            return lookup(n.right, k);


    // Remove the key/value pair with key k from the subtree whose root
    // is n, returning the resulting subtree.  Depends on two helper
    // functions:
    //    * findMin, which returns the node with the minimum key
    //      in a subtree
    //    * removeMin, which removes the node with the minimum key
    //      in a subtree
    Node remove(Node n, Key k):
        if n == null:
            return an error (e.g., throw an exception);
        else if k == n.key:
            if n.left == null && n.right == null:
                return null;
            else if n.left != null && n.right != null:
                Node min = findMin(n.right);
                n.key = min.key;
                n.value = min.value;
                removeMin(n.right);
                return n;
            else if n.left != null:
                return n.left;
            else // n.right != null
                return n.right;
        else if k < n.key:
            n.left = remove(n.left, k);
            return n;
        else // k > n.key
            n.right = remove(n.right, k);
            return n;

Bounded type parameters

The data-ordering rules of binary search trees — as well as the sketched algorithms above for implementing them — make clear a constraint on the keys: it must be possible to compare them to one another for the purposes of determining, for all pairs of keys x and y, which of the following relationships holds between them: x = y, x < y, or x > y. Furthermore, this relationship must always be the same for any pair of keys, so that searches for a key will proceed along the same path as that followed during its insertion. Types for which these kinds of comparisons must not be eligible to be used as keys, because there would be no way to determine their appropriate order.

Generic classes, as we've discussed them, allow type parameters to be specified, for the purposes of making the class generic (allowing it to, say, store any kind of data) without resorting to the use of Object references (which are not type-safe, because there's no way to tell at compile time what type of objects the references will actually point to). For example, the provided Stack<E> class is an improvement over a non-generic Stack class, because it allows you to specify in your program — so that it can be known at compile time — what types of objects you intend to store in each stack you instantiate. Armed with this knowledge, the compiler can report an error when a program attempts, for example, to push a Student object into a Stack<String>, taking type errors that would otherwise be run-time errors and turning them into compile-time errors instead, which is the primary advantage of programming in a statically-typed language like Java.

The problem with generic classes that have unrestricted type parameters is that they can be, in a sense, too generic; in a particular instantiation of Stack<E>, E can be any kind of object. This is fine for stacks, since there's no reason why any kind of object couldn't be stacked. However, this causes a serious problem when you're talking about a class like BinarySearchTree<K, V>, where there needs to be a restriction placed on the types of the keys, namely that only "comparable" types can be used as keys.

One way to solve this problem is to check the keys at run time; any time a key of a non-comparable type is added to a binary search tree, an exception can be thrown. Of course, this solution negates part of the benefit of using generic classes in the first place. We want type errors to be caught at compile time, rather than run time, if they can be; inserting a key of a non-comparable type is a type error! It is important, then, that we empower the compiler with enough information to be able to report compile-time errors when an attempt is made to instantiate BinarySearchTree with a type of key that is not comparable.

This issue was not lost on the designers of generic classes in Java. The solution provided in Java is the use of bounded type parameters. Bounded type parameters are type parameters that are restricted. In particular, they are restricted to types that extend from some class or implement some interface. In general, class Foo<E extends F>, means:

The class Foo takes one type parameter, E.
If F is a class, Foo must be instantiated with either the class F or a class that extends F. So, for example, Foo<F> would be a legal instantiation, as would Foo<G>, if G extends F.
If F is an interface, Foo must be instantiated with either the interface F or a class that implements F.

(Curiously, the keyword "extends" is used in this context whether F is a class or an interface.)

In the Java library, an interface called Comparable<E> exists that embodies the notion of comparability, as we've talked about it. It contains one method called compareTo that is intended to be used to compare pairs of objects to see which is smaller than the other (or if they're equal). You should read more about it in the Java 7 API documentation if you're unfamiliar with it, as I'll expect you to be able to use it appropriately in this project.

Comparable<E>, as you might imagine when you see that it takes a type parameter, is generic. Its type parameter, E, refers to the type of object that something can be compared to. For example, String implements Comparable<String>, which means that Strings can be compared to other Strings. The technical reason that the type parameter is included is this: its compareTo method takes an object of some type as a parameter, comparing "this" object to the object passed as a parameter. If the Comparable interface was not generic, this method would be forced to take a parameter of some non-specific type such as Object, and then cast its parameter before making its comparison. By making the interface generic, it becomes possible to say that the objects of some class can be compared, but that they can only be compared to objects of a certain type. Most commonly, a class X that is intended to be comparable will implement Comparable<X>, to establish the fact that X's are comparable only to other X's.

Since implementing the Comparable<E> interface is the standard way in Java to specify that the objects of a class are comparable to one another, it makes sense to restrict the keys in a generic binary search tree class to be of some class that implements the Comparable<E> interface. On the other hand, there is no restriction necessary on the values; they can be any kind of object. So, the appropriate generic name for a binary search tree class, and the one I'll expect you to use in this assignment, is BinarySearchTree<K extends Comparable<K>, V>. In other words, a BinarySearchTree takes two parameters: a key type K (where keys must be comparable to other objects of that same type K), and a value type V, which is unconstrained.

Design of the various classes

I would like you to break up the implementation of your binary search tree into the following classes. This is not necessarily an exhaustive list of the classes you'll need, but will give you a good idea of the design that I'm asking you to pursue. The emphases here are type safety and support of the standard Java iteration feature.

BinarySearchTree<K extends Comparable<K>, V> implements Iterable<Association<K, V>>. This class encapsulates a binary search tree with keys of type K and values of type V. Note that K is a bounded type parameter, as discussed in the previous section. On the other hand, V is not bounded, since any type of object can be used as a value. For our purposes, we'll say that iterating a BinarySearchTree should return key/value pairs, so that either the keys or the values can be iterated. These are implemented as Associations, which are described next; this is why the class implements Iterable<Association<K, V>>, rather than Iterable<K> or Iterable<V>. (The next section of the write-up provides more details about the Iterable interface, which is part of the Java library.)
Association<K, V>. This class encapsulates one association (a key/value pair). This may seem like an implementation detail, one that should be hidden as a private member of the BinarySearchTree class, but the class should actually be defined separately from, and outside of, BinarySearchTree. The reason is that an Iterator over a BinarySearchTree should return Associations, not keys or values by themselves. So, code outside of the BinarySearchTree class needs to be aware of the existence of Associations. As a protection, Associations should be immutable, meaning that you may not modify what keys/values are stored in them. This is vitally important for keys, since BinarySearchTree has the sole responsibility of ensuring that all keys in the tree are unique and that they're stored in an ordered fashion that will allow for fast lookups. It's philosophically important for values, too, since we're entrusting BinarySearchTree with the task of managing relationships between keys and values.
- Note that Associations are not nodes. The BinarySearchTree should consist of a collection of nodes, with each node storing an Association. You may well want to include the node class within the BinarySearchTree class and make it private.
BSTIterator implements Iterator<Association<K, V>>. This class should be defined privately within the BinarySearchTree class, similarly to the StackIterator implemented in the generic Stack code example. It should iterate the nodes of the tree in ascending order of the keys. You are not required to support the remove( ) operation, but you will need to implement the remove( ) method. However, the Java library allows an iterator to forego support for the remove( ) method by having it throw UnsupportedOperationException whenever it's called. This is perfectly okay for this assignment, though you may implement the remove( ) method if you wish.
Stack<E>. This class will be needed in your implementation of BSTIterator. You may use the Stack<E> class that I provided, or you may write your own. More information about how to use the stack in your iterator implementation follows in the next section of the write-up.

In addition to iteration (and presumably some kind of node class), your BinarySearchTree class must support the following operations:

Adding a new association, given a key and a value. Remember that keys must be unique in a binary search tree. You may handle this problem however you wish (e.g., by throwing an exception or replacing the old value with the new one). This method should take a key and a value as a parameter, rather than an Association. In other words, you shouldn't require a user of this class to create a new Association object in order to call this method.
Removing an association, given a key. (You are not required to support removal through an iterator, but I do want you to have an operative remove method in the BinarySearchTree class.)
Look up a value, given a key. This should run in O(log n) time, assuming that the tree is relatively balanced.
Look up a key, given a value. This will require a full tree traversal, which might easily be implemented by using an iterator. (More details about iterators follow in the next section.)
Clear a tree out, removing all of the associations.
Get the number of associations in the tree.

The algorithms for doing adds, removes, and lookups are summarized in the section titled Recalling binary search trees above.

Implementing an iterator over a binary search tree

Iterator<E> and Iterable<E> in Java

An iterator is an object that allows access to all of the elements in some collection (e.g., the objects in a linked list, the associations in a binary search tree, or even the lines of text in a text file) without exposing details of how the collection is implemented. This is a powerful abstraction, since it not only insulates code outside of the collection from changes in the implementation of that collection, but it also allows code to iterate over a collection without even knowing what kind of collection it's iterating, allowing it to work in a variety of contexts instead of being limited to just one. For instance, you could write a method printAll( ) that could print all of the elements in any kind of collection, simply by having the method take an iterator as a parameter, rather than a collection. In this way, not only can classes be generic, but so can algorithms.

The standard Java iterator, defined in the Iterator<E> interface in the Java library, supports three methods.

Method signature	Brief description
boolean hasNext()	returns true if there are more elements not yet returned by this iterator
E next()	returns the next element in the underlying collection
void remove()	removes the last element returned by next() from the underlying collection

(Be sure to look at the Java 7 API documentation for a more thorough description of this and other relevant parts of the Java library.)

We say that objects of a class are iterable if an iterator can be created to iterate over their contents. For example, we might like a linked list to be iterable, meaning that an iterator can be created and used to iterate over each of the elements in the list. In Java, the standard way of specifying that objects of a class are iterable is to have the class implement the Iterable<E> interface. If you were to build a LinkedList<E> class, you might have it implement the Iterable<E> interface. The Iterable<E> interface supports one method.

Method signature	Brief description
Iterator<E> iterator()	returns a new Iterator<E> that can be used to iterate over the elements of a collection

The "E" in all of these places links together the type of elements in the linked list with the type of elements that will be returned when you iterate through the list. A LinkedList<String> implements Iterable<String>, which means that you can ask the list to create an Iterator<String> that can be used to iterate over its elements. Notice that String is the type parameter in all of these cases, since you ought to get back Strings when you access the elements of a LinkedList of Strings.

Your binary search tree class is required to support iteration in the standard way that Java collections support it, meaning that it should be iterable and the iterator should, in this case, return associations (key/value pairs) in ascending order of the keys stored in the tree.

Iterable and Iterator enables the "foreach" loop

Java 5, released a decade or so ago, introduced not only generics, but also a loop that is commonly called the "foreach" or "enhanced for" loop. It abstracts the typical usage pattern of creating an iterator and iterating through the elements of a collection, wrapping it into a simple piece of syntax with the repetitive details hidden. For example, given an ArrayList<String> called a, the following code prints out the elements of that ArrayList in the order they're stored:

    for (String s : a)
        System.out.println(s);

This is a form of what is called syntactic sugar. Syntactic sugar is something that makes a language easier to use without introducing any new capabilities. In this case, the "foreach" loop above could have been written like this instead:

    Iterator<String> i = a.iterator();
    
    while (i.hasNext())
    {
        String s = i.next();
        System.out.println(s);
    }

In fact, a Java compiler will effectively take the "foreach" loop shown here and turn it into the lengthier code using the Iterator. This means that you won't need to write the lengthier code in the vast majority of cases, where the only thing you need the iterator for is to get the value of each element and do something with it. (In some cases, such as cases where you might need to call remove() on the iterator, or when you need to access the elements in an order other than the order that the iterator will return them, you'll need to write more specific code.)

Hooking your own class into the "foreach" loop turns out to be relatively easy in Java: have your class implement the Iterable interface and implement an iterator for it. When you're done with your BinarySearchTree, it will be possible to use the "foreach" loop to iterate over its associations.

How iterators and traversals are different

Recall that an inorder traversal of a binary search tree visits all of the keys in ascending order. An inorder traversal is typically written as a recursive algorithm, roughly like this:

    inorder(Tree T):
        if T has a left subtree:
            inorder(left subtree of T)
        
        visit the root of T
        
        if T has a right subtree:
            inorder(right subtree of T)

The "visit" step in the algorithm is abstract; it can be used for different things in different contexts. It might involve printing a key/value pair to the screen, saving a key/value pair to a file, checking to see if a key/value pair meets some search criteria, or whatever.

This algorithm visits all of the keys in ascending order, by definition of a binary search tree: for all subtrees S of the tree T (including T itself), all of the keys in the left subtree of S are guaranteed to be less than the key in S's root, while all of the keys in the right subtree of S are guaranteed to be greater than the key in S's root. This recursive algorithm visits all of the keys in the left subtree before visiting the root, and it visits the root before visiting the keys in the right subtree. So the keys are visited in ascending order.

The simplest way to implement this kind of traversal is to write a recursive method that does the same thing that the algorithm above does, returning when the entire traversal is complete. However, in Java, this kind of implementation is incompatible with the notion of an iterator, which is intended to perform a traversal one step at a time. In other words, an iterator will traverse to the "next" key in ascending order every time its next( ) method is called, then pause until the next( ) method is called again. When the next( ) method is called again, it will pick up where it left off, traversing to the "next" key in ascending order again, then pause again until the subsequent call to next( ). The key to implementing an iterator is to be able to "pause" the iteration, allowing it to pick up where it left off next time you ask for the "next" key.

Why Java makes this kind of implementation difficult is that there's no straightforward way to "pause" a method and have it pick up where it left off. (This is not to say that it can't be done in Java; for example, threading would allow it. But threading is a heavyweight solution to what should, in this case, be a lightweight problem. We'll see examples from other languages later this quarter in which implementing iterators becomes no more difficult than a straightforward traversal.)

One solution to this problem is to perform an entire traversal in the iterator's constructor, saving the key/value pairs into an ArrayList or other "flat" data structure in ascending order of the keys. However, if the tree is large, this approach is neither memory- nor time-efficient. A better approach is to perform the traversal one step at a time, each time the next( ) method is called. This requires a bit of extra complexity in the implementation of the iterator, but this is a small price to pay for a large improvement in memory usage. You are required to implement your iterator without performing an entire traversal up-front; instead, you'll need to run the traversal step by step, moving to the next key only when next( ) is called.

Running the traversal step by step in the iterator

In place of a recursive algorithm that runs the entire traversal to completion before returning, you'll instead need to implement your iterator so that it performs the traversal step by step, doing a little bit of work each time next( ) is called, then saving its state so it can pick up where it left off in the subsequent call to next( ). In order to implement your iterator this way, you'll first need to decide what state needs to be saved between calls to next( ).

At any given time throughout a traversal implemented using the recursive algorithm, the run-time stack is essentially keeping track of two things:

The current node on the top of the run-time stack, its parent below it, its parent's parent below it, and so on.
Along with each node, an indication of whether the left and right subtrees of the node have already been traversed.

Since you will not be implementing a recursive traversal that runs to completion, you will need to store this information yourself between calls to next( ). This is the reason why I suggested writing a Stack<E> class (or using the one that I provided). You'll need it to store this state information. (Note that the goal of using a Stack is not to push all of the binary search tree's elements into it, then pop them one at a time.)

With this in mind, the rough approach for implementing the iterator goes something like this. (I've purposefully left some of the details unspecified.)

In the iterator's constructor, begin by pushing the root node on to the stack, along with an indication that neither of its subtrees have been traversed yet. If the tree is empty, leave the stack empty as well.
The hasNext( ) method returns false if and only if the stack is empty.
In the next( ) method, the top of the stack is interpreted to be the "current node." You'll do different things here, depending on whether the current node's left and/or right subtrees have been traversed already. After each call to next( ), the current node should be the next node in ascending order of keys whose key/value pair has not yet been returned. As you advance your traversal down the tree, push nodes on to the stack; as your traversal moves up the tree, pop nodes from the stack. At all times, the stack should contain, from the top of the stack to the bottom, the entire path from the current node to the parent. Each node should be accompanied in the stack by an indication of whether its left and/or right subtrees have already been traversed. (Note that these indications should be stored in the stack, not in the tree nodes themselves. This is because multiple iterators may be iterating over a tree simultaneously.)

For us, "unchecked" and "raw type" warnings are errors

The main purpose of this assignment is to give you practice with generic classes in Java. If you're using generics properly, you will have no type errors and will need no typecasts in your code. For this reason, when we grade this assignment, "unchecked" or "raw type" warnings will be considered errors. These indicate a problem with your use of generics (usually caused by leaving the type parameter off of a declaration somewhere) and, thus, need to be fixed. So you'll want to be sure that your compiler is configured to give you these warnings.

Be sure that your program compiles this way with no errors and no warnings before you submit it.

Limitations

You may not use a pre-existing binary search tree implementation (e.g., java.util.TreeMap) for this assignment. I'd like you build your binary search tree class from scratch. As stated earlier, you may use the provided generic Stack<E> class as part of your implementation of the binary search tree iterator, though you may not use the pre-existing java.util.Stack from the Java library.

Deliverables

You need to submit all of your Java source files (.java), including any that were provided to you. Do not submit compiled versions of your program, or other files generated by your development environment.

Follow this link for a discussion of how to submit your project. Remember that we do not accept project submissions via email under any circumstances.