Treaps: How to Combine BSTs and Heaps for Highly Probable Balanced Trees We have seen that the shape (and depth, the important metric for doing BST operations) of a BST are depdendent on how that tree is built: in which order its values are added (and removed). For example, if we add the values in a strictly increasing or decreasing order, the result is a pathological tree (republican or democrat tree): one whose height is about equal to its size. There are other orders (zig-zagging) that also result in these pathological BSTs: e.g., insert in order, 10, 90, 20, 80, 30, 70, 40, 60, 50. We have also seen that after we finish building a BST (in time O(N Log2 N) to O(N^2) depending on its structure), we can do an in-order traversal to collect all its values, in order, in an array (in time O(N) regardless of its shape), and then use these values to build a BST that is as well balanced as possible (in time O(N)). Likewise, in an offline algorithm, if we have all the values to put in a tree before we need to build it, we can sort them (in O(N Log2 N)) and then use these values to build a BST that is as well balanced as possible (in time O(N), the same as above once we have the tree values in order). Of course, subsequent insertions and deletions can unbalance the BST. If we build a BST with random values (drawn from a uniform distribution, such as that used by the Math.random method), what do BSTs look like (what are their heights compared to their sizes)? I have written a program that generates the integers [0,N-1] in an array, randomizes the array order, and then builds a BST using the order of the resulting values (and does so repeatedly, collecting statistics). This program is named TreeHeights and is available via the "Sample Programs" link for the course. Here are the results of running this program BSTs of size 10 to 1,000,000, each experiment generating 1,000,000 BSTS (but for the BST of size 1,000,0000 I ran just 1,000 tests). Here Min/Avg/Max measure the height of the randomly built BSTs. Size; Heights: Min Possible Min Achieved Average Max Achieved Max Possible 10 2.5 3 5 9 9 100 6 8 12 23 99 1,000 9 16 21 34 999 10,000 12 24 30 44 9,999 100,000 16 34 40 55 99,999 1,000,000 19 44 49 57 999,999 Note that we can approximate the average tree height constructed as a factor of 2 to 2.5 times the height of the smallest BST; after 100, each time we increase the size of the BST by a factor of 10, the average height goes up by about 9, meaning the average height is logarithmically related to the size. If we are given a bunch of values to add into a BST, the best approach would be to add them in random order. But this requires an offline algorithm: first get all the data, then put them in a random order, then add them in that order to the BST. For an online algorithm, we must add each value as it arrives. If we are unlucky, we might end up with a tree whose height is much bigger than the heights of the random trees described above, all the way to something that is truely pathological. This leads us to the question of whether there is a way to make such pathological trees very improbable, or even impossible. And if so, how much extra work (what is the complexity class, and the constant in front of it) do we need to keep the tree reasonably well balanced. In this lecture we will find examine Treaps, a data structure that combines te properties of BSTs and Heaps. When we use this data structure, it creates trees (online) whose heights are very similar to the height of random trees, even if the order of the data would produce pathological BSTs. In the next lecture we will learn a variant of BSTs that guarantees that it creates well-balanced trees. Treaps: Order and Structure Properties A Treap is a tree that combines the order properties of a BST (primarily) and a Min-Heap (secondarily). Each Treap node has a value (the one added to the tree and the one that can be searched for) and a priority (automatically generated when a node is added to the Treap) in the range [0,1) which is used to make it improbable that the tree's structure will become pathological. Order: The values in all nodes must correspond to the BST order property; the priorities in all node much also correspond to the Min-Heap order property. Structure: None: just like BSTs. Any structure is possible, but pathological ones are extremely unlikely The generic Java declaration for a Treap (inside a class that uses a Treap to implement some data type: e,g., TreapSet) looks like private class TN { private E value; private double priority; private TN left,right,parent; private TN (E v, TN l, TN r, TN p) {value = v; priority = Math.random(); left = l; right = r; parent = p;} Here is an example of a Treap whose values are single letter strings and whose priorities are simplfied to be just two decimal digits M:.05 / \ F:.52 W:.15 / \ / \ A:.95 K:.60 P:.50 Z:.23 / / \ G:.65 O:.67 T:.74 / \ R:.80 V:.95 Notice that if you look just at the string values, the tree is a BST; if you look just at the ORDERING of the priorities, it is a Min-Heap (of course if you look at the tree's STRUCTURE it is NOT a Min_heap, because not every level is filled, and the bottom level is not filled from the lefsome level above the bottom are not filled and the bottom level is not filled from the left). Now we will see how Treaps are searched and built. Generally, we first use a BST algorithm, and then "fix up" the tree using a heap algorithm. Operation: Location of a value Just use the standard BST search algorithm (either iterative or recursive). Operation: Insertion of a value To insert a value into a Treap, first we first insert its node according only to the BST property of the Treap. We also generate a random number and store it permanently as that node's priority. Then we must ensure that the new node (a leaf) satisfies the Min-Heap property when compared to its parent in the Treap: if it doesn't, we must reestablish the Min-Heap property while not violating the BST property. We do this with combinations of two "rotation" operations, described below. Note that when we start the rotation process, the newly added node is a leaf (which is how the BST insertion algorithm works). Left Rotation: Suppose a node D is a right child of its parent B, and D has a lower priority than B (violating the Min-Heap property). We can perform a left rotation on B and D as shown below. The left tree is a BST, and nodes A, C, and E may be null or be parents of other nodes. This rotation keeps the BST property, while reestablishing the Min-Heap property that D's priority is less thatn B's. After this rotation we must also check whether B's new parent has a lower priority that B's, and possible perform another rotation. B D Note: the values A < B < C < D < E / \ / \ (with respect to the BST property) A D -> B E / \ / \ C E A C Right Rotation: Suppose a node B is a left child of its parent D, and B has a lower priority than D (violating the Min-Heap property). We can perform a left rotation on D and B as shown below. The left tree is a BST, and nodes A, C, and E may be null or be parents of other nodes. This rotation keeps the BST property, while reestablishing the Min-Heap property that B's priority is less thatn D's. After this rotation we must also check whether B's new parent has a lower priority that B's, and possible perform another rotation. D B Note: the values A < B < C < D < E / \ / \ (with respect to the BST property) B E -> A D / \ / \ A C C E We continue checking the priority of the insered node against its (new) parent, rotating until it is the root or its priority is greater than its parent's priority. Example: Suppose we start with the following Treap M:.05 / \ F:.52 W:.15 / \ / \ A:.95 K:.60 P:.50 Z:.23 / / \ G:.65 O:.67 T:.74 / \ R:.80 V:.95 and add the value S (with the random priority of .45), e.g., S:.45. The tree first becomes (after BST insertion but before rotations to restore the Min-Heap property). M:.05 / \ F:.52 W:.15 / \ / \ A:.95 K:.60 P:.50 Z:.23 / / \ G:.65 O:.67 T:.74 / \ R:.80 V:.95 \ S:.45 Now, generally the parent of the newly added node has a pretty high priority (after all, it is at the bottom of tree when interpreted as a Min-Heap), often higher than the newly added node, so we expect to do some rotations. S's priority is lower than its parent's, R, so we must rotate; since S is a right child of its parent R, we do a left rotation (where B is R and D is S in the rotation pattern), and get the following tree. M:.05 / \ F:.52 W:.15 / \ / \ A:.95 K:.60 P:.50 Z:.23 / / \ G:.65 O:.67 T:.74 / \ S:.45 V:.95 / R:.80 We now check S's priority against its parent's, T, and find again that we must rotate; since S is a left child of its parent T, we do a right rotation (where D is T and B is S in the rotation pattern), and get the following tree. M:.05 / \ F:.52 W:.15 / \ / \ A:.95 K:.60 P:.50 Z:.23 / / \ G:.65 O:.67 S:.45 / \ R:.80 T:.74 \ V:.95 Note when restoring the Min-Heap property in Treaps, a rotation is required. This is more complicated than just exchanging the values in the parent and child nodes, which was needed in real heaps. This is especially true higher in the tree, where there the subtrees of the parent and children can be large. We now check S's priority against its parent's P, and find again that we must rotate; since S is a right child of its parent P, we do a left rotation (where B is P and D is S in the rotation pattern), and get the following tree. M:.05 / \ F:.52 W:.15 / \ / \ A:.95 K:.60 S:.45 Z:.23 / / \ G:.65 P:.50 T:.74 / \ \ O:.67 R:.80 V:.95 We now check S's priority against its parent's W, and find that no more rotations are required. If the priority of S were < .05 it would have been rotated all the way to the root. Note that each rotation kept the BST order property, and got closer to satisfying the Min-Heap property, until it too was restored. Generally if the random number generated for the priority of an inserted values is high, it stays near the bottom of a tree; if it is low it is rotated towards the top of the tree. Operation: Deletion of any value Similarly to insert, we will first delete the value using the BST property, and then using rotations restore the Min-Heap property. To remove a value v from a Treap, first we first remove v's node according only to the BST property of the Treap. 1) Recall that we can easily remove v's node if it is a leaf or has one child (just make the parent refer to empty in the first case or its one child in the second). 2) Otherwise, v's node has two children, and we find the node whose value is closest to v (for simplicity, call it c and make it the node with the biggest value less than v; it could also be the node with the smallest value greater than v), take value c and the priority that is in its node and put both in v's node, then delete c's original node as described above: it will be a leaf or have only a left-child -because it is the biggest value it has no bigger value in its right child. Now we restore the Min-Heap order property of the tree, by using rotations (if necessary). We check the node now storing the value c and rotate it with its smallest child, if the smaller child's priority is < the priority stored with the value c. We might have to continue to do this until the node storing c is in the correct position (or it becomes a leaf, in which case there are no children priorities to compare against). Example: Suppose we start with the following Treap. Verify it satisifies the BST and Min-Heap order properties. M:.05 / \ F:.52 W:.15 / \ / \ A:.95 K:.60 S:.45 Z:.23 / / \ / G:.65 P:.50 T:.74 Y:.62 / \ \ O:.67 R:.80 V:.95 Now we remove the value W (with the priority of .15). We find that the biggest value less than W is V (rightmost node to the left of W). We remove V's node and put the value V and its priority .95 in the node that originally contained W, the value we removed. M:.05 / \ F:.52 V:.95 / \ / \ A:.95 K:.60 S:.45 Z:.23 / / \ / G:.65 P:.50 T:.74 Y:.62 / \ O:.67 R:.80 This is tree satisifies the BST order property, now we will use rotations to make is satisfy the Min-Heap order property. Because V's priority is bigger than the smallest priority of its children (.23 for Z, its right child) we perform a left rotation with the nodes storing V and Z. The result is the tree M:.05 / \ F:.52 Z:.23 / \ / A:.95 K:.60 V:.95 / / \ G:.65 S:.45 Y:.62 / \ P:.50 T:.74 / \ O:.67 R:.80 Because V's priority is bigger than the smallest priority of its children (.45 for S, its left child) we perform a right rotation with the nodes storing V and S. The result is the tree M:.05 / \ F:.52 Z:.23 / \ / A:.95 K:.60 S:.45 / / \ G:.65 P:.50 V:.95 / \ / \ O:.67 R:.80 T:.74 Y:.62 Because V's priority is bigger than the smallest priority of its children (.62 for Y, its right child) we perform a left rotation with the nodes storing V and Y. The result is the tree M:.05 / \ F:.52 Z:.23 / \ / A:.95 K:.60 S:.45 / / \ G:.65 P:.50 Y:.62 / \ / O:.67 R:.80 V:.95 / T:.74 Because V's priority is bigger than the smallest priority of its children (.74 for T, its left child) we perform a right rotation with the nodes storing V and T. The result is the tree M:.05 / \ F:.52 Z:.23 / \ / A:.95 K:.60 S:.45 / / \ G:.65 P:.50 Y:.62 / \ / O:.67 R:.80 T:.74 \ V:.95 Because V is now a leaf node, it has no children with a lower priority, so the rotations stop. Note that each rotation kept the BST order property, and got closer to satisfying the Min-Heap property, until it too was restored. Generally because the priority (above .95) of the value (above V) moved to the node of the deleted value (above W) is high (V comes from low in the tree) there are likely going to be many rotations to resatsify the Min-Heap property, bringing that node down towards the bottom of the tree. it stays near the bottom of a tree; if it is low it is rotated towards the top of the tree. We can traverse this tree, as we traversed BSTs. show how worst: in order, each priority bigger (1/2^n) probability Size; Heights: Min Possible Min Achieved Average Max Achieved Max Possible 10 2.5 3 5 9 9 100 6 8 12 23 99 1,000 9 16 21 34 999 10,000 12 24 30 44 9,999 100,000 16 34 40 55 99,999 1,000,000 19 44 49 57 999,999 BST Treap Ratio 10 .14 .24 .58 (time was quick, not so accurate) 100 1.5 2.7 .55 1,000 19.1 31.8 .60 10,000 274.3 415.3 .66 100,000 4796.9 7398.7 .65 ------------------------------------------------------------------------------ Merging: Finally, there is one operation common to priority queues that heaps do not do optimally: merging two priority queues into one. A simple way to do this for any heap implementation is just add every value from one heap to another: this would be an O(N Log2 N) operation (assuming each heap had N values). Of course O(N Log2 N) is good complexity class. Another way to merge is to put both heaps into an array and then use the offline technique above to build a heap from their contents. This is an O(N) operations: putting the two heaps in an array big enough to hold both is O(N) and then doing the offline heap construction algorithm is also O(N), so the resulting complexity is O(N). There are more advanced implementations of priority queues that merge more quickly (while still quickly doing inserts and remove-min operations): leftist, skew, binomial, etc. heaps. The key is to create order and structure properties that constrain the data enough (but not too much) to allow all the operations to work quickly.