Survey of binary search data structures General principle: handle insert, delete, successor, predecessor in logarithmic time "balanced" tree: each path has length O(log n) each operation modifies items on a single path various mechanisms for forcing tree to stay balanced various additional advantages/disadvantages beyond e.g. treaps faster for certain types of operation storage space ... treap: described last time used in LEDA algorithm library advantage: same properties as random-insert model for any seq of ops data structure state does not depend on history average access 2 ln n, relatively good conceptually simple disadvantages: deepest path larger (by constant) than other balanced trees randomized skip list: each new item flips coin until getting heads makes array of pointers #flips long pointer[i] -> sorted singly linked list of items with >= i flips search(x): i = max(#flips) pos = first item with i flips while i >= 0: if x <= pos.array[i]: pos = pos.array[i] else: i = i - 1 expect O(log n) steps per search advantage: simple data structure state does not depend on history disadvantages: variable size nodes randomized a,b-tree: [CLRS 18] NOT binary all items stored at leaves (can also mod to store items internally) each internal node has a <= #children <= b, b >= 2a-1 (so search step at each node more complicated) insert: find place to add if too many children, split and zipper upwards delete: remove child, rebalance w/sibling if necessary # nodes touched by update = O(log_a n) (2,3)-tree: O(log n) per update advantage: none? B-tree: node size = memory page, O(log_B n) per update advantage: adapts to memory hierarchy red-black tree: [CLRS 13] binary search tree (items stored in both internal nodes and leaves) each node stores one bit: colored red or black root must be black any child of a red node must be black all paths root->null pointer have same # black nodes view as similar to 2-4 tree: group red nodes with their parents into single supernode => height <= 2 log_2 n insert: do normal tree insert, color new node x red (preserves all-path property, may mess up any child) while x and parent(x) both red: if parent(x) = root: recolor it black else if sibling(parent(x)) red: flip colors among parent,grandparent,sibling x = grandparent else if x-parent-grandparent zigzags: rotate x-parent x = old parent (now child of x, no zigzag) else: rotate parent->grandparent and swap colors delete: similar with more cases (read the book) advantages: little extra memory (one bit per node) each insert does at most two rotations (useful for persistence, later) AVL trees: first balanced bst: Adel'son, Velskii, Landis 1962. at each node, heights of subtrees differ by at most one each node stores two bits: which of left and right is deeper similar case analysis and rotation-based updates as red-black trees advantages: little extra memory more balanced than red-black: height log_phi n instead of 2 log_2 n BB[alpha]-trees: at each node, #descendants(child) <= (1-alpha) * #descendants(node) each node stores #descendants so we can check this condition max height: log_(1/(1-alpha)) n insert, delete: perform standard binary search tree ins/del propagate new #descendants up tree at *highest* unbalanced node, replace whole subtree with new complete binary tree amortized analysis: Phi = sum_(nodes) | #left - #right | each ins/del adds O(log n) to Phi each rebuild takes time O(k), reduces Phi by alpha k - O(log n) => amortized time O(log n) advantages: can get height (1+epsilon) log_2 n at cost of more frequent rebuilds, slower amort update can store large complicated data structures at each node, rebuild them infrequently, not worry about rotate disadvantages: time is amortized instead of worst case scapegoat trees: no extra information per node (not even one bit), just tree structure otherwise similar to BB[alpha]-tree if insert places node too deep, computes balance of larger and larger subtrees at ancestors until finding non-alpha-balanced subtree then rebuilds that subtree to be balanced search time adds in geometric series to rebuild time time per search worst case O(log n) time per update amortized O(log n) weighted binary trees (e.g. BB[alpha]-trees with total weight instead of #) time to insert/del/find item i: O(log W/w_i) advantages: can speed up times for frequently accessed items disadvantages: more complex finger search trees (locality of reference) time to insert/del find item i: O(log |pos_i - pos_{i-1}|) advantages: locality of reference in input sequence (e.g. sequential order => linear time) disadvantages: more complex Splay trees: next time advantages: most of above no extra space, weighted w/o change to data struc, finger searches (again w/o change) disadvantages: amortized