What is a data structure? Module: set of variables and subroutines encapsulating some interface Object-oriented: interacting set of objects that implements some functionality What do we study? What sets of operations are commonly needed? How can we implement them efficiently? (various forms of analysis, amortized, etc) How can we form new specialized data structures by combining existing ones in simple ways? Why data structures? Theoretical: how do I find a fast algorithm? Fast ds = fast alg sorting examples -- selection sort => heap sort; ins sort Programming in the small (even when overall program does something non-theoretical) interval tree for fast tree scrolling example Software design principles (modular object oriented design) Python syntax key diffs from C etc: no variable declarations (data has a type but variables can hold any type data) block structure determined by indentation instead of brackets simple example fib = [1, 1] def fibonacci(n): while n < len(fib): fib.append(fib[-1]+fib[-2]) return fib[n] Basic data types object, string, list, dict list indexing zero-based, neg=from end Object oriented vs dictionary oriented representations: how to represent trees special tree node object having data parent left, right children 1st child, next sibling list of all siblings dictionary or dictionaries mapping tree nodes to these items advantage: flexibility tree nodes can be anything same object can be in multiple trees disadvantage: slower (dictionary lookup per access) how to represent sequences of objects? singly linked list easy forward traversal, ins/del at start hard to index doubly linked list easy traversal, ins/del anywhere hard to index array easy to index hard to ins/del anywhere python choice (also Java vector): partially filled array resize when gets filled easy to index easy to ins/del at end hard to ins/del elsewhere resize strategy? Python: always double Java: double by default, or fixed increment HOW TO COMPARE? WORST CASE ANALYSIS (PER OPERATION) index, del: constant ins: linear AMORTIZED: prove that, if we perform a sequence of n operations, total time of the sequence is small (amortized complexity = average = total/n for worst case sequence) here, worst case = repeat n*insertions (to trigger many resizes) fixed increment I: resize after every I inserts time per resize = proportional to current size so total = Theta(I) + Theta(2I) + Theta(3I) + ... + Theta(n) = Theta(n^2/I). Amortized complexity per operation: n/I = O(n) [I is a constant] double: total = Theta(1) + Theta(2) + Theta(4) + ... + Theta(n) = Theta(2n) = Theta(n) Amortized complexity per operation: O(1)