Linked Lists ICS-46 is concerned with studying the lower-level data structures that are used in Python to represent lists/tuples, dicts, sets/frozensets, and other not-built-into Python named data-types: stacks, queues, priority-queues, equivalence classes, and graphs. There are two primary components to all these data-types: arrays (a simpler version of Python lists; in fact Python's lists are built from arrays) and self-referential data-structures. Linked lists are the simplest kind of self-referential data-structures; trees (we will study binary search trees later this week) are more complex self-referential data-structures. Languages like Java/C++ don't build-in most of Python's useful data-types, but instead provide them in standard libraries, which are a bit more awkward to use than these data-types in Python. These data-type libraries are built on arrays and self-referential structures. This week is a peek at self-referential structures: linked lists and binary search trees. ------------------------------------------------------------------------------ Here is the trivial class that serves as the basis for all our linked list code (and the tree class covered later this week isn't much different). LN stands for List Node: a linked list is a sequence of zero or more list nodes, with each one explicitly referring to the next LN in the list, if there is one (via an attribute that is the next reference). class LN: def __init__(self : "LN", value : object, next : "LN or NoneType" = None): self.value = value self.next = next We write "LN" in the annotations above, because when defining LN we cannot use LN for an annotation yet (because it hasn't been completely defined yet); we can use it after the definition of the LN class. Basically the class allows us to create objects with two attribute names: value refers to some object (of any class), but next should either refer to an object constructed from the LN class or refer to the special value None (its default value in __init__ above): type(None) is NoneType. In this way we describe LN as a self-referential class: each of its objects refers to another one of its objects (although None will serve to stop this recursive definition from being infinite: it will be the base case in our recursive functions that process linked lists). None represents a linked list with no/0 nodes. Generally, a linked list is a sequential list of values, with their order being important. So a linked list is like a standard Python list (implemented by a simpler structure named arrays, which you'll learn tons about in ICS-45J/ICS-45C and ICS-46, but are hidden in Python). Here, and much more in these other courses, we will learn many details concerning the objects that implement linked lists and how we can use linked list to implement the kinds of operations we perform on standard Python lists. In ICS-46 we will focus the the performance tradeoffs (speed/space) for array vs. linked structures for representing lists and other data types. We already know a lot about objects storing attributes that refer to objects constructed from OTHER classes. Now we will study objects storing attributes that refer to objects constructed from the SAME class, storing attributes that refer to objects constructed from the same class, ... That is something new to learn about and explore. We will start with pictures, because pictures are essential to understanding and manipulating these data structures. In lecture, I will show some detailed diagrams of linked lists built with LN objects; then I will remove much redundant information to show more concise and easy-to-draw pictures. See the extra materials shown on the Weekly Schedule page for this lecture. I will show detailed pictures here during the lecture. Here is an abbreviated picture: name x refers to an LN object whose value attribute is 5 and whose next attribute is a reference to another LN object whose value attribute is 3 and whose next attribute is a reference to another LN object whose value attribute is 8 and whose next attribute is ..... A reference in the next attribute to the value None is represented symbolically by the symbol /. x +---+ +---+---+ +---+---+ +---+---+ +---+---+ +---+---+ | --+--->| 5 | --+--->| 3 | --+--->| 8 | --+--->| 2 | --+--->| 4 | / | +---+ +---+---+ +---+---+ +---+---+ +---+---+ +---+---+ Note that the tails of the arrows (references) are put INSIDE a box representing a place where a name's value is stored. The heads of the arrows refer to an entire LN object, not any particular attribute name/value in it. Note that with this notation, we show a value like 3 inside a box for simplicity, instead of a reference to the int object 3 inside the box. Often, the interesting part of linked-list programming has more to do with the next attributes of the object than the value attributes. In the code below, whenever we see a .name it means "follow the arrow to the object it refers to (all arrows refer to objects) and select the name attribute (in LN objects, all attributes store data). Read the following carefully; everything we do later with linked lists is built on understanding the meaning of .name (something we've been doing with class objects for a while, even if just doing something like writing print(self.name) or self.name = value. (1) x stores a reference to the first LN object (2) x.value stores a reference to the int object 5 in this first LN object (3) x.next stores a reference to the second LN object (4) x.next.value stores a reference to the int object 3 in this second LN object (5) x.next.next stores a reference to the third LN object (6) x.next.next.value stores a reference to the int object 8 in this third LN object Don't memorize this information; understand what .name means and carefully be able to analyze each of these expressions, and any others using .next and .value too. Typically we will look at classes for a list/tree data structure as representing just data and no methods. So, we will examine functions defined outside of LN, not methods defined inside the LN class (although most of these functions can be easily written as methods). We will discuss both iterative and recursive versions of most functions, as appropriate. See the download that contains all these functions and a simple driver that you can use to test them. ------------------------------------------------------------------------------ Functions that query/access linked lists One of the main operations we perform on linked lists (as we do with lists) is to iterate over them, processing all their values. The following function computes the sum of all the values in a linked list ll. def sum_ll(ll): def sum_ll(ll): sum = 0 sum = 0 while ll != None: while True: sum += ll.value if ll == None: ll = ll.next break; return sum sum += ll.value ll = ll.next return sum Lots of code that traverses (iterates over) linked lists looks similar. In class we will cover (hand simulate) how this code processes the linked list above, with the call sum_ll(x) and see exactly how it is that we visit each node in the linked list and stop processing it at the end (when ll goes beyond the last LN in the linked list and takes on the value None. There is no special iterator for LN objects (unless we create one, as we will at the end of this lecture); LN is just like any other Python class that we write. We can also define linked lists recursively and use such a definition to help us write functions that recursively process linked lists. (1) None is the smallest linked list: it contains no nodes (2) A list node (LN) whose next refers to a linked list is also linked list So None is a linked list (of 0 values); a list node whose next attribute is None is a linked list (of 1 value); a list node whose next attribute is a list node whose next attribute is None is a linked list (of 2 values); etc. So, we can recursively process a linked list by processing its first LN and then recursively processing the (one smaller) linked list its next attribute refers to; recursion ends at None (which is the base case: the smallest linked list). We can recursively compute the sum of linked list by def sum_ll_r(ll): if ll == None: # Could also test: type(ll) is NoneType return 0 else: return ll.value + sum_ll_r(ll.next) Back to the three rules we studied to prove a recursive functions correct: (1) It recognizes and computes the correct result for the smallest (None, no LN) linked list: it returns 0 which is the sum of no nodes. (2) Each recursive call is on a smaller linked list, which is closer to the base case: The recursive call is on ll.next, which is a linked list with one fewer nodes. It is similar to the call l[1:] when l is Python list. (3) Assuming sum_ll_r(ll.next) computes the sum of all values after the node representing the start of the linked list to be processed, this function returns the sum of all the nodes in this linked list: if we add the value of this first node to the sum of the values in all the following nodes in the linked list, then we have computed the sum of all the nodes in the linked list. It is elephants all the way down. ----- Aside: Efficiency in Time and Space In tuples/lists, using a slice to skip the first value in a recursive call is INEFFICIENT in both time and space. It must COPY the entire tuple/list. But using ll.next to skip the first value in a recursive call is EFFICIENT in both time and space. ----- An even simpler traversal of linked lists computes their length. Here are the iterative and recursive functions. def len_ll(ll): count = 0 while ll != None: count += 1 ll = ll.next return count def len_ll_r(ll): if ll == None: return 0 else: return 1 + len_ll_r(ll.next) These are simpler than the sum_ll functions: rather than adding the value of each list node, these add 1 to a count for each list node, ultimately computing the number of list nodes in the entire linked list: its length. Next lets look at computing a string representation for a list. There is no standard for how linked lists are represented as strings. We could convert them to look like a normal list: [...] but instead we will use the following form '5->3->8->2->4->None'. Here are the iterative and recursive functions to produce such strings. In the iterative method, for each node in the list we concatenate its value followed by '->', and concatenate just the value 'None' at the end, before returning. def str_ll(ll): answer = '' while ll != None: answer += str(ll.value) + '->' ll = ll.next return answer + 'None' In the recursive version, we return 'None' as the base-case, concatenating the value and '->' in front of the result returned on each recursive call. def str_ll_r(ll): if ll == None: return 'None' else: return str(ll.value) + '->' + str_ll_r(ll.next) In all these examples, the iterative and recursive code have approximately the same complexity. Let's now look at two other functions: one that converts a standard Python list into a linked list, and one that copies a linked list, observing that the recursive versions are a bit simpler to write and understand. BUT, you should hand simulate the iterative methods to understand how/why they work too, and apply the 3 proof rules to the recursive functions. First: two functions to convert a standard Python list into a linked list. In list_to_ll we must treat an empty list specially, returning None: otherwise (for a non-empty list) we can access the first value: l[0]. We make two names (front and rear) to refer to the LN constructed with that value (in this LN its next is None). We will not change front and eventually return its value (returning a reference to the front of all the list nodes in our list). We add each new value at the end of the list of nodes by extending the node rear refers to: changing its next from None to an actual list node (whose next is None), and then re-adjusting rear to refer to this new end-of-the-list node, extending it as many times as necessary. def list_to_ll(l): if l == []: return None front = rear = LN(l[0]) # next parameter is implicitly None in LN for v in l[1:]: rear.next = LN(v) # next parameter is implicitly None in LN rear = rear.next return front The recursive version of this function is simpler, and looks pretty much like all the recursive functions that we have seen for linked lists. One interesting feature of note: the recursive call is the second argument to LN's constructor. It calls this recursive function and passes a reference to the copied list to the constructor. def list_to_ll_r(l): if l == []: return None else: return LN( l[0], list_to_ll_r(l[1:]) ) Here is the proof this function is correct (1) It recognizes and computes the correct result for the smallest (empty) list: it returns None, which is the smallest (empty) linked list. (2) Each recursive call is on a smaller list, which is closer to the base case: The recursive call is on l[1:], the standard one-smaller list. (3) Assuming list_to_ll(l[1:]) returns a linked list with all the values in the l[1:], this function returns a linked list of all the values in the parameter list: it returns a reference to a new list node with the first value in the list (l[0]) and whose .next refers to a linked list with all the values in l[1:]. To find a value in a linked list (returning a reference to the node that contains that value; if the value appears in the list multiple times, it returns a reference to the first node that it is in), we write an iterative method and two recursive variants. Each returns None if the value is not found in the linked list. Iteratively, we use ll to traverse the list, looking for avalue: we either find it or "run off the end of the list by reaching None" and return None. def find_ll(ll, avalue): while ll != None: if ll.value == avalue: return ll ll = ll.next return None We can also write this more simply as follows (see code below), combining the two conditions for returning a value; when the loop terminates, the test ll != None and ll.value != avalue is False when the while loop ends, so either ll == None or ll.value == avalue; in both cases returning ll is correct. Note that the short-circuit evaluation of the and operator (and the order of the conjuncts) is critical: we should not follow the reference in ll (with ll.value) until we are sure that ll does not refer to the None object; if it does, ll.value would raise an exception. DeMorgan's law in boolean algebra is very important in programming. It says that not (A and B) == not (A) or not (B); also, not (A or B) == not (A) and not (B). Each part is negated and the connector flips: and -> or; or -> and. So the loop above terminates when its test is False: when not (test) is True. not (ll != None and ll.value != avalue) is equivalent to terminating when not (ll != None) or not(ll.value != avalue) removing the double negative we get ll == None or ll.value == avalue We could have also written the continuation test for this loop as not (ll == None or ll.value == avalue) or the entire function using while True: as def find_ll(ll, avalue): while True: if ll == None or ll.value == avalue: # short-circuit is critical return ll ll = ll.next For the recursive functions, the first uses the simplest base case/non-base case form. If the linked list isn't empty def find_ll_r(ll, avalue): if ll == None: return None else: if ll.value == avalue: return ll else: return find_ll_r(ll.next, avalue) We could replace this entire body by one complicated conditional expression: return (None if ll==None else ll if ll.value==avalue else find_ll_r(ll.next,avalues) But this version is very hard to read, and not in the standard recursive form that we have been using. As a slight variant (and similar to what we did in the while loop version) we can test both ll == None or ll.value == avalue and in both cases return ll (returning either None of a reference to a list node). Note that if ll == None is True, short-circuit evaluation of "or" means that the expression ll.value == avalue will not need to be evaluated: good thing, too, because accessing ll.value when ll is None would raise an exception. def find_ll_r2(ll, avalue): if ll == None or ll.value == avalue: # short-circuit or is critical return ll else: return find_ll_r(ll.next, avalue) Note that this function is tail-recursive and could automatically be written iteratively (as the code above shows). Most of the previous functions are also not tail-recursive, but we could convert them into tail-recursive functions that accumulate their answers in an additional parameter, and then convert them to be iterative. We have already examined code that returned the linked list equivalent of a standard Python list. Here is similar code that copies a linked list: constructs new nodes with the same values, arranged in the same order, for a linked list. In the iterative version we again use front/rear to remember the front of the list and extend the rear for each values we traverse in ll. def copy_ll(ll): if ll == None: return None front = rear = LN(ll.value) # next parameter is implicitly None in LN while ll.next != None: ll = ll.next rear.next = LN(ll.value) # next parameter is implicitly None in LN rear = rear.next return front As we expect, the recursive version is more elegant, and similar to the other recursive code that processes linked lists. It is similar to the code we wrote to translate a Python list into a linked list. Again here the recursive calls is the second argument to LN's constructor. So, for ll it builds a node whose value attribute is ll.value and whose next attribute is a copy of all linked list values after the first, therefore copying the entire linked list: each recursive calls creates a new LN for its ll's value attribute. def copy_ll_r(ll): if ll == None: return None else: return LN(ll.value, copy_ll_r(ll.next)) Finally, languages like Java/C++ don't easily support generators. But because Python does, we can easily write a generator that produces all the values in a linked list. def iterator(ll): while ll != None: yield ll.value ll = ll.next With this code, we could print every value in a linked list by simply writing for v in iterator(ll): print(v) In fact, we could put a variant of this code in the __iter__ method in the LN class itself: def __iter__(self): current = self while current != None: yield current.value current = current.next With this method, we could simply write just for v in ll: print(v) although this code (unlike the generator above) will not work when ll refers to None, because there is no __iter__ method in NoneType that would return immediately. But if ll refers to an LN object, the __iter__ code above will iterate through its values. When writing a linked list class using full Object Oriented Programming, we define both a list node class and list empty class: iterations on a list empty class would just immediately raise StopIteration. ------------------------------------------------------------------------------ Functions that command/mutate linked lists All the functions above queried/accessed/created but did not mutate linked lists: no changes were made to .value or .next of any LN object. If x refers to the first LN in a linked list, we can add a new value at the front of the linked list by the simple code: x = LN(new_value, x) Now x refers to a new list node, whose value is new_value, and whose next attribute refers to the LN that x used to refer to: all the nodes in the original linked list x. Draw a picture with x = None originally or x referring to the linked list above, when this assignment is executed. This operation takes the same amount of time, regardless of how long the linked list is: it just constructs an LN object and fills-in its two attributes. If l is a regular Python list, inserting a new_value at its front position (l.insert(0,new_value)) take more time the longer the list is: it inserts a new value after moving every value back one in the list. Linked lists are more efficient that regular lists for some operations, including insertion at the front. We can write the following iterative/recursive functions to append a value at the end of the linked list. In both cases the list is mutated: the last list node has its next attribute changed to refer to a new list node containing the new value (and whose .next is None). But, to handle the case where x is initially empty (stores None), the iterative/recursive functions must return a reference to the front of the list (maybe x itself, or if x stored None, a reference to a one-node linked list storing new_value). We call these functions like x = append_ll(x, new_value) and x = append_ll_r(x, new_value) As with list_to_ll and copy, the iterative version needs to remember the front while using ll to traverse down the list, to find the last list node to extend. def append_ll(ll,value): if ll == None: # special case for an empty list return LN(value) front = ll while ll.next != None: # while ll does not refer to the last node... ll = ll.next # advance: terminates when ll.next == None ll.next = LN(value) # (at end: ll.next == None) put value after end node return front # return reference to original front of ll (still front) The recursive method is again simpler to write. def append_ll_r(ll,value): if ll == None: return LN(value) else: ll.next = append_ll_r(ll.next,value) return ll Here is the proof this function is correct (1) It recognizes and computes the correct result for the smallest (empty) linked list: it returns a reference to a linked list with one node (storing value: it is both the front and end of the new list) (2) Each recursive call is on a smaller linked list, which is closer to the base case of None: the recursive call is on ll.next. (3) Assuming append_ll_r(ll.next,value) returns a reference to a linked list that is one longer than ll.next containing all its list nodes followed by value in the last list node, this function returns a linked list that is one longer than ll containing all its list nodes followed by value in the last list node (by storing in ll.next a reference to the extended linked list and returning the original reference to ll). ICS-46 studies the execution times of various code applied to data structures. We will do a bit of this analysis in ICS-33, in week 8. Lists in Python allow us to add a value at the end very quickly, but adding a value at the front of a long list takes much more time: Python must first move the value at index 0 into index 1; the value at index 1 to index 2; ...For linked lists, adding a value at the front is very quick, while adding a value at the rear requires traversing every value in the list (to find the end). Depending on how often we perform these two operations, it might be faster to use a list or a linked list. If we frequently update the information at the end of a linked list, we can keep (cache) a reference to it, so we can immediately find the last node in a linked, to speed up our functions. Here are two simple functions (not iterative or recursive) to mutate a list by adding/removing a value directly after the one referred to by their argument. Both functions return None implicitly. def add_after_ll(ll,value): # raises an exception if ll is None; otherwise splice value after ll ll.next = LN(value,ll.next) def remove_after_ll(ll): # raises exception if ll (no list) or ll.next (no value to remove) is None # excise value after ll ll.next = ll.next.next Note that to remove the first value in a linked list, we write x = x.next Finally, we could write an append method in the LN class itself, which scanned to the end of the linked list containing the LN, and appending the value there. As with generator/__iter__, this method won't work for an empty linked list, because its class is NoneType, not LN. ------------------------------------------------------------------------------ Problems 1) Define a recursive function that converts a linked list of values into a standard Python list? 2) To really understand how low-level linked list code works, use the list shown above execute the call. x = magic(x) It actually mutates the list in a complicated way and returns a reference to one of its nodes. Hand simulate the results (calling the function to see the result produced before trying to hand simulate it has zero educational value. I don't care whether you know the answer; I care whether you can hand simulate this code and code like it that you might write. def magic(ll): answer = None while ll != None: t_m = ll ll = ll.next t_m.next = answer answer = t_m return answer 3) Define a function named select with two arguments: a linked list (ll) and a non-negative integer (n); it returns the value of the nth value in the list or raises an exception if there are too few values in the list. Write this function iteratively and recursively. 4) Define a function named append, with two linked list arguments; it returns a reference to the first node in a new linked list (lots of new LN objects) with all the values in the first followed by all the values in the second. This method does not mutate the arguments lists, it copies every list node in each. Write this function iteratively and recursively. 5) Define a function named append, with two linked list arguments; it returns a reference to the first node in a linked list (no new LN objects) that contains all the values in the first followed by all the values in the second. This method mutates the arguments lists (by making the next of the last node in the first list refer to the first in the second list; but be careful about empty lists). Write this function iteratively and recursively. 6) Define a function named interleave, with two linked list arguments; it returns a reference to the first node in a new linked list (lots of new LN objects) with all the values in the first interleaved with all the values in the second. This method does not mutate the arguments lists. Write this function recursively. If one list is longer than the other, just copy all the leftover nodes from the longer list at the end of the result. 7) Define a function named interleave, with two linked list arguments; it returns a reference to the first node in a linked list (no new LN objects) that contains all the values in the first interleaved with all the values in the second. This method mutates the arguments lists. Write this function recursively. If one list is longer than the other, just refer to all the leftover nodes from the longer list at the end of the result. 8) Define a function named reverse, with one linked list argument; it returns a reference to the first node in a new linked (lots of new LN objects) with all the values in their reverse order. Write this function iteratively and recursively. 9) Define a function named reverse, with one linked list argument; it returns a reference to the first node in a linked (no new LN objects) with all the values in their reverse order. This method mutates the argument list. Write this iteratively and recursively 10) For the recursive functions written in 8 and 9, rewrite them to use a helper method with an extra value that accumulates the reversed linked list. Such functions will be tail recursive; translate their code to iterative functions. 11) Define a function named is_ordered, with one linked list argument; it returns whether or not the list values are in non-decreasing order (each one must be <= its successor). 12) Define a function named insert_ordered, with two arguments: one an ordered linked list (see problem 11) and one value; it returns a reference to a linked list with all the original values and the new one added so the linked list is still ordered. 13) Define the __iter__ method in LN such that we can iterate over the nodes (not the values) in a linked list. Hint: either return a nested class implementing __next__ or return a generator (simpler) to do the job. To sum all the values in a linked list using such an iterator, we would write for r in ll: print(r.value) 14) Reread the definition of list_to_ll above. Why doesn't the following code work equivalently? def list_to_ll(l): if l == []: return None front = rear = LN(l[0]) # next parameter is implicitly None in LN for v in l[1:]: rear = rear.next = LN(v) return front Hint: chaining the equal sign (as in rear = rear.next = LN(v)) works differently in Python than in C++; what result would the call list_to_ll([1,2,3,4,5])) produce for the code above? Can you change the code to produce the correct result?