Linked Lists ICS-46 is concerned with studying the lower-level data structures that are used in Python to represent lists/tuples, dicts, sets/frozensets, and other not-built-into Python named data-types: stacks, queues, priority-queues, equivalence classes, and graphs. There are two primary components to all these data-types: arrays and self-referential data-structures. Linked lists are the simplest kind of self-referential data-structures, and trees are more complex self-referential data-structures. Languages like Java/C++ don't build-in most of Python's useful data-types, but instead provide them in standard libraries, which are a bit more awkward to use than these data-types in Python. These data-type libraries are built on arrays and self-referential structures. This week is a peek at self-referential structures (the last week will be a peek at more of Java, and will address some of these same issues in a larger context). ------------------------------------------------------------------------------ Here is the trivial class that serves as the basis for all our linked list code (and the tree class isn't much different). LN stand for List Node: a linked list is a sequence of list nodes one referring to the next. class LN: def __init__(self : LN, value : object, next : LN =None): self.value = value self.next = next Basically the class allows us to create objects with two instance variables: value refers to some object (of any class) but next should either refer to an object constructed from the LN class or None (its default value). In this way we describe LN as a self-referential class: each of its objects refers to another one of its objects (although None will serve to stop this recursive definition from being infinite: it will be the base case in our recursive functions that process linked lists). So a linked list is like a standard Python list (implemented by arrays, which you'll learn tons about in ICS-45H/ICS-45C and ICS-46, but are hidden in Python). Here, and more in these classes, we will learn many details concerning the objects that implement linked lists and how we can implement, using linked lists, the kinds of operations we perform on standard Python lists. In ICS-46 we will focus the the performance tradesoff (speed) for array vs. linked structures for representing lists. We already know a lot about names refering to objects that have namespaces, but objects refering to objects (of the same class) that refer to objects (of the same class) that refer to objects (of the same class) ... will be something new to learn about. We will start with pictures, because pictures are essential to understanding and manipulating these data structures. In lecture, I will show some detailed diagrams of linked lists built with LN objects; then I will remove much redundant information to show more concise and easy to draw pictures. Please copy these down, as they are not in the notes. Show detailed pictures here in lecture Here is an abbreviated picture: name x refers to an LN object whose value is 5 and whose next is a reference to another LN object whose value 3 and whose next is a reference to another LN object whose value 8 and whose next is ..... The value None is represented symbolically by the symbol /. x +---+ +---+---+ +---+---+ +---+---+ +---+---+ +---+---+ | --+--->| 5 | --+--->| 3 | --+--->| 8 | --+--->| 2 | --+--->| 4 | / | +---+ +---+---+ +---+---+ +---+---+ +---+---+ +---+---+ Note that the tails of the arrows (references) are put INSIDE a box representing a place where a name's value is stored. The heads of the arrows refer to an entire LN object, not any particular instance variable. In the following, whenever we see a .name it means "follow the arrrow to the object it refers to (all arrows refer to objects) and access the name attribute. Read the following carefully; everything later we do with linked lists is built on understanding the meaning of .name (something you've been doing with class objects for a while, even if just doing something like alist.append(...). So (1) x refers to the first LN object (2) x.value refers to the int object 5 (3) x.next refers to the second LN object (4) x..next.value refers to the int object 3 (5) x.next.next refers to the third LN object (4) x..next..next.value refers to the int object 8 Don't memorize this; understand what .name means and carefully be able to analyze each of these expressions. A typically class in data structure looks at this class as representing just data and no methods. So, we will examine functions definined outside of LN, not methods defined inside LN (although must of these functions can be easily written as methods). We will discuss both iterative and recursive versions of most functions. See the download that contains all these functions and a simple driver you can use to test them. ------------------------------------------------------------------------------ Functions that query/access linked lists One of the main operations we perform on linked lists (as we do with lists) is to iterate over them, processing all their values. The following function computes the sum of all the values in a linked list ll. def sum_ll(ll): total = 0 while ll != None: total += ll.value ll = ll.next return sum Lots of code that traverses linked lists looks similar. We will go over how this code processes the linked list above, with the call sum_ll(x) and see exactly how it is that we visit each node in the linked list and stop processing it at the end. We can also define linked lists recursively and use such a definition to help us write functions that recursively process linked lists. (1) None is the smallest linked list: it contains no nodes (2) A list node whose next refers to a linked list is also linked list So, we can recursively process linked list by processing their first LN and the (one smaller) linked list they refer to; recursion ends at None (which is the smallest linked list). We can recursively compute the sum of list by def sum_ll_r(ll): if ll == None: return 0 else: return ll.value + sum_ll_r(ll.next) Back to the three rules we studied to prove a recursive functions correct: (1) It computes the correct result for the smalled (no node) list: it returns 0 which is the sum of no nodes. (2) Each recursive call is on a smaller list, which is closer to the base case: The recursive call is on ll.next, which is a list with one fewer nodes (3) Assuming sum(ll.next) computes the sum of all values after the node representing the start of the list to be processed,, this function returns the sum of all the list nodes: if we add the value of this first node to the sum of the values in all the following nodes in the list, then we have computed the sum of all the nodes in the list. An even simpler traversal of linked lists compute their length. Here are the iterative and recursive methods. def len_ll(ll): count = 0 while ll != None: count += 1 ll = ll.next return count def len_ll_r(ll): if ll == None: return 0 else: return 1 + len_ll_r(ll.next) These are simpler than sum_ll: rather than addind the value of each list node, these add 1 to a sum for each list node, ultimately computing the number of list nodes in the linked list (its length). Next lets look at computing a string representation for a list. There is no standard for how linked lists are represented as strings. We could convert them to look like a normal list: [...] but instead we will use the following form '5->3->8->2->4->None'. Here are the iterative and recursive functions. In the iterative method, for each node in the list we append its value followed by '->' into a string, and append just the value 'None' at the end, before returning. def str_ll(ll): answer = '' while ll != None: answer += str(ll.value)+'->' ll = ll.next return answer + 'None' In the recursive version, we return 'None' as the base-case, appending the value and '->' in front of the result returned on each recursive call. def str_ll_r(ll): if ll == None: return 'None' else: return str(ll.value)+'->'+str_ll_r(ll.next) In all these examples, the iterative and recursive code have been approximately the same complexity. Let's now look at functions that convert a standard Python list into a linked list, and copy a linked list, observing that the recursive versions are a bit simpler to write and understand. BUT, you should hand simulate the iterative methods to understand how/why they work. Next are two function to convert a standard Python list into a linked list. In list_to_ll we must treat an empyt list specially, returning None: otherwise (for a non-empty list) we can access the first value: l[0]. We make two names refer to is LN (which has next=None): front and rear. We will not change front and eventually return its value (returning a reference to the front of all the list nodes in our list). We add each new value at the end of the list of nodes by extending the node rear refers to: changing its next from None to an actual list node, and then re-adjusting rear to refer to this new end-of-the-list node, extending it as many times as necessary. def list_to_ll(l): if l == []: return None front = rear = LN(l[0]) for v in l[1:]: rear.next = LN(v) rear = rear.next return front The recursive version of this function looks pretty much like all the recursive function we have seen on linked list. One interesting feature of note: the constructor for LN has the recursive call as its second argument. def list_to_ll_r(l): if l == []: return None else: return LN(l[0], list_to_ll_r(l[1:])) (1) It computes the correct result for the smallest (empty) list: it returns None, which is an empty linked list. (2) Each recursive call is on a smaller list, which is closer to the base case: The recursive call is on l[1:], the standard one-smaller-list. (3) Assuming list_to_ll(l[1:]) returns a linked list with all the values in the ll[1:], this function returns a linked list of all the values in the parameter list: it return a reference to a new list node with the first value in the list (l[0]) and whose .next refers to a linked list with all the values in l[1:]. To find a value in a linked list (returning a reference to the node that contains it, we write an iterative method and two recursive variants. Each returns None if the value is not found in the linked list. Iteratively, we use ll to traverse the list, looking for avalue: we either find it or "run off the end of the list" and return None def find_ll(ll, avalue): while ll != None: if ll.value == avalue: return ll ll = ll.next return None We can also write this more simply as follows, combining the two conditions for returning a value; when the loop terminates, the test ll != None and ll.value != avalue is False, so either ll == None or ll.value == avalue; in both cases returning ll is correct. def find_ll(ll, avalue): while ll != None and ll.value != avalue: ll = ll.next return ll For the recursive functions, the firt uses the simplest base case/non-base case form. If the linked list isn't empty def find_ll_r(ll, avalue): if ll == None: return None else: if ll.value == avalue: return ll else: return find_ll_r(ll.next,avalue) We could replace this entire body by one complicated conditional expression: return (None if ll==None else ll if ll.value==avalue else find_ll_r(ll.next,avalues) But this version is very hard to read, and not in the standard recursive form we have been using. As a slight variant we can test both ll == None or ll.value == avalue and in both cases return ll (returning either None of a reference to a list node). Note taht if ll == None is True, short-circuit evaluation of or means that the expression ll.value == avalue will not need to be executed: good thing, too, because ll.value when ll is None would raise an exception. def find_ll_r2(ll, avalue): if ll == None or ll.value == avalue: # short-circuit or is critical return None else: return find_ll_r(ll.next,avalue) We examined code that returned the linked list equivalent of a standard Python list. Here is similar code that copies a linked list: constructs new nodes with the same values, arranged in a linked list. In the iterative version we again use front/rear to remember the front of the list and extned the rear for each values we traverse in ll. def copy_ll(ll): if ll == None: return None front = rear = LN(ll.value) while ll.next != None: ll = ll.next rear.next = LN(ll.value) rear = rear.next return front As we expect, the recursive version is more elegant, and similar to the other recursive code that processes linked lists. def copy_ll_r(ll): if ll == None: return None else: return LN(ll.value,copy_ll_r(ll.next)) Can you write a recursive function that converts a linked list of values into a standard Python list? Finally, languages like Java/C++ don't easily support generators. Since Python does we can easily write a generator that produces all the values in a linked list. def iterator(ll): while ll != None: yield ll.value ll = ll.next ------------------------------------------------------------------------------ Functions that command/mutate linked lists All the functions above queried/accessed but did not mutate linked lists: no changes were made to .value or .next of an LN object. If x refers to a linked list, we can add a new value at the front of the linked list by the simple code: x = LN(newvalue,x) Now x refers to a new list node, whose value is newvalue, and whose next refers to the node that x used to refer to: all the nodes in the original linked list. Draw a picture with x = None originally or x refering to the linked list above. We can write the following iterative/recursive functions to append a value at the end of the linked list. In both cases the list is mutated: the last list node has its next changed to refer to a new list node containing the new value (and whose .next is None). But, to handle the case where x stores None, the iterative/recursive functions must return a reference to the front of the list (maybe x itself, or if x stored None, a reference to a one-node linked list storing newvalue). We call these functions like x = append_ll(x,newvalue) and x = append_ll_r(x,newvalue) As with list_to_ll and copy, the iterative version needs to remember the front while using ll to traverse down the list, to find the last list node to extend. def append_ll(ll,value): if ll == None: return LN(value) front = ll while ll.next != None: # while ll does not refer to the last node... ll = ll.next # terminates when ll.next == None ll.next = LN(value) # append value at end return front # return reference to original front of ll The recurisive method is again simpler to write. def append_ll_r(ll,value): if ll == None: return LN(value) else: ll.next = append_ll_r(ll.next,value) return ll Here are two simple functions (not iterative or recursive) to mutate a list by adding/removing a value after one referred to by their argument. Both functions return None implicitly. def add_after_ll(ll,value): # raises an exception of ll is None ll.next = LN(value,ll.next) def remove_after_ll(ll): # raises an exception of ll is None or ll.next is None ll.next = ll.next.next Note that to remove the first value in a linked list, we write x = x.next ------------------------------------------------------------------------------ Do you really understand how low-level linked list cod works? Using the list shown above execute the call. x = magic(x) It actually mutates the list in a complicated way and returns a reference to one of its nodes. Hand simulate the results (calling the function to see the result produced before trying to hand simulate it has zero educational value. I don't care whether you know the answer; I care whether you can hand simulate this code and code like it that you might write. def magic(ll): answer = None while ll != None: t_m = ll ll = ll.next t_m.next = answer answer = t_m return answer