Linked Lists

ICS-46 is concerned with studying the lower-level data structures that are used
in Python to represent lists/tuples, dicts, sets/frozensets, and other
not-built-into Python named data-types: stacks, queues, priority-queues, 
equivalence classes, and graphs. There are two primary components to all these
data-types: arrays (a simpler version of Python lists; in fact Python's lists
are built from arrays) and self-referential data-structures. Linked lists are
the simplest kind of self-referential data-structures; trees (we will study
binary search trees later this week) are more complex self-referential
data-structures.
 
Languages like Java/C++ don't build-in most of Python's useful data-types, but
instead provide them in standard libraries, which are a bit more awkward to use
than these data-types in Python. These data-type libraries are built on arrays
and self-referential structures. This week is a peek at self-referential
structures: linked lists and binary search trees.

------------------------------------------------------------------------------

Here is the trivial class that serves as the basis for all our linked list code
(and the tree class covered later this week isn't much different). LN stands for
List Node: a linked list is a sequence of zero or more list nodes, with each one
explicitly referring to the next LN in the list, if there is one (via an
attribute that is the next reference).

class LN:
    def __init__(self : "LN", value : object, next : "LN or NoneType" = None):
        self.value = value
        self.next  = next

We write "LN" in the annotations above, because when defining LN we cannot use
LN for an annotation yet (because it hasn't been completely defined yet); we can
use it after the definition of the LN class.

Basically the class allows us to create objects with two attribute names: value
refers to some object (of any class), but next should either refer to an object
constructed from the LN class or refer to the special value None (its default
value in __init__ above): type(None) is NoneType.

In this way we describe LN as a self-referential class: each of its objects
refers to another one of its objects (although None will serve to stop this
recursive definition from being infinite: it will be the base case in our
recursive functions that process linked lists). None represents a linked
list with no/0 nodes. Generally, a linked list is a sequential list of values,
with their order being important.

So a linked list is like a standard Python list (implemented by a simpler
structure named arrays, which you'll learn tons about in ICS-45J/ICS-45C and
ICS-46, but are hidden in Python). Here, and much more in these other courses,
we will learn many details concerning the objects that implement linked lists
and how we can use linked list to implement the kinds of operations we perform
on standard Python lists. In ICS-46 we will focus the the performance tradeoffs
(speed/space) for array vs. linked structures for representing lists and other
data types.

We already know a lot about objects storing attributes that refer to objects
constructed from OTHER classes. Now we will study objects storing attributes
that refer to objects constructed from the SAME class,  storing attributes that
refer to objects  constructed from the same class, ... That is something new to
learn about and explore.

We will start with pictures, because pictures are essential to understanding
and manipulating these data structures. In lecture, I will show some detailed
diagrams of linked lists built with LN objects; then I will remove much
redundant information to show more concise and easy-to-draw pictures. See the
extra materials shown on the Weekly Schedule page for this lecture.

     I will show detailed pictures here during the lecture.

Here is an abbreviated picture: name x refers to an LN object whose value
attribute is 5 and whose next attribute is a reference to another LN object
whose value attribute is 3 and whose next attribute is a reference to another
LN object whose value attribute is 8 and whose next attribute is .....

A reference in the next attribute to the value None is represented symbolically
by the symbol /. 

  x
+---+    +---+---+    +---+---+    +---+---+    +---+---+    +---+---+
| --+--->| 5 | --+--->| 3 | --+--->| 8 | --+--->| 2 | --+--->| 4 | / |
+---+    +---+---+    +---+---+    +---+---+    +---+---+    +---+---+

Note that the tails of the arrows (references) are put INSIDE a box representing
a place where a name's value is stored. The heads of the arrows refer to an
entire LN object, not any particular attribute name/value in it.

Note that with this notation, we show a value like 3 inside a box for
simplicity, instead of a reference to the int object 3 inside the box. Often,
the interesting part of linked-list programming has more to do with the next
attributes of the object than the value attributes.

In the code below, whenever we see a .name it means "follow the arrow to the
object it refers to (all arrows refer to objects) and select the name attribute
(in LN objects, all attributes store data). Read the following carefully;
everything we do later with linked lists is built on understanding the meaning
of .name (something we've been doing with class objects for a while, even if
just doing something like writing print(self.name) or self.name = value.
  (1) x stores a reference to the first LN object
  (2) x.value stores a reference to the int object 5 in this first LN object
  (3) x.next stores a reference to the second LN object
  (4) x.next.value stores a reference to the int object 3 in this second LN
        object
  (5) x.next.next stores a reference to the third LN object
  (6) x.next.next.value stores a reference to the int object 8 in this third
         LN object

Don't memorize this information; understand what .name means and carefully be
able to analyze each of these expressions, and any others using .next and
.value too.

Typically we will look at classes for a list/tree data structure as representing
just data and no methods. So, we will examine functions defined outside of LN,
not methods defined inside the LN class (although most of these functions can be
easily written as methods). We will discuss both iterative and recursive
versions of most functions, as appropriate.

See the download that contains all these functions and a simple driver that you
can use to test them.

------------------------------------------------------------------------------

Functions that query/access linked lists

One of the main operations we perform on linked lists (as we do with lists)
is to iterate over them, processing all their values. The following function
computes the sum of all the values in a linked list ll.

def sum_ll(ll):	    	    	      	def sum_ll(ll):
    sum = 0				   sum = 0
    while ll != None:			   while True:
        sum += ll.value			       if ll == None:
        ll  =  ll.next			           break;
    return sum 				       sum += ll.value
    	   				       ll  =  ll.next
					   return sum

Lots of code that traverses (iterates over) linked lists looks similar. In class
we will cover (hand simulate) how this code processes the linked list above,
with the call sum_ll(x) and see exactly how it is that we visit each node in
the linked list and stop processing it at the end (when ll goes beyond the last
LN in the linked list and takes on the value None.

There is no special iterator for LN objects (unless we create one, as we will
at the end of this lecture); LN is just like any other Python class that we
write.

We can also define linked lists recursively and use such a definition to help
us write functions that recursively process linked lists. 

  (1) None is the smallest linked list: it contains no nodes
  (2) A list node (LN) whose next refers to a linked list is also linked list

So None is a linked list (of 0 values); a list node whose next attribute is None
is a linked list (of 1 value); a list node whose next attribute is a list node
whose next attribute is None is a linked list (of 2 values); etc.

So, we can recursively process a linked list by processing its first LN and then
recursively processing the (one smaller) linked list its next attribute refers
to; recursion ends at None (which is the base case: the smallest linked list).
We can recursively compute the sum of linked list by

def sum_ll_r(ll):
    if ll == None:              # Could also test: type(ll) is NoneType
        return 0
    else:
        return ll.value + sum_ll_r(ll.next)

Back to the three rules we studied to prove a recursive functions correct:

(1) It recognizes and computes the correct result for the smallest (None, no LN)
      linked list: it returns 0 which is the sum of no nodes.

(2) Each recursive call is on a smaller linked list, which is closer to the
      base case: The recursive call is on ll.next, which is a linked list with
      one fewer nodes. It is similar to the call l[1:] when l is Python list.

(3) Assuming sum_ll_r(ll.next) computes the sum of all values after the node
      representing the start of the linked list to be processed, this function
      returns the sum of all the nodes in this linked list: if we add the value
      of this first node to the sum of the values in all the following nodes in
      the linked list, then we have computed the sum of all the nodes in the
      linked list. It is elephants all the way down.

-----
Aside: Efficiency in Time and Space

In tuples/lists, using a slice to skip the first value in a recursive
call is INEFFICIENT in both time and space. It must COPY the entire tuple/list.
But using ll.next to skip the first value in a recursive call is EFFICIENT in
both time and space.
-----

An even simpler traversal of linked lists computes their length. Here are the 
iterative and recursive functions.

def len_ll(ll):
    count = 0
    while ll != None:
        count += 1
        ll = ll.next
    return count

def len_ll_r(ll):
    if ll == None:
        return 0
    else:
        return 1 + len_ll_r(ll.next)
 
These are simpler than the sum_ll functions: rather than adding the value of
each list node, these add 1 to a count for each list node, ultimately computing
the number of list nodes in the entire linked list: its length.

Next lets look at computing a string representation for a list. There is no
standard for how linked lists are represented as strings. We could convert them
to look like a normal list: [...] but instead we will use the following form
'5->3->8->2->4->None'. Here are the iterative and recursive functions to produce
such strings.

In the iterative method, for each node in the list we concatenate its value
followed by '->', and concatenate just the value 'None' at the end, before
returning.

def str_ll(ll):
    answer = ''
    while ll != None:
        answer += str(ll.value) + '->'
        ll = ll.next
    return answer + 'None'

In the recursive version, we return 'None' as the base-case, concatenating the
value and '->' in front of the result returned on each recursive call.

def str_ll_r(ll):
    if ll == None:
        return 'None'
    else:
        return str(ll.value) + '->' + str_ll_r(ll.next)

In all these examples, the iterative and recursive code have approximately the
same complexity. Let's now look at two other functions: one that converts a
standard Python list into a linked list, and one that copies a linked list,
observing that the recursive versions are a bit simpler to write and understand.
BUT, you should hand simulate the iterative methods to understand how/why they
work too, and apply the 3 proof rules to the recursive functions.

First: two functions to convert a standard Python list into a linked list.

In list_to_ll we must treat an empty list specially, returning None: otherwise
(for a non-empty list) we can access the first value: l[0]. We make two names
(front and rear) to refer to the LN constructed with that value (in this LN its
next is None). We will not change front and eventually return its value
(returning a reference to the front of all the list nodes in our list). We add
each new value at the end of the list of nodes by extending the node rear
refers to: changing its next from None to an actual list node (whose next is
None), and then re-adjusting rear to refer to this new end-of-the-list node,
extending it as many times as necessary.

def list_to_ll(l):
    if l == []:
        return None
    front = rear = LN(l[0])   # next parameter is implicitly None in LN

    for v in l[1:]:
        rear.next = LN(v)     # next parameter is implicitly None in LN
        rear = rear.next
    return front

The recursive version of this function is simpler, and looks pretty much like
all the recursive functions that we have seen for linked lists. One interesting
feature of note: the recursive call is the second argument to LN's constructor.
It calls this recursive function and passes a reference to the copied list to
the constructor.

def list_to_ll_r(l):
    if l == []:
        return None
    else:
        return LN( l[0], list_to_ll_r(l[1:]) )

Here is the proof this function is correct

(1) It recognizes and computes the correct result for the smallest (empty) list:
    it returns None, which is the smallest (empty) linked list.

(2) Each recursive call is on a smaller list, which is closer to the base case:
      The recursive call is on l[1:], the standard one-smaller list.

(3) Assuming list_to_ll(l[1:]) returns a linked list with all the values in the
      l[1:], this function returns a linked list of all the values in the
      parameter list: it returns a reference to a new list node with the first
      value in the list (l[0]) and whose .next refers to a linked list with
      all the values in l[1:].

To find a value in a linked list (returning a reference to the node that
contains that value; if the value appears in the list multiple times, it returns
a reference to the first node that it is in), we write an iterative method and
two recursive variants. Each returns None if the value is not found in the
linked list.

Iteratively, we use ll to traverse the list, looking for avalue: we either find
it or "run off the end of the list by reaching None" and return None.

def find_ll(ll, avalue):
    while ll != None:
        if ll.value == avalue:
            return ll
        ll = ll.next
    return None

We can also write this more simply as follows (see code below), combining the
two conditions for returning a value; when the loop terminates, the test

  ll != None and ll.value != avalue

is False when the while loop ends, so either ll == None or ll.value == avalue;
in both cases returning ll is correct. Note that the short-circuit evaluation
of the and operator (and the order of the conjuncts) is critical: we should not
follow the reference in ll (with ll.value) until we are sure that ll does not
refer to the None object; if it does, ll.value would raise an exception.

DeMorgan's law in boolean algebra is very important in programming. It says that
not (A and B) == not (A) or not (B); also, not (A or B) == not (A) and not (B).
Each part is negated and the connector flips: and -> or; or -> and.

So the loop above terminates when its test is False: when not (test) is True.
  not (ll != None and ll.value != avalue)
is equivalent to terminating when
  not (ll != None)  or  not(ll.value != avalue)
removing the double negative we get
  ll == None  or  ll.value == avalue
We could have also written the continuation test for this loop as 
  not (ll == None or ll.value == avalue)
or the entire function using while True: as

def find_ll(ll, avalue):
    while True:
        if  ll == None or ll.value == avalue:   # short-circuit is critical
            return ll
        ll = ll.next

For the recursive functions, the first uses the simplest base case/non-base case
form. If the linked list isn't empty

def find_ll_r(ll, avalue):
    if ll == None:
        return None
    else:
        if ll.value == avalue:
            return ll
        else:
            return find_ll_r(ll.next, avalue)

We could replace this entire body by one complicated conditional expression:

return (None if ll==None else ll if ll.value==avalue else find_ll_r(ll.next,avalues)

But this version is very hard to read, and not in the standard recursive form
that we have been using.

As a slight variant (and similar to what we did in the while loop version) we
can test both ll == None or ll.value == avalue and in both cases return ll
(returning either None of a reference to a list node). Note that if ll == None
is True, short-circuit evaluation of "or" means that the expression
ll.value == avalue will not need to be evaluated: good thing, too, because
accessing ll.value when ll is None would raise an exception.

def find_ll_r2(ll, avalue):
    if ll == None or ll.value == avalue: # short-circuit or is critical
        return ll
    else:
        return find_ll_r(ll.next, avalue)

Note that this function is tail-recursive and could automatically be written
iteratively (as the code above shows). Most of the previous functions are also
not tail-recursive, but we could convert them into tail-recursive functions that
accumulate their answers in an additional parameter, and then convert them to
be iterative.

We have already examined code that returned the linked list equivalent of a
standard Python list. Here is similar code that copies a linked list:
constructs new nodes with the same values, arranged in the same order, for a
linked list. In the iterative version we again use front/rear to remember the
front of the list and extend the rear for each values we traverse in ll.

def copy_ll(ll): 
    if ll == None:
        return None
    front = rear = LN(ll.value)     # next parameter is implicitly None in LN

    while ll.next != None:
        ll = ll.next
        rear.next = LN(ll.value)    # next parameter is implicitly None in LN
        rear = rear.next
    return front

As we expect, the recursive version is more elegant, and similar to the other
recursive code that processes linked lists. It is similar to the code we wrote
to translate a Python list into a linked list. Again here the recursive calls is
the second argument to LN's constructor. So, for ll it builds a node whose
value attribute is ll.value and whose next attribute is a copy of all linked
list values after the first, therefore copying the entire linked list: each
recursive calls creates a new LN for its ll's value attribute.
 
def copy_ll_r(ll):
    if ll == None:
        return None
    else:
        return LN(ll.value, copy_ll_r(ll.next))

Finally, languages like Java/C++ don't easily support generators. But because
Python does, we can easily write a generator that produces all the values in a
linked list.

def iterator(ll):
    while ll != None:
        yield ll.value
        ll = ll.next

With this code, we could print every value in a linked list by simply writing

  for v in iterator(ll):
      print(v)

In fact, we could put a variant of this code in the __iter__ method in the LN
class itself:

  def __iter__(self):
    current = self
    while current != None:
        yield current.value
        current = current.next

With this method, we could simply write just

  for v in ll:
      print(v)

although this code (unlike the generator above) will not work when ll refers
to None, because there is no __iter__ method in NoneType that would return
immediately. But if ll refers to an LN object, the __iter__ code above will
iterate through its values. When writing a linked list class using full
Object Oriented Programming, we define both a list node class and list empty
class: iterations on a list empty class would just immediately raise
StopIteration.

------------------------------------------------------------------------------

Functions that command/mutate linked lists

All the functions above queried/accessed/created but did not mutate linked
lists: no changes were made to .value or .next of any LN object.

If x refers to the first LN in a linked list, we can add a new value at the
front of the linked list by the simple code:

  x = LN(new_value, x)

Now x refers to a new list node, whose value is new_value, and whose next
attribute refers to the LN that x used to refer to: all the nodes in the
original linked list x. Draw a picture with x = None originally or x referring
to the linked list above, when this assignment is executed.

This operation takes the same amount of time, regardless of how long the linked
list is: it just constructs an LN object and fills-in its two attributes. If l
is a regular Python list, inserting a new_value at its front position
(l.insert(0,new_value)) take more time the longer the list is: it inserts a new
value after moving every value back one in the list. Linked lists are more
efficient that regular lists for some operations, including insertion at the
front.

We can write the following iterative/recursive functions to append a value at
the end of the linked list. In both cases the list is mutated: the last list
node has its next attribute changed to refer to a new list node containing the
new value (and whose .next is None). But, to handle the case where x is
initially empty (stores None), the iterative/recursive functions must return a
reference to the front of the list (maybe x itself, or if x stored None, a
reference to a one-node linked list storing new_value). We call these functions
like

  x = append_ll(x, new_value)

and

  x = append_ll_r(x, new_value)

As with list_to_ll and copy, the iterative version needs to remember the front
while using ll to traverse down the list, to find the last list node to extend.

def append_ll(ll,value):
    if ll == None:	    # special case for an empty list
        return LN(value)

    front = ll
    while ll.next != None:  # while ll does not refer to the last node...
        ll = ll.next        #   advance: terminates when ll.next == None

    ll.next = LN(value)     # (at end: ll.next == None) put value after end node
    return front            # return reference to original front of ll (still front)

The recursive method is again simpler to write.

def append_ll_r(ll,value):
    if ll == None:
        return LN(value)
    else:
        ll.next = append_ll_r(ll.next,value)
        return ll

Here is the proof this function is correct

(1) It recognizes and computes the correct result for the smallest (empty)
      linked list: it returns a reference to a linked list with one node
     (storing value: it is both the front and end of the new list)

(2) Each recursive call is on a smaller linked list, which is closer to the
      base case of None: the recursive call is on ll.next.

(3) Assuming append_ll_r(ll.next,value) returns a reference to a linked list
      that is one longer than ll.next containing all its list nodes followed by
      value in the last list node, this function returns a linked list that is
      one longer than ll containing all its list nodes followed by value in the
      last list node (by storing in ll.next a reference to the extended linked
      list and returning the original reference to ll).

ICS-46 studies the execution times of various code applied to data structures.
We will do a bit of this analysis in ICS-33, in week 8. Lists in Python allow
us to add a value at the end very quickly, but adding a value at the front of a
long list takes much more time: Python must first move the value at index 0
into index 1; the value at index 1 to index 2; ...For linked lists, adding a
value at the front is very quick, while adding a value at the rear requires
traversing every value in the list (to find the end). Depending on how often we
perform these two operations, it might be faster to use a list or a linked list.
If we frequently update the information at the end of a linked list, we can
keep (cache) a reference to it, so we can immediately find the last node in a
linked, to speed up our functions.

Here are two simple functions (not iterative or recursive) to mutate a list by
adding/removing a value directly after the one referred to by their argument.
Both functions return None implicitly.

def add_after_ll(ll,value):
    # raises an exception if ll is None; otherwise splice value after ll
    ll.next = LN(value,ll.next)

def remove_after_ll(ll):
    # raises exception if ll (no list) or ll.next (no value to remove) is None
    # excise value after ll
    ll.next = ll.next.next

Note that to remove the first value in a linked list, we write

  x = x.next

Finally, we could write an append method in the LN class itself, which scanned
to the end of the linked list containing the LN, and appending the value there.
As with generator/__iter__, this method won't work for an empty linked list,
because its class is NoneType, not LN.

------------------------------------------------------------------------------

Problems

1) Define a recursive function that converts a linked list of values into a
standard Python list?

2) To really understand how low-level linked list code works, use the list
shown above execute the call.

x = magic(x)

It actually mutates the list in a complicated way and returns a reference to
one of its nodes. Hand simulate the results (calling the function to see the
result produced before trying to hand simulate it has zero educational value.
I don't care whether you know the answer; I care whether you can hand simulate
this code and code like it that you might write.

def magic(ll): 
    answer = None
    while ll != None:
        t_m      = ll
        ll       = ll.next
        t_m.next = answer
        answer   = t_m 
    return answer

3) Define a function named select with two arguments: a linked list (ll) and a
non-negative integer (n); it returns the value of the nth value in the list
or raises an exception if there are too few values in the list. Write this
function iteratively and recursively.

4) Define a function named append, with two linked list arguments; it returns
a reference to the first node in a new linked list (lots of new LN objects)
with all the values in the first followed by all the values in the second. This
method does not mutate the arguments lists, it copies every list node in each.
Write this function iteratively and recursively.

5) Define a function named append, with two linked list arguments; it returns
a reference to the first node in a linked list (no new LN objects) that contains
all the values in the first followed by all the values in the second.  This
method mutates the arguments lists (by making the next of the last node in the
first list refer to the first in the second list; but be careful about empty
lists). Write this function iteratively and recursively.

6) Define a function named interleave, with two linked list arguments; it
returns a reference to the first node in a new linked list (lots of new LN
objects) with all the values in the first interleaved with all the values in
the second. This method does not mutate the arguments lists. Write this function
recursively. If one list is longer than the other, just copy all the leftover
nodes from the longer list at the end of the result.

7) Define a function named interleave, with two linked list arguments; it
returns a reference to the first node in a linked list (no new LN objects) that
contains all the values in the first interleaved with all the values in the
second.  This method mutates the arguments lists. Write this function
recursively. If one list is longer than the other, just refer to all the
leftover nodes from the longer list at the end of the result.

8) Define a function named reverse, with one linked list argument; it returns
a reference to the first node in a new linked (lots of new LN objects) with all
the values in their reverse order. Write this function iteratively and
recursively.

9) Define a function named reverse, with one linked list argument; it returns
a reference to the first node in a linked (no new LN objects) with all the
values in their reverse order. This method mutates the argument list. Write
this iteratively and recursively

10) For the recursive functions written in 8 and 9, rewrite them to use a helper
method with an extra value that accumulates the reversed linked list. Such
functions will be tail recursive; translate their code to iterative functions.

11) Define a function named is_ordered, with one linked list argument; it
returns whether or not the list values are in non-decreasing order (each one
must be <= its successor).

12) Define a function named insert_ordered, with two arguments: one an ordered
linked list (see problem 11) and one value; it returns a reference to a linked
list with all the original values and the new one added so the linked list is
still ordered.

13) Define the __iter__ method in LN such that we can iterate over the nodes
(not the values) in a linked list. Hint: either return  a nested class
implementing __next__ or return a generator (simpler) to do the job. To sum
all the values in a linked list using such an iterator, we would write

  for r in ll:
      print(r.value)

14) Reread the definition of list_to_ll above. Why doesn't the following code
work equivalently?

def list_to_ll(l):
    if l == []:
        return None
    front = rear = LN(l[0])   # next parameter is implicitly None in LN

    for v in l[1:]:
        rear = rear.next = LN(v)
    return front

Hint: chaining the equal sign (as in rear = rear.next = LN(v)) works differently
in Python than in C++; what result would the call list_to_ll([1,2,3,4,5]))
produce for the code above? Can you change the code to produce the correct
result?