Reference Variables in Linked List Processing/Special Linked Lists Introduction (mostly Special Linked Lists): In this lecture we will first examine the use of references and pointers to pointers to learn alternative ways to write some linked list processing code. Then we will examine some standard variations to simple linear-linked lists: circular, header, trailer, and doubly-linked lists (and combinations of these). We will discuss the tradeoffs between using these lists and simple linear-linked lists. Standard linear-linked lists are used much more often than any of these variants, but it is interesting to see what tradeoffs the variants allow: Programming Assignment #2 (and some later programming assignments) will involve using these variants. ------------------------------------------------------------------------------ References and Pointers to Pointers Recall that if a parameter is passed by reference, the parameter aliases the storage location of the argument. We will draw pictures of aliased arguments by having a line (not an arrow, arrows are for pointers) connect the parameter to the argument (the parameter will not have its own box, it will just show the arrowless line connecting it to the box it aliases). In the code, whenever we examine/change a reference parameter we are really examining/changing the argument passed to it. The parameter is bound automatically to the address of the argument and examined/stored by automatic dereferencing the address. Note that we often see "const T&" as the mode/type of a parameter: that provides a fast way to do the equivalent of parameter mode/type "T": access an argument of type T and do not change it: for large chunks of data & is faster because it passes the address of of the argument; it doesn't copy all the values in the argument (but the const ensures the reference is not used to change the argument or any part of it). So this mode is fast and safe. For passing "small" types (like int), both take the same amount of time, but accessing a value through an address (indirectly) is a bit slower. So, there are efficiency tradeoffs. We can use references to simplify code for the functions that change linked lists (e.g., adding/removing values): by writing void functions that do not have to return a value. Here are the standard void add_front, add_rear, and remove_lookahead functions, where the pointer to the begining of the linked list is passed by reference (but NOT const). We read the mode/type specification backwards (right to left) "LN*& l" means "l is a reference to a pointer to an LN object instantiated with values of type T". In all these cases, we call these functions like: add_front(x,some_value); they do NOT return a result but instead change either x's state (by the reference mode parameter) or the state of some LN in the linked list x points to. The only changes to the code from the original functions that we studied are that the parameter l may be changed (meaning the argument matching l may be changed), and that no value is returned. template void add_front(LN*& l, const T& value) { l = new LN(value,l); //change argument aliased to l } template void add_rear(LN*& l, cont T& value) { if (l == nullptr) l = new LN(value); //change argument aliased to l else { for (LN* p = l; /*see body for termination*/; p = p->next) if (p->next == nullptr) { p->next = new LN(value); break; } } } template void remove_lookahead (LN*& l, const T& to_remove) { if (l == nullptr) return; if(l->value == to_remove) { LN* to_delete = l; l = l->next; //change argument aliased to l delete to_delete; }else{ for (LN* p = l; p->next != nullptr; p = p->next) if (p->next->value == to_remove) { LN* to_delete = p->next; p->next = p->next->next; delete to_delete; break; } } } Examine the first picture accompanying this lecture note. It illustrates how add_rear(x,...); works when x is nullptr (on the left) and when x points to a linked list (on the right). Again, do some hand simulations to better understand how the reference parameter l is used in the code, following the rules and graphics for reference parameters. Finally, we can get some more leverage from using pointers to pointers in our code, although the complexity and extra execution overhead can be higher. The following function removes a value from a list. It uses p to "point to a pointer" and uses a smaller amount of (complicated) code to remove the first LN containing to_remove from the list. In a call remove_ptp(x,5) it starts with p pointing to the storage occupied by the variable x. Here is the first example of a pointer that does NOT point to an object, but instead points to a storage location that points to an object: p will first point to x's storage, and then p will point to the storage of "next" instance variables in the LN (list node) objects in the linked list. In this code, p traverses the linked list until p points to a storage location (the "x" variable or the "next" instance variable in some LN object) such that (*p)->value equals to_remove: in this case the code updates the variable p points to; it stores a pointer to the LN beyond the first one storing to_remove. Note (*p)->value is equivalent to (**p).value. How do we "advance" p. We use the & operator to find the address of it argument, computing a pointer to it. So p = &((*p)->next) means, "go to the location that p points to, and follow that pointer to examine its next instance variable; then store a reference (address: pointer to) that instance variable into p". By using a pointer to a pointer we don't need to use either lookahead or ghost pointers, and there is no special case for removal at the front of the list. But, the extra indirection of p pointing to a variable containing a pointer makes the code complicated to understand and less efficient to execute. You should still try hand simulating it using pictures, but I consider understanding this code a bit beyond what is required for this course. template void remove_ptp (LN*& l, T to_remove) { for (LN** p = &l; (*p) != nullptr; p = &((*p)->next)) if ((*p)->value == to_remove) { LN* to_delete = *p; (*p) = (*p)->next; delete to_delete; break; } } ------------------------------------------------------------------------------ Circular Linked Lists: A circular linked list is a linked list whose rear LN points to its front LN (so nullptr does not mark the end of the list). It becomes a bit arbitrary to speak of a front and rear when these LN nodes are arranged in a circle. A circular list may be empty, but if there are LN nodes, none of them have a next instance variable storing nullptr. See the pictures of simple circular linked lists. The one LN version is especially interesting because its next instance variable points to itself (since its single node is both its front and rear). Here is code to print a circular linked listing by printing the first node, the second, ...., the last, and then literally "..." to show the nodes continue in a circle. template void print(LN* cl) { if (cl == nullptr) { std::cout << "nullptr"; // No nodes at all return; } std::cout << cl->value; // print "Front value" for (LN* p = cl->next; p != cl; p = p->next) // Loop until "Front" again std::cout << "->" << p->value; // Not "Front" print it std::cout << "..." There are a few applications for circular lists. One allows us to represent and efficiently process a queue by having a single pointer to the "rear" value in the queue (instead of one to the front and one to the rear). Note that in a circular list, the node one beyond the rear is the front, so both the front and rear nodes in a queue are easily reached from the rear node: all "action" in a queue takes place at the front (remove) or rear (add). If we pointed to the front node, there could be an arbitrary number of nodes between it and the rear node, so it would take lots of time to reach the rear node from the front; but it takes only one operation to reach the front node from the rear. So it isn't intuitive, but pointing to the rear node is the right node to point to to efficiently process queues represented by circular lists. If rear points to the last node an a queue represented by a circular list, here is code to add to the rear of the queue (enqueue). Draw a picture of an empty list, a circular list with one node, and a circular list with 3-5 nodes and simulate this code on those lists to see that it is correct. if (rear != nullptr) rear = rear->next = new LN(some_value,rear->next);//rear->next is front else rear = new LN(some_value); //Make it a 1 node circular list, rear->next = rear; // pointing to itself; cannot do in 1 line } If rear points to the last node in a queue represented by a circular list, here is code to remove the front of the queue. Draw a picture of an empty list, a circular list with one node, and a circular list with 3-5 nodes and simulate this code on those lists to see that it is correct. if (rear != nullptr) { int front_value = rear->next->value; //Front is after rear (even if same) if (rear->next == rear) //Just 1 node in the list? rear = nullptr; // Yes, make empty list else rear->next = rear->next->next; // No: rear->next refers to new front } // the one currently after front For simplicity, this code does not delete the removed LN. If we had to represent a huge number of queues, most of which are empty, using a class that stored one pointer to the rear of a circular linked list will save space when compared to using a class that stores two pointers to the front and rear of a linear-linked list. ------------------------------------------------------------------------------ Header Linked Lists: A header linked list is a linked list that always contains a special header node; an empty header list contains just this one node. The header node stores no real data (in its value: technically it stores the value computed by the type's default constructor); this node exists solely to simplify code that processes linked lists by ensuring that every "real" node in the linked list is preceded (is pointed to) by some node. This guarantee allows us to never have to change what front points to (it always refers to the header) and therefore we do not need any special-case code to handle this (now impossible) option, or reference parameters, or pointers to pointers (the other ways to simplify this code). For example, I originally wrote the following code for adding a value v to the rear of a linked list (which also caches a pointer to the rear LN) if (front == nullptr) front = rear = new LN(v); else rear = rear->next = new LN(v); If the linked list were a header list, an empty list would have a front and rear: would both point to the header node (whose ->next is set to nullptr in the constructor). We could simplify the codeabove to the following single line of code (because front would never store a nullptr) rear = rear->next = new LN(e); In a header list, one never changes front: it always refers to the header. So every "real" node that we manipulate in the list is guaranteed to have a previous node pointing to it (via its next instance variable). This guarantee often simplifies the code for adding and removing nodes (see the example above), although some other operations (like traversal) might be more complicated, because we must remember to skip the header node. ------------------------------------------------------------------------------ Trailer Linked Lists: A trailer linked list is a linked list that always contains a special trailer node; an empty trailer list contains just this one node. As with a header node, this node stores no real data in its value; this node exists solely to simplify code that processes linked lists by ensuring that every "real" node in the linked list is followed by a node. In "trailer lists", every list has a special (valueless) trailer node at the end. So, an "emtpy trailer list" would have one list node in it. All "real" nodes in the list come before the trailer. By using a trailer node, we can remove a node that we have a pointer to, without needing a pointer to the node before it! I'll show this "trick" in class; hand simulate the following code again to see how it works (do it with p pointing to the first, last and middle node in a trailer list). Note that the "last" real node IN a trailer list (the last one storing real data) is the one before the trailer node. This code works correctly only if the node p points to is always followed by another node (which is guaranteed for a trailer list; and p -the node to be removed- must never point to the trailer itself). LN* to_delete = p->next; p->value = to_delete->value; p->next = to_delete->next; delete to_delete; in fact, in C++ we can collapse the inner two statements into *p = *to_delete; The statement *p = *to_delete copies both the instance variables in the LN to_delete points to into the LN p points to. It uses the = operator to copy the contents of one LN to another LN. Being able to remove the value/node referred to by a pointer (without know what node is before it), is useful if we are implementing an iterator by pointing to its current node (the one we might need to erase). Otherwise the cursor for an iterator will have to keep track of the current node and the one before it (which might be "none" if the current node is the first). Sometimes it is useful to combine header and trailer nodes into one list, allowing all simple methods (both those simple for header lists and those simple for trailer list) to be written simply. The next picture illustrates an empty header/trailer list and one containing three int values. ------------------------------------------------------------------------------ Doubly-Linked Lists: A doubly-linked list is a non-linear linked list: it contains pointers both to the node that comes before (prev) and after (next) it in the linked list. We would define such a doubly-linked node (DLN) as template class DLN { public: LN () : prev(nullptr), next(nullptr){} LN (const DLN& dln) : value(dln.value), prev(dln.prev), next(dln.next){} LN (T v, DLN*p = nullptr, DLN* n = nullptr) : value(v), prev(p), next(n){} T value; DLN* prev; DLN* next; }; Given this definition, we can traverse a doubly-linked list in either direction: we can reach any node from any other node. The cost for this flexibility is an increase in the space needed to store a doubly-linked list: twice as many pointers in each node (2 instead of 1) and the requirement to change twice as many pointers when we mutate the list, e.g., when adding/removing values in nodes. For example, when we add a node to a doubly-linked list, we need to make the next variable of the one before it refer to the new node, and the prev variable of the one after it refer to the new node. And in the new node itself, we need to set its prev variable as well as its next instance variable. The following method illustrates how to remove a node from a doubly-linked list. Assume DLN* node_to_remove; if (node_to_remove->prev == nullptr) // or (front == node_to_remove) front = node_to_remove->next; else node_to_remove->prev->next = node_to_remove->next; if (node_to_remove->next != nullptr) node_to_remove->next->prev = node_to_remove->prev; delete node_to_remove; If the node added is at the front or rear of a doubly-linked list, there are all sorts of special cases to handle. By having both a header and a trailer node in a doubly-linked list, we can simplify this code and remove all the special cases. For example, in a doubly-linked lists with header and trailer nodes, we can simplify the remove code above to node_to_remove->prev->next = node_to_remove->next; node_to_remove->next->prev = node_to_remove->prev; delete node_to_remove; But now, even an empty list has two nodes: the header and trailer, linked to each other. Note that in a doubly-linked list with header and trailer nodes, node_to_remove->prev will never be nullptr, and likewise node_to_remove->next will never be nullptr; because, node_to_remove will point to a real node, not the header or trrailer. But, to create an empty doubly-linked header/trailer list requires the following code: front = new DLN() front->next = new DLN(); front->next->prev = front; which is a bit convoluted. The overhead and extra complexity of these special linked lists make them more cumbersome to use than simple linear linked lists, but sometimes we require their extra functionality. Probably trailer lists are the most practical of all these special lists, and we will use them in some part of out programming assignments. Added Fall 2018: There is a special data type called a double-ended queue (deque, pronounced like "deck"). It has a front and rear, and allows pushing/popping at the front (like a stack) and enqueueing/dequeueing at the rear (like a queue). We can efficiently implement a deque using arrays or linked lists. My standard 5 data types does not include an implementation of deques.