CompSci 261, Spring 2017, Homework 1

Suppose that you are writing code to process html files using the Python html.parser module. This module will read lines from a file, and call functions handle_starttag and handle_endtag (that you define yourself) when it encounters html tags of different types. Both of these functions take a string argument, the type of tag. However, it will not check that each end tag equals the corresponding start tag, as it should in correct html. Describe how your handle_starttag and handle_endtag functions could use a stack data structure to perform this check. Your description should be in English, not code or pseudocode.

For instance, your method should accept as correct the sequence of calls starttag("li"); starttag("p"); endtag("p"); endtag("li"). However, if the two endtags occur in the opposite order, or if any one of these four calls is omitted, your method should determine that there is a problem.

Solution: Maintain a stack of tags. Each starttag operation should push its argument onto the stack. Each endtag operation should pop a tag from the stack and check that it matches its argument. When the file has been entirely processed in this way, you should also check that the stack is empty.

We saw how to use a dynamic array to implement a stack, so that each push or pop operation calls a constant number of dynamic array operations. Similarly, describe how to use dynamic arrays to implement a queue so that each enqueue or dequeue operation calls a constant number of dynamic array operations. Your solution should not depend on the details of its dynamic array; in particular it should not notice when an array is resized and do anything different in that case. It should only access an array by reading and writing array elements, using operations that grow or shrink the array by one cell, or creating a new empty array.

(Hint: maintain two dynamic arrays, one for enqueuing and one for dequeueing, and switch them from one role to the other when necessary. Note that your solution should use a constant number of dynamic array operations in the worst case, not just on average. You may assume that deleting an array (even a large one) is a single operation. For full credit, the arrays maintained by your solution should have total length $O(n)$, where $n$ is the number of elements currently in the queue.)

Solution: Let $X$ and $Y$ be two dynamic arrays, stored by the queue data structure, and also store an index $d$ that counts how many elements of $X$ have already been dequeued. To enqueue a new element, add it to the end of $Y$ with a grow operation. To dequeue an element, first test whether $d$ equals the length of $X$; if so, set $X=Y$ (discarding the old array in $X$), set $d=0$, and set $Y$ to be a new empty array. The total length of $X$ and $Y$ will always be at most $2n$ where $n$ is the largest number of elements ever in the queue.

When we analyzed dynamic arrays that expand the array by a constant factor $c>1$ whenever the array becomes full (its size equals its length), we showed using geometric sequences that the average time per grow operation is constant. This is true for $c=2$, $c=3/2$, or $c=9/8$, the constants for the textbook version of the algorithm, the Java version, and the Python version. But what if we want to determine more accurately how the time depends on $c$ rather than just saying that it's a constant? Do the same analysis more carefully to give the average time per grow operation, in $O$-notation, as a function of $c$.

(Hint: consider the same geometric series $1+c+c^2+\dots+n+cn$, and divide its sum by the number $n$ of operations.)

Solution: The total length of all of the arrays created by the expand operations is given by the geometric series above, and the total time for these operations is proportional to this total length. The series adds to $\tfrac{cn}{1-1/c}$, and the number of operations is $n$, so the average time per operation is $O(\tfrac{c}{1-1/c})$. For $c$ near 1 this is also $O(1/(c-1))$.

The Wikipedia article on the potential function method includes a demonstration that, for a binary number undergoing operations that add one to the number, the amortized number of bits that change per operation is $O(1)$. It uses a potential function $\Phi$ that counts the number of nonzero bits of the number, and observes that each operation that changes a large number of bits also decreases $\Phi$ by a correspondingly large amount, so that using amortized analysis the large decrease in $\Phi$ cancels the large number of bit-changes. Now suppose that we are instead maintaining a decimal number, under the same operations, and that we want to show that the number of digits that change per operation is $O(1)$. Provide a potential function that works for this problem, and explain why it works.

Solution: Set the potential function to be the number of digits that are equal to 9. When $d$ digits change in an add-one operation, $d-1$ of them were 9's and become 0, so the actual time for the operation is cancelled by the decrease in the potential function. (It would also work to use a different potential function that counts the number of nonzero digits, or the number of odd digits.)