Informatics 42 • Winter 2008 • David G. Kay • UC Irvine

Fourth Homework

Get your work checked and signed off by a classmate, then show it to your TA in lab by Monday, February 4. This assignment is all pencil-and-paper, which should give you some more time to work on your lab assignment.

You have been asked to write the spell checker for SnazzyWrite, a word processing program that competes with Microsoft Word. Your software will take words from the user's document and look each word up in a list of correctly spelled words. (Sometimes this is called a dictionary, but for clarity we'll call it a word list even if it's not implemented as a list.)

If your program doesn't find a word in the word list, it notifies the user that the word is probably misspelled. In that case, it also gives the user the opportunity to add the word to the word list, since the word may be a new word rather than a misspelling.

You consider the following alternatives for implementing the word list:

I. A dynamically allocated singly-linked list, ordered alphabetically, plus a single integer to store the number of words

II. An array ordered alphabetically, plus a single integer to store the number of words

III. A hash table as described in class (with a good hash function, an appropriate table size, and collisions resolved by chaining)

IV. A reasonably balanced binary search tree, ordered alphabetically

(a) Complete the following table, giving the best O-notation for each operation on each data structure, assuming that each operation is implemented in Java using the most efficient algorithm available. Assume that the word list contains w words and the user's document contains d words--use these variables, not n.
Operation Linked List Array Hash Table Bin. S. Tree
Look up (search for) a single word in the word list O(w) O(log w) O(1) O(log w)
Add a new word to the word list O(w) O(w) O(1) O(log w)
Spell-check an entire document (assuming no new words) O(dw) O(d log w) O(d) O(d log w)
Print all the words in the word list in alphabetical order O(w) O(w) O(w log w) O(w)

(b) In the table above, some of the entries measure primarily comparisons and others mainly measure data movements. Circle each entry above whose O-notation measures primarily data movements.

(c) Which of these data structures would be best for this task? Give a very brief but clear and valid reason for your choice.

(d) Considering only the O-notations above, which two data structures should, logically, never be used for this task?

(e) Let's think a little bit about designing SnazzyWrite to properly handle exceptions.

(e.1) As part of the SnazzyWrite application, you decide that you need the following method:

public ArrayList<String> readParagraphs(String filename)

The method is intended to take a filename as a parameter, open the specified file, read its contents, and return an ArrayList<String>, where each String is the text of one paragraph.

One of the first things that the method will do is to open the specified file, in preparation for reading its contents. As we discussed in class, this operation fails in some circumstances (e.g., the file doesn't exist, can't be accessed because it's locked by another program, resides on a hard drive that's connected via a network connection that's down), throwing an IOException. Should the method catch and handle this exception, or should the method allow it to be thrown to its caller? In a couple of sentences, justify your answer.

(e.2) Now we'll turn our attention back to the spell checker that we were considering in the previous parts of this homework. Suppose that you're going to write a method called checkDocument that spell checks an entire document — perhaps it's given the ArrayList<String> returned by readParagraphs from part (e.1) as a parameter. Would it be a reasonable design strategy for checkDocument to throw an exception in the case that it finds a spelling error? In a couple of sentences, explain why or why not.

Written by David G. Kay, Winter 2001. Modified by Alex Thornton, Winter 2007.