| Introduction |
This programming assignment is designed to ensure that you know how to use
combinations of Python's most important data types to model and
compactly write code that solves a wide variety of different programming
problems.
The kind of abstract thinking that goes into modeling solutions to these
programming problems with these data types (and iteration over them) is
critical to your development as computer scientists.
There are five parts to this assignment (and and extra credit part at the end). In each you will be asked to write a module that contains a few functions and a script at the bottom, which ties these functions together to solve the problem. You should download the program1 project folder and use it to create an Eclipse project. You will create each modules in this project, and submit each module separately in Checkmate. The project folder contains no modules, but it does contain all the data files you need to test/debug you modules. You should work on this assignment in pairs, with someone in your lab section. Try to find someone who lives near you, with similar programming skills, and work habits/schedule: e.g., talk about whether you prefer to work mornings, nights, or weekends; what kind of commitment you will make to submit program early. If you believe that it is impossible for you to work with someone, because of some special reason(s), you should send me email stating them and asking for special permission to work alone (which I do grant, but not frequently). Only one student should submit all parts of the the assignment, but both student's names should appear in the comments at the top of each submitted .py file. It should look something like # Romeo Montague, Lab 1 # Juliet Capulet, Lab 1 # We certify that we worked cooperatively on this programming # assignment, according to the rules for pair programmingPlease turn in each program as you finish it, so that I can accurately assess the progress of the class as a whole during this assignment. Print this document and carefully read it, marking any parts that contain important detailed information that you find (for review before you turn in the files). The code you write should be as compact and elegant as possible, using appropriate Python idioms. You should familiarize yourselves with the safe_open function in the goody module and all the functions in the prompt module, both of which you should have installed in your courselib folder as part of the Eclipse/Python installation. Recall how to use the sep and end parameters in the print function. Finally, reread the section on Time Management from Programming Assignment 0 before starting this assignment. |
| Problem #1: Reachability |
Problem Summary:
Input and OutputRead a file of pairs of node names (representing edges) in a directed graph, building a dict whose key is a str source node and whose value is a set of str destination nodes that are each reachable from the source node key. We write this informally as dict[str] -> set{str}
Two nodes appear on each line: first the source node, then the destination
node, with these node names separated by one semicolon character.
For example, the input file graph1.txt contains the following
lines (which could appear in this order, or any other):
Print the graph, one source node per line (the source nodes are printed
alphabetically) followed by the set of all the destination nodes that the
source can immediately reach.
The graph above would print as
Note that the source nodes are sorted alphabetically, but the set
of desintation nodes does not have to be sorted:
in fact it makes no sense to talk about sorted sets; we could talk
about a sorted list whose contents came from a set.
Note that because node g is not a source node (it is only a
destination node), it does not appear first on any line (and appears only
in the sets for source nodes d and f.
Note that there are multiple data files for this program: graph1.txt,
graph2.txt and graph3.txt; test/debug your program on
the first file; when you are done, test it on the last two.
Draw the graph represented by each for to ensure that your code correctly
prints it and computes the nodes reachable from any source node (which you
can do by eyeballing the graphs: they are small).
Repeatedly prompt the user for a starting node in the graph (until quit
is entered) and compute and print all the nodes that are reachable from it by
following edges in the graph.
Reject any node not present in the graph.
An example interaction (processing the graph above) might be
I am supplying these number of lines not as requirements, but ballpark estimate
of the amount of code you should write.
Here is the basic algorithm for computing reachability; it is simple to explain
and not (very) complicated to implement.
But, you have to understand these instructions and carefully translate them into
Python.
You should hand-simulate this algorithm using the graph above, and verify that
it produces the results you expect before coding it.
You might be tempted to use recursion, but please don't: unless recursion is
done very carefully, it will run forever on graphs with cycles: one of the
input files is a graph with cycles.
Print the set containing all these node labels.
When debugging this algorith, print the set and list after every
interesting change, or use the debugger to see how these change.
|
| Problem #2: Instant Runoff Voting |
Problem Summary:
Input and OutputRead a file of voters and their ranking of the candidates, separated by semicolons, building a dict whose key is each voter and whose value is a list of candidates ranked by that voter (they appear in the file in order, from most favorite to least favorit) We write this informally as dict[str] -> list[str].
For example, the input file votepref1.txt contains the following
lines (which could appear in this order, or any other):
Print all the associations in this dict, one per line (the voters are
printed alphabetically) using the following form.
Each line contains the voter and his/her complete ranking of the candidates.
For example, the file above would produce:
Note that the voter names are sorted alphabetically, but the list of preference appear the same order they appeared in the file. There are multiple data files for this program: votepref1.txt, votepref2.txt and votepref3.txt; test/debug your program on the first file; when you are done, test it on the rest.
Start with all the candidates.
Evaluate the ballot to determine how many votes each candidate received.
Print this vote count two ways: sorted alphabetically and sorted numerically
(in decreasing order).
Remove the candidate(s) receiving the fewest votes, and repeat this process
until only one or no candidates remain.
Finally, print the outcome of the election: a single candidate winner or a tie.
An example interaction (processing the graph above) might be
Functions and Script
Sample InteractionThe program, as specified, will have the following interaction: user-typed information appears in italics. Your output should match this one. Enter file with voter preferences: votepref1.txt
Voter Preferences
A -> ['X', 'Y', 'Z']
B -> ['Y', 'Z', 'X']
C -> ['Y', 'Z', 'X']
D -> ['Z', 'Y', 'X']
E -> ['Z', 'Y', 'X']
Vote count on ballot #1 with candidates (alphabetically) = {'Z', 'Y', 'X'}
X -> 1
Y -> 2
Z -> 2
Vote count on ballot #1 with candidates (numerical) = {'Y', 'X', 'Z'}
Y -> 2
Z -> 2
X -> 1
Vote count on ballot #2 with candidates (alphabetically) = {'Y', 'Z'}
Y -> 3
Z -> 2
Vote count on ballot #2 with candidates (numerical) = {'Y', 'Z'}
Y -> 3
Z -> 2
Winner is {'Y'}
You can also try processing the votepref2.txt file (which leads to a No winner result) and votepref3.text. |
| Problem #3: Finite Automata |
Problem Summary:
Input and OutputRead a file that describes a FA: each line contains a state and an arbitrary number of input->state transitions. Build a dict such that each keys is a str state and whose values is another dict specifying of the transitions from that state: this second dict has keys that are str inputs and values are str states. The first token on each line is the str state and the remaining tokens (always coming in pairs) are str inputs and states. All tokens are separated by one semicolon character. We write this informally as dict[str] -> (dict[str] -> str).
For example, the input file faparity.txt contains the following lines
(which could appear in this order, or any other):
Here, the state even (meaning it has seen an even number of 1 inputs so far) is a key in the main dict. It's value is a dict with two key/value pairs 0/even and 1/odd. It means that in the even state, if the input is a 0 the FA stays in the even state; if the input is a 1 the FA goes to the odd state. And similarly (the next line) means that for the odd state, if the input is a 0 the FA stays in the odd state; if the input is a 1 the FA goes back to the even state. So, seeing an input of 0 keeps the FA in the same state; seeing an input of 1 flips the FA into the other state. Print the finite automaton, one state (and its transitions) per line; the states are printed alphabetically and the transition dict for each state is printed as a list of input/state items (tuples) such that these are printed alphabetically by the inputs.
For example, the file above would produce:
Note that there are multiple data files for this program: faparity.txt and fadivisibleby3.txt; test/debug your program on the first file; when you are done, test it on the last file. Draw the FA represented by each for to ensure that your code correctly prints and computes with it. Repeatedly process lines from a second input file, computing the results of the finite automaton for a start-state and its inputs; then print out all the results in a special form. Each line in the file contains a start-state followed by a sequence of inputs. The start-state will be a state in the FA (is a key in the outer dictdict).
For example, the input file fainputparity.txt contains the following
three lines:
The result of processing each line is to print the start-state, and then each
input and the new state it transitions to, and finally print the stop-state.
For the parity FA and the first line in this file, it should print
Functions and Script
Sample InteractionThe program, as specified, will have the following interaction: user-typed information appears in italics. Your output should match this one. Enter file with finite automaton: faparity.txt
Finite Automaton
even transitions: [('0', 'even'), ('1', 'odd')]
odd transitions: [('0', 'odd'), ('1', 'even')]
Enter file with start-state and input: fainputparity.txt
Starting new simulation
Start state = even
Input = 1; new state = odd
Input = 0; new state = odd
Input = 1; new state = even
Input = 1; new state = odd
Input = 0; new state = odd
Input = 1; new state = even
Stop state = even
Starting new simulation
Start state = odd
Input = 1; new state = even
Input = 0; new state = even
Input = 1; new state = odd
Input = 1; new state = even
Input = 0; new state = even
Input = 1; new state = odd
Stop state = odd
Starting new simulation
Start state = even
Input = 1; new state = odd
Input = 0; new state = odd
Input = 1; new state = even
Input = 1; new state = odd
Input = 0; new state = odd
Input = x; illegal input; terminated
Stop state = None
|
| Problem #4: Non-Deterministic FA |
Problem Summary:A non-deterministic finite automaton (NDFA) is machine described by its states and its transitions: each transition for a state specifies an input and what state (or states: that what makes it non-deterministic) that input leads to. We can illustrate a NDFA as a graph with labelled edges (see below). The critical difference is that a NDFA can have multiple edges with the same label going to different states (we'll see how to handle such transitions below).
Input and OutputRead a file that describes a NDFA: each line contains a state and an arbitrary number of input->state transitions. Build a dict such that each keys is a str state and whose values is another dict specifying of the transitions from that state: this second dict has keys that are str inputs and values are sets of str states: all the states a particular input can lead to. The first token on each line is the str state and the remaining tokens (always coming in pairs) are str inputs and states. All tokens are separated by one semicolon character. We write this informally as dict[str] -> (dict[str] -> set{str}).
For example, the input file ndfaendin01.txt contains the following lines
(which could appear in this order, or any other):
Here, the state start is a key in the main dict. It's value is a dict with two key/value pairs 0 mapping to the set containing start and near and 1 mapping to the set containing just start. It means that in the start state, if the input is a 0 the NDFA can stay in the start state or it can go to the near state; if the input is a 1 the NSFA must stay in the start state. And similarly the next line means that in the near state, if the input is a 1 the NDFA must go into the end state. The last line means that the end state has no transitions out of it. Print the ndfa, one state (and its transitions) per line; the states are printed alphabetically and the transition dict for each state is printed as a list of input/set of state items (tuples) such that these are printed alphabetically by the inputs. Note that the state end is a key in the main dict, whose associated transitions are an empty dict
For example, the file above would produce:
Note that there is only one pair of data files for this program. Repeatedly process lines from a second input file, computing the results of the non-determinisitc finite automaton for a start-state and its inputs; then print out all the results in a special form. Each line in the file contains a start-state followed by a sequence of inputs. The start-state will be a state in the FA (is a key in the outer dictdict).
For example, the input file ndfainputendin01.txt contains the following
two lines:
The result of processing each line is to print the start-state, and then each
input and the new states (plural) it could transition to (the could
is what makes it non-deterministic), and finally print the stop-states.
For the ndfaendin01 NDFA and the first line in this file, it should print
Note especially that in the start state, if the input is a 0, then the NDFA can either remain in the start state or go into the near state. For this program, we keep track of all states that the NDFA can be in, using a set of new possible states. For the next input, 1, we can be either in the start state (from the start state, an input of 1 allows us to stay in the start state) or the end state (from the near state, an input of 1 allows us to transition to the end state). Thus, we keep track of the set of states the NDFA can be in, and the new set of states the NDFA can be in after processing the next input. In this example, because 'end' is included in the stop-states, this input does end in 01.
Functions and Script
Sample InteractionThe program, as specified, will have the following interaction: user-typed information appears in italics. Your output should "match" this one (recall the order of values in sets is not important). Enter file with non-deterministic finite automaton: ndfaendin01.txt
Non-Deterministic Finite Automaton
end transitions: []
near transitions: [('1', {'end'})]
start transitions: [('0', {'near', 'start'}), ('1', {'start'})]
Enter file with start-state and input: ndfainputendin01.txt
Starting new simulation
Start state = start
Input = 1; new possible states = {'start'}
Input = 0; new possible states = {'near', 'start'}
Input = 1; new possible states = {'end', 'start'}
Input = 1; new possible states = {'start'}
Input = 0; new possible states = {'near', 'start'}
Input = 1; new possible states = {'end', 'start'}
Stop state(s) = {'end', 'start'}
Starting new simulation
Start state = start
Input = 1; new possible states = {'start'}
Input = 0; new possible states = {'near', 'start'}
Input = 1; new possible states = {'end', 'start'}
Input = 1; new possible states = {'start'}
Input = 0; new possible states = {'near', 'start'}
Input = 0; new possible states = {'near', 'start'}
Stop state(s) = {'near', 'start'}
|
| Problem #5: Word Generator |
Problem Summary:
Input and OutputAfter prompting for the order statistic, read a file of words, building a dict storing data described as: dict[tuple[strn]] -> list[str]. Here the dict's keys are tuples of n words (n is the order statistic) and each key's value is a list of all the words in the text that follow these words: e.g., if n were 2, the dict would contain a key for every pair of words appearing next to each other in the text, and a value that is a list of all the words following this key (no matter where the pair occurs, with NO DUPLICATES allowed in the values list).The easiest way to read the words one at a time is to use an iterator over the result returned by the function goody.read_file_values, which is passed an open object to read from. Build the dict by "prereading" n words (by calling next) into a list (assume that this is always possible; how might it not be?); then read the next word and put it in as a value associated with the list of pre-read words; then, drop the oldest word in the list, and add this next word to the end of the list (always keeping the list length at n); read the next word, and repeat the process for this word, continuing until there are no words to read. You can use a normal for iterator over the values remaining. Remember to convert this list to a tuple before using it as a key in the dict.
For a simple example, the file ndfainput1.txt contains the following
lines (it could have all this information on one line or more lines):
Print all the associations in the dict, one per line in standard lexical order; after printing all associations, print the length of the smallest and largest list that is a value in the dict. Each line contains an n word tuple, followed by the list of unique words that follow them in the text. In standard lexical order, the keys appear in order relative to the first word in the tuple; for all first words that are the same, they appear in order relative to the second word in the tuple; for all first and second words that are the same, they appear in order relative to the thrid word in the tuple; etc. (see the example below).
For example, the file above would produce:
For example, ('a','d') appears three times in the text above, twice followed by 'c' and once followed by nothing (at the end of the file); ('a','b') appears twince in the file above, first followed by 'c' and second followed by 'a'. Prompt the user for the words to start with (there are order statistic number of them) and the number of random words after that to generate. Produce the list of words and print it. A random 10 word list, after the words a and d might print as Random text = ['a', 'd', 'c', 'a', 'a', 'd', 'c', 'a', 'a', 'd', 'c', 'b']In the result we start with a d (specified by the user), we know only c can come next; then using d c we know that either b or a must come next; it randomly chooses a...
Functions and Script
Sample InteractionThe program, as specified, will have the following interaction: user-typed information appears in italics. Your output should match this one. Enter order statistic: 2
Enter file to process: wginput1.txt
Corpus
('a', 'a') can be followed by any of ['b', 'd']
('a', 'b') can be followed by any of ['c', 'a']
('a', 'd') can be followed by any of ['c']
('b', 'a') can be followed by any of ['d', 'a']
('b', 'c') can be followed by any of ['b']
('c', 'a') can be followed by any of ['a']
('c', 'b') can be followed by any of ['a']
('d', 'c') can be followed by any of ['b', 'a']
min/max = 1/2
Enter 2 words to start with
Enter word 1: a
Enter word 2: d
Enter # of words to generate: 10
Random text = ['a', 'd', 'c', 'a', 'a', 'd', 'c', 'a', 'a', 'd', 'c', 'b']
You can also try reading a much larger file included in this project folder huck.txt, Mark Twain's, "The Adventures of Huckleberry Finn". I tried it with an order statistic of 3. The corpus has over 90,000 entries; the biggest key triple had an associated value with 46 uniqeue words in it. The key was ('out', 'of', 'the') and its associated value was the list ['window', 'face', 'woods', 'fourth', 'front', 'jacket', 'hole', 'canoe', 'middle', "ferryboat's", 'cottonwood', "captain's", 'river', 'fog', 'range', 'presbyterian', 'tree', 'nest', 'wagon-troughs', 'reach', 'store', 'way', 'wigwam', 'ark', 'room', 'corner', 'grave', 'nonesuch', 'trouble', 'kitchen', 'old', 'first', 'hardest', 'nigger-patch', 'sugar-bowl', 'window-hole', 'brass', 'spoon', 'house', 'tooleries', 'bag', 'office', 'post-office', 'cabin', 'path', 'chains'] With the appropriate modification, we can use this same program to read/generate music or DNA sequences or any other data made of symbols. |
| Extra Credit |
Write up all three answers in and drop them off on checkmate.
|