Program 6

Dictionary and Set Processing

Major Data Types and Iteration:
List, Tuple, Set, and Dict (and Open for files)

ICS-31: Introduction to Programming


Introduction This programming assignment is designed to ensure that you know how to use combinations of Python's most important data types to model and compactly write code that solves a few different programming problems. The kind of abstract thinking that goes into modeling solutions to these programming problems, with these data types (and iteration over them), is critical to your development as programmers.

There are two parts to this assignment. In each you will be asked to write a module that contains a few functions and a script at the bottom, which ties these functions together to solve the problem.

You should download the program6 project folder and use it to create an Eclipse project. You will create each module in this project, and submit each module separately in Checkmate. The project folder contains no modules, but it does contain all the data files you need to test/debug you modules.

Only one student should submit all parts of the the assignment, but both student's names should appear in the comments at the top of each submitted .py file. It should look something like


# Romeo Montague, Lab 1
# Juliet Capulet, Lab 1
# We certify that we worked cooperatively on this programming
#   assignment, according to the rules for pair programming
Please turn in each program as you finish it, so that I can accurately assess the progress of the class as a whole during this assignment.

Print this document and carefully read it, marking any parts that contain important detailed information that you find (for review before you turn in the files). The code you write should be as compact and elegant as possible, using appropriate Python idioms.

You can use either dict or defaultdict, and you will need to use split to convert a string you read from a file into a list of names. Although the example strings are all one-character, your code should work for any length strings.

You should familiarize yourselves with the safe_open function in the goody module and all the functions in the prompt module, both of which you should have installed in your courselib folder as part of the Eclipse/Python installation. Recall how to use the sep and end parameters in the print function. Finally, reread the section on Time Management from Programming Assignment 0 before starting this assignment.


Problem #1: Reachability

Problem Summary:

  • Write a script that prompts the user to enter the name of a file representing a graph.
  • Read the information in the file, storing the graph in a dictionary.
  • Print the graph.
  • Repeatedly prompt the user for a starting node in the graph, and compute and print all the nodes that are reachable from it by following edges in the graph.

Input and Output

Read a file of pairs of node names (representing edges) in a directed graph, building a dict whose key is a str source node and whose value is a set of str destination nodes that are each reachable from the source node key. We write this annotation as {str : {str}}

Two nodes appear on each line: first the source node, then the destination node, with these node names separated by one semicolon character. For example, the input file graph1.txt contains the following lines (which could appear in this order, or any other):

  c;f
  b;d
  a;c
  c;e
  d;g
  a;b
  e;d
  f;g
  f;d
which represent the graph

Print the graph, one source node per line (the source nodes are printed alphabetically) followed by the set of all the destination nodes that the source can immediately reach. The graph above would print as

  Graph: source -> {destination} nodes
     a -> {'c', 'b'}
     b -> {'d'}
     c -> {'f', 'e'}
     d -> {'g'}
     e -> {'d'}
     f -> {'g', 'd'}

Note that the source nodes are sorted alphabetically, but the set of desintation nodes does not have to be sorted: in fact it makes no sense to talk about sorted sets; we could talk about a sorted list whose contents came from a set. Note that because node g is not a source node (it is only a destination node), it does not appear first on any line (and appears only in the sets for source nodes d and f.

Note that there are multiple data files for this program: graph1.txt, graph2.txt and graph3.txt; test/debug your program on the first file; when you are done, test it on the last two. Draw the graph represented by each for to ensure that your code correctly prints it and computes the nodes reachable from any source node (which you can do by eyeballing the graphs: they are small).

Repeatedly prompt the user for a starting node in the graph (until quit is entered) and compute and print all the nodes that are reachable from it by following edges in the graph. Reject any node not present in the graph. An example interaction (processing the graph above) might be

  Enter starting node: e
  From e the reachable nodes are {'g', 'e', 'd'}

  Enter starting node: a
  From a the reachable nodes are {'g', 'f', 'e', 'd', 'c', 'b', 'a'}

  Enter starting node: quit

Functions and Script

  • read_graph has an open (file) parameter; it returns the dict representing the graph (mine is 9 lines).
  • print_graph has a dict parameter (representing the graph); it returns nothing, but it prints the graph in the appropriate form (mine is 4 lines).
  • reachable has a dict parameter (representing the graph) and a str start node in the graph (technically a key in the dict; it returns a set of all the nodes reachable from it by following edges in the graph (mine is 9 lines).
  • Write a script at the bottom of this module that calls these functions to solve the problem (mine is 7 lines).

    I am supplying these number of lines not as requirements, but ballpark estimate of the amount of code you should write.

    Here is the basic algorithm for computing reachability; it is simple to explain and not (very) complicated to implement. But, you have to understand these instructions and carefully translate them into Python. You should hand-simulate this algorithm using the graph above, and verify that it produces the results you expect before coding it. If you know what recursion is, you might be tempted to use it; but please don't: unless recursion is done very carefully, it will run forever on graphs with cycles: one of the input files is a graph with cycles.

    1. To compute all the reachable nodes in a graph, create a set (initially empty) of reached nodes and a list (initially containing the parameter start node) of nodes that we are going to explore (to find nodes they can reach).

    2. While the exploring list still has nodes, remove the first one (recall the pop method for lists) and put it into the reached set; for all its destination nodes that are not already in the reached set, put them in the exploring list.

    3. When the exploring list becomes empty (can you argue that this always will happen -there is no infinite looping), return the reached set.

    Print the set containing all these node labels. When debugging this algorithm, print the set and list after every interesting change, or use the debugger to see how these change.

Sample Interaction

The program, as specified, will have the following interaction: user-typed information appears in italics. Your output should "match" this one (sets will match if they have the same contents, independent of their order). You should also check that it works for other starting nodes, and a variety of starting nodes in the other graphs.
  Enter file name with graph: graph1.txt

  Graph: source -> {destination} nodes
    a -> {'c', 'b'}
    b -> {'d'}
    c -> {'f', 'e'}
    d -> {'g'}
    e -> {'d'}
    f -> {'g', 'd'}

  Enter starting node: e
  From e the reachable nodes are {'g', 'e', 'd'}

  Enter starting node: x
    Entry Error: 'x';  Not a source node
    Please enter a legal String

  Enter starting node: a
  From a the reachable nodes are {'g', 'f', 'e', 'd', 'c', 'b', 'a'}

  Enter starting node: quit

Problem #2: Instant Runoff Voting

Problem Summary:

  • Write a script that prompts the user to enter the name of a file representing the preference of a sequence of voters.
  • Read the information in the file, storing it in a dictionary.
  • Print the voter preference.
  • Repeatedly display the vote count for ballots (sorted both by candidate and numerically), eliminating from the election the candidate receiving the fewest votes, until one candidate (the winner) or no candidates (a tie) remain.
This form of election is known as instant runoff voting. Every voter submits a ballot that ranks all the candidates the election, from from most favorite candidate to least favorite (we will use a list for this purpose: earlier in the list means more favored).

During the first ballot, votes are counted for each of the candidates according to the rankings of the voters. Then the candidate(s) with the fewest number of votes are removed from the election: if more than one candidate receives the least number of votes, they all are removed from the election.

During the second ballot, votes are tallied for the remaining candidates (there are at least 1 fewer candidates); if a voter's first ranked candidate is still in the election, then that candidate should receive the vote; if a voter's first ranked candidate is not still in the election, then his/her second ranked candidate should receive the vote; but if his/her second ranked cnadidate has been removed from the election, then his/her third ranked candidate should receive the vote ...).

This ballot process continues until either 1 candidate remains, or 0 candidates remain (meaning that all the previous candidates tallied the same number of votes and therefore were all removed).

Input and Output

Read a file of voters and their ranking of the candidates, separated by semicolons, building a dict whose key is each voter and whose value is a list of candidates ranked by that voter (they appear in the file in order, from most favorite to least favorit) We write this annotation as {str : [str]}

For example, the input file votepref1.txt contains the following lines (which could appear in this order, or any other):

  A;X;Y;Z
  B;Y;Z;X
  C;Y;Z;X
  D;Z;Y;X
  E;Z;Y;X
The first line means, voter A ranks candidate X first, candidate Y second, and candidate Z third. The second line means, voter B ranks candidate Y first, candidate Z second, and candidate X third. Each line will have a unique voter and a permutation of all the candidates running.

Print all the associations in this dict, one per line (the voters are printed alphabetically) using the following form. Each line contains the voter and his/her complete ranking of the candidates. For example, the file above would produce:

  Voter Preferences
     A  ->  ['X', 'Y', 'Z']
     B  ->  ['Y', 'Z', 'X']
     C  ->  ['Y', 'Z', 'X']
     D  ->  ['Z', 'Y', 'X']
     E  ->  ['Z', 'Y', 'X']

Note that the voter names are sorted alphabetically, but the list of preference appear the same order they appeared in the file. There are multiple data files for this program: votepref1.txt, votepref2.txt and votepref3.txt; test/debug your program on the first file; when you are done, test it on the rest.

Start with all the candidates. Evaluate the ballot to determine how many votes each candidate received. Print this vote count two ways: sorted alphabetically and sorted numerically (in decreasing order). Remove the candidate(s) receiving the fewest votes, and repeat this process until only one or no candidates remain. Finally, print the outcome of the election: a single candidate winner or a tie. An example interaction (processing the graph above) might be

  Vote count on ballot #1 with candidates (alphabetically) = {'X', 'Y', 'Z'}
    X  ->  1
    Y  ->  2
    Z  ->  2

  Vote count on ballot #1 with candidates (numerical) = {'Y', 'X', 'Z'}
    Y  ->  2
    Z  ->  2
    X  ->  1

  Vote count on ballot #2 with candidates (alphabetically) = {'Y', 'Z'}
    Y  ->  3
    Z  ->  2

  Vote count on ballot #2 with candidates (numerical) = {'Y', 'Z'}
    Y  ->  3
    Z  ->  2

  Winner is  {'Y'}
The first ballot consisted of all three candidates, X, Y, and Z (printed in the set after the =). For this ballot, the votes were counted and printed; candidate X received the fewest number of votes so is eliminated from the next ballot. The second ballot consisted of two candidates, Y and Z (printed in the set after the =). For this ballot, the votes were counted and printed; candidate Z received the fewest number of votes so is eliminated from the next ballot. There is only one candidate remaining so Y is declared the winner. An alternative outcome might be No winner: election is a tie among candidates remaining on the ballot

If you have any questions about how instant runoff voting works (these rules are a bit subtle), please post questions on the forum; you can include examples and what-if questions as long as you don't write code.

Functions and Script

  • print_dict has a str title, any kind of dict, a function(default None) and bool (default False)as parameters and returns nothing; but it prints the title followed by the dictionary in the appropriate form and order (specified by the function and bool). The function determines the ordering and the bool determines whether to reverse it: like the key and reverse parameters used to sort in Python. This function is used to print both the voter preference dict and the vote count dict for each ballot (mine is 4 lines).
  • read_voter_preferences has an open (file) parameter; it returns the dict representing each voter and his/her preferences: {str : [str]} (mine is 6 lines).
  • evaluate_ballot has a dict of voter preference ({str : [str]} read above) and a set of the remaining candidates as parameters; it returns a dictionary whose keys are these candidates and whose values are the number of votes they received on this ballot, based on the description of instant runnoff voting: {str : int}. Remember to count only one vote per voter (hint: you might have to break out of a loop), for his/her highest ranked candidate stil in the election (mine is 8 lines - I used a comprehension in this code).
  • remaining_candidates has a dict as a parameter whose keys are candidates and whose values are the number of votes they received ({str : int}); it returns a set containing all those candidates remaining in the election (the one(s) receiving the fewest number of votes are absent). Note that if all the candidates receive the same number of votes, then this function returns an empty set. (mine is 3 lines - I used a comprehension in this code).
  • Write a script at the bottom of this module the calls these functions to solve the problem (mine is 13 lines).

Sample Interaction

The program, as specified, will have the following interaction: user-typed information appears in italics. Your output should match this one.
  Enter file with voter preferences: votepref1.txt

  Voter Preferences
    A  ->  ['X', 'Y', 'Z']
    B  ->  ['Y', 'Z', 'X']
    C  ->  ['Y', 'Z', 'X']
    D  ->  ['Z', 'Y', 'X']
    E  ->  ['Z', 'Y', 'X']
  
  Vote count on ballot #1 with candidates (alphabetically) = {'Z', 'Y', 'X'}
    X  ->  1
    Y  ->  2
    Z  ->  2

  Vote count on ballot #1 with candidates (numerical) = {'Y', 'X', 'Z'}
    Y  ->  2
    Z  ->  2
    X  ->  1

  Vote count on ballot #2 with candidates (alphabetically) = {'Y', 'Z'}
    Y  ->  3
    Z  ->  2

  Vote count on ballot #2 with candidates (numerical) = {'Y', 'Z'}
    Y  ->  3
    Z  ->  2

  Winner is  {'Y'}

You can also try processing the votepref2.txt file (which leads to a No winner result) and votepref3.text.