ICS 31 -- Winter 2013 -- Quiz 9

  1. Complete the definition of seconds_to_mmss below, consistent with its header, docstring, and assertions. [Note: The integer divison operator (a//b) gives the integer quotient of a/b. The mod operator (%) gives the remainder of a/b.] You do not have to worry about leading zeroes (like "11:05").

    def seconds_to_mmss(seconds: int) -> str:
    ''' Convert a number of seconds to minutes and seconds in "mm:ss" format
    '''




    assert(seconds_to_mmss(15) == "0:15")
    assert(seconds_to_mmss(75) == "1:15")
    assert(seconds_to_mmss(3620) == "60:20")
    Answer Feedback:
    ANSWER:
        return str(seconds//60) + ":" + str(seconds % 60)
    ## Alternative:
    ## return "{:d}:{:2d}".format(seconds//60, seconds % 60)

    ## Alternative that fixes the leading zero (e.g., in "12:01",
    ## using zfill() (which we haven't covered)
    ## return "{:d}:{:s}".format(seconds//60, str(seconds % 60).zfill(2))

    ## Alternative that fixes the leading zero without zfill():
    ## return "{:d}:{:02d}".format(seconds//60, seconds % 60)
  2. Suppose we wish to process text files that contain some "front matter"---lines at the start of the file that we wish to ignore, similarly to a part of last week's lab. Let's say that we have read the file into a list of strings, that the end of the front matter is indicated by a line in the file that says "END OF FRONT MATTER", and that we are guaranteed that this line will occur in the file.

    Complete the definition of remove_front_matter below, consistent with its header, docstring, and assertions.

    def remove_front_matter(linelist: 'list of str') -> 'list of str':
    ''' Return input list with starting lines (through "END OF FRONT MATTER") removed
    '''










    test_list = ["To be skipped",
    "Also to be skipped",
    "END OF FRONT MATTER",
    "To be included",
    "Also to be included"]
    assert(remove_front_matter(test_list) == ["To be included",
    "Also to be included"])
    assert(remove_front_matter(test_list[2:]) == ["To be included",
    "Also to be included"])
    assert(remove_front_matter(test_list[:3]) == [ ])
    Answer Feedback:
    ANSWER:
        result = [ ]
    found_dividing_line = False
    for line in linelist:
    if found_dividing_line:
    result.append(line)
    if line == "END OF FRONT MATTER":
    found_dividing_line = True
    return result

    ## Alternative approach:
    dividing_line = 0
    for line in linelist:
    if line == "END OF FRONT MATTER":
    break
    dividing_line += 1
    return linelist[dividing_line+1:]

  3. Suppose an exam has two problems, each worth 20 points. We want to see how students' scores on Problem 1 relate to their scores on Problem 2, so we decide to make a scatter plot.

    Our students come in a list of Score records, defined as follows:

    from collections import namedtuple
    Score = namedtuple('Score', 'p1 p2')
    TOPSCORE = 20

    Both p1 and p2 are ints between 0 and TOPSCORE (inclusive).

    With this list:

    scorelist = [Score(p1=0, p2=0),
    Score(p1=1, p2=1),
    Score(p1=1, p2=5),
    Score(p1=4, p2=2),
    Score(p1=5, p2=0)]

    the scatter plot of scores would look like this (except that we've omitted the 75% of the table that would show scores greater than 5):

    5| *
    4|
    3|
    2| *
    1| *
    0|* *
    ------
    012345

    To keep things simpler in our problem, we're going to omit the axes and the labels and print just the 21-by-21 body of the plot.

    # Initialize the table to a 20-by-20 table of blanks
    table = [ ]
    for row in range(TOPSCORE+1):
    table_row = [ ]
    for col in range(TOPSCORE+1):
    table_row.append(' ')
    table.append(table_row)

    # Populate the table with an asterisk for each
    # student's two scores. (When two students have
    # the same pair of scores, just one asterisk appears.)
    for s in scorelist:
    table[s.p2][s.p1] = '*'

    # Print the 20-by-20 table
    for row in range(TOPSCORE,-1,-1):
    for col in range(TOPSCORE+1):
    print(table[row][col], sep='', end='')
    print() # Print the default end= character, a newline

    Please answer each of the following questions in just a few English words:


    1. Why do we have to say range(TOPSCORE+1)?  

      Answer Feedback:
      ANSWERS:

      These answers are a little more complete
      than would be necessary for credit on an
      exam. It's also likelier that on an exam,
      you'd be given some choices rather than be
      asked to write a prose answer.

      range(TOPSCORE) goes from 0 to 19 in this case,
      but the scores go from 0 to 20. We have 20
      possible scores plus zero, for a total of 21.

    2. Why do we have to say table[s.p2][s.p1] and not table[s.p1][s.p2]?

      Answer Feedback:
      ANSWER:

      We want to plot the first score (p1) on the x-axis (left to right). But the first index in the table is the row number (y-axis), top to bottom; we want to plot p2 on that axis so p2 has to go first.

    3. When we print the table, why do we print the rows in a backwards range (TOPSCORE down to 0)?

      Answer Feedback:
      ANSWER:

      We think of the scatter plot as having the origin (0,0) in the lower left corner. But we have to print it from the top down, highest row number first.

    4. Why do we have sep='' and end='' when we print a row of the table?  

      Answer Feedback:
      ANSWER:

      To avoid extra horizontal spaces and vertical spaces
  4. Suppose we have a list of names and that some names may occur more than once on the list. For example:

    NL = ['Joe', 'Sam', 'Joe', 'Jill', 'Joe', 'Joe', 'Jill', 'Sam', 'Jane', 'Jane', 'Jane', 'Joe', 'John']

    And suppose that we want to know which name occurs most frequently.


    1. First we can create a dictionary that gives us a collection of each distinct name on the list, along with the number of times it occurs. Fill in the blanks of the following definition, with one identifier, constant, or operator in each blank:


      def tally_names(L: [str]) -> dict:
      ''' Return a dictionary with each unique string in L as the key and
      the number of times that string occurs in L as the value.
      '''
      result = { }
      for s in __________:
      if __________ in __________:
      __________ [__________] __________ __________
      else:
      __________ [__________] __________ __________
      return __________

      assert tally_names(NL) == {'Sam': 2, 'Jill': 2, 'Joe': 5, 'Jane': 3, 'John': 1}
      Answer Feedback:
      def tally_names(L: [str]) -> dict:
      ''' Return a dictionary with each unique string in L as the key and
      the number of times that string occurs in L as the value.
      '''
      result = { }
      for s in L:
      if s in result:
      result[s] += 1
      else:
      result[s] = 1
      return result

      assert tally_names(NL) == {'Sam': 2, 'Jill': 2, 'Joe': 5, 'Jane': 3, 'John': 1}

    2. We want to find the most frequently occurring name, but we can't sort the dictionary because dictionaries, as we know, are inherently unsorted (they can't be, because of how they're built [using a "hash table," the details of which are a topic for ICS 33]). But we can use a function like the one below to convert the dictionary to a list of key-value pairs (where each pair is a two-item list, [key, value]).


      def dict_to_list(d: dict) -> 'list of [key, value] pairs':
      ''' Convert dictionary (with key/value entries) to a list of [key, value] pairs
      '''
      result = [ ]
      for key in d:
      result.append([key, d[key]])
      return result


      What does the following statement print, using the definition of NL above? (Hint: Look at the assertion for the previous part.)


      print(dict_to_list(tally_names(NL)))
      Answer Feedback:
      [['Jane', 3], ['Jill', 2], ['John', 1], ['Joe', 5], ['Sam', 2]]

      (The order of the pairs doesn't matter for this problem; it's unpredictable because dictionaries are unordered.)


    3. The following sequence of statements prints the most frequently occurring string in the original list NL, along with the number of times it occurs. Complete each statement below to produce this result, supplying one identifier, operator, or constant for each blank. (Hint: The problem contains many clues.)


      def second_item(L: list) -> 'any':
          ''' Return second field (L[1]) of a list, to use with key= in Sort() method
          '''
          return __________ [ __________ ]
       
      list_of_string_frequency_pairs = __________ (__________(NL))
      __________ . __________(key=__________, reverse=True)
      most_frequent_pair = __________ [ __________ ]
      print("The string '", __________ [ __________ ], "' occurs ", __________ [ __________ ], ' times.', sep='')
      Answer Feedback:
      def second_item(L: list) -> 'any':
          ''' Return second field (L[1]) of a list, to use with key= in Sort() method
          '''
          return L[1]
      
      list_of_string_frequency_pairs = dict_to_list(tally_names(NL))
      list_of_string_frequency_pairs.sort(key=second_item, reverse=True)
      most_frequent_pair = list_of_string_frequency_pairs[0]
      print("The string '", most_frequent_pair[0], "' occurs ", most_frequent_pair[1], ' times.', sep='')

    4. What data structure that we've learned just recently could we use (instead of the two-item key-value list) to represent each word with its frequency? (One word.)

      Answer Feedback:
      A tuple (unnamed). Instead of ['John', 3], we could use ('John', 3). There's not a major reason to prefer tuples over lists, but technically, tuples are immutable which lets Python store them more simply than it can store (mutable) lists.