Seventh Homework

Informatics 42 • Winter 2012 • David G. Kay • UC Irvine

Seventh Homework

Get your work checked and signed off by a classmate, then show it to your TA in lab on Monday, February 27.

Part I

(a) For Labs E and F, we'll use an already existing program that simulates visitors to an amusement park. Read over the problem description for the amusement park simulator, noting (i) that it's not essential that you retain every detail and (ii) that your task won't be to build this from scratch, but to enhance it in various ways (though for that, you'll need to become familiar with the existing code that we will supply).

(b) Using tkinter, write a GUI for a window that could be used to set some initial values in the dice-playing program from Lab A. As we discussed in class, your window should have three labeled text fields: one for the player's stake (the amount of money the player starts with), one for computer's stake, and one for number of test rolls. You should also have three buttons at the bottom: Cancel, Clear, and Submit or OK. When the user clicks Submit/OK, the entered values should be checked (empty values shouldn't be accepted) and transmitted back to the calling program.

To start out, create one label and one text entry field; then create a Submit button that transmits the entered text back to the calling program (which can just print it out). This much should be pretty easy to do by adapting the code we developed in class. Continue to work on the rest of this as time permits, but don't let it interfere with the rest of this homework or the current lab. Only if you're very ambitious or have a lot of spare time should you actually install your GUI in a copy of your Lab A program.

(c) In the California Lottery game SuperLotto, a player would pay $1.00 to pick six different numbers (between 1 and 51) for the next draw. (We're describing the original SuperLotto game, not the current SuperLotto Plus with a separate "Mega" number; the probabilities for SuperLotto Plus are a little more complicated.) Every Wednesday and Saturday, the Lottery draws six numbers. If the player's six numbers match the Lottery's six numbers, the player wins the multi-million-dollar jackpot (or splits it with any other players who also picked the same six winning numbers). If nobody matches the six winning numbers, the jackpot "rolls over" to the next draw; that is, the jackpot amount for the draw with no winners is added to the jackpot for the next draw. If many draws go by with no winners, the jackpot can get very large; it has been over $100,000,000.

Which of the following statements are supported by the principles of probability? Give a yes or no answer to each, with a few words of explanation.

(c.1) If you pay $2.00 for two different sets of numbers, you are twice as likely to win as if you paid $1.00.

(c.2) If you pay $100.00 for 100 different sets of numbers, you are 100 times as likely to win as if you paid $1.00.

(c.3) If you buy one ticket for every drawing for ten years, your chances of winning are roughly a thousand times greater than if you buy just one ticket.

(c.4) If you decide to play the same six numbers in every draw from now on, you should check the winning numbers in the past to be sure your numbers haven't come up already.

(c.5) If you play for a few months and not a single number you choose is included in the winning numbers, you are a little more likely to win the next draw (because you're "due").

(c.6) If you play for a few months and two or three of your numbers are included in the winning numbers of each draw, you are a little more likely to win the next draw (because you're "on a roll").

(c.7) Since 50% of the ticket revenues goes to prizes, in general the expected value of a $1.00 ticket is 50 cents.

(c.8) The expected value of your $1.00 ticket will be higher if you only play when the jackpot is over $20,000,000.

(c.9) Your probability of winning is greater if you pick numbers between 1 and 31 (because many people pick birthdates as their numbers).

(c.10) Your expected value is greater if you pick numbers greater than 31.

(d) [from Patterns of Problem Solving by Moshe F. Rubinstein] In one brief sentence, how is information related to probability? If you know that 5 people out of every 1000 have cancer, and if we have a perfectly accurate test that predicts whether a person has cancer, which of the following gives us more information:

The test indicates that a person has cancer.
The test indicates that a person does not have cancer.

(e) Make a small relevance tree (with five to ten "leaves") on any topic you like; then rank two or three alternatives on the tree you've designed. Be sure to state the criteria for the rating scales you use (e.g., 0: over $500; 1: $300-500; . . .) and pay attention to the "deal-breaking" bottom threshold; using a narrow scale, like 0 to 3, will make this easier.

(f) View the video "Sorting Out Sorting". This may be the greatest piece of computer science pedagogical cinematography every made (but even so, it's about sorting algorithms, so don't be expecting Harry Potter or Star Wars). Here are a few points that will enhance your enjoyment and understanding of this video:

We don't expect you to learn the specific sorting algorithms described in the video. (You may have the opportunity to learn them in a subsequent course.)
We might ask you about the video's main concepts.
Don't fall asleep or let your attention wane during the boring parts—that's when the funniest lines come.
The music is goofy in places; deal with it. The graphics are fuzzy (it's 1980 technology); deal with that, too.
The best part of the video comes after the credits, so don't quit when the credits come up. The whole thing is 28 minutes long, in three parts on YouTube.

Part II

Take the opportunity to go back and make some enhancements to the Stats program from the third homework, with an eye toward continuing to solidify your Python programming skills. Everyone should be able to do this on his or her own, but if you can't finish it by Monday, it's fine to carry this part over to the following week's homework.

(a) Enhance your Stats program from the third homework to keep track of word (token) frequencies. Along with all the other statistics, it should print out the most frequently occurring token in its input file, along with the number of times that token occurred.

Which of Python's built-in data structures is right for keeping track of the tokens and the frequency of each? This is something everyone should be able to answer; talk with your classmates to make sure you all agree. Then implement it: Once you've processed the text and counted the frequencies, you can go through your structure to find the highest value and the associated token.

[You may feel uncomfortable returning to a program you wrote a few weeks ago and then making modifications to it. Parts of it may no longer seem familiar to you, or you may not remember why you made the decisions you did when you were working on it. It's important to realize that, in a real-world software development context, you might often be in a position where you have to work on something that you haven't seen in a while, or where you have to work on something that you didn't even build in the first place. Unlike undergraduate homework assignments, which you can often just "submit and forget," real-world projects tend to have a long lifespan. If you're finding it difficult to get yourself back into the swing of working on this assignment, take the opportunity to consider what you might have done differently four weeks ago to make this experience better. Are there design choices you might have made differently? Documentation that you might have written? Names that you might have chosen differently?]

(b) But what if you want the 10 most frequent tokens? You need to sort the frequency list. But since your frequency list is a Python dictionary [oops—gave it away!], which is stored as a hash table, you can't sort it in place. So you need to turn it into a list of key-value pairs (e.g., with L = list(d.items())), swap the pairs so the value comes first (either N = []; for i in L: N.append( (i[1],i[0]) ) or with a list comprehension, N = [ (v,k) for (k,v) in L]), sort on the value (S = sorted(N, reverse=True)), and then print the first 10.

(c) Now, let's keep track of actual words, not just tokens. Modify your Stats program so that it continues to gather all the statistics about tokens as before, but also gathers a parallel set of statistics about "real words."

What's a "real" word? At http://www.ics.uci.edu/~kay/wordlist.txt is a file of about 380,000 words. For our purposes, a word is real if it's on this list. (We could think about ways of managing the list, allowing the user to add new words and delete questionable ones, but let's just think about it and not do it for now.)

Implementation hints and advice (think before reading; that's how you learn): You'll need to read in the wordlist and store it; then as you process each token in the input text file, you'll look it up on the wordlist to determine whether to include it in the real-word statistics. There is a fine opportunity for code reuse here: If you have a class for a collection of tokens, with methods that produce various statistics, you can create one instance of the class for all the tokens in the input and a second instance that will contain just the real words.

Written by David G. Kay, Winter 2005; modified Winter 2006 and Winter 2012.