ICS 33 Summer 2013
Project #2: Eight Line Poem

Due date and time: Monday, July 22, 11:59pm


Introduction

When I was a young kid, one of my teachers introduced me to a computer for the first time — a Radio Shack TRS-80 Model I. First, I played little math games and messed around with other "state of the art" educational tools from the early 1980s; the state of the art wasn't much then, but it was fun and new.

Then one day, my teacher asked me if I wanted to learn how to write my own programs. I thought it sounded like a great idea. So I picked up a book about a language called BASIC — some of you may have experimented with it before — and typed in a short program that asked a user for a number of hits and a number of at-bats and printed out a batting average (foreshadowing my later interest in baseball, though I didn't know what it meant at the time). I ran the program, tried it out, and I was mesmerized; the computer did exactly what I asked it to, exactly the way I asked it to. I was hooked. And I still am.

BASIC was a good teaching tool for its day: versatile and easy to learn, much like Python is today. For this project, I've designed a considerably limited version of BASIC called Bumpkin, which supports only a small handful of kinds of statements. You'll be building a Bumpkin interpreter, a program written in Python that takes a Bumpkin program as its input and executes the Bumpkin program and shows its output. (This may sound a little mind-bending, but it's not as crazy as it sounds. The Python interpreter you've been using was most likely written in a language other than Python; the most popular one is written in a language called C.)

In building your interpreter, you'll gain experience in a few areas that will stretch your abilities:


The Bumpkin language

I'll discuss the requirements for your interpreter later in the write-up. First, let's talk about the Bumpkin language. Bumpkin is a programming language, though it behaves somewhat differently from Python, so we'll first need to acquaint ourselves with how it works.

A Bumpkin program is a sequence of statements, one per line. Here's an example of a Bumpkin program:

LET A 3
PRINT A
GOSUB 7
PRINT A
PRINT B
GOTO 10
LET A 4
LET B 6
RETURN
PRINT A
.

Each line contains exactly one statement (i.e., there may be no blank lines). Bumpkin assigns a line number to each of the lines, where the first line of the program is numbered 1, the second line is numbered 2, and so on. The last line of the program is a period (.) on a line by itself. Execution of a Buumpkin program always begins at line number 1. There is no predefined limit on the number of lines in a Bumpkin program.

Labels and spacing

Any line of a Bumpkin program can also begin with a label, which is a name that can be used to refer to that line elsewhere in the program without having to rely on its line number. Labels are a single word appearing at the beginning of a line, immediately followed by a colon. Labels must begin with a letter, optionally followed by one or more letters or digits.

        LET A 3
        PRINT A
        GOSUB CHUNK
        PRINT A
        PRINT B
        GOTO FINAL
CHUNK:  LET A 4
        LET B 6
        RETURN
FINAL:  PRINT A
        .

Spacing

One of the features of Python's syntax is that the way you space your program — indention, empty lines, and so on — has an effect on your program's meaning. Bumpkin, in that sense, is different. While Bumpkin programs may not have blank lines in them, the amount and placement of blank space between the words on each line is considered irrelevant. So the following is a legal Bumpkin equivalent to the previous one shown.

            LET        A    3
   PRINT        A
      GOSUB    CHUNK
        PRINT    A
  PRINT   B
     GOTO         FINAL
         CHUNK:  LET A 4
  LET   B                            6
              RETURN
  FINAL:     PRINT    A
          .

Variables

A Bumpkin program can utilize variables to store integer values. Variables have names that begin with a letter optionally followed by a sequence of letters and digits; legal variable names include X, BOO, U2, or THIS1ISTHELAST1.

Variables do not need to have values assigned to them before they are used, and any variable that is used before it is used has the value 0.

The value of a variable can be changed with a LET statement. A LET statement changes the value of one variable, by either assigning it an integer constant or the value of another variable.

You can print the value of a variable to the console by using a PRINT statement. A PRINT statement prints the value of one variable, followed by a newline.

So, consider the following short Bumpkin program:

LET A 3
LET Z -9
PRINT A
PRINT Z
.

Its output would be:

3
-9

Execution of a Bumpkin program

A Bumpkin program is executed one line at a time, beginning at line number 1. Ordinarily, execution proceeds forward, so that line 1 will execute first, followed by line 2, followed by line 3, and so on. Execution continues until either an END statement is reached, or until the "." line that appears on the last line of the program.

Like any programming language, it is possible in Bumpkin to write programs that execute out of sequence, though the mechanisms are a bit more primitive than they are in a language like Python. A GOTO statement causes execution to "jump" immediately to the given number. For example, the statement GOTO 4 jumps execution to line number 4. Here's an example Bumpkin program that uses GOTO:

LET A 1
GOTO 4
LET A 2
PRINT A
.

In this program, line 1 is executed first, setting the variable A's value to 1. Then the GOTO statement will immediately jump execution of the program to line 4, skipping the second LET. Line 4 prints the value of A, which is still 1. So, the output of the program is simply 1.

A GOTO statement may jump either forward or backward, meaning that the following program is a legal Bumpkin program. See if you can figure out what its output would be. (Remember that the value of a variable that hasn't yet been assigned with a LET is 0.)

LET Z 5
GOTO 7
LET C 4
PRINT C
PRINT Z
END
PRINT C
PRINT Z
GOTO 3
.

Alternatively, GOTO statements can specify a label instead of a line number, in which case execution jumps to the line that is marked with that label. A Bumpkin program equivalent to the previous one, but that uses labels instead of line numbers, follows.

        LET Z 5
        GOTO CZ
CCZ:    LET C 4
        PRINT C
        PRINT Z
        END
CZ:     PRINT C
        PRINT Z
        GOTO CCZ
        .

GOTO statements are not permitted to jump beyond the boundaries of the program, to lines before line 1 or lines after the "." that completes the program. If such a GOTO statement is encountered while a program is executed, the interpreter terminates with an error message. Similarly, GOTO statements are not permitted to jump to non-existent labels; this, too, terminates the interpreter with an error message.

Arithmetic operations

Bumpkin provides the typical arithmetic operations that can be performed on variables: addition, subtraction, multiplication, and division. Each operation is provided as a statement that changes the value of the given variable, making it equivalent to operators like +=, -=, etc., in Python. The first operand must be the name of a variable; the second can either be an integer constant or the name of a variable. Here are examples of their use:

LET A 4
ADD A 3
PRINT A
LET B 5
SUB B 3
PRINT B
LET C 6
MULT C B
PRINT C
LET D 7
DIV D 2
PRINT D
.

In the example above, the ADD statement adds 3 to the value of A, storing the result in A. So, printing A will display 7 on the console. The output of the program above is:

7
2
12
3

It is important to note that, since all variables in Bumpkin have integer values, the DIV statement implements integer division, meaning that its result is the floor (or integral part) of the quotient. So, in the example above, 7 / 2 = 3. The second operand must not be zero, meaning that the statement DIV A 0 is illegal. When a Bumpkin program encounters a division by zero, it immediately terminates with an error message.

The IF statement

Bumpkin provides an IF statement, which acts like a conditional GOTO. It compares the value of some variable to some value, and jumps execution of the program to the given line number or label if the comparison is true. The comparison can use one of the typical relational operators: <, <=, >, >=, = (equal to), or <> (not equal to).

LET A 3
LET B 5
IF A < 4 THEN 5
PRINT A
PRINT B
.

In the program above, the variables A and B are given the values 3 and 5, respectively. An IF statement then compares A to 4. Since A is less than 4, execution jumps to line 5. B's value is printed out. So this program's output is simply:

5

The IF statement in Bumpkin is substantially less flexible than its Python equivalent. In an IF statement in Bumpkin, the token IF must be followed by exactly five words. The first must be the name of a variable. The second must be one of the relational operators (<, <=, >, >=, =, or <>). The third must be an integer constant. The fourth must be the word THEN. The fifth must be a line number or a label. They behave in the way you might expect. For example: IF C <> 0 THEN X means "jump to the line labeled X if C is not equal to 0".

Like GOTO statements, IF statements are not permitted to jump beyond the boundaries of the program or to a non-existent label. An attempt to do so should cause the Bumpkin program to terminate with an error message.

Subroutines

There are no methods or functions in Bumpkin, but there is a simplified mechanism called a subroutine. A subroutine is a chunk of Bumpkin code that can be "called" by issuing a GOSUB statement. GOSUB is much like GOTO; it causes execution to jump to a particular line (given a line number or a label). However, GOSUB also causes the Bumpkin program to remember where it jumped from. Subsequently, when a RETURN statement is reached, execution continues at the line following the GOSUB statement that caused the jump. Here's an example:

LET A 1
GOSUB 6
PRINT A
PRINT B
END
LET A 2
LET B 3
RETURN
.

In the program above, line 1 is executed first, setting the value of A to 1. Next, a GOSUB statement is reached. Execution jumps to line 6, but Bumpkin also remembers that when a RETURN statement is reached, execution should jump back to the line following the GOSUB — in this case, line 3. Line 6 is executed next, setting A to 2, then line 7 sets B to 3. Now we reach a RETURN statement, causing execution to jump back to the line number that we're remembering — line 3. Line 3 prints the value of A (which is 2), then line 4 prints the value of B (which is 3). Next, we reach line 5, which is an END statement, so the program ends.

Subroutines can be used very similarly to Python functions or methods, except they do not take parameters or return a value. Consider the following example, which contains a subroutine that prints the values of A, B, and C each time it's called:

           LET A 3
           GOSUB PRINTABC
           LET B 4
           GOSUB PRINTABC
           LET C 5
           GOSUB PRINTABC
           LET A 1
           GOSUB PRINTABC
           END
PRINTABC:  PRINT A
           PRINT B
           PRINT C
           RETURN
           .

Subroutines may call other subroutines, meaning that two or more GOSUB's may be reached before a RETURN is reached. The rules for this are very similar to functions that call other functions in Python; for each GOSUB that is reached, Bumpkin will remember the line to which it should return. When a RETURN is reached, execution will move to the line remembered from the most recent GOSUB. Here's an example:

LET A 1
GOSUB 7
PRINT A
END
LET A 3
RETURN
PRINT A
LET A 2
GOSUB 5
PRINT A
RETURN
.

In this example, execution begins at line 1 by setting the variable A to 1. Next, we jump to line 7 with a GOSUB, but remember that we should jump back to line 3 when we encounter a RETURN. Line 7 prints A (which is 1), then line 8 changes A's value to 2. Now we've reached line 9, which is another GOSUB statement. At this point, execution will jump to line 5, but we'll also need to remember to jump back to the line following this GOSUB — line 10 — when we reach a RETURN. But we also need to remember the line from the previous GOSUB — line 3.

Line 5 sets A to 3, then we encounter our first RETURN statement. We're remembering two lines — line 3 and line 10. But line 10 is the most recently remembered line, so execution jumps to line 10. Line 10 prints A (which is 3). Now we encounter another RETURN statement on line 11. We're remembering the line 3 from the first GOSUB. So execution jumps to line 3, printing A (which is still 3), then ending the program on line 4.

So, the output of this program is:

1
3
3

Like GOTO statements, GOSUB statements are not permitted to jump beyond the boundaries of the program (to lines before line 1 or lines after the "." that completes the program) or to non-existent labels. If such a GOSUB statement is encountered while a program is executed, the interpreter terminates with an error message.

It is also an error for a RETURN statement to be encountered when there has been no previous GOSUB. The Bumpkin program will immediately terminate and print an error message in this case, as well.


Experimenting with Bumpkin

Click the link below to experiment with an implementation of a Bumpkin interpreter that you can run in your browser, so you can get an idea of how the language works.

A word of warning about my interpreter: I wrote it without making a serious attempt at handling syntax problems, so it assumes that the input is a legal Bumpkin program. If you attempt to run an invalid Bumpkin program, you may see a clean error message, but you also may see the interpreter simply crash.

Once you're comfortable with the Bumpkin language, it'll be your turn to implement a Bumpkin interpreter.


Your interpreter

For this project, you'll be building your own Bumpkin interpreter, which is a program that you'll write in Python that is capable of executing a Bumpkin program, generating the correct output according to the specification of the Bumpkin language from the previous sections of this write-up. (Since Python runs on many operating systems, that means, once completed, you'll be able to use your interpreter to run Bumpkin programs on Windows, Mac OS X, Linux, Unix, and several other platforms. Not too shabby!)

When your interpreter begins, it prints a prompt — a single greater-than sign followed by a space — at which a user can enter one of three commands. After each command, the prompt is displayed again and the user can continue entering commands until exiting.

Note that Bumpkin programs run in isolation from one another. If a user runs more than one Bumpkin program, the result of the first program (e.g., its output, the values stored in its variables) will not affect the result of the second (e.g., the values of all variables are zero when the second program begins).


How does an interpreter work?

A typical interpreter will execute a program one statement at a time, keeping track of what we might call the program state as it goes along. In the case of a Python interpreter, you might imagine that there would be quite a bit of work to be done, as Python is a fairly complex language. The interpreter would need to keep track of all of the objects — creating new ones and garbage-collecting old ones as necessary — as well as maintain the "call stack," along with various other tasks required by Python programs. Implementing an efficient, complete Python interpreter is a project that would easily take many programmer-months.

A Bumpkin interpreter is a much simpler program, since Bumpkin is a much simpler programming language. Your interpreter will need to execute a Bumpkin program one statement at a time, updating the program state as necessary, until either an END statement or the "." is reached. (The "." can simply be treated as an END statement, if you'd like.) The program state consists of the following information:

Each statement has a different effect on the program state. For example, a LET statement will cause the value of one of the variables to change, then cause the program counter to be incremented (since, after a LET statement, execution continues on to the next statement), a GOTO statement will cause the program counter to be changed to the line number specified in the statement, and so on.


Reading the program from an input file and representing it in memory

Your program will need to begin executing a Bumpkin program by reading it from a text file and representing it in memory. There are a number of ways to solve this problem. One way is to read the program into memory as a collection of strings, with each of the strings containing one line of the input program. Every time a particular line is executed, it would need to be parsed (to see what kind of statement it was), then executed. As you might imagine, this is a terribly inefficient way to implement an interpreter, since the same statement may need to be parsed over and over again. You are not permitted to use this approach for your interpreter.

A better approach — one that I'm requiring you to use instead — is to read the input program once, parse it once, and represent it as objects, one object per statement. Since statements have commonalities (e.g., printing a trace), inheritance provides a very natural design approach for this problem:

You'll need to write code that can parse the input file and create the appropriate sequence of Statement objects. You might want to store them in a list, though you should note that line numbers in Bumpkin start at 1, not 0, so I suggest storing None as the first element in the list, then storing the actual Statement objects with indices beginning at 1. (An alternative, storing the statements beginning at index 0, will require the error-prone practice of adding or subtracting one when converting between line numbers and list indices, which you might find difficult to get right in every case; giving up an empty list cell is an inexpensive way to prevent mistakes.)

While parsing the program, you'll also need to keep separate track of which line number is associated with each label, because you'll need this information when you execute the program.

You may assume that the input file contains a syntactically legal Bumpkin program. It's acceptable for your program to either print an error message, ignore lines that aren't understood, or even crash in the event that it's given an input file that is not legal Bumpkin. (It's a good thing Python interpreters don't behave this way!) We will only test your interpreter with syntactically legal Bumpkin programs, though the programs may have run-time errors in them. As was discussed above, there are four kinds of run-time errors: division by zero, a RETURN statement without a corresponding GOSUB, a GOTO/GOSUB/IF..THEN to a line outside of the boundaries of the program, or an invalid (i.e., non-existent) label. Your interpreter will need to behave reasonably in these cases, by printing a meaningful error message and ending the Bumpkin program's run. Note that you shouldn't print a Python stack trace; you should instead print an error message that is specific to Bumpkin. (This is for the same reason that Python interpreters show error messages specific to Python, as opposed to error messages that emanate from whatever programming language the Python interpreter was implemented in.)


Designing your interpreter

As the size of a program increases, one of the most difficult obstacles that programmers face is the need to separate their concerns. One of the primary strategies that programmers use to separate their concerns is to break a large program into a set of smaller pieces. The obvious mechanism for breaking up a program in Python is the use of classes and functions, remembering that classes have the role of defining new kinds of objects.

It is especially difficult for novice programmers to keep their concerns separate. The temptation is always to try to think about the complete picture, since this strategy works well for the short programs that you write when you're first starting out. As programs become larger, confusion naturally sets in, as the complete picture can be difficult to keep in your brain all at once. Even moderately small Python programs are typically built out of many interacting parts and encompass a great deal of complexity. My complete Bumpkin interpreter has approximately 15 classes and a handful of functions, spread across several modules. (Yours may have fewer, because I implemented some optional features, but this gives you a rough order of magnitude to think about.) Now, before you freak out, bear in mind that many of them are relatively short. I opted to write more classes with less code in each, so that I could concentrate my efforts on each one largely in the absence of the others. This project will encourage you to begin thinking about your programs the same way, which will give you the ability to write much larger programs than you could before.

The main tasks that your program must perform are:

I suggest breaking up your program in the following way:

It's a good idea to build as many of the underlying pieces as you need to implement a couple of the statements, say LET and PRINT, first. Afterward, add new kinds of statements one or two at a time, making any changes required in the underlying pieces. The goal, as always, is to find a way to stand on stable ground as early and as often as possible.


Unit testing

Along with your Bumpkin interpreter, you will be required to separately submit unit tests, implemented using the unittest module in the Python standard library (as we did in lecture), for the ProgramState class. You can feel free to write unit tests for any other part of your interpreter that you'd like — though there are a couple of things you should be aware of:

Remember that unit tests are separate from the program they test, so you would want to write your unit tests in separate modules, and executing the complete Bumpkin interpreter should not execute the unit tests (nor vice versa).


Bumpkin quick reference

Here is a list of all of the Bumpkin statements (and their different variants) that should be supported by your interpreter, with a brief description of the effect of each.

Statement Description
LET var value Changes the value of the variable var to the given value, which will either be an integer constant or the name of another variable.
PRINT value Prints the given value to the console; the value will be either an integer constant or the name of a variable.
ADD var value Adds the given value to the value of the variable var, where the value will be either an integer constant or the name of another variable.
SUB var value Subtracts the given value to the value of the variable var, where the value will be either an integer constant or the name of another variable.
MULT var value Multiplies the given value to the value of the variable var, where the value will be either an integer constant or the name of another variable.
DIV var value Divides the given value to the value of the variable var, where the value will be either an integer constant or the name of another variable.
GOTO line Jumps execution of the program to the given line (which will be specified as either a line number or a label).
IF value1 op value2 THEN line Compares the value value1 to the value value2 using the relational operator op (=, <>, <, <=, >, >=). value1 and value2 will either be the value of a variable or an integer constant. If the comparison is true, jumps execution of the program to the given line (which will be specified either as a line number or a label). If not, this statement has no effect.
GOSUB line Temporarily jumps to the given line (which will be specified as either a line number or a label). A RETURN statement will cause execution to jump back to the line following the GOSUB.
RETURN Jumps execution of the program back to the line following the most recently-executed GOSUB statement.
END Ends the program immediately.
. Special marker that indicates the end of the program text. Behaves as an END statement when encountered.


Limitations

Third-party libraries — i.e., anything not included in a standard Python 3.3.2 installation — are strictly off-limits in this project. Other than the standard Python library, all of the code should be written solely by you.


Deliverables

Put your names and student ID in a comment at the top of each of your .py files, then submit all of the files to Checkmate. Take a moment to be sure you've submitted all of your files and be sure you submit the right version; we will only be able to accept the files you submit before the deadline, so forgetting to submit one (or submitting the wrong version) can have a significant impact on the score you receive for this project.

Follow this link for a discussion of how to submit your project via Checkmate.

Can I submit after the deadline?

Yes, it is possible, subject to the late work policy for this course, which is described in the section titled Late work at this link.