ICS 142 Winter 2004
Assignment #4

Due date and time: Thursday, March 4, 11:59pm


Introduction

As we discussed at the outset of this course -- and as you had no doubt heard or read before -- there are two major types of language processors: compilers and interpreters. Compilers translate a program in one language into an equivalent program in another, often lower-level, language. Interpreters execute a program and generate its output, in a sense translating the program from the source to the target language as it executes. Though their overall aims are different, they share a great deal of functionality. Both compilers and interpreters must read and understand the structure and meaning of the source program. Both must be able to determine whether the program is syntactically or semantically erroneous, and must report errors in some usable fashion. (It should be noted that an interpreter may do this work before executing the program, or may do some or all of this work piecemeal as the program is executing.)

For a language like Monkie2004, in which the majority of semantic rules can be checked statically (i.e. before the program executes), either kind of language processor is best off starting with the phases that we've built in the first three assignments: scanning the input program and breaking it into tokens, parsing these tokens and determining the syntactic structure of the program, building an intermediate representation of the program (such as an abstract syntax tree), and performing static semantic checking on it. When these phases have been completed, it is clear that the program is syntactically and semantically correct -- at least with respect to semantic rules that can be checked statically -- and there is a convenient tree-based representation of the program ready to be operated upon.

From this point, however, the way we should proceed is radically different, depending on whether we intend to build a compiler or an interpreter. If we're building a compiler, we may proceed by writing a module that takes the abstract syntax tree and generates a "flat" intermediate representation for it, one that is closer to the machine code that we will eventually want to emit. We'll then want our compiler to perform various optimizations on it to reduce, if possible, the amount of time and/or memory that it will consume when executed. Next, we'll need to map this intermediate code to machine code, which will require us to select the appropriate sequence of machine instructions for each intermediate code instruction, as well as deciding on how we'll use registers and/or cache memory to reduce the number of accesses to main memory (or, worse yet, virtual memory stored on disk!). Finally, our compiler, using all of the knowledge about the source program that it has gained during these phases, will emit target code.

On the other hand, an interpreter is not concerned with rewriting the input program; it's concerned with executing it and determining its output. Given an intermediate representation of the program, such as an abstract syntax tree, an interpreter operates by traversing it and evaluating the meaning of its nodes on the fly. For example, for some expression tree rooted with an addition operation, an interpreter can evaluate it by first evaluating the left subtree and determining its value, then evaluating the right subtree and determining its value, and finally adding these two values together to yield the final result of the addition. Since the interpreter is viewing the program at, essentially, the source level, it needs to maintain a symbol table, declaring variables into it as their declarations are reached, then removing declarations when the variables fall out of scope. Some mechanism needs to be included to support calls to subprograms, which may either involve the creation and maintenance of an explicit run-time stack (complete with activation records), or a simpler approach built on subprogram calls in the interpreter's source language.

So, what we should do next depends on what kind of language processor we intend to build. This assignment will ask you to build upon the work you did in the previous assignment, extending your program to be a complete Monkie2004 interpreter. When you're done, you'll be able to execute Monkie2004 programs and view their output. Future assignments will explore some aspects of the remaining tasks performed by a compiler (though it should be pointed out here that we will not be building a complete Monkie2004 compiler this quarter).


Changes to the Monkie2004 language for this assignment

No changes have been made to the syntax or static semantic rules of the language; they remain as they were in the previous assignment.

There is one change to the apparent intent of the language, though it involves a rule that has never formally been specified: the meaning of the ref keyword. It was originally intended that ref would be used to signify that a formal parameter was to be passed using pass-by-reference semantics. For this assignment, the ref keyword may still appear in a parameter list as before, but, as a simplification, it will not have any meaning. All parameters will be passed by value. (Optionally, you may implement pass-by-reference semantics for ref if you wish, but it is not required, and I won't be offering any extra credit for it. If you want some ideas about how to implement it, feel free to contact me.)


The dynamic semantic rules of Monkie2004

The following is a list of the dynamic semantic rules for Monkie2004. It is considered a supplement to the static semantic rules presented in the previous assignment, and only applies to programs that have no lexical, syntactic, or static semantic errors. The rules below cover aspects of Monkie2004 programs that only have meaning at run-time. Most of the rules describe the behavior of legal Monkie2004 programs. In a few cases, dynamic semantic errors are described. When a Monkie2004 program encounters a dynamic semantic error, it prints an error message to the output and terminates immediately.

Execution of a Monkie2004 program begins with all global declarations being made. All global variables are assigned their default initial values, as described in the rules below. After all global declarations have been made, a procedure with the following signature is called:

    procedure program()

If no such procedure exists, it is a dynamic semantic error and the program terminates immediately. If it does exist, the execution of the program lasts until program( ) returns.

The dynamic semantic rules of Monkie2004 are:


Implementing your interpreter

Assuming that you completed a solution to at least Part 1 of the previous assignment, you have a completed CUP script specifying a parser that builds an AST for the input program. To support this, you also have a set of Java classes that implement the various kinds of AST nodes. Each AST node, at present, contains an analyze( ) method, which is used to perform static semantic checking on it (and its children, as appropriate).

Given an AST and a symbol table, an interpreter is relatively straightforward to implement. (Don't get me wrong; there are plenty of devilish details. But conceptually, it's not difficult to explain.) Much of what I've suggested here is provided as example code in the starting point.

One fact will greatly simplify your implementation: you may freely assume that, by the time the interpreter begins executing it, the input program is free of lexical, syntactic, or static semantic errors. So, for example, you may assume that both operands to a concatenation operation are strings, the type of the expression on the right-hand side of an assignment statement matches the type of the variable on the left-hand side, and so on. This means that, even if you have to do quite a bit of casting, you can at least assume that the casts will be proper.


Starting point

Officially, the starting point for this assignment is your solution to the previous assignment. We won't be testing your static semantic checker again for this assignment, meaning that we will only test your solution to this assignment using Monkie2004 programs that are syntactically correct and do not violate the static semantic rules. So, if you weren't able to get the previous assignment done, you will not be doubly penalized, unless your solution to the previous assignment reported errors for legal Monkie2004 programs. However, if you were unable to complete Part 1 of the previous assignment, you will need to get it finished before you can proceed with this one.

While I want you to use your own code as a starting point, I am providing some suggested approaches and example code from my interpreter, which you can use or ignore at your discretion. They are available as a Zip archive.

Be aware that the code I provided may not fit in perfectly with your design, so you may need to make some modifications to it. All of this code is provided as-is (much of it uncommented) to give you some ideas about how to proceed with your solution to this assignment. Since each of you will be starting with a somewhat different solution to the previous assignment, it was not really practical for me to provide code that would surely work with each of your previous designs. But I thought that the files that I've provided would help lead you in a good direction.


Deliverables

Place your completed CUP script and all of the .java files that comprise your program into a Zip archive, then submit that Zip archive. You need not include the .java files created by CUP (Parser.java and Tokens.java), but we won't penalize you if you do. However, you should be aware that we'll be regenerating these ourselves during the grading process, to be sure that they really did come from your CUP script. Please don't include other files, such as .class files, in your Zip archive. Also, don't include any of the example code from the starting point that you didn't end up using.

Follow this link for a discussion of how to submit your assignment. Remember that we do not accept paper submissions of your assignments, nor do we accept them via email under any circumstances.

In order to keep the grading process relatively simple, we require that you keep your program designed in such a way as it can be compiled and executed with the following set of commands:

    cup monkie.cup
    javac *.java
    java Driver inputfile.m

Limitations

The limitations from the previous assignment still apply to this one; you may not make changes to the Monkie2004 grammar that was given to you in the previous assignment, except for the actions you wrote to build your abstract syntax tree (and any modifications you needed to make to them for this assignment) and adding names to the symbols on the right-hand sides of rules when you need to refer to their associated values. Other changes to the CUP script are not permitted.