Compilers and Interpreters (CS 142A)

 

Office hours are set for Wednesdays 10-11 at 3065 Bren Hall and Fridays 3-4 at ICS 183 (Computer Lab).

You can submit your assignment on EEE as follows:

  1. Compress just your source code files (no sample input, no test harness program code) into one single zip file named as your student ID.      
  2. Login to the EEE (eee.uci.edu) using your UCI account.
  3. Go to MyEEE and find the "142 COMPILERS&INTPRETER lec A" entry.
  4. Click on DropBox.
  5. Click on the AssignmentSubmission under the appropriate assignment folder (For example: "142A Assignment #1" for Assignment #1).
  6. Click on "Upload Files" on the right side of the page and follow the instructions to upload your zip file.

NOTES: No hard copy of the assignments is needed.

Assignment 1

The description for assignment 1 is mentioned here. It is due on October 25th. It would be useful (and somehow necessary) to read this file to know more about how to construct your scanner.

I have provided a sketch for the code in Java that you can use. I have provided some comments in the files that might be useful. The files are:

1) FileReader.java : This file implements class FileReader. This class should provide methods to read the input file character by character. The chars are passed to the scanner which reads the inputs token by token.

2) Scanner.java : In this class you read the input file token by token. You do not need to check for the grammar of the language in this class but instead you have to identify the tokens and the keywords of the language and later pass them to the parser which checks for the grammar and will build the necessary data structures accordingly.

3) Tester.java : You can download this class and test your code for FileReader and Scanner classes.

4) sample_input : You can use this file as a sample input but you should not rely on it. Come up with your own sample input and try to challenge your code to make sure that it catches all the possible errors of the input program.

 

Grading Criteria for Assignment 1

Your code has been tested on 4 input codes. Input1, input2, input3 and input4 are the test input files. If the code for scanner is not compiled then you get 20% of the total points. Correct result for input1 yields 70% of the total points and correct result for other input files yields 10%. If the result is incorrect then you will loose 50% of the points for the test case which resulted incorrectly.

NOTE: If you have revision request for your assignment you can email me and explain the problem or you can meet me during my office hours.

 

Assignment 2

The description of assignment 2 is mentioned here. It might be useful to take a glance at this file. It is due on November 8th.

1) You have two input files for your parser/interpreter. One is the program source code which your parser/interpreter is supposed to parse/interpret. The other input file (input values) is the file from which InputNum reads the integer numbers. You should write the output to an output file.

2) If you have not been able to finish assignment 1 you can use FileReader.java (you should use Next() method) and Scanner.java (you should use Next() method) source codes to do your assignment 2.

3) You can also use FileReader.java and Scanner.java which has GetSym() method as the interface to the parser.

4) A sketch for Parser.java has been provided for you. You can complete this input or considering that you'll keep the interface you can implement your own code.

5) Tester.java just calls your code interpreter method and passes you the necessary files.

6) I have provided an erroneous input file so that you can check if your compiler catches those errors correctly. The revised version of the input is also provided and you can check if your interpreter generates the same output file as this one. You can use this input values file.

 

Grading criteria for Assignment 2

This is the compressed file for the test cases that have been used for grading assignment 2.

The grading break down for the 4 test cases is as follows:

Test1 and Test 2:
not able to compile after looking into the code: 5
no output or completely wrong output, parser error: 15
some errors at output: 30
perfect: 40

Test3 and Test4:
not able to compile after looking into the code: 2
parse error without any information / other error : 5
good error info: 10

So test cases 1 and 2 have 40 points each and test cases 3 and 4 have 10 points each.

 

Assignment 3

Those who have not been able to finish assignment 2 can use this working parser. It should be mentioned that to the best of my knowledge this parser works however I don't guarantee its full correctness and you are responsible to make sure that the parser is fully functional. You can use the test cases provided to make sure that your interpreter runs correctly.

NEW: You can find a sample program and its assembly code here.

In this assignment you have to augment your interpreter to build a compiler i.e to generate the machine code for a target processor. The description of assignment 3 can be found here. The target processor is called DLX for which you have to generate the machine code. Your compiler is supposed to provide DLX processor with the machine code translation of the input high-level code. DLX.java is a behavioral simulator for DLX processor that you can use in your project. I have provided a tester file for you which you can use. It creates an object DLX. Your parser is called inside your DLX class. For this you have to slightly change your parser's interface as shown in Parser.java file. You can also use this sample source code and input values file and here is the output. You should pass an array of integers from your compiler to the processor. Instructions should be stored in the array. Since each instruction is 4 bytes (32 bits) you can have an integer representing the machine code. For this you have to manipulate the integer representing the instruction at bit level. Let's say you want to represent AND 5, 2, 0 (Format 2 of DLX instruction set) using integer inst_int (opcode for AND is 9):

inst_int = 9;

inst_int = (inst_int << 26) | (5 << 21) | (2 << 16) | (0);

Of course you should make sure that the index of the registers will be in the range of 0 and 31.

You also need to change your interpreter to be able to generate the assembly code. You need to change the interpreter as follows:

1) The way to address the variables in the code is to specify the displacements of the location of the variables from a reference point. In fact register R30 stores the reference address. The content of register R30 is set inside class DLX. Your variables should be allocated at negative offset relative to the address stored in register R30. For this you need to have a hash table which holds the displacement value for each variable. Once you want to do an operation on a variable you need to load it from the memory using Load/Store instructions in the instruction set of DLX. You store the variable in a register and then you can generate the instruction implementing the operation.

2) In your parser the functions that output a value (except for relation) should now store the value in a specific register and output the number of that register. The following piece of code can clarify the statement:

Let's assume we want to parse and compile the expression a + b * c:

We know that the sequence of the function calls would be as follows: expression -> term -> factor -> (return to term with value of "a" stored in R20) -> (return to expression with value of "a" stored in R20) -> (consume +) -> term -> factor -> (return to term with value "b" stored in R21) -> (consume *) -> factor -> (return to term with value of "c" stored in R22 and do the multiplication of R21 * R22 and store it in R21 and release R22) -> (return to expression with the value of "b * c" stored in R21 and do the addition R21 + R20 and store it in R21 and release R20) -> return to the caller function with value of "a+b*c" stored in R21

You can have a global data structure as the pool of available registers. Once you use the register and you don't need it anymore, you need to release the register so that it can be used for other instructions.

The assembly code for this operation is:

LDW R20, R30, offset (a) (offset associated with variable a)

LDW R21, R30, offset (b)

LWD R22, R30, offset (c)

MUL R23, R21, R22

ADD R21, R23, R20

NOTE: Registers R20, R21, R22 and R23 are brought here as example and it doesn't mean that you should just use these registers.

Since we have a limited number of registers you should make sure to reuse the registers once you don't need the current content of the register anymore. In the example shown you can see that the value of R21 * R22 is stored back in R21 as the previous content of R21 is not needed any more.

NOTE: While you can implement the register allocator function in any algorithm that you want, there is an efficient way of implementing it which makes it easier for you to manage register allocation and register release. It is called register stack which is going to be discussed in the lecture. In this way you assume that your register file is a stack and you can push or pop variables to/from the stack. Assuming using register stack the assembly code would be as follows:

stack pointer = sp = 1

LDW Rsp, R30, offset (a)    // R1

sp++ (not an assembly code but the scheme of the code in your compiler)

LDW Rsp, R30, offset (b)    // R2

sp++

LDW Rsp, R30, offset (c)    // R3

sp++

MUL R(sp - 2), R(sp - 2), R(sp - 1)    // R2, R2, R3

sp--

ADD R(sp - 2), R(sp - 2), R(sp - 1)    // R1, R1, R2

sp--

Advantage: The advantage of using registers to keep the values in an expression is the smaller size of code because we don't need to push the variables to memory stack and then pop them to execute the operations on them.

Disadvantage: Because of limited number of registers we might not be able to compile the code if there are too many variables in the expression.

The other way is to push the variables in memory stack and then pop them to register file when we want to execute an operation on them. Then we have to push the result back to the memory stack. For the example above the assembly code would look like:

LDW R1, R30, offset (a)
PSH R1, R29, 4      // R29 = Stack Pointer

LDW R1, R30, offset (b)
PSH R1, R29, 4

LDW R1, R30, offset (c)
PSH R1, R29, 4

POP R1, R29, -4
POP R2, R29, -4
MUL R1, R1, R2
PSH R1, R29, 4

POP R1, R29, -4
POP R2, R29, -4
ADD R1, R1, R2

PSH R1, R29, 4

Advantage: In this approach two registers are enough for you to compile the whole code with any size of expression.

Disadvantage: The size of the generated assembly code is large because for every variable you have to do 2-3 times of load/store.

It is advised that you use memory stack approach (the second approach) for compiling your code as it works for any size of expression and won't run out of registers.

3) As far as I've understood at this moment you don't need to take care of if and while statements. So whatever comes after this will be for that and for now you can disregard it. You have to add a function in your parser to handle while statements based on the grammar. You also need to consider branching in your instruction stream. You need branches in your if statements and while loop.

 

Grading criteria for Assignment 3

This is the compressed file for the test cases used for grading assignment 3.

The grading criteria is as follows:

Serious compiling problems you get 10 points for test 1 and 5 points for tests 2 and 3
Problems with the calculation (error messages from DLX) 15 points for test 1 and 10 points for tests 2 and 3
Calculation errors/ wrong results 30 points for test 1 and 20 points for tests 2 and 3
Correct outputs 40 for test 1 and 30 for test  2 and 3

If we had to work for a few minutes to solve compile errors and the solution was working correctly 5 points have been taken off because the exact interface restrictions have been violated!

Assignment 4

In this assignment you have to change your compiler to generate code for the Common Intermediate Language (CIL). Details of CIL and its instruction set are mentioned here. In this file you can also find an example showing the generated code for a sample input. Wrapper.dll file mentioned in the pdf file can be downloaded here. In order to call the functions defined inside wrapper.dll file you need to have your compiler written in C#. For this you can convert your java source code using JLCA.

NEW: You can also use and modify a parser which is provided for you in C#. In this zip file you can find a working Visual Studio 2005 project. You have to double click on A4.sln to open the project. In bin\Debug\tests folder a dozen of test input files have been provided for you which you can use to test your code. The source codes are in the root directory of your zip file. You should insert your code inside ParserSkeleton.cs file.

Please be advised that the Main function is located in Driver.cs and the parser expects to see the input file inside folder "tests" under the folder that the assignment 4's executable is located. The extension of the input test file must be "spl". Besides the Wrapper.dll should be copied to the same folder as the executable is located. In Visual Studio, the executable is built in the bin\Debug or bin\Release under the project folder. You should copy the wrapper.dll to one of these two folders that your executable is located in. (Most probably bin/Debug).

To compile and assemble your test cases you can use the following commands:

CMD: a4 1 // To create an asm file from the source program
CMD: ilasm 1.asm // To assemble the asm file into 1.exe (This command is available through the Visual Studio Command Prompt)
CMD: 1 // To run the application

 

Grading Criteria for Assignment 4

These are the test files used for grading assignment 4.

Grading criteria is as follows:

Tests 1, 2 and 3 if not compiled: 5 points

Tests 1, 2 and 3 with wrong output or no output at all: 10 points

Tests 1 and 2 generating some output which is not totally correct: 20. Test 3: 25

Tests 1 and 2 working correctly: 30 points, Test 3: 40!

 

Assignment 5

In this assignment you have to implement a full compiler which supports if statements, while loops, function/procedure calls and arrays. The grammar for this enhanced language which is called SimPL and also a description of assignment 5 can be found here.

NEW: You can also use these test cases to run your code.

For handling arrays in CIL, the language provides some features for you which makes it easier to generate the code for array operations. Some of these instructions include:

A sample assembly code (and its equivalent C# code) for handling arrays can be seen in this file.

You can find interesting and useful sample assembly codes here.

NEW: You should use this Wrapper.dll file for assignment 5.

Some notes on assignment 5:

1) Array ELEMENTS can be passed as function/procedure arguments and you can not pass the identifier of an array or a sub-array of the array to your functions.

By sub-array I mean an array with the same identifier but less number of indices. If for example a_array is a 3 dimensional array with 3 indices you can't pass a_array with 2 indices or less to any function or procedure. You can't also assign a value to any sub-array!

2) You can't have the same name for an array and a variable or function. They should be completely different and you should be able to catch the error.
 

3) In your compiler make sure that no return statement is called inside main function and no return statement returning a number is called inside a procedure. Also you have to make sure that at least one return statement returning a number is called in your function.

4) The wrapper methods that you will use for this assignment are:

** public void insertMain(string[] code, // string array of instructions

int localVarNum, // number of local variables
int[] oneDimArrays, // information on array variables

int maxStackNum); // max. stack length (a good number: 32)
 

NEW: The integer array oneDimArrays holds the sizes of all array variables. For example, if your array variables are a[10] and a2[4][20], oneDimArrays would hold two entries, oneDimArrays[0] = 10 and oneDimArrays[1] = 80.

NEW: The local variable number for arrays start from the end of the scalar variables. If you have 2 arrays and 4 scalars, the arrays are local variables 4 qand 5, NO MATTER WHAT THE DECLARATION ORDER IS.


** public void insertProcedure(string procedureName, // name of procedure
bool returnsVal, // true if procedure returns (scalar) value
int argNum, // number of arguments
string[] code, // same as insertMain
int localVarNum, // same as insertMain
int[] oneDimArrays, // same as insertMain
int maxStackNum); // same as insertMain

- This is almost exactly the same as insertMain, except that you have to enter the additional information of procedure name, if it returns a value, and the number of arguments ({"Main", false, 0} for main).


** public string getProcedureCall(string procName, returnsVal, argNum);

- Same as in assignment4. You will use this to call declared methods as well as the predefined ones.


** public void WriteFile(string fileName);

- Same as in assignment4. Called inside Parser.computation(), at the end.

How to handle multi dimensional arrays:

The way that you should present your multi dimensional array is using single dimensional array in memory:

If you have array int_array[m][n] (the indices range from 0 to n-1 and 0 to m-1) then here is required computation to know the address of the one dimensional array stored in the memory:

x = index of the one dimensional array in the memory

Converting multi dimensional array index to single dimensional:
int_array [i][j] -------> x = i * n + j


Converting single dimensional array index to multi dimensional:

x = k ------> i = [k/n] , j = k - i * n
int_array [i][j]

More than two dimensional arrays are handled exactly the same way!

Assuming that you can define several arrays with several dimensions one problem is how to keep the information about the numbers of elements in each dimension of each array in your source code. We need this information to convert the multi dimensional array to single dimensional array in our memory.

This is a suggestion on how you can do it and you might find better solutions too!

Let's assume that these are the arrays that are defined in your source code:

Array a_array [4] [5];
Array b_array [3] [5] [10];
Array c_array [6] [8] [11] [15] [16];

This is the information that you need to know:

Array a_array has two dimensions and the first dimension has at most 5 elements and the second dimension has at most 4 elements. Array b_array has 3 dimensions and ....!

Had we had just one array (let's say a_array) then we needed to define an array of size two (number of dimensions) in our compiler and then the value of the first element is 5 and the value of the second one is 4. This defined array needs to be dynamically defined as we don't know the number of dimensions of a_array until we scan it.

However the problem arises when we can have several arrays defined in our source code of SimPL language. For that you can use an ArrayList object (let's call it array_dimensions) whose elements are ArrayList. This is basically a two dimensional dynamic array. You also need a hash table (let's name it array_indices) in which for each identifier of a new array you have to assign an index number. This index number is the index of that array in the two dimensional dynamic array.

So for the array declarations that I mentioned we might have something like this:

HashTable arrays_indices:

array_a ----> 0 (index in the ArrayList of the ArrayLists)
array_b ----> 1 (index in the ArrayList of the ArrayLists)
array_c ----> 2 (index in the ArrayList of the ArrayLists)

ArrayList array_dimensions:

array_dimensions[0] -----> 5, 4
array_dimensions[1] -----> 10, 5, 3
array_dimensions[0] -----> 16, 15, 11, 8, 6

Hope this helps,

 

Assignment 5 Grading Criteria

Here are the test cases that I have used to test assignment 5.

The grading criteria is as follows:

Tests 1, 2, 3 and 4 if not compiled: 5 points each

Tests 1, 2, 3 and 4 with wrong output or no output at all: 10 points each

Tests 1, 2, 3 and 4 generating some output which is not totally correct: 15 points each

Tests 1, 2, 3 and 4 working correctly: 25 points each

 

Should you have further concerns or questions you can let me know by email.

Good luck