The Project FAQ


Project 1 (due Friday April 12):

Download the soot framework, get yourself familiar with various command-line options and phases, and implement an intra-procedural analysis that print out all heap load/store statements. Here is an article containing instructions to implement an intra-procedural analysis in soot Ignore the dataflow analysis part--all we need to do in the first project is to implement an intra-procedural transform that iterates over the set of all Jimple statements and print out those that read and write heap locations.


Specifically, each Java method in soot is represented by a SootMethod object. From this object, we retrieve the body of the method by calling getActiveBody(). This will give us a Body object, through which we have access to all information of the Java method, such as local variables and statements. As we are interested in obtaining the statements (i.e., Units), we can use method getUnits() to get a chain of statements the method contains. Write a while loop that iterates over the statements in the chain. In Jimple, each statement is an object of interfact Stmt, which has many different implementing classes. Among these Stmt classes, we are interested in JAssignStmt, which represents assignment statements (e.g., anything of the form a = b). In order to identify statements that access the heap, we need to understand what the left-hand-side (LHS) and RHS operands are for each statement. To do this, we can call methods getLeftOp() and getRightOp() on each statement object. These calls return objects of type Value, which is an interface representing all expressions, locals, constants, and references in a Java program. We are particularly interested in InstanceFieldRef, StaticFieldRef, and ArrayRef, each of which represents a type of heap access. In particular,

If the LHS/RHS of an assignment is an InstanceFieldRef, the assignment is of the form a.f = b(heap write) /b = a.f (heap read);

If the LHS/RHS of an assignment is a StaticFieldRef, the assignment is of the form A.f = b (heap write)/b=A.f (heap read);

If the LHS/RHS of an assignment is an ArrayRef, the assignment is of the form a[i] = b (heap write)/b=a[i] (heap read);


The output of the analysis should have the following format:

Statement a.f = b, heap write;
Statement b = a[i], heap read;


Project 2 (due Sunday May 3)

The goal of the second project is to implement an intraprocedural constant propagation analysis. There are two versions of the analysis. The first one is a MUST analysis that identifies, for each statement of the form x = a (where a can be a complex expression), the constant value that x must have. This analysis is used widely inside compilers. The second analysis is a MAY analysis, which computes, for each statement of the form x = a, all potential constant values that x *may* receive at run time.

We will implement both versions in the project by instantiating the general monotone framework we discussed in the class. You need to first define the lattice, the bottom element (i.e., unknown), the top element (i.e., anything), the transfer functions, and the meet operator. Implement a worklist-based fixedpoint computation algorithm. Make sure your analysis terminates in the presence of loops. Since the analysis is intraprocedural, there is no need to analyze call relationships. However, you do need to perform worst-case approximation when handling call sites.

We consider only simple assignments that manipulate integer values with +,-,*,/ operations.

Note that for the MAY analysis, it is not obvious how to make the analysis converge. One simple solution is to do ``widening`` (as discussed in abstract interpretation). For variables that are defined inside a loop, we can simply assign them a ``top`` lattice element so that the analysis can quickly stabilize.


The output of the analysis should have the following format:

Statement a = b, the must and may sets for a are {...} and {...}
Statement b = c * a, the must and may sets for b are {...} and {...}

Note that the must set must contain one single element and the may set might contain multiple elements.


Project 3 (due Wednesday May 22)

The goal of the third project is to implement a 0-CFA analysis for Java. As discussed in the class, 0-CFA for a functional language computes what function values may reach each variable in the program. In the object-oriented world, 0-CFA identifies, at each call site, what methods may potentially be invoked. Of course, the easiest way to get this information is to simply do a class hierarchy analysis (CHA) that identifies, at a call a.f(), the type of a and all its subtypes, and returns all of the methods defined in these types. A more precise analysis would first compute objects (i.e., allocation sites) that variable a points to and return only the methods in the classes instantiated in these allocation sites.

We will implement this analysis in the project by formulating the analysis as a constraint-based system. To simplify the analysis, we assume a program only has allocation sites of the form A a = new A(), assignment statements of the form a = b, and method calls. This is an interprocedural analysis, which means we do need to consider method calls. For each type of statement, the following constraints will be generated:


(1) A a = new A() => {new A()} pts(a)

(2) a = b => pts(b) pts(a)

(3) r = a.f(b), where f`s definition is

O f(B p) { ...; return v; }


=> pts(a) pts(this) pts(b) pts(p) pts(v) pts(r)


Build a constraint-graph where each node represents a variable and each edge represents a ``subset`` relationship. Develop an iterative algorithm to traverse the graph to find the solution.

Since this is a flow-insensitive analysis (note that all dataflow analyses are *flow-sensitive*), we do not even need a control flow graph. You can simply iterate over the statements (i.e. units) to build the constraint graph.

The output of the analysis should have the following format:

Call site a.f(...), the methods potentially invoked here are <A.f(...)>, <SubA1.f(...)> , <SubA2.f(...)>, etc.
Call site b.g(...), the methods potentially invoked here are <B.g(...)>, etc.


Project 4 (due Sunday June 9)

In this project, we will implement a 1-CFA analysis, based on the 0-CFA analysis we have implemented in Project 3. 1-CFA is a context-sensitive analysis that distinguishes points-to information of each variable under different calling contexts. What we need to do in this project is to create copies for nodes in the constraint graph we built in Project 3. Suppose we have a method m that has five variables, and m is invoked at six different places (i.e., there are six call sites that may have m as a target in the call graph). For each variable in m, we need to create six copies, each representing a calling context. At a call site m(a) //c1, we add an edge from a to a copy of the first formal parameter node in m representing c1. Because variables for different contexts are distinguished, the information passed into the method at one call site can only flow back to the same call site, and thus, we have much more precise points-to information computed from 1-CFA. The output of the analysis should have the same format as 0-CFA.


Information about the soot framework

1.     Soot is a Java compiler infrastructure that takes bytecode as input (i.e., .class files) and generates SSA-based intermediate representation (i.e., Jimple). You can implement your analysis on top of the Jimple representation. Soot also provides an option to do program transformation and bytecode rewriting.


2.     Where can I download it? ( Here you can find various soot tutorials.


3.     Soot API.


4.     The recent release of soot supports the use of option ``-pp`` to save the effort of explicitly adding all library classes into the soot classpath.