Informatics 42 • Winter 2008 • David G. Kay • UC Irvine

Fifth Homework

Get your work checked and signed off by a classmate, then show it to your TA in lab by Monday, February 11.

(a) Below is some Java code that implements a finite-state automaton.


public class FSA
{
    private static final int SECRET1 = 35;
    private static final int SECRET2 = 127;
    private static final int SECRET3 = 33;

    private static String[] stateList = { "Init", "GotFirst", "GotSecond", "Success" };

    public static void main(String[] args)
    {
        int number;
        int count = 1;
        String state;

        state = "Init";
        
        while (!state.equals("Success") && count <= 3)
        {
            number = getNext();
            
            if (state.equals("Init"))
            {
                if (number == SECRET1)
                    state = "GotFirst";
                else
                    state = "Init";
            }
            else if (state.equals("GotFirst"))
            {
                if (number == SECRET2)
                    state = "GotSecond";
                else
                    state = "Init";
            }
            else if (state.equals("GotSecond"))
            {
                if (number == SECRET3)
                    state = "Success";
                else
                    state = "Init";
            }
            
            count++;
        }

        if (state.equals("Success"))
            System.out.println("Input accepted.");
        else
            System.out.println("Input rejected.");
    }
}

(a.1) Draw the state transition diagram that represents the FSA this program implements. The input tokens here are whole integers, not individual characters.

(a.2) Describe in one brief English sentence what this FSA does. Try to think of a simple, real-world, non-computer-related object that this FSA models.

Intermezzo: State transition diagrams are one way to describe FSAs. Another way (which is easier to represent in a computer) is a transition table. A transition table has a row for each state and a column for each input (or each disjoint category of inputs); the value at each position in the table tells you what state to go to when you read a given input in a given state. Below is a transition table for the program above:
  Secret1 Secret2 Secret3 other
Init GotFirst Init Init Init
GotFirst Init GotSecond Init Init
GotSecond Init Init Success Init
Success      

This table says just what the program and the state transition diagram say: If you're in Init and you read Secret1, you go into state GotFirst; if you're in state Init and you read anything else, you stay in Init. If you're in state GotFirst and you read Secret2, you go into state GotSecond; if you read anything else in state GotFirst you go to state Init. And finally, if you're in state GotSecond and you read Secret3, you go to state Success; otherwise, you go to Init. In state Success the machine stops, so you don't make any transitions out of that state; it's the accept state.

As we noted, transition tables make FSAs easy to represent in a computer. In fact, we can write a very simple but very general FSA simulator according to the following pseudocode:


initialize TransitionTable;
state ← initial state;

while there are more tokens:
    get a token;
    state ← TransitionTable[state][token];

if state = accept state
    then accept;
    else reject;

Because this code is so simple, it's the preferred way to implement FSAs in programs. The only tricky part is finding a data type that will represent the range of tokens and will at the same time be acceptable as an array index in your programming language. Scheme, for example, handles symbolic names very easily. But in the above Java code, you can't have an array with four columns labeled 35, 127, 33, and 'other'. The cleanest way to deal with this is to have a function that would translate each token (or category of tokens) to its corresponding column in the transition table—effectively a switch statement or sequence of if-statements that map the tokens (or token categories) to the range 0..3.

(b) Think about the task of extracting words from a stream of text. In Java, classes like Scanner or StringTokenizer do this for you, but sometimes you need to specify "words" idiosyncratically. You can do this kind of input-parsing task much more easily using state machines than by writing code directly.

(b.1) Draw a state transition diagram that accepts words defined as follows: a sequence of non-separator characters whose end is marked by a separator. Separators are symbols that separate English words—space, comma, semicolon, colon, and so on. Note that the hyphen (-), the apostrophe ('), and the percent sign (%) are not separators: treat "mother-in-law," "don't," and "23%" as single words. The end of the line is a separator, unless the last word of the line ends with a hyphen. That way, if a word like mother-in-law is hyphenated across two lines, it will still count as one word. (We will assume that in our input, only words that are always hyphenated will be hyphenated at the end of a line; that is, you should not expect normally-unhyphenated words to be broken across two lines.) Watch for multiple separators in a row—for example, a comma followed by a space is two separators, but there is no word between them.

You could code up this FSA into a method called getNextWord, and call it to parse a stream of input. Coding this isn't a required part of this assignment, though.

(b.2) Write a transition table for the state machine you drew in part (b.1).

(c) One of Scheme's attractions is that its syntax is very simple. Unlike Java, which has a few dozen different statements, each with its own grammar and punctuation rules, every program or expression in Scheme is just a list of words surrounded by parentheses. This provides a rich variety of expression because a "word" can be (a) any sequence of characters delimited (separated from other words) by white space, or (b) a parenthesized list of words nested within the outside list. The following are all valid Scheme expressions (each is one line long except the last, which starts with the word define):


(Fee fie fo fum)
(+ 3.14159 1776 -45 quantity)
(equal? (+ 2 2) (+ 3 1))
(define square
  (lambda (x)
    (* x x)))

Novice Scheme programmers sometimes worry about keeping all the parentheses balanced, but most Scheme systems have "syntax-based" text editors that automatically keep track of the parentheses, so that any time you type a right parenthesis it automatically flashes the left parenthesis that matches it. That way you can see effortlessly what matches what. (This idea has found its way into some program editors for Java and other languages, where it's also useful.)

Suppose you decide to write a syntax-based editor for Scheme, and as your first task you want to write some code that checks whether the parentheses are balanced in a Scheme expression. Astutely, you start by designing a FSA. To make it truly a finite-state machine, we have to put an upper limit on the depth to which parentheses can be nested; the example below shows the FSA for an upper limit of three-deep nesting. (In the diagram, "other" means an input symbol other than an open or close parenthesis.)

(c.1) After scanning the entire Scheme expression, in what state should your machine be if the parentheses were correctly balanced?

(c.2) This FSA works fine in theory, but for a realistic nesting depth of a few dozen, the diagram would be tediously repetitious. So you decide to simplify things and encapsulate the state information in a simple integer counter. Then you can have a single state on the page, and all the action happens in the transition steps, where you increment the counter for each left parenthesis and decrement it for each right parenthesis. (Having a variable may appear to violate the definition of a finite-state machine, all of whose information is encapsulated in a finite number of states. But since integer variables on computers (as opposed to integers in mathematics) always have a finite upper bound, we're technically safe. If our machine used a stack to keep track of the unbalanced parentheses (which is what our integer counter is modeling), it would no longer be an FSA—it would be a PDA (push-down automaton), which can accept a broader class of languages.)

The modified (augmented) machine appears below.



In the augmented machine, being in the stop state is not enough to know that the Scheme program has balanced parentheses; the value of the counter must be considered as well. What should the counter's value be if the machine accepts the Scheme source (that is, if the parentheses are correctly balanced)? What must have happened for the machine to end up in the error state?

(c.3) Things are never quite as simple as they first seem. Comments in Scheme programs start with a semicolon and extend to the end of the line. Thus, the following is a valid Scheme expression; everything to the right of the semicolon on each line is a comment. Of course the contents of comments are ignored when checking for balanced parentheses.


  (define print-it        ; In this routine we
    (lambda (p)           ; a) accept a parameter,
      (display p)         ; b) display it, and
      (newline)))         ; c) hit carriage return

Draw a new FSA-like machine, similar to the one above, to account for comments correctly; you will have to add more states.

(c.4) And there's one more wrinkle. Literal character strings in Scheme are enclosed in double-quote marks. As in Java, the contents of literal strings are ignored when analyzing the syntax of the program. The following three expressions are valid in Scheme.

(display "Oh; really?")
(list "a)" "b)" "c)" )
(let ((delims ".,;:)(("))) ; This has an extra '(' in quotes

Draw a new FSA-like machine to handle both strings and comments correctly.

(c.5) Write a transition table for the state machine you designed in part (e.4). Note that some of the transitions in some conditions will also increment or decrement the count of parentheses.

(c.6) Test your FSA from part (e.4) thoroughly on paper—devise a thorough test plan and work each test through your FSA.

(d) (Optional) Available on the web is a program called JFLAP, written at Duke University (http://www.jflap.org/). You can download this Java application and use it to build and test your own simple FSAs (as well as do other formal-language activities). Other state machine simulators are available on the web; you can find some of them by using search strings like "state machine applet" or "FSA animation." If you'd like to work on building or enhancing tools like these (especially to allow graphical construction of useful FSAs with regular expressions as the transitions), come talk to me some time.

(e) A grammar is a set of rules that can generate all the strings in a formal language. In the right form, a grammar for a programming language can be used with other software to produce automatically part of a compiler for that language.

Below is a grammar (in Backus-Naur Form, or BNF notation) that describes arithmetic expressions:

<expression> ::=    <real> | <variable> | ( <expression> ) | 
                                       <expression> <operator> <expression> | 
                                       ( <variable>  =  <expression> ) 
<real> ::=    <positive-real>  | -  <positive-real> 
<positive-real> ::=    <integer-part>  | <integer-part> . <integer-part>  
<integer-part> ::=    <digit> | <digit> <integer-part> 
<variable> ::=    <letter> 
<digit> ::=    0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 
<letter> ::=       a | b | c | d | e | f | g | h | i | j | k | l | m | 
                 n | o | p | q | r | s | t | u | v | w | x | y | z  
 <operator> ::=    + | - | * | / | %
(e.1) Some of the following expressions can be generated by this grammar; others can not. Indicate which are the valid expressions. (The easiest way to do this might be to photocopy the page, or print it from the on-line version, and circle the valid expressions.)

3

(e.2) Using the grammar, generate four more expressions that aren't on the above list. Each expression should involve applying at least ten rules. For each expression, show its derivation tree (with <expression> at the root and terminal symbols—i.e., without angle brackets—at the leaves).

(e.3) Give three arithmetic expressions that are syntactically valid in Java but are not generated by this grammar.

(e.4) Modify the grammar to allow multi-letter variable names. This requires changing only one of the existing rules.


Written by David G. Kay, Winter 2005.

FSA exercises written by David G. Kay, Winter 1991 (based on materials from 1990 and earlier). Revised by Joe Hummel, Norman Jacobson, Theresa Millette, Brian Pitterle, Alex Thornton, Rasheed Baqai, Li-Wei (Gary) Chen, and David G. Kay, 1992-1999.

BNF grammar exercise written by David G. Kay, Spring 1999.