(Last modified Thu Jun 05 23:53 2008)
A grammar consists of terminals, nonterminals, and rules. Each grammar has a nonterminal that is its start symbol. A grammar defines a language (a set of strings).
Each grammar corresponds to a language, the set of strings it generates or recognizes. A grammar can be used to generate the strings of its language (going from its start symbol to strings of nonterminals), or to parse strings of its language (going from strings of nonterminals to grammatical structures of those strings).
There are several kinds of grammars, among which are:
The example will be the context-free language of propositional logic. Here is its grammar:
| formula | ::= | T | "True" is a formula |
| | | F | "False" is a formula | |
| | | variable | A variable is a formula | |
| | | negation | The negation of a formula is a formula | |
| | | conjunction | The conjunction of two formulae is a formula | |
| | | disjunction | The disjunction of two formulae is a formula | |
| variable | ::= | A | The letter 'A' is a variable |
| | | B | The letter 'B' is a variable | |
| | | C | The letter 'C' is a variable | |
| negation | ::= | ¬formula | Negate a formula by putting ¬ before it |
| conjunction | ::= | formula∧formula | Conjoin two formulae by putting ∧ between them |
| disjunction | ::= | formula∨formula | Disjoin two formulae by putting ∨ between them |
| parenthesization | ::= | (formula) | Parenthesize a formula by putting () around it |
Its start symbol is formula.
The nonterminals of a grammar are the symbols that are the alphabet of strings in the language the grammar defines.
Example: the characters '(', ')', '¬', '∧', '&or', and the letters A through C.
A nonterminal represents a grammatical category in the language.
Example: the symbols variable, negation, conjunction, disjunction, parenthesization, and formula.
A rule shows how to replace a nonterminal and (eventually) produce a string in the language.
Looking at it from the opposite direction, a rule shows how to parse a string in the language and identify its grammatical form.
A (context-free) rule consists of a nonterminal (the left-hand side or LHS), followed by "::=", followed by the string of terminals and nonterminals (the right-hand side or RHS).
Sometimes "→" is used instead of "::=", especially when the rules are being used as productions.
If there are two or more rules for the same LHS, we often list them successively with '|' for the second or later rules instead of "LHS ::=".
Example: Two rules
| formula | ::= | T | |
| | | F | (Note the | for the second rule for the same LHS) |
Using these as productions, we start with the nonterminal formula (the LHS) and replace it first by the first rule's RHS to get "T", and then by the second rule's RHS to get "F". This gives us a finite set { "T", "F" }.
Example: Also consider the rule
| formula | ::= | ¬formula |
This could go on forever, and we've hardly even used rule 2 yet. These three rules give us an infinite set of formulae { "T", "F", "¬T", "¬F", "¬¬T", "¬¬F", ... "¬¬¬¬F", ... , "¬¬¬¬¬¬¬¬¬T", ... }
Figure 1. Abstract syntax tree for ¬A∧B
We can also use a grammar's rules to parse a string and identify its grammatical structure. Using the full set of rules at the top of the page, we can parse "¬A∧B" thus:
We can't go any further because "A" and "B" are terminals (neither of them is not the LHS of any rule). "A" terminates its branch of the parse tree (thus "terminal"), and "B" terminates its branch. The structure is shown graphically in Figure 1 as an abstract syntax tree. The leaves of the tree are the terminals, and the interior nodes are the syntactic categories.
Note that we apply the weaker-binding rule first (conjunction), and the stronger-binding rule later (negation).
Figure 2. Abstract syntax tree for ¬(A∧B)
Similarly, we can parse "¬(A∧B)", a similar string with a different grammatical structure:
We can of course produce ¬(A∧B) by beginning with the start symbol formula and applying the rules as productions:
There are four principal classes of formal grammars: regular, context-free, context-sensitive, and unrestricted. Each is more powerful than the preceding kinds. We usually use "grammar" to mean "context-free grammar".
The rules of a regular grammar are restricted to contain a single nonterminal as the LHS and one of the following on the RHS:
The regular grammars are equivalent to the regular expressions, and there is an algorithm to translate any regular grammar into the equivalent regular expression (and vice versa).
The rules of a context-free grammar are described above. Each rule has a single nonterminal as its LHS and a (possibly empty) string of nonterminals and terminals as its RHS.
The rules of a context-sensitive grammar show how a nonterminal is expanded in a specific context. Each rule has a single nonterminal N surrounded by context α and β as its LHS, so that the LHS is αNβ, and a non-empty string γ of nonterminals and terminals in the same context as its RHS, so that the RHS is αγβ.
Most definitions of CSGs allow a production whose RHS is the empty string so that the context-free languages are a subset of the context-sensitive languages (see below).
Both the LHSs and RHSs of unrestricted grammars may contain any string of nonterminals and terminals.
Figure 3. The Chomsky hierarchy
The Chomsky hierarchy is the containment hierarchy of the four classes of languages defined by these types of grammars. It relates:
Figure 3 shows the containment hierarchy of these sets of languages.
Just about any language you would ordinarily encounter is context-sensitive, and all the known non-context-sensitive languages are specially-constructed ones such as the example given.
Each grammar of the four language classes is equivalent to an automaton, and an algorithm exists to convert a grammar to an automaton (and vice versa):
| Grammar type | Automaton type |
|---|---|
| Regular grammar | Finite state machine |
| Context-free grammar | Pushdown automaton |
| Context-sensitive grammar | Linear bounded automaton |
| Unurestricted grammar | Turing machine |
John E. Hopcroft and Jeffrey D. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, 1979. (This is the classic text.)
John C. Martin, Introduction to languages and the theory of computation. McGraw-Hill, 1997. (This book is more recent and more accessible.)