> b(
H/0|DTimes New Romanp5D,0DWo
0DDTahomaew Romanp5D,0DWo
0D" DMonotype Sortsp5D,0DWo
00DSymbole Sortsp5D,0DWo
0D@DLucida Sans Unicode,0DWo
0D"a.
@n?" dd@ @@``8O J
!"$#%&'()*+,-./0123465789:;<=>?@ABECDF$HIJc$ ̙3f33ff@g4JdJd08ppp@uʚ;2Nʚ;<4!d!ddЁ
0P<4dddddЁ
0P<4BdBddЁ
0P___PPT9l/0P
(((?O=U,Normal Forms for CFG s\Eliminating Useless Variables
Removing Epsilon
Removing Unit Productions
Chomsky Normal FormIVariables That Derive NothingxConsider: S -> AB, A -> aA | a, B -> AB
Although A derives all strings of a s, B derives no terminal strings (can you prove this fact?).
Thus, S derives nothing, and the language is empty.&
f,/p 7Testing Whether a Variable Derives Some Terminal StringBasis: If there is a production A -> w, where w has no variables, then A derives a terminal string.
Induction: If there is a production A -> a, where a consists only of terminals and variables known to derive a terminal string, then A derives a terminal string. V
3f_ 3f&p
Testing (2)Eventually, we can find no more variables.
An easy induction on the order in which variables are discovered shows that each one truly derives a terminal string.
Conversely, any variable that derives a terminal string will be discovered by this algorithm.Proof of Converse3f
The proof is an induction on the height of the least-height parse tree by which a variable A derives a terminal string.
Basis: Height = 1. Tree looks like:
Then the basis of the algorithm
tells us that A will be discovered.2%x3feInduction for Converse 3fAssume IH for parse trees of height < h, and suppose A derives a terminal string via a parse tree of height h:
By IH, those Xi s that are
variables are discovered.
Thus, A will also be discovered, because it has a right side of terminals and/or discovered variables.:g}|
4Algorithm to Eliminate Variables That Derive NothingDiscover all variables that derive terminal strings.
For all other variables, remove all productions in which they appear either on the left or the right.uExample: Eliminate Variables33S -> AB | C, A -> aA | a, B -> bB, C -> c
Basis: A and C are identified because of A -> a and C -> c.
Induction: S is identified because of S -> C.
Nothing else can be identified.
Result: S -> C, A -> aA | a, C -> cj**f3f7 3fG33f>
Unreachable SymbolsAnother way a terminal or variable deserves to be eliminated is if it cannot appear in any derivation from the start symbol.
Basis: We can reach S (the start symbol).
Induction: if we can reach A, and there is a production A -> a, then we can reach all symbols of a.^}3f% 3f4# 2Unreachable Symbols (2)Easy inductions in both directions show that when we can discover no more symbols, then we have all and only the symbols that appear in derivations from S.
Algorithm: Remove from the grammar all symbols not discovered reachable from S and all productions that involve these symbols. & w
Eliminating Useless SymbolsA symbol is useful if it appears in some derivation of some terminal string from the start symbol.
Otherwise, it is useless.Eliminate all useless symbols by:
Eliminate symbols that derive no terminal string.
Eliminate unreachable symbols.XQufcf$Q<Example: Useless Symbols (2)33 S -> AB, A -> C, C -> c, B -> bB
If we eliminated unreachable symbols first, we would find everything is reachable.
A, C, and c would never get eliminated.2#{!f{ {K"Why It WorksAfter step (1), every symbol remaining derives some terminal string.
After step (2) the only symbols remaining are all derivable from S.
In addition, they still derive a terminal string, because such a derivation can only involve symbols reachable from S.Epsilon ProductionsWe can almost avoid using productions of the form A -> (called -productions ).
The problem is that cannot be in the language of any grammar that has no productions.
Theorem: If L is a CFL, then L-{} has a CFG with no -productions.RZD7 ff633
Nullable Symbols &To eliminate -productions, we first need to discover the nullable variables = variables A such that A =>* .
Basis: If there is a production A -> , then A is nullable.
Induction: If there is a production A -> a, and all symbols of a are nullable, then A is nullable.
,f 3f 3f&#P: ^M#Example: Nullable Symbols33 H S -> AB, A -> aA | , B -> bB | A
Basis: A is nullable because of A -> .
Induction: B is nullable because of B -> A.
Then, S is nullable because of S -> AB.~$fff3f 3fPb
# # $#Proof of Nullable-Symbols Algorithm$3f The proof that this algorithm finds all and only the nullable variables is very much like the proof that the algorithm for symbols that derive terminal strings works.
Do you see the two directions of the proof?
On what is each induction?5 %2Eliminating -Productions$
.Key idea: turn each production A -> X1& Xn into a family of productions.
For each subset of nullable X s, there is one production with those eliminated from the right side in advance.
Except, if all X s are nullable, do not make a production with as the right side.lT̙(?>21 l5(DExample: Eliminating -Productions0#33
.S -> ABC, A -> aA | , B -> bB | , C ->
A, B, C, and S are all nullable.
New grammar:
S -> ABC | AB | AC | BC | A | B | C
A -> aA | a
B -> bB | b+.?fffffff.>ffb
#9
+Why it WorksProve that for all variables A:
If w and A =>*old w, then A =>*new w.
If A =>*new w then w and A =>*old w.
Then, letting A be the start symbol proves that L(new) = L(old) {}.
(1) is an induction on the number of steps by which A derives w in the old grammar. Su3f
CW,$Proof of 1 Basis*3f3f*If the old derivation is one step, then A -> w must be a production.
Since w , this production also appears in the new grammar.
Thus, A =>new w.@O=(m.,Proof of 1 Induction*3f
3fLet A =>*old w be an n-step derivation, and assume the IH for derivations of less than n steps.
Let the first step be A =>old X1& Xn.
Then w can be broken into w = w1& wn,
where Xi =>*old wi, for all i, in fewer than n steps. n! %P#%1*Induction Continued 3f
ZBy the IH, if wi , then Xi =>*new wi.
Also, the new grammar has a production with A on the left, and just those Xi s on the right such that wi .
Note: they all can t be , because w .
Follow a use of this production by the derivations Xi =>*new wi to show that A derives w in the new grammar.J*mM4. Lc.2Proof of Converse3f
We also need to show part (2) if w is derived from A in the new grammar, then it is also derived in the old.
Induction on number of steps in the derivation.
We ll leave the proof for reading in the text.4Unit ProductionsA unit production is one whose right side consists of exactly one variable.
These productions can be eliminated.
Key idea: If A =>* B by a series of unit productions, and B -> a is a non-unit-production, then add production A -> a.
Then, drop all unit productions.`
fa̙73#5,Unit Productions (2)Find all pairs (A, B) such that A =>* B by a sequence of unit productions only.
Basis: Surely (A, A).
Induction: If we have found (A, B), and B -> C is a unit production, then add (A, C).:P3f 3fM7*Proof That We Find Exactly the Right Pairs+3f&By induction on the order in which pairs (A, B) are found, we can show A =>* B by unit productions.
Conversely, by induction on the number of steps in the derivation by unit productions of A =>* B, we can show that the pair (A, B) is discovered.:9Proof The the Unit-Production-Elimination Algorithm Works:3f5Basic idea: there is a leftmost derivation A =>*lm w in the new grammar if and only if there is such a derivation in the old.
A sequence of unit productions and a non-unit production is collapsed into a single production of the new grammar.0
f&=Cleaning Up a GrammarTheorem: if L is a CFL, then there is a CFG for L {} that has:
No useless symbols.
No -productions.
No unit productions.
I.e., every right side is either a single terminal or has length > 2.B;uF3f.#A>"Cleaning Up (2)Proof: Start with a CFG for L.
Perform the following steps in order:
Eliminate -productions.
Eliminate unit productions.
Eliminate variables that derive no terminal string.
Eliminate variables not reached from the start symbol.NEu3f@
@Chomsky Normal FormA CFG is said to be in Chomsky Normal Form if every production is of one of these two forms:
A -> BC (right side is two variables).
A -> a (right side is a single terminal).
Theorem: If L is a CFL, then L {} has a CFG in CNF.z^Qu7f4Q3fCProof of CNF Theorem3fnStep 1: Clean the grammar, so every production right side is either a single terminal or of length at least 2.
Step 2: For each right side a single terminal, make the right side all variables.
For each terminal a create new variable Aa and production Aa -> a.
Replace a by Aa in right sides of length > 2.sk4>DExample: Step 233 Consider production A -> BcDe.
We need variables Ac and Ae. with productions Ac -> c and Ae -> e.
Note: you create at most one variable for each terminal, and use it everywhere it is needed.
Replace A -> BcDe by A -> BAcDAe.b]"2Ybp E *CNF Proof Continued&3f
Step 3: Break right sides longer than 2 into a chain of productions with right sides of two variables.
Example: A -> BCDE is replaced by A -> BF, F -> CG, and G -> DE.
F and G must be used nowhere else.F#a33>#M#:Example of Step 3 Continued33Recall A -> BCDE is replaced by A -> BF, F -> CG, and G -> DE.
In the new grammar, A => BF => BCG => BCDE.
More importantly: Once we choose to replace A by BF, we must continue to BCG and BCDE.
Because F and G have only one production.:*tG*F!*CNF Proof Concluded&3f
We must prove that Steps 2 and 3 produce new grammars whose languages are the same as the previous grammar.
Proofs are of a familiar type and involve inductions on the lengths of derivations./
!"
&')*-/03689;<?ABGHI J!L"N# ` ̙33` ` ff3333f` 333MMM` f` f` 3>?" dd@,?udd@ w " @ ` n?" dd@ @@``PR @ ` `p>>f(
6`o P
T Click to edit Master title style!
!
0r
RClick to edit Master text styles
Second level
Third level
Fourth level
Fifth level!
S
0v ``
>*
0`| `
@*
04 `
@*Z
Byh@ ? ̙33 Default Design
0zrP
(
0|t P
t
P*
0t t
R*
d
c$ ?
t
0lt
@t
RClick to edit Master text styles
Second level
Third level
Fourth level
Fifth level!
S
6(t `P t
P*
6t ` t
R*
H
0h ? ̙330$(
r
S"tt
r
S#t` t
H
0h ? ̙33
` $(
r
StP
t
r
Stt
H
0h ? ̙33
$$(
$r
$ SؽtP
t
r
$ Stt
H
$0h ? ̙33
($(
(r
( StP
t
r
( Stt
H
(0h ? ̙33
zr
,
(
,r
, SxtP
t
r
, S4tt
F 0
,p@2
,
<t̙`0
3A2
,
<t̙
@a1" 2
,
<t̙ 0
@an"
,
<t0
5. . .TB
,B
c$D TB
,
c$DP @ H
,0h ? ̙33
6.0(
0r
0 StP
t
r
0 Sxtt
F P{Z
0vN 0
0{K 2
0
<xt̙`0
3A2
0
<t̙
@X1" 2
0
<dt̙ 0
@Xn"
0
<,t0
5. . .TB
0B
c$D TB
0
c$DP @ `R
0
0̙PK *
`R
0
0̙0 Z
0
<{
>w1
0
<P{
*
Xwn H
00h ? ̙33
4:(
4r
4 S{P
{
4 Sl{0{
"p`PpH
40h ? ̙33
8:(
8r
8 S\'{P
{
8 St{
"p`PpH
80h ? ̙33
@<$(
<r
< S8{P
{
r
< ST9{ `{
H
<0h ? ̙33
`@$(
@r
@ S$D{P
{
r
@ SD{P`{
H
@0h ? ̙33
D:(
Dr
D SO{P
{
D SQ{{
"p`PpH
D0h ? ̙33
H$(
Hr
H S\{P
{
r
H Sp]{{
H
H0h ? ̙33
",$(
,r
, SXh{P
{
r
, Si{{
H
,0h ? ̙33
|$(
|r
| Ss{P
{
r
| S|t{ {
H
|0h ? ̙33
$(
r
S{P
{
r
SH{{
H
0h ? ̙33
$(
r
S{P
{
r
S<{{
H
0h ? ̙33
@$(
r
S{
{
r
S{`{
H
0h ? ̙33
`$(
r
S{P
{
r
Sh{{
H
0h ? ̙33
vn(
r
Sر{P
{
r
S{ {
l { 0&
{ 0&,$D
0
<{
&
p2Note: C is now useless.
Eliminate its productions."33f/@
@
{ `
ZB
B
s*D8c
@ZB
s*D8c
@N
@
0 {
ZB
B
s*D8c
@ZB
s*D8c
@N
@
p{
ZB
B
s*D8c
@ZB
s*D8c
@N
@
{ 0
ZB
B
s*D8c
@ZB
s*D8c
@H
0h ? ̙33
:(
r
S{P
{
S0{{
"p`PpH
0h ? ̙33
$(
r
S{P
{
r
S\{{
H
0h ? ̙33
$(
r
S {P
{
r
S{{
H
0h ? ̙33
$(
r
S{P
{
r
S@{{
H
0h ? ̙33
$(
r
S~P
~
r
ST ~~
H
0h ? ̙33
@$(
r
S{P
~
r
S{{
H
0h ? ̙33
`$(
r
SP~P
~
r
S~~
H
0h ? ̙33
$(
r
S!~P
~
r
S"~~
H
0h ? ̙33
$(
r
SL(~P
~
r
S*~~
H
0h ? ̙33
:(
r
S;~P
~
S$~~
"p`PpH
0h ? ̙33
yq
(
r
SHN~P
~
SpR~~
"p`Ppl ,
@,
,$D
0
<S~,
uAMust be first. Can create
unit productions or useless
variables.BBZB
s*D&f
H
0h ? ̙33
:(
r
Sa~P
~
S g~ ~
"p`PpH
0h ? ̙33
P(
r
Spw~P
~
SPx~ 0<$
0~
H
0h ? ̙33
$(
r
SsP
s
r
Sss
H
0h ? ̙33
P$(
r
Sx~P
~
r
S ~~
H
0h ? ̙33
#40(
4x
4 c$0"_P
_
x
4 c$~_
H
40h ? ̙33
!p$$(
$r
$ St~P
~
r
$ S0~~
H
$0h ? ̙33
0@L(
LX
L C
t
L SPt
@t
H
L0h ? ̙33
0pP(
PX
P C
t
P Sǥɧb6ye
Q=Z)+.~R&q#
NOh+'00`h
CS154 slidesoJeff UllmanJeffUll69fMicrosoft PowerPointP@}R@}J@^o=G/g 3& &&#TNPP2OMi
&
TNPP &&TNPP
&&--&&-$0>$>L$LZ*$*Zh8$8hvF-$FvT$Tb$bp-$p~$~$-$-$$-$$-$-$,$,:
-$
:H$HV&-$&Vd4-$4drB-$BrP$P^-$^k-$ky-$y-$-$-$-$-$-$-$-$'-$'5-$5C-$CQ!-$!Q_/-$/_m=-$=m{K-$K{Y-$Yg-$gu-$u-$-$-$-$-$-$-$-$#-$#1-$1?-$?M-$M[+-$+[i9-$9iwG-$GwU-$Uc-$cq-$q-$-$-$-$-$-$-$-$-$--$-;-$;I-$IW'-$'We5-$5erB-$BrP-$P^-$^l-$lz-$z-$-$-$-$-$$-$-$($(6-$6D-$DR"$"R`0-$0`n>$>n|L-$L|Z$Zh$hv-$v$$$&&&-&$&&-&&
&&-&&&&-$0>$>L$LZ*$*Zh8$8hvF-$FvT$Tb$bp-$p~$~$-$-$$-$$-$-$,$,:
-$
:H$HV&-$&Vd4-$4drB-$BrP$P^-$^k-$ky-$y-$-$-$-$-$-$-$-$'-$'5-$5C-$CQ!-$!Q_/-$/_m=-$=m{K-$K{Y-$Yg-$gu-$u-$-$-$-$-$-$-$-$#-$#1-$1?-$?M-$M[+-$+[i9-$9iwG-$GwU-$Uc-$cq-$q-$-$-$-$-$-$-$-$-$--$-;-$;I-$IW'-$'We5-$5erB-$BrP-$P^-$^l-$lz-$z-$-$-$-$-$$-$-$($(6-$6D-$DR"$"R`0-$0`n>$>n|L-$L|Z$Zh$hv-$v$$$&-
--&&&y&/v`v
p"سv`vf- @Times New Roman\سv`vf- . 2
f1 .--IyH-- @"Tahomav
p"\سv`vf- .!2
#Normal Forms for ' 2
2 . .2
#CFGs#'.--)1p-- @"Tahomav
p"\سv`vf- .32
Eliminating Useless Variables
$
. .2
9Removing Epsilon$
. .-2
Removing Unit Productions$
. .2
W
Chomsky $
. .2
WNormal Form$
$.--"Systemf
G-&TNPP &՜.+,0H
(On-screen ShowStanford University, CS Dept.c$$ *Times New RomanTahomaMonotype SortsSymbolLucida Sans UnicodeDefault DesignNormal Forms for CFG’sVariables That Derive Nothing8Testing Whether a Variable Derives Some Terminal StringTesting – (2)Proof of ConverseInduction for Converse5Algorithm to Eliminate Variables That Derive NothingExample: Eliminate VariablesUnreachable SymbolsUnreachable Symbols – (2)Eliminating Useless Symbols!Example: Useless Symbols – (2)
Why It WorksEpsilon ProductionsNullable SymbolsExample: Nullable Symbols$Proof of Nullable-Symbols AlgorithmEliminating ε-Productions$Example: Eliminating ε-Productions
Why it WorksProof of 1 – BasisProof of 1 – InductionInduction – ContinuedProof of ConverseUnit ProductionsUnit Productions – (2)+Proof That We Find Exactly the Right Pairs:Proof The the Unit-Production-Elimination Algorithm WorksCleaning Up a GrammarCleaning Up – (2)Chomsky Normal FormProof of CNF TheoremExample: Step 2CNF Proof – Continued Example of Step 3 – ContinuedCNF Proof – ConcludedFonts UsedDesign Template
Slide Titles$_JeffJeff
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~Root EntrydO)Current UserSummaryInformation(0PowerPoint Document(DocumentSummaryInformation8Root EntrydO)+`<Current UserPSummaryInformation(0PowerPoint Document((_'Michael GoodrichMichael Goodrich