You are on page 1of 62

Simplifications

of
Context-Free Grammars

1
A Substitution Rule

Equivalent
grammar
S  aB
S  aB | ab
A  aaA
Substitute A  aaA
A  abBc Bb A  abBc | abbc
B  aA
B  aA
Bb
2
A Substitution Rule
S  aB | ab
A  aaA
A  abBc | abbc
B  aA
Substitute
B  aA
S  aB | ab | aaA
A  aaA Equivalent
A  abBc | abbc | abaAc grammar
3
In general:
A  xBz

B  y1

Substitute
B  y1

equivalent
A  xBz | xy1z grammar
4
Nullable Variables

  production : A

Nullable Variable: A 

5
Removing Nullable Variables

Example Grammar:

S  aMb
M  aMb
M 

Nullable variable

6
Final Grammar

S  aMb
S  aMb
Substitute S  ab
M  aMb M 
M  aMb
M 
M  ab

7
Unit-Productions

Unit Production: A B

(a single variable in both sides)

8
Removing Unit Productions

Observation:

A A

Is removed immediately

9
Example Grammar:

S  aA
Aa
A B
BA
B  bb

10
S  aA
S  aA | aB
Aa
Substitute Aa
A B A B B  A| B
BA
B  bb
B  bb

11
S  aA | aB S  aA | aB
Aa Remove Aa
B  A| B BB BA
B  bb B  bb

12
S  aA | aB
S  aA | aB | aA
Aa Substitute
BA Aa
BA
B  bb
B  bb

13
Remove repeated productions

Final grammar
S  aA | aB | aA S  aA | aB
Aa Aa
B  bb B  bb

14
Useless Productions

S  aSb
S 
SA
A  aA Useless Production

Some derivations never terminate...

S  A  aA  aaA    aa aA  
15
Another grammar:

SA
A  aA
A
B  bA Useless Production
Not reachable from S

16
In general: contains only
terminals
if S    xAy    w

w L(G )

then variable A is useful

otherwise, variable A is useless

17
A production A  x is useless
if any of its variables is useless

S  aSb
S  Productions
Variables SA useless
useless A  aA useless
useless B  C useless

useless CD useless


18
Removing Useless Productions

Example Grammar:

S  aS | A | C
Aa
B  aa
C  aCb

19
First: find all variables that can produce
strings with only terminals

S  aS | A | C Round 1: { A, B}
Aa SA
B  aa
C  aCb Round 2: { A, B, S }

20
Keep only the variables
that produce terminal symbols: { A, B, S }
(the rest variables are useless)

S  aS | A | C
Aa S  aS | A
B  aa Aa
C  aCb B  aa
Remove useless productions
21
Second: Find all variables
reachable from S

Use a Dependency Graph

S  aS | A
Aa S A B
B  aa not
reachable

22
Keep only the variables
reachable from S
(the rest variables are useless)

Final Grammar
S  aS | A
S  aS | A
Aa
Aa
B  aa

Remove useless productions

23
Removing All

Step 1: Remove Nullable Variables

Step 2: Remove Unit-Productions

Step 3: Remove Useless Variables

24
Normal Forms
for
Context-free Grammars

25
Chomsky Normal Form

Each productions has form:

A  BC or Aa

variable variable terminal

26
Examples:

S  AS S  AS
S a S  AAS
A  SA A  SA
Ab A  aa
Chomsky Not Chomsky
Normal Form Normal Form

27
Convertion to Chomsky Normal Form

Example: S  ABa
A  aab
B  Ac

Not Chomsky
Normal Form

28
Introduce variables for terminals: Ta , Tb , Tc

S  ABTa
S  ABa A  TaTaTb
A  aab B  ATc
B  Ac Ta  a
Tb  b
Tc  c
29
Introduce intermediate variable: V1

S  AV1
S  ABTa
V1  BTa
A  TaTaTb
A  TaTaTb
B  ATc
B  ATc
Ta  a
Ta  a
Tb  b
Tb  b
Tc  c
Tc  c
30
Introduce intermediate variable: V2
S  AV1
S  AV1
V1  BTa
V1  BTa
A  TaV2
A  TaTaTb
V2  TaTb
B  ATc
B  ATc
Ta  a
Ta  a
Tb  b
Tb  b
Tc  c
Tc  c 31
Final grammar in Chomsky Normal Form:
S  AV1
V1  BTa
A  TaV2
Initial grammar
V2  TaTb
S  ABa B  ATc
A  aab Ta  a
B  Ac Tb  b
Tc  c 32
In general:

From any context-free grammar


(which doesn’t produce  )
not in Chomsky Normal Form

we can obtain:
An equivalent grammar
in Chomsky Normal Form

33
The Procedure

First remove:

Nullable variables

Unit productions

34
Then, for every symbol a:

Add production Ta  a

In productions: replace a with Ta

New variable: Ta
35
Replace any production A  C1C2 Cn

with A  C1V1
V1  C2V2

Vn2  Cn1Cn

New intermediate variables: V1, V2 ,  ,Vn2


36
Theorem: For any context-free grammar
(which doesn’t produce  )
there is an equivalent grammar
in Chomsky Normal Form

37
Observations

• Chomsky normal forms are good


for parsing and proving theorems

• It is very easy to find the Chomsky normal


form for any context-free grammar

38
Greinbach Normal Form

All productions have form:

A  a V1V2 Vk k 0

symbol variables

39
Observations

• Greinbach normal forms are very good


for parsing

• It is hard to find the Greinbach normal


form of any context-free grammar

40
Compilers

41
Machine Code
Program Add v,v,0
v = 5; cmp v,5
if (v>5) jmplt ELSE
x = 12 + v; THEN:
while (x !=3) { Compiler
add x, 12,v
x = x - 3; ELSE:
v = 10; WHILE:
} cmp x,3
...... ...
42
Compiler

Lexical
parser
analyzer

input output

machine
program
code
43
A parser knows the grammar
of the programming language

44
Parser
PROGRAM  STMT_LIST
STMT_LIST  STMT; STMT_LIST | STMT;
STMT  EXPR | IF_STMT | WHILE_STMT
| { STMT_LIST }

EXPR  EXPR + EXPR | EXPR - EXPR | ID


IF_STMT  if (EXPR) then STMT
| if (EXPR) then STMT else STMT
WHILE_STMT while (EXPR) do STMT

45
The parser finds the derivation
of a particular input

derivation
Parser
input E => E + E
E -> E + E
=> E + E * E
10 + 2 * 5 |E*E
=> 10 + E*E
| INT
=> 10 + 2 * E
=> 10 + 2 * 5

46
derivation tree
derivation
E

E => E + E E + E
=> E + E * E
=> 10 + E*E 10
E * E
=> 10 + 2 * E
=> 10 + 2 * 5 2 5

47
derivation tree

E machine code

E + E mult a, 2, 5
add b, 10, a
10
E * E

2 5

48
Parsing

49
Parser
input
grammar derivation
string

50
Example:

Parser
S  SS derivation
input
S  aSb
aabb ?
S  bSa
S 

51
Exhaustive Search

S  SS | aSb | bSa | 

Phase 1: S  SS Find derivation of


S  aSb aabb
S  bSa
S 
All possible derivations of length 1
52
S  SS aabb
S  aSb
S  bSa
S 

53
Phase 2 S  SS | aSb | bSa | 
S  SS  SSS
S  SS  aSbS aabb
Phase 1 S  SS  bSaS
S  SS S  SS  S
S  aSb S  aSb  aSSb
S  aSb  aaSbb
S  aSb  abSab
S  aSb  ab 54
S  SS | aSb | bSa | 
Phase 2
S  SS  SSS
S  SS  aSbS aabb
S  SS  S

S  aSb  aSSb
S  aSb  aaSbb
Phase 3
S  aSb  aaSbb  aabb
55
Final result of exhaustive search
(top-down parsing)
Parser
S  SS
input
S  aSb
aabb
S  bSa
S 
derivation

S  aSb  aaSbb  aabb


56
Time complexity of exhaustive search

Suppose there are no productions of the form

A
A B
Number of phases for string w : approx. |w|

57
For grammar with k rules

Time for phase 1: k

k possible derivations

58
Time for phase 2: 2
k

2 possible derivations
k

59
Time for phase |w| is 2|w|:

A total of 2|w| possible derivations

60
Total time needed for string w:

k  k  k
2 | w|

phase 1 phase 2 phase |w|

Extremely bad!!!
61
For general context-free grammars:

There exists a parsing algorithm


that parses a string | w |
3
in time | w |

The CYK parser

62

You might also like