You are on page 1of 83

Simplifications of Context-Free Grammars

Courtesy Costas Buch - RPI

A Substitution Rule
Equivalent grammar

S p aB A p aaA Substitute A p abBc B pb B p aA B pb


Courtesy Costas Buch - RPI

S p aB | ab A p aaA A p abBc | abbc B p aA


2

A Substitution Rule
S p aB | ab A p aaA A p abBc | abbc B p aA

Substitute

B p aA
S p aB | ab | aaA A p aaA A p abBc | abbc | abaAc
Courtesy Costas Buch - RPI

Equivalent grammar
3

In general:

A p xBz B p y1
Substitute

B p y1
A p xBz | xy1z
Courtesy Costas Buch - RPI

equivalent grammar
4

Nullable Variables

P  production :

ApP

Nullable Variable:

A- P

Courtesy Costas Buch - RPI

Removing Nullable Variables


Example Grammar:

S p aMb M p aMb M pP
Nullable variable
Courtesy Costas Buch - RPI 6

Final Grammar

S p aMb S p aMb M p aMb M pP


Substitute

M pP

S p ab M p aMb M p ab

Courtesy Costas Buch - RPI

Unit-Productions

Unit Production:

Ap B

(a single variable in both sides)

Courtesy Costas Buch - RPI

Removing Unit Productions


Observation:

Ap A
Is removed immediately

Courtesy Costas Buch - RPI

Example Grammar:

S p aA Apa Ap B BpA B p bb
Courtesy Costas Buch - RPI 10

S p aA Apa Ap B BpA B p bb
Courtesy Costas Buch - RPI

S p aA | aB
Substitute

Ap B

Apa B p A| B B p bb

11

S p aA | aB Apa B p A| B B p bb

S p aA | aB
Remove

BpB

Apa BpA B p bb

Courtesy Costas Buch - RPI

12

S p aA | aB Apa Substitute BpA BpA B p bb

S p aA | aB | aA Apa B p bb

Courtesy Costas Buch - RPI

13

Remove repeated productions

Final grammar

S p aA | aB | aA Apa B p bb

S p aA | aB Apa B p bb

Courtesy Costas Buch - RPI

14

Useless Productions

S p aSb S pP SpA A p aA Useless Production


Some derivations never terminate...

S A aA aaA - aa - aA Courtesy Costas Buch - RPI 15

Another grammar:

SpA A p aA ApP B p bA Useless Production


Not reachable from S

Courtesy Costas Buch - RPI

16

In general: if

contains only terminals

S - xAy - w

w L(G )
then variable otherwise, variable

A is useful A is useless

Courtesy Costas Buch - RPI

17

A production A p x is useless if any of its variables is useless

Variables useless

S p aSb S pP SpA

Productions useless useless useless useless


18

A p aA useless B p C
useless

CpD
Courtesy Costas Buch - RPI

Removing Useless Productions


Example Grammar:

S p aS | A | C Apa B p aa C p aCb

Courtesy Costas Buch - RPI

19

First:

find all variables that can produce strings with only terminals

S p aS | A | C Apa B p aa C p aCb

Round 1:

{ A, B}
SpA

Round 2:
Courtesy Costas Buch - RPI

{ A, B, S }
20

Keep only the variables that produce terminal symbols:


(the rest variables are useless)

{ A, B, S }

S p aS | A | C Apa B p aa C p aCb
Courtesy Costas Buch - RPI

S p aS | A Apa B p aa
21

Remove useless productions

Second: Find all variables


reachable from

Use a Dependency Graph

S p aS | A Apa B p aa

B
not reachable

Courtesy Costas Buch - RPI

22

Keep only the variables reachable from S


(the rest variables are useless)

Final Grammar

S p aS | A Apa B p aa

S p aS | A Apa

Remove useless productions


Courtesy Costas Buch - RPI 23

Removing All

Step 1: Remove Nullable Variables Step 2: Remove Unit-Productions Step 3: Remove Useless Variables

Courtesy Costas Buch - RPI

24

Normal Forms for Context-free Grammars

Courtesy Costas Buch - RPI

25

Chomsky Normal Form


Each productions has form:

A p BC
variable

or

Apa
terminal

variable

Courtesy Costas Buch - RPI

26

Examples:

S p AS S pa A p SA Apb
Chomsky Normal Form
Courtesy Costas Buch - RPI

S p AS S p AAS A p SA A p aa
Not Chomsky Normal Form
27

Convertion to Chomsky Normal Form


Example:

S p ABa A p aab B p Ac
Not Chomsky Normal Form

Courtesy Costas Buch - RPI

28

Introduce variables for terminals:

Ta , Tb , Tc

S p ABTa S p ABa A p aab B p Ac A p TaTaTb B p ATc Ta p a Tb p b Tc p c


Courtesy Costas Buch - RPI 29

Introduce intermediate variable:

V1

S p ABTa A p TaTaTb B p ATc Ta p a Tb p b Tc p c


Courtesy Costas Buch - RPI

S p AV1 V1 p BTa A p TaTaTb B p ATc Ta p a Tb p b Tc p c


30

Introduce intermediate variable:

V2

S p AV1 V1 p BTa A p TaTaTb B p ATc Ta p a Tb p b Tc p c


Courtesy Costas Buch - RPI

S p AV1 V1 p BTa A p TaV2 V2 p TaTb B p ATc Ta p a Tb p b Tc p c


31

Final grammar in Chomsky Normal Form:

S p AV1 V1 p BTa
Initial grammar

A p TaV2 V2 p TaTb B p ATc Ta p a Tb p b


Courtesy Costas Buch - RPI

S p ABa A p aab B p Ac

Tc p c

32

In general: From any context-free grammar (which doesnt produce P ) not in Chomsky Normal Form we can obtain: An equivalent grammar in Chomsky Normal Form
Courtesy Costas Buch - RPI 33

The Procedure
First remove: Nullable variables Unit productions

Courtesy Costas Buch - RPI

34

Then, for every symbol

a: Ta p a a with Ta

Add production

In productions: replace

New variable:

Ta
35

Courtesy Costas Buch - RPI

Replace any production with

A p C1C2 . Cn A p C1V1 V1 p C2V2 Vn2 p Cn1Cn

New intermediate variables:

Courtesy Costas Buch - RPI

V1, V2 , - ,Vn2

36

Theorem:

For any context-free grammar (which doesnt produce P ) there is an equivalent grammar in Chomsky Normal Form

Courtesy Costas Buch - RPI

37

Observations
Chomsky normal forms are good for parsing and proving theorems

It is very easy to find the Chomsky normal form for any context-free grammar
Courtesy Costas Buch - RPI 38

Greinbach Normal Form

All productions have form:

A p a V1V2 . Vk
symbol variables

k u0

Courtesy Costas Buch - RPI

39

Examples:

S p cAB A p aA | bB | b Bpb
Greinbach Normal Form

S p abSb S p aa
Not Greinbach Normal Form

Courtesy Costas Buch - RPI

40

Conversion to Greinbach Normal Form:

S p aTb STb S p abSb S p aa S p aTa Ta p a Tb p b


Greinbach Normal Form
Courtesy Costas Buch - RPI 41

Theorem:

For any context-free grammar (which doesnt produce P ) there is an equivalent grammar in Greinbach Normal Form

Courtesy Costas Buch - RPI

42

Observations
Greinbach normal forms are very good for parsing

It is hard to find the Greinbach normal form of any context-free grammar


Courtesy Costas Buch - RPI 43

Compilers

Courtesy Costas Buch - RPI

44

Machine Code Program v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } ...... Add v,v,0 cmp v,5 jmplt ELSE THEN: add x, 12,v ELSE: WHILE: cmp x,3 ...
45

Compiler

Courtesy Costas Buch - RPI

Lex

Courtesy Costas Buch - RPI

46

Lex: a lexical analyzer


A Lex program recognizes strings

For each kind of string found the lex program takes an action

Courtesy Costas Buch - RPI

47

Output Input Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Identifier: Var Operand: = Integer: 12 Operand: + Integer: 9 Semicolumn: ; Keyword: if Parenthesis: ( Identifier: test ....
48

Lex
program

Courtesy Costas Buch - RPI

In Lex strings are described with regular expressions

Lex program
Regular expressions + - = if then /* operators */

/* keywords */
Courtesy Costas Buch - RPI 49

Lex program
Regular expressions (0|1|2|3|4|5|6|7|8|9)+ /* integers */

(a|b|..|z|A|B|...|Z)+

/* identifiers */
50

Courtesy Costas Buch - RPI

integers

(0|1|2|3|4|5|6|7|8|9)+

[0-9]+

Courtesy Costas Buch - RPI

51

identifiers

(a|b|..|z|A|B|...|Z)+

[a-zA-Z]+

Courtesy Costas Buch - RPI

52

Each regular expression has an associated action (in C code)

Examples:
Regular expression \n [0-9]+ [a-zA-Z]+ Action linenum++; prinf(integer); printf(identifier);
Courtesy Costas Buch - RPI 53

Default action:

ECHO;

Prints the string identified to the output

Courtesy Costas Buch - RPI

54

A small lex program


%% [ \t\n] [0-9]+ [a-zA-Z]+ ; /*skip spaces*/

printf(Integer\n); printf(Identifier\n);

Courtesy Costas Buch - RPI

55

Input 1234 test 78

Output Integer Identifier Identifier Integer Integer Integer

var 566 9800

Courtesy Costas Buch - RPI

56

%{ int linenum = 1; %} %% [ \t] \n [0-9]+ [a-zA-Z]+

Another program

; /*skip spaces*/ linenum++; prinf(Integer\n); printf(Identifier\n); printf(Error in line: %d\n, linenum); Courtesy Costas Buch - RPI 57

Input 1234 test 78

Output Integer Identifier Identifier Integer Integer Integer Error in line: 3 Identifier
Courtesy Costas Buch - RPI 58

var 566 9800 + temp

Lex matches the longest input string Example: Regular Expressions if ifend

Input: Matches:

ifend ifend

if if

Courtesy Costas Buch - RPI

59

Internal Structure of Lex

Lex
Regular expressions NFA DFA Minimal DFA

The final states of the DFA are associated with actions


Courtesy Costas Buch - RPI 60

Compiler
Lexical analyzer parser

input program
Courtesy Costas Buch - RPI

output machine code


61

A parser knows the grammar of the programming language

Courtesy Costas Buch - RPI

62

Parser
PROGRAM p STMT_LIST STMT_LIST p STMT; STMT_LIST | STMT; STMT p EXPR | IF_STMT | WHILE_STMT | { STMT_LIST } EXPR p EXPR + EXPR | EXPR - EXPR | ID IF_STMT p if (EXPR) then STMT | if (EXPR) then STMT else STMT WHILE_STMTp while (EXPR) do STMT
Courtesy Costas Buch - RPI 63

The parser finds the derivation of a particular input derivation input 10 + 2 * 5 Parser E -> E + E |E*E | INT E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5
64

Courtesy Costas Buch - RPI

derivation tree derivation E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5 E 10 E 2


Courtesy Costas Buch - RPI

E + E * E 5
65

derivation tree E E 10 E 2 * E 5
Courtesy Costas Buch - RPI 66

machine code E mult a, 2, 5 add b, 10, a

Parsing

Courtesy Costas Buch - RPI

67

Parser input string grammar derivation

Courtesy Costas Buch - RPI

68

Example:

Parser input

S p SS S p aSb S p bSa S pP
Courtesy Costas Buch - RPI

derivation ?

aabb

69

Exhaustive Search

S p SS | aSb | bSa | P
Phase 1:

S SS S aSb S bSa S P

Find derivation of

aabb

All possible derivations of length 1


Courtesy Costas Buch - RPI 70

S SS S aSb S bSa S P

aabb

Courtesy Costas Buch - RPI

71

Phase 2

S p SS | aSb | bSa | P aabb

S SS SSS
Phase 1

S SS S aSb

S SS aSbS S SS bSaS S SS S S aSb aSSb S aSb aaSbb S aSb abSab S aSb ab


Courtesy Costas Buch - RPI

72

Phase 2

S p SS | aSb | bSa | P aabb

S SS SSS S SS aSbS S SS S S aSb aSSb S aSb aaSbb


Phase 3

S aSb aaSbb aabb


Courtesy Costas Buch - RPI 73

Final result of exhaustive search (top-down parsing) Parser

S p SS
input

aabb

S p aSb S p bSa S pP
derivation

S aSb aaSbb aabb


Courtesy Costas Buch - RPI 74

Time complexity of exhaustive search Suppose there are no productions of the form

ApP

Ap B
Number of phases for string

w:

2| w|

Courtesy Costas Buch - RPI

75

For grammar with

k rules

Time for phase 1:

k possible derivations

Courtesy Costas Buch - RPI

76

Time for phase 2:

possible derivations

Courtesy Costas Buch - RPI

77

Time for phase

2 | w |:

2|w|

2|w| possible derivations k

Courtesy Costas Buch - RPI

78

Total time needed for string

w:

k  k .  k

2|w|

phase 1

phase 2

phase 2|w|

Extremely bad!!!
Courtesy Costas Buch - RPI 79

There exist faster algorithms for specialized grammars S-grammar:

A p ax
string of variables

symbol

Pair

( A, a ) appears once
80

Courtesy Costas Buch - RPI

S-grammar example:

S p aS S p bSS S pc
Each string has a unique derivation

S aS abSS abcS abcc


Courtesy Costas Buch - RPI 81

For S-grammars: In the exhaustive search parsing there is only one choice in each phase Time for a phase:

Total time for parsing string


Courtesy Costas Buch - RPI

w:

| w|
82

For general context-free grammars:

There exists a parsing algorithm that parses a string | w | 3 in time | w |

Courtesy Costas Buch - RPI

83

You might also like