Professional Documents
Culture Documents
Translator
A translator is a program that takes a program
written in one programming language as input and
produces a program in another language as output. If the
source language is a high level language and the object
language is a low level language , then such a translator is
called a compiler.
Source Object
Compiler
Program Program
Analysis of Source Program
Lexical Analyzer
token stream
Syntax Analyzer
Syntax tree
Semantic Analyzer
Syntax tree
Intermediate code generator
-The lexical analyzer reads the stream of characters from the source
program and groups the characters into meaningful sequences
called lexemes.
<id,1> +
<id,2>
*
<id,3> 60
Semantic Analysis
The semantic analyzer uses the syntax tree and the information in the
SYMTAB to check the source program for semantic consistency with
the language definition.
It also gathers type information and saves it in either the syntax tree
or the SYMTAB, for subsequent use during intermediate code
generation.
Eg. t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
Code Optimization
The machine independent code optimization phase
attempts to improve the intermediate code so that
better target code will result.
A simple intermediate code generation algorithm
followed by code optimization is a reasonable way
to generate good target code.
The optimizer can deduce that the conversion of 60
from int to float can be done once. So the
inttofloat operation can be eliminated by replacing
int 60 by float 60.0
Eg. t1 = id3 * 60.0
id1 = id2 * t1
Code Generation
The code generator takes as input an
intermediate representation of the source
program and maps it to the target language.
If the target language is machine code,
registers or memory locations are selected for
each of the variables used by the program.
Then the intermediate instructions are
translated into sequences of machine
instructions.
Lexical Analyzer
Syntax Analyzer
<id,1> +
<id,2>
*
<id,3> 60
Semantic Analyzer
=
<id,1> +
<id,2>
*
<id,3> inttofloat(60)
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
code optimizer
t1 = id3 * 60.0
id1 = id2 * t1
Code Generator
Symbol Table
Tasks(Role) of Lexical Analyzer
Identification of Lexemes
Removal of comments and white
spaces(blank,newline,tab etc)
Correlating error messages generated by
the compiler with the source program.
Lexical Analyzer processes.
a) Scanning consists (simple processes) that do not
require tokenization of the input, such as deletion
of comments and compaction of consecutive white
spaces into one.
b) Lexical analysis(complex process) where scanner
produces the sequence of tokens as output.
Tokens, Patterns and Lexemes
E = M * C * * 2 eof
lexemeBegin forward
Input Buffering…
lexemeBegin forward
OPERATION DEFINITION
union of L and M L U M = {s | s is in L or s is in M}
written L U M
concatenation of L LM = {st | s is in L and t is in M}
and M written LM
Kleene closure of L L*= Li
written L* i 0
Regular Expression
A notation that allows us to define a pattern in
a high level language.
Regular language
Each regular expression r denotes a language
L(r) (the set of sentences relating to the regular
expression r)
Notes: Each word in a program can be expressed in a
regular expression
Eg. Suppose we want to describe the set of valid
C identifiers.
If letter_ stand for any letter or the underscore,
and digit_ stands for any digit, then we would
describe the language of C identifiers by :
letter_(letter_ | digit)*
The | means union.
( ) are used to group sub expressions.
* means “zero or more occurrences of”
The juxtaposition of letter_ with the remainder
of the expression signifies concatenation.
Rules for constructing regular expressions
The regular expressions are built recursively out of smaller regular
expressions using the following rules.
r = r
r = r Is the identity element for concatenation
45
Transition Diagrams
53
There are two ways that we can handle
reserved words that look like identifiers:
1) Install the reserved words in the
symbol table initially.
When we find an identifier, a call to installlD
places it in the symbol table if it is not
already there and returns a pointer to the
symbol-table entry for the lexeme found.
Any identifier not in the symbol table during
lexical analysis cannot be a reserved word, so
its token is id.
1/22/201
54
8
The function getToken examines the
symbol table entry for the lexeme found,
and returns whatever token name the
symbol table says this lexeme represents
— either id or one of the keyword tokens
that was initially installed in the table.
2.) Create separate transition diagrams for
each keyword.
nonlet/dig *
start t h e n
start
56
A transition diagram for whitespace
delim
letter or digit
State 2 : RETRACT( );
return (id, INSTALL( ) )
Blank/
Start B E G 3 I N newline 6 * return(1,
0 1 2 4 5
)
Blank/
*
E 7 N 8 D 9 newline10 return(2,
)
Blank/
L S 12 E newline14* return(5,
11 13
)
Blank/ *
I 15 F newline 17 return(3,
16
)
Blank/ *
T 18 H 19 E 20 N 21 newline 22 return(4,
)
Identifier :
letter or digit
Not
Start letter letter or *
23 24 25 return(6, INSTALL( )
digit
)
constant :
digit
Not
Start *
= or 31 return(8,1)
29 < 30
>
= 32 return(8, 2)
> 33 return(8,4 )
*
= 34 return(8,3 )
*
> 35 not = 36 return(8,5)
= 37 return(8,6)
Regular Expressions
String
It is a finite sequence of symbols
Eg: 001, 10101,….
Operations with string
Suffix
of x is obtained by discarding 0 or more leading symbols
of x
Eg: cde, e, …… represent the suffix of abcde
Substring
of x is obtained by deleting a prefix and suffix from x
Eg: cd, abc, de, abcde …represent the substring of
abcde
All suffix and prefix will be a substring, but the substring
need
not be a suffix or prefix
Є and x are prefixes, suffixes, and substring of x
Language
It is the set of strings formed from specific alphabet
If L & M are two languages, the possible operations are
Concatenation
Concatenation of L & M is denoted as L.M and can be
found by selecting a string x from L and y from M and joining
them in that order
LM = {xy x is in L and y is in M}
ФL = LФ = Ф
Exponentiation
Li = LLLLL…… L (i times)
L0 = {Є}, {Є}L = L{Є}=L
Union
LUM = {x x is in L or x is in M}
ФUL = LUФ = L
Closure
∞
‘*’ denotes ‘0 or more instances of’; L* = U Li
i=0
Eg: let L = { aa }
L* is all strings of even number of a’s
L0 = {Є} L1 = { aa } L2 = { aaaa } …….
Language Recognizer
How it works?
a
1 2
ε
start 0
b
ε b
3 4
Algorithm to construct an NFA from a Regular Expression
N1
є є
f
i'i
є є
N2
There is a transition on ε from the new initial state to the initial states of
N1 and N2.There is an ε-transition from the final states of N1 and N2 to
the new final state f. Any path from i to f must go through either N1 or
4. NFA for “R1R2”
i N1 N2 f
The initial state of N2 is identified with the accepting state of N1. A path
from i to f must go first through N1, then through N2.
5. NFA for “R1* ”
є є
i N1 f
R11
R9
R10
R7
R8
R5 b
R6
b
R4
*
a
( R3 )
R1 R2
/
a b
R1= a R2= b
a b
N1 : 2 3 N2 : 4 5
R3=
R1/R2
a
є 2 3 є N4 : R4= (R3) is same as
N3 : N3
1 6
є b є
4 5
R5= (R4)*
N5 : є
a
є 2 3 є
є є
0 1 6 7
є b є
4 5
є
R6= a
a
N6 : 7' 8
R7= R5R6 є
N7 :
a
є 2 3 є
є є a
0 1 6 7 8
є b є
4 5
є
R8= b
b
N8 : 8' 9
R9= R7R8 є
N9 :
a
є 2 3 є
є є a b
0 1 6 7 8 9
є b є
4 5
є
R10 = b
b
N10 : 9' 10
R11= R9R10 є
N11 :
a
є 2 3 є
Start є є a b b
0 1 6 7 8 9 10
є b є
4 5
є
Deterministic Automata (DFA)
For each NFA, we can find a DFA accepting the same language.
є
a
є 2 3 є
Start є є a b b
0 1 6 7 8 9 10
є b є
4 5
a D
Start A a a b
b E
C
b
b
Minimizing the number of states
Algorithm
– Input: a NFA N.
– output: a DFA D accepting the same language
Let us define the function ε-CLOSURE(s) to be the set of
states of N built by applying the following rules:
1. S is added to Є-closure (s)
2. If ‘t’ is in Є-CLOSURE (s), and there is an edge labeled Є
from ‘t ‘ to ‘u’, then ‘u’ is added to Є-CLOSURE(s) if ‘u’ is not
already there. Rule 2 is repeated until no more states can be
added to Є-CLOSURE(s) .
Thus,ε-CLOSURE(s) is the set of states that can be reached
from s on ε-transitions only. If T is a set of states, then ε-
CLOSURE(T) is the union over all states s in T of ε-
CLOSURE(s).
Constructing DFA from NFA
Algorithm Є- CLOSURE
Algorithm
– Input: a DFA M
– output: a minimum state DFA M’
• If some states in M ignore some inputs, add transitions
to a “dead” state.
• Let P = {accepting state, All nonaccepting states}
• Let P’ = {}
• Loop: for each group G in P do
Partition G into subgroups so that s and t (in G) belong
to the same subgroup if and only if each input
‘a’,states s and t have transitions to states in the
same group of P
put those subgroups in P’
if (P != P’) goto loop
• Remove any dead states and unreachable states.
NFA to DFA Example-2
a -closure({0}) = {0,1,3,7}
1 2 subset({0,1,3,7},a) = {2,4,7}
subset({0,1,3,7},b) = {8}
start
0 3
a
4
b
5
b
6
a
-closure({2,4,7}) = {2,4,7}
b
subset({2,4,7},a) = {7}
subset({2,4,7},b) = {5,8}
7 b 8
-closure({8}) = {8}
subset({8},a) =
subset({8},b) = {8}
-closure({7}) = {7}
subset({7},a) = {7}
subset({7},b) = {8}
----------------------
b
DFA states
A = {0,1,3,7} a3
B = {2,4,7} C
b a
C = {8} b b
D = {7} start
E = {5,8} A D
F = {6,8}
a a
b b
B E F
a1 a3 a2 a3
Minimizing the Number of
States of a DFA
C
b a
b a
start a b b start a b b
A B D E A B D E
a a
a
a b a
A language for specifying Lexical Analyzers
Auxiliary Definitions