You are on page 1of 24

3/7/15

CS 4110 Compiler Design


Semantic Analysis

http://www.labouseur.com/courses/compilers/compilers/alan/

Instructor: Jiaofei (Fay) Zhong, Email: jiaofei.zhong@csueastbay.edu

Syntax Directed Definitions

3/7/15

Semantic Analysis
Beyond context free grammar
Is
Is
Is
Is

x declared before it is used?


x declared but never used?
an expression type consistent?
an array reference in bounds?

Choices

Use context sensitive grammars


Hard to define and costly to use
Use attribute grammar
Can help to some extend
Use ad hoc methods
Mostly required

Attribute Grammar
Augment context free grammar with rules
Each grammar symbol has an associated set of attributes
The attributes are evaluated by the semantic rules
attached to the productions
For example
Semantic rules: The rules you added to yacc input file for

building a parse tree


Attributes: The pointers for each grammar symbols in the stack

Syntax directed translation


Use attribute grammar for semantic analysis
Type checking
Generate intermediate code or representation

3/7/15

Attribute Grammar
Example
Expression evaluation
LE
print(E.val)
E E1 + T
E.val := E1.val + T.val
ET
E.val := T.val
T T1 * F
T.val := T1.val * F.val
TF
T.val := F.val
F(E)
F.val := E.val
F digit
F.val := digit.lexval
By the way, now you see why we do not like to do left recursion elimination
and left factoring. It will be more difficult to write attribute rules for an
incomprehensible grammar.
5

Input: 2 * 3 + 4

E E.val = 10
E.val = 6 E

T.val = 6 T

T.val = 2 T

F.val = 2 F

digit = 2 digit

T
F

F.val = 3

digit = 3

digit

T.val = 4
F.val = 4

digit

LE
E E1 + T
E T
T T1 * F
TF
F(E)
F digit

digit = 4

print(E.val)
E.val := E1.val + T.val
E.val := T.val
T.val := T1.val * F.val
T.val := F.val
T.val := E.val
F.val := digit.lexval

3/7/15

Semantics and Attributed


Grammar
Semantics
Give the meaning of the language
What is this?

S
(T)

a! F

S (T),
TFTN|N
F a! | b!, N num
Input: (a! b! a! 5 3 2 6)

b!

N
N

a!

With the attributed grammar

S (T)
TN
T F T1 N
F a!
F b!
N num

5
print (T.list)
T.list := [N.val]
if (F.op = a) T.list := append (T1.list, N.val)
if (F.op = b) T.list := removeLastN (T1.list, N.val)
F.op := a
F.op := b
Remove last N.val
N.val := num
elements from T .list
1

Semantics and Attributed


Grammar
Semantics
Give the meaning of the language
What is this?
S (T),
TFTN|N
F a! | b!, N num
Input: (a! b! a! 5 3 2 6)

(T)

a! F

b!

With the attributed grammar

S (T)
T F T1 N
TN
F a!
F b!
N num

N
N

a!

print (T.val)
5
T.val := T1.val F.op N.val
T.val := N.val
F.op := +
Attributed grammar defines the
F.op := *
meaning for the production rules
N.val := num
Which gives the meaning for the

input program
8

3/7/15

Build Parse Tree


Parse tree construction
Can be done with only synthesized attributes:
child: first child pointer
sib: right sibling pointer
addr: parse tree node address

S AB
A CdE
AF
B EF

S.addr = new-node(S)
S.child = A.addr; A.sib = B.addr;
A.addr = new-node(A); d.addr = new-node(d);
A.child = C.addr; C.sib = d.addr; d.sib = E.addr;
A.addr = new-node(A)
A.child = F.addr;
B.addr = new-node(B)
B.child = E.addr; E.sib = F.addr

Attribute Grammar
Two types of attributes
Synthesized attributes

Attribute values are evaluated

from bottom up
The value of the parent is defined
on the values of the children

Inherited attributes

from top down

Attribute values are evaluated


The value of a node is defined on

the values of its parent and/or siblings

Attribute rules
Only reference local information
only refer to symbols in the corresponding production
10

3/7/15

S-Attribute Grammar
Only has synthesized attributes
Suitable for shift-reduce parsing
Example

Derive parents value


Expression evaluation
from childrens values
LE
print(E.val)
E
E E1 + T
E.val := E1.val + T.val
E + T
ET
E.val := T.val
T T1 * F
T.val := T1.val * F.val
F
T
TF
T.val := F.val
T * F digit
F(E)
F.val := E.val
F digit
F.val := digit.lexval
F digit
lexval:
value from lex

11

digit

Inherited Attributes
Ls attribute value type depends
on left sibling Ts type

Example
Adding type to symbol table

D TL

L.type := T.type

T int

T.type := integer

T real

T.type := real

L L1, id

L1.type := L.type, addtype (id.entry, L.type)

L id

addtype (id.entry, L.type)

Attribute type is
still synthesized

Id.entry is not defined here


It is the entry to the symbol table

L1s attribute value type


depends on its parent type value
type is an inherited attribute
12

3/7/15

Dependency Graph
Dependency specifies
the evaluation order

T.type = int

D TL
T int
T real
L L1, id
L id

L.type := T.type
T.type := integer
T.type := real
L1.type := L.type, addtype (id.entry, L,type)
addtype (id.entry, L.type)

L1 L1.type = T.type

int

L2 L2.type = L1.type ,

L3.type = L2.type L
3

id addtype(id.entry, L1.type)

id addtype(id.entry, L2.type)

id addtype(id.entry, L3.type)

Input: int id, id, id

13

Dependency Graph
Evaluation based on the dependencies
Circularity check
Dependency graph should be acyclic

Build parse tree


Fist build the parse tree, then evaluate
Cannot be done with parsing, require a

separate evaluation pass

Can we do better?
L-attributed grammar
Parser stack based technique
Marker nonterminals (not covered)
14

Rewrite grammar rules

3/7/15

Left-Attributed Grammar
L-attributed grammar
May have synthesized attributes
Has inherited attributes
For any production: A X1 X2 Xn

Any attribute of Xk only depends on the attributes of X1, X2, , Xk1,

and/or the inherited attributes of A


All the symbols to the left

Earlier inherited attribute example is an L-attributed grammar

D T L
T int
T real
L L1, id
L id

L.type := T.type
T.type := integer
T.type := real
L1.type := L.type, addtype (id.entry, L.type)
addtype (id.entry, L.type)

15

L-Attributed Grammar

Generally, not possible for


the starting symbol to
contribute
to the attribute value

Whats the importance of L-attributed


The needed attribute value has already been evaluated
But how to obtain the value
even though it is already evaluated?

The attribute value


comes from the
left sibling
Already evaluated

D
T.type = int

L1 L1.type = T.type

int

L2 L2.type = L1.type ,

L3.type = L2.type L
3

id
addtype(id.entry, L2.type)

16

id addtype(id.entry, L3.type)

id addtype(id.entry, L1.type)

The attribute value


comes from the parent
More specifically,
the parents left sibling
Also already evaluated

3/7/15

Technique based on Parser Stack


Whats the importance of L-attributed

T.type = real
T is kept in the stack
till the D reduction

The needed attribute value has already been

evaluated

But how to obtain the value even though it is

already evaluated?
From stack

Input

Stack

Action

p, q, r $

real

T real

p, q, r $

shift

, q, r $

Tp

L id

, q, r $

TL

shift

q, r $

TL,

shift

,r$

TL,q

L L, id

,r$

TL

shift

r$

TL,

shift

TL,r

L L, id

TL

D TL

accept

real p, q, r $

Obtain L1.type by:


+ Knowing that T is
below L in the stack
+ Knowing that L.type comes
from the T.type
+ Obtain T.type directly
addtype (id.entry, L.type)
In this case, T is 3 below
D TL
T int
T real
L L1, id
L id

T.type = real

L.type := T.type
T.type := integer
T.type := real
L1.type := L.type, addtype(, L.type)
addtype (, L.type)

shift

Technique based on Parser Stack


L-attributed grammar
DTL
T int
T real
L L1, id
L id

L.type = T.type
T.type = integer
T.type = real
L1.type = L.type, addtype (id.entry, L.type)
addtype (id.entry, L.type)

Convert to parser stack based attributes


DTL
T int
T real
L L, id
L id

-- no longer need to attribute


val[top] := integer
val[top] := real
addtype (val[top], val[top3])
addtype (val[top], val[top1])

18

3/7/15

Inherited Attribute
Example: not an L-attributed grammar Depend on right sibling;
No longer L-attributed
DL:T
L L1, id
L id
T int
T real

L.type = T.type
L1.type = L.type, addtype (id.entry, L.type)
addtype (id.entry, L.type)
T.type = integer
T.type = real
D

Parse p, q, r : int

Inherited attribute

L1

Needs two-pass evaluation


First build the parse tree
Then evaluate

L, 2
L3 ,

T
int

r
q

p
19

Rewrite Grammar
Example:
DL:T

When reduction starts,


Type information is ready.

T int

Force ids to go into stack.


T real This grammar will reduce
L L1, id nothing till T presents.
When reduction starts, type
L id
information is ready.

Rewrite the grammar

20

Input

Stack

p, q: real $

Action
shift (D id L)

, q: real $

shift (L , id L)

q: real $

p,

shift (L , id L)

: real $

p, q

shift (L : T)

real $

p, q:

shift (T real )

p, q: real

reduce (T real)

p, q: T

reduce (L : T)

p, q L

reduce (L , id L)

pL

reduce (D id L)

accept

D id L

id.type := L.type

L , id L1

L.type := L1.type; addtype (id.entry, L1.type)

L:T

L.type := T.type

T int

T.type := integer

T real

T.type := real

10

3/7/15

Issues in Attributed Grammar


Attribute rules are confined to local production info
Only allow references to local symbols
May have a lot of copying

A aB
A.u := B.u + h3(a); print (A.u, B.v);
B bC
B.u := C.u; B.v := C.v + g3(b)
C cD
C.u := D.u + h2(c); C.v := D.v
D dEF
D.u := E.u; D.v := F.v + g2(d)
Ee
E.u := h1(e)
Can something be done to save
Ff
F.v := g1(f)
the copying effort?
Both u and v are synthesized attributes Use global table
A.u is computed from C.u and E.u
Is global attribute worse than
B.v is computed from D.v and F.v
stack technique?
D.u := E.u, C.v := D.v, B.u := C.u are only for value passing

21

Issues in Attributed Grammar

Attribute rules are confined to local production


info

Excessive copying due to production rules in the


grammar
E E1 + T
ET
T T1 * F
TF
F(E)
F digit
E T, T

E.val := E1.val + T.val


E.val := T.val
T.val := T1.val * F.val
T.val := F.val
T.val := E.val
F.val := id.digit
F causes additional copying of val attribute

Can something be done to save the copying effort?


Use global table
22

11

3/7/15

Issues in Attributed Grammar


E

Attribute rules are confined

to local production info

Not able to optimize

Need to build the parse tree T


T

+
T
F

F
b

F
*

Need to have a separate optimization phase


Requires tree traversal

Reuse the result

23

Attribute Grammars
Other works in semantic analysis
Attribute grammar
yacc
Operational semantics
Actions that are executed directly rather than by

translation

E.g., computation for expressions

Denotational semantics
a mathematical formalism for semantic specification
Axiomatic semantics
Mainly for program proof
24

12

3/7/15

Semantic Analysis -- Summary


Read Textbook
Attributes and attribution rules
Synthesized and inherited attributes
S-attributed grammar
L-attributed grammar
Hack to the parser stack
Rewrite grammar rules
Efficiency issues -- Excessive copying

Intermediate code generation


For various types of statements

25

Type Checking
Intermediate Code Generation

13

3/7/15

Symbol Table
Used by all phases of compiler
A data structure to keep track of the binding of the

identifiers
At type checking time
Keep track of identifier types

At code generation time:


Determine memory allocation
Keep track of starting address of the identifier (offsets)

Searched every time an identifier is encountered


Data structure used to implement symbol tables
Binary tree
Hash table
27

Symbol Table
Identifier categories
Variable names
Defined constants
Compiler generated temporary variables
Function names and parameter names
Labels
Type names

Attributes for the symbol table


Category
Data type: array, record, etc.
Other category dependent attributes
Label: location
Variable: type, address, etc.
28

14

3/7/15

Example: Attributes for Record/Structure


Attributes needed
Identifier name
Data type = record
Total size of the record
May be dynamic

Starting address
Element type = (e.g., integer for integer array)
Linked list of the fields
In each field, specify: name of the field, type of the field, size
Offset for each field
Fields may itself be record/structure

29

Type Checking
First handle type definitions
Statement for basic type definitions (discussed earlier)
D id L
L , id L1
L:T
T int
T real

addtype (id.entry, L.type)


L.type := L1.type; addtype (id.entry, L1.type)
L.type := T.type
T.type := integer
T.type := real

Then perform type checking within statements

30

15

3/7/15

Type Definition
Basic Types
Integer, real, Boolean, char, etc.

Compound types
Pointers
Type = pointer (basic-type)

Array
Type = array (dimension)

Structure
Type = structure with the types of all fields

Function
Type = function with the types of all parameters

31

Type Checking
Type checking for the statements from bottom up
L S; L1
L.type if (S.type = void and L1.type = void) then void
else type error

S id := E
S.type := if (id.type = E.type) then void else type error

S if E then S1
S.type := if E.type = Boolean then S1.type else type error

S while E do S1
S.type := if E.type = Boolean then S1.type else type error

S if E then S1 else S2
S.type := if E.type = Boolean then ? else type error
Best can be done is to return the supertype of S1 and S2
32

Require run time type checking if supported

16

3/7/15

Type Checking
Type equivalence
array(a,b)
array(c,d)
In some situations, like parameter passing, they are structurally

equivalent

struct A { int ai; char ac; }


struct B { int bi; char bc; }
A and B are structurally equivalent

Subtypes
Define subtype relation
e.g., A B: A is a subtype of B
A may be accepted anywhere B is expected
33

Intermediate Code Generation


Advantages
Separates machine independent and machine dependent

parts of the compiler


Makes the front-end of the compiler re-targetable

Optimization can be performed on intermediate code


Can be represented in the form to allow easier analysis

Disadvantages
Extra phase in the compilation
If the target machine language is relatively high level,

then this step can be skipped

34

17

3/7/15

Intermediate Code Generation


Common types of intermediate languages
Abstract syntax tree
Similar to parse tree, but without the grammar symbols (only tokens in

the language)

Three address code


Similar to assembly code
Support some high level concepts
Bridge the gap between high level language and assembly
We consider this

35

Three Address Code Statements


Assignment with binary operator
x := y op z
op: arithmetic or logic operators

Assignment with unary operator


x := op y
op: minus sign, logic negation, shift, type conversion

Copy
x := y

Indexed assignment
x := y[i] or x[i] := y or x[i] = y[j]

Address and pointer assignment


x := &y or x := *y or *x := y
36

18

3/7/15

Three Address Code Statements


Unconditional jump
Goto L

Conditional jump
If x rel-op y goto L

Procedure call
Call: p (x1, x2, xn)
Represented as

37

param x1
param x2

param xn
call p, n
n is the number of parameters

Handling Declaration
When entering a procedure
Create a child symbol table

Declaration statements
A new entry for each identifier declared
Compute the relative address of the identifier

38

19

3/7/15

Handling Declaration
Processing declarations in a basic block (e.g., procedure)
PL
L DL | D
D id : T ;

{ offset := 0 }

Initialized offset.
But it will not be executed
till the very end.

{ add (id.name, T.type, offset);


offset := offset + T.width }
T integer
{ T.type := integer; T.width:=4 }
T real
{ T.type := real; T.width:= 8 }
T array[num] of T1 { T.type := array(num.val, T1.type);
T.width := num.val * T1.width }
T ^T1
{ T.type := pointer(T1.type); T.width:= 4 }

offset is a global variable.


Easier to use and less copying.
Need to initialize outside.
39

Handling Statements
In declaration
Create entries in symbol table

In statements
Need to look up the symbol table
lookup(id.name)
Return the pointer to the id entry in the table
If not found, then return nil

Need to generate code


emit()
Output three address code

Sometimes, need to create new variables (temporary variables)


newtemp()
Insert the new variable into symbol table
40

20

3/7/15

Assignment Statement Translation


Bottom up translation
S id := E { p:= lookup(id.name);
if p != nil then emit(p := E.place) else error; }
E E1 + E2 { E.place = newtemp();
emit(E.place := E1.place + E2.place) }
E E1* E2 { E.place = newtemp();
emit(E.place := E1.place * E2.place) }
E E1
{ E.place := newtemp();
emit(E.place := E1.place) }
E (E1)
{ E.place := E1.place; }
E id
{ p := lookup(id.name);
if p != nil then E.place := p else error; }

41

Array Addressing
Array elements are stored consecutively
Allow quick computation of the physical memory location

Compute the address of an array reference


One dimensional array A[n]
Address of A[i]
base + (i low) * w
low: lower bound of the subscript
w: width of each element
Assume low = 0, Address = base + i * w

k-dimensional array A[n0, n1, ..., nk]


Address of A[i0, i1, , ik]
base + (((i0*n1+i1)*n2+i2) )*nk+ik)*w

Assume low = 0

42

21

3/7/15

Array Addressing
Bottom up translation

3 other cases

S L := E

{ if (L.array = true and E.array = false) then


emit (L.place [ L.offset ] := E.place); }

EE+E
E (E)
E L | num
L id
LA

{ L.array := true;
L.place := A.place; L.offset := A.offset }
{}
{}
{}

A Elist]
Elist Elist1, E
Elist id[E
43

Array Addressing
Bottom up translation
LA
A Elist]
Elist Elist1, E

Compute the new offset


E.g., offset := i0 * n1
offset := offset + i1

Elist id[E

arrayPlace, offset,
arraySize, elementWidth
are global variables

offset: computed dynamically


Need a temp variable for it
Initialize it to i0
44

{ L.array := true; L.offset := A.offset;


L.place := A.place; }
{ A.place := arrayPlace; A.offset := offset;
emit (offset := offset * elementWidth; }
{ Elist.dim := Elist1.dim + 1;
emit (offset := offset * arraySize[Elist.dim]);
emit (offset := offset + E.place; }
{ Elist.dim := 0;
0 is the highest dim
arrayPlace := lookup(id.name);
i-th element of
offset := newtemp();
arraySize is the
size of the i-th
p := lookup (id.name, arraryInfo);
dimension
arraySize := p.arrarySize;
elementWidth := p.elementWidth;
emit (offset := E.place); }
size of each

array element

base + (((i0*n1+i1)*n2+i2) )*nk+ik)*w

22

3/7/15

Conditional Branch
if E then S1 else S2
Evaluation of Boolean expression
Similar to evaluation of arithmetic expressions

Code generation for conditional branch


Two methods
Use label
Intermediate code supports labels
Facilitate control flow analysis and code optimization

Use actual location


Same as machine code
Lost the structure information, harder to do optimization

While loop, case statement


Similar to conditional branch
45

Conditional Branch
if E then S1 else S2
if a < b then S1 else S2
Use label
test := a < b
if test goto L
S2
L: S1

-- in case the expression is more complicated


-- separate the evaluation from branch
-- create label L in symbol table
-- assign starting location for L

Use address
Do not know how much space S2 requires
Cannot do it in one path
Back-patching

46

Keep the location requiring backpatching in stack


When the location information is available, come back and fill the info

23

3/7/15

Procedure Call
Just like a program
Handle declaration, but in a child symbol table
Search may go back to parent symbol table

Handle statements

Need to handle parameters


Type checking of the parameters: done in the type checking phase
Create new variables for the parameters and allocate space
Copy the values of the input parameters from parent table to local

table
Record current instruction location
Need to handle return
Copy the output parameters from local table to parent table
47

Jump back to recorded instruction location

Intermediate Code Generation -Summary


Read Textbook

48

24

You might also like