You are on page 1of 10

Syntax-Directed Translation 1.

Grammar symbols are associated with attributes to associate information with the programming language constructs that they represent. 2. Values of these attributes are evaluated by the semantic rules associated with the production rules. 3. Evaluation of these semantic rules: may generate intermediate codes may put information into the symbol table may perform type checking may issue error messages may perform some other activities in fact, they may perform almost any activities.

4. An attribute may hold almost anything. a string, a number, a memory location, a complex record 1. When we associate semantic rules with productions, we use two notations: Syntax-Directed Definitions Translation Schemes

Syntax-Directed Definitions: give high-level specifications for translations hide many implementation details such as order of evaluation of semantic actions. We associate a production rule with a set of semantic actions, and we do not say when they will be evaluated. Translation Schemes: indicate the order of evaluation of semantic actions associated with a production rule. In other words, translation schemes give a little bit information about implementation details.

Syntax-Directed Definitions 1. A syntax-directed definition is a generalization of a context-free grammar in which: Each grammar symbol is associated with a set of attributes. This set of attributes for a grammar symbol is partitioned into two subsets called synthesized and inherited attributes of that grammar symbol.

Each production rule is associated with a set of semantic rules. Semantic rules set up dependencies between attributes which can be represented by a dependency graph.

2. This dependency graph determines the evaluation order of these semantic rules. 3. Evaluation of a semantic rule defines the value of an attribute. But a semantic rule may also have some side effects such as printing a value.

Dependency Graph for synthesized

A Dependency Graph Inherited Attributes

Annotated Parse Tree 1. A parse tree showing the values of attributes at each node is called an annotated parse tree. 2. The process of computing the attributes values at the nodes is called annotating (or decorating) of the parse tree. Of course, the order of these computations depends on the dependency graph induced by the semantic rules Annotated Parse Tree Example

Syntax-Directed Definition In a syntax-directed definition, each production A b=f(c1,c2, ,cn) Where f is a function and b can be one of the followings: b is a synthesized attribute of A and c1,c2, ,cn are attributes of the grammar symbols in the production ( A OR b is an inherited attribute one of the grammar symbols in (on the right side of the production), and c1,c2, ,cn are attributes of the grammar symbols in the production ( A Attribute Grammar So, a semantic rule b=f(c1,c2, ,cn) indicates that the attribute b depends on attributes c1,c2, ,cn. In a syntax-directed definition, a semantic rule may just evaluate a value of an attribute or it may have some side effects such as printing values. An attribute grammar is a syntax-directed definition in which the functions in the semantic rules cannot have side effects (they can only evaluate values of attributes). ). ). is associated with a set of semantic rules of the form:

Syntax-Directed Definition Example Production L E E T T F F E return E1 + T T T1 * F F (E) digit Semantic Rules print(E.val) E.val = E1.val + T.val E.val = T.val T.val = T1.val * F.val T.val = F.val F.val = E.val F.val = digit.lexval

1. Symbols E, T, and F are associated with a synthesized attribute val. 2. The token digit has a synthesized attribute lexval (it is assumed that it is evaluated by the lexical analyzer). Syntax-Directed Definition Example2 Production E E T T F F E1 + T T T1 * F F (E) id Semantic Rules E.loc=newtemp(), E.code = E1.code || T.code || add E1.loc,T.loc,E.loc E.loc = T.loc, E.code=T.code T.loc=newtemp(), T.code = T1.code || F.code || mult T1.loc,F.loc,T.loc T.loc = F.loc, T.code=F.code F.loc = E.loc, F.code=E.code F.loc = id.name, F.code=

1. Symbols E, T, and F are associated with synthesized attributes loc and code. 2. The token id has a synthesized attribute name (it is assumed that it is evaluated by the lexical analyzer). 3. It is assumed that || is the string concatenation operator. Syntax-Directed Definition Inherited Attributes Production D T T L L TL int real L1 id id Semantic Rules L.in = T.type T.type = integer T.type = real L1.in = L.in, addtype(id.entry,L.in) addtype(id.entry,L.in)

1. Symbol T is associated with a synthesized attribute type. 2. Symbol L is associated with an inherited attribute in. Syntax Trees 1. Decoupling Translation from Parsing-Trees. 2. Syntax-Tree: an intermediate representation of the compiler s input. 3. Example Procedures: mknode, mkleaf

4. Employment of the synthesized attribute nptr (pointer) PRODUCTION E p E1 + T E p E1 - T EpT T p (E) T p id SEMANTIC RULE E.nptr = mknode( + ,E1.nptr ,T.nptr) E.nptr = mknode( - ,E1.nptr ,T.nptr) E.nptr = T.nptr T.nptr = E.nptr T.nptr = mkleaf(id, id.lexval)

T p num Draw the Syntax Tree

T.nptr = mkleaf(num, num.val) a-4+c

Directed Acyclic Graphs for Expressions

S-Attributed Definitions 1. Syntax-directed definitions are used to specify syntax-directed translations. 2. To create a translator for an arbitrary syntax-directed definition can be difficult. 3. We would like to evaluate the semantic rules during parsing (i.e. in a single pass, we will parse and we will also evaluate semantic rules during the parsing). 4. We will look at two sub-classes of the syntax-directed definitions: S-Attributed Definitions: only synthesized attributes used in the syntax-directed definitions. L-Attributed Definitions: in addition to synthesized attributes, we may also use inherited attributes in a restricted fashion. To implement S-Attributed Definitions and L-Attributed Definitions we can evaluate semantic rules in a single pass during the parsing. 5. Implementations of S-attributed Definitions are a little bit easier than implementations of L-Attributed Definitions L-Attributed Definitions S-Attributed Definitions can be efficiently implemented. We are looking for a larger (larger than S-Attributed Definitions) subset of syntax-directed definitions which can be efficiently evaluated. L-Attributed Definitions L-Attributed Definitions can always be evaluated by the depth first visit of the parse tree.

This means that they can also be evaluated during the parsing A syntax-directed definition is L-attributed if each inherited attribute of Xj, where 1ejen, on the right side of A X1X2...Xn depends only on:

1. The attributes of the symbols X1,...,Xj-1 to the left of Xj in the production and 2. the inherited attribute of A Every S-attributed definition is L-attributed, the restrictions only apply to the inherited attributes (not to synthesized attributes). Translation Schemes In a syntax-directed definition, we do not say anything about the evaluation times of the semantic rules (when the semantic rules associated with a production should be evaluated?). A translation scheme is a context-free grammar in which: attributes are associated with the grammar symbols and semantic actions enclosed between braces {} are inserted within the right sides of productions. Ex: A { ... } X { ... } Y { ... } Semantic Actions When designing a translation scheme, some restrictions should be observed to ensure that an attribute value is available when a semantic action refers to that attribute. These restrictions (motivated by L-attributed definitions) ensure that refer to an attribute that has not yet computed. In translation schemes, we use semantic action terminology instead of semantic rule terminology used in syntax-directed definitions. The position of the semantic action on the right side indicates when that semantic action will be evaluated. Translation Schemes for S-attributed Definitions If our syntax-directed definition is S-attributed, the construction of the corresponding translation scheme will be simple. Each associated semantic rule in a S-attributed syntax-directed definition will be inserted as a semantic action into the end of the Production E E1 + T Semantic Rule E.val = E1.val + T.val a production of a syntax directed definition E E1 + T { E.val = E1.val + T.val } the production of the corresponding translation scheme right side of the associated production. a semantic action does not

A Translation Scheme Example A simple translation scheme that converts infix expressions to the corresponding postfix expressions. E R R T TR + T { print( + ) } R1 I id { print(id.name) } a+b+c ab+c+

infix expression postfix expression

Inherited Attributes in Translation Schemes If a translation scheme has to contain both synthesized and inherited attributes, we have to observe the following rules: 1. An inherited attribute of a symbol on the right side of a production must be computed in a semantic action before that symbol. 2. A semantic action must not refer to a synthesized attribute of a symbol to the right of that semantic action. 3. A synthesized attribute for the non-terminal on the left can only be computed after all attributes it references have been computed (we normally put this semantic action at the end of the right side of the production). 4. With a L-attributed syntax-directed definition, it is always possible corresponding translation scheme which satisfies possible for a general syntax-directed translation). to construct a

these three conditions (This may not be

A Translation Scheme with Inherited Attributes D T id { addtype(id.entry,T.type), L.in = T.type } L T T L L int { T.type = integer } real { T.type = real } id { addtype(id.entry,L.in), L1.in = L.in } L1 I

This is a translation scheme for an L-attributed definitions.

What is Three Address Code?  A statement of the form x = y op z is a three address statement. x, y and z here are the three operands and op is any logical or arithmetic operator.  Here three address refers to the three addresses in the statement viz addresses of x , y and z.  Say for example we have a statement a=b+c*d Then three address code for this statement is :t1 = c * d a = b + t1 Types :There are different types of three address statements. Some of them are 1. Assignment statements. They are of the form x := y op z where op is a binary arithmetic or logical operation. Assignment Instructions. They are of the form x := op y where op is an unary operation like unary plus, unary minus shift etc 2.Copy statements. They are of the form x := y where the value of y is assigned to x. 3.Unconditional Jump goto L. The three address statement with label L is the next to be executed. Conditional Jumps such as if x relop y goto L. This instruction applies a relational operator (<,>,<=,>=) to x and y and executes the statement with label L if the conditional statement is satisfied. Else the statement following if x relop y goto L is executed 4.param x and call p,n for procedure calls and return y where y representing a returned value (optional).Three Address statements for it are as follows. param x1 param x2 param x3 . param xn call p,n generated as a part of the three address code for call of the procedure p(x1,x2,x3,....xn) where n are the number of variables being sent to the procedure 5.Indexed assignments of the form x: = y[ i ] and x [ i ]: = y. The first of these sets x to the value in the location i memory units beyond location y. The statement x[i]:=y sets the contents of the location i units beyond x to the value of y 6.Address and pointer assignments of the form x:= &y, x:= *y and *x: = y. The first of these sets the value of x to be the location of y.

Implementations of three-Address Statements


A three-address statement is an abstract form of intermediate code. In a compiler, these statements can be implemented as records with fields for the operator and the operands. Three such representations are quadruples, triples, and indirect triples.

Quadruples
A quadruple is a record structure with four fields, which we call op, arg l, arg 2, and result. The op field contains an internal code for the operator. The three-address statement x:= y op z is represented by placing y in arg 1. z in arg 2. and x in result. Statements with unary operators like x: = y or x: = y do not use arg 2. Operators like param use neither arg2 nor result. Conditional and unconditional jumps put the target label in result. the assignment a: = b+ c + b i c. They are obtained from the three-address code in following. The contents of fields arg 1, arg 2, and result are normally pointers to the symbol-table entries for the names represented by these fields. If so, temporary names must be entered into the symbol table as they are created.

op uminus * uminus * + := Triples

Arg1 c b c b t2 t5

Arg2

Result t1

t1

t2 t3

t3 t4

t4 t5 a

To avoid entering temporary names into the symbol table. If we do so, three-address statements can be represented by records with only three fields: op, arg 1 and arg2, .The fields arg l and arg2, for the arguments of op, are either pointers to the symbol table (for programmer- defined names or constants) or pointers into the triple structure (for temporary values). Since three fields are used, this intermediate code format is known as triples.

op (0) (1) (2) (3) (4) (5) uminus * uminus * + :=

Arg1 c b c b (1) a

Arg2

(0)

(2) (3) (4)

Indirect Triples
Another implementation of three-address code that has been considered is that of listing pointers to triples, rather than listing the triples themselves. This implementation is naturally called indirect triples. For example, let us use an array statement to list pointers to triples in the desired order. Then the triples

op (11) (12) (13) (14) (15) (16) uminus * uminus * + :=

Arg1 c b c b (12) a

Arg2

Triple (0)

Statements (11) (12) (13) (14) (15) (16)

(11)

(1) (2)

(13) (14) (15)

(3) (4) (5)

The difference between triples and quadruples may be regarded as a matter of how much indirection is present in the representation. When we ultimately produce target code, each name, temporary or programmerdefined, will be assigned some run-time memory location. This location will be placed in the symbol-table entry for the datum. Using the quadruple notation, a three- address statement defining or using a temporary can immediately access the location for that temporary via the symbol table. Syntax-Directed Translation Into Three-Address In general, when generating three-address statements, the compiler has to create new temporary variables (temporaries) as needed. We use a function newtemp( ) that returns a new temporary each time it is called. Recall Topic-2: when talking about this topic The syntax-directed definition for E in a production id := E has two attributes:

1. E.place - the location (variable name or offset) that holds the value corresponding to the nonterminal 2. E.code - the sequence of three-address statements representing the code for the nonterminal term ::= ID { term.place := ID.place ; term.code = term1 ::= term2 * ID {term1.place := newtemp( ); term1.code := term2.code || ID.code ||* gen(term1.place := term2.place * ID.place} expr ::= term { expr.place := term.place ; expr.code := term.code } expr1 ::= expr2 + term { expr1.place := newtemp( ) expr1.code := expr2.code || term.code ||+ gen(expr1.place := expr2.place + term.place } Syntax tree vs. Three address code }

Expression: (A+B*C) + (-B*A) B

Three address code is a linearized representation of a syntax tree (or a DAG) in which explicit names (temporaries) correspond to the interior nodes of the graph. DAG vs. Three address code Expression: D = ((A+B*C) + (A*B*C))/ -C

You might also like