You are on page 1of 73

Chapter 3

Describing Syntax and Semantics


CS 350 Programming Language Design
Indiana University Purdue University
Fort Wayne

Chapter 3 Topics
Introduction
The General Problem of Describing Syntax
Formal Methods of Describing Syntax
Attribute Grammars
Describing the Meanings of Programs: Dynamic
Semantics

Introduction
Who must use language definitions?
Language designers
Implementors
Programmers (the users of the language)

Syntax
The form or structure of the expressions, statements, and
program units
Defines what is grammatically correct

Semantics
The meaning of the expressions, statements, and program units

Describing syntax is easier than describing semantics


3

Some definitions
A sentence is a string of characters over some alphabet
A language is a set of valid sentences
The syntax rules of the language specify which strings of
characters are valid sentences

A lexeme is the lowest level syntactic unit of a language


For example: sum, +, 1234

A token is a category of lexemes


For example: identifier, plus_op, int_literal
Each token may be described by separate syntax rules

Thus we may think of sentences as strings of lexemes


rather than as strings of characters
4

Describing syntax
Syntax may be formally described using
recognition or generation
Recognition involves a recognition device R
Given an input string, R either accepts the string as
valid or rejects it
R is only used in trial-and-error mode
A recognizer is not effective in enumerating all
sentences in a language

Languages are usually infinite

The syntax analyzer part of a compiler (parser) is a


recognizer
5

Describing syntax
Generation
A language generator generates the sentences of a
language
A grammar is a language generator
One can determine if a string is a sentence by
comparing it with the structure given by a generator

Formal methods for describing syntax


Noam Chomsky and John Backus independently
developed similar formalisms in the 1950s
In the mid-1950s, Chomsky identified four classes of
grammars for studying linguistics
Regular grammars

Recognizer Deterministic Finite Automaton (DFA)

Context-free grammars

Recognizer Push-down automaton

Context-sensitive grammars

Recognizer Linear-bounded automaton

Phrase structure grammars

Recognizer Turing machine

The first is useful for describing tokens


Most programming languages can be described by the
second

Formal methods for describing syntax


Context-Free Grammar (CFG)
A language generator
Not powerful enough to describe syntax of natural
languages
Defines a class of programming languages called
context-free languages

Backus-Naur Form (BNF)


Presented in 1959 by John Backus to describe Algol 58
Notation was slightly improved by Peter Naur
BNF is equivalent to Chomskys context-free grammars
8

Formal methods for describing syntax


A meta-language is a language used to describe
another language
BNF is a meta-language for programming
languages
In BNF . . .
A terminal symbol is used to represent a lexeme or a
token
A nonterminal symbol is used to represent a syntactic
class
A production rule defines one nonterminal symbol in
terms of terminal symbols and other nonterminal
symbols
9

Production rule example


The following production rule defines the syntactic
class of a while statement
<while_stmt> while ( <logic_expr> ) <stmt>

The syntactic class being defined is on the left-hand


side of the arrow (LHS)
The text on the right-hand side (RHS) gives the
definition of the LHS
The RHS above consists of 3 terminals (tokens)
and 2 nonterminals (syntactic classes)
Terminals: while, (, and )
Nonterminals: <logic_expr> and <stmt>
10

Formal methods for describing syntax


Nonterminal symbols may have multiple distinct
definitions, as in . . .
<if_stmt> if <logic_expr> then <stmt>
<if_stmt> if <logic_expr> then <stmt> else <stmt>

Alternative form
<if_stmt> if <logic_expr> then <stmt>
if <logic_expr> then <stmt> else <stmt>

More compactly, . . .
<if_stmt> if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt>

The vertical bar | is read or


11

Formal methods for describing syntax


The nonterminal symbol being defined on the LHS
may appear on the RHS
Such a production rule is recursive
Example
Lists can be described using recursion
<identifier_list> identifier
identifier , <identifier_list>

12

Formal methods for describing syntax


A grammar G = ( T, N, P, S ), where

T is a finite set of terminal symbols


N is a finite set of nonterminal symbols
P is a finite nonempty set of production rules
S is a start symbol representing a complete sentence

The start symbol is typically named <program>


Generation of a sentence is called a derivation
Beginning with the start symbol, a derivation
applies production rules repeatedly until a
complete sentence is generated (all terminal
symbols)
13

Formal
methods
for
describing
syntax
An example grammar
<program> <stmts>
<stmts> <stmt> | <stmt> ; <stmts>
<stmt>
<var> = <expr>
<var>
a|b|c|d
<expr>
<term> + <term> | <term> - <term>
<term>
<var> | integer

Any nonterminal appearing on the RHS of a production rule


needs to be defined with a production rule (and thus
appear on the LHS)
In the grammar, integer is a token representing any integer
lexeme
14

Formal methods for describing syntax


An example derivation
<program> => <stmts>
=> <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + integer

The symbol => is read derives


15

Derivation

Every string of symbols in the derivation is called a


sentential form
Including <program>

A sentence is a sentential form that has only


terminal symbols
A leftmost derivation is one in which the leftmost
nonterminal in each sentential form is the one that
is expanded next
A derivation may be leftmost or rightmost or neither
Derivation order has no effect on the language
generated by a grammar

16

Parse Tree

A parse tree is a hierarchical


representation of a derivation
Each internal node is labeled
with a nonterminal symbol
and each leaf is labeled with
a terminal symbol
A grammar is ambiguous if
and only if it generates a
sentential form that has two
or more distinct parse trees

<program>
<stmts>
<stmt>
<var>

<expr>

a <term> +

<term>

<var>

integer

17

Ambiguous grammar example


<expr> <expr> <op> <expr> | int
<op> + | *
<expr>

<expr>

<op>

<expr> <op>

<expr>

int

int

<expr>

<expr>

<expr>

<op>

<expr>

<expr> <op>

int

int

int

<expr>

int

18

Ambiguity
The compiler decides what code to generate
based on the structure of the parse tree
The parse tree indicates precedence the operators
Does it mean ( int + int ) * int or int + ( int * int )

Ambiguity cannot be tolerated


In the case of operator precedence, ambiguity can
be avoided by having separate nonterminals for
operators of different precedence
19

A non-ambiguous grammar
<expr> <expr> + <term> | <term>
<term> <term> * int | int
<expr>
Derivation
<expr>
<term>

<term>
<term> *

<expr> => <expr> + <term>


int

=> <term> + <term>


=> int + <term>

int

int

=> int + <term> * int


=> int + int * int

20

Associativity of operators
Operator associativity can also be indicated by a
grammar
<expr> <expr> + <expr> | int
(ambiguous)
<expr> <expr> + int | int
(unambiguous)
Example: a parse tree using the unambiguous grammar
The unambiguous grammar is
<expr>
<expr>
left recursive and produces
<expr>
+
int
a parse tree in which the order
of addition is left associative
Addition is performed in a
left-to-right manner

<expr>
int

int

21

Dangling-else problem
Consider the grammar
<stmt> | <if_stmt> |
<if_stmt> if <logic_expr> then <stmt>
if <logic_expr> then <stmt> else <stmt>

This grammar is ambiguous since


if <logic_expr> then if <logic_expr> then <stmt> else <stmt>

has distinct parse trees represented by


if <logic_expr> then
if <logic_expr> then
<stmt>
else
<stmt>

and

if <logic_expr> then
if <logic_expr> then
<stmt>
else
<stmt>

22

Dangling-else problem
Most languages match each else with the nearest
preceding elseless if
The ambiguity can be eliminated by developing a
grammar that distinguishes elseless ifs from ifs
with else clauses
See text, page 131

23

Extended BNF (denoted EBNF)


Three abbreviations are added for convenience
Optional parts on the RHS of a production rule can be
placed in brackets
[ <optional> ]

Braces on the RHS indicate that the enclosed part may


be repeated 0 or more times
{ <repeated> }

When a single element must be chosen from a group, the


options are placed in parentheses and separated by
vertical bars
(a|b|c)
24

Extended BNF examples


Brackets
<proc_call> ident [ ( <expr_list> ) ]
Generates: myProcedure and myProcedure( a, b, c )

Braces
<identifier_list> ident { , ident }
Generates: Larry, Curly, Moe

Choice among options


<term> int ( + | - ) int
Generates: 5 + 7 and 5 - 7
25

BNF and EBNF example


BNF:
<expr> <expr> +
<expr> <term>
<term> <term> *
<term> /
<factor>

<term>
<term>
<factor>
<factor>

EBNF:
<expr> <term> { ( + | - ) <term> }
<term> <factor> { ( * | / ) <factor> }
26

Extended BNF
EBNF uses metasymbols |, {, }, (, ), [, and ]
When metasymbols are also terminal symbols in
the language being defined, instances that are
terminal symbols must be quoted
<proc_call> ident [ ( <expr_list> ) ]

When regular BNF indicates that an operator is left


associative, the corresponding EBNF does not
BNF: <sum> <sum> + int
EBNF: <sum> int { + int }

This must be overcome during syntax analysis


27

Extended BNF
Sometimes a superscript + is used as an additional
metasymbol to indicate one or more repetitions
Example: The production rules
<compound_stmt> begin <stmt> { <stmt> } end

and
<compound_stmt> begin { <stmt> }+ end

are equivalent

28

BNF homework assignment


Announced in class

29

Attribute grammars
Context-free grammars (CFGs) cannot describe all
of the syntax of programming languages
Typical example: a variable must be declared before it
can be referenced
Something like this is called a context-sensitive
constraint
Text refers to it as static semantics

30

Attribute grammars
Static semantics refers to the legal form of a
program
This is actually syntax rather than semantics
The term semantics is used because the syntax
check is done during syntax analysis rather than
during parsing
The term static is used because the analysis required
to check the constraint can be done at compile time

31

Attribute grammars (AGs)


An attribute grammar is an extension to a CFG
Concept developed by Donald Knuth in 1968
The additional AG features describe static semantics
These features carry some semantic info along through
parse trees

Additional features
Attributes

Can be assigned values like variables

Attribute computation functions

Specify how attribute values are calculated

Predicate functions

Do the checking

32

Attribute grammars defined

Definition: An attribute grammar is a context-free


grammar G = (T, N, P, S) with the following additions:
For each grammar symbol X there is a set A(X) of
attributes

Some of these are synthesized


These pass information up the parse tree

The remaining attributes are inherited


These pass information down the parse tree

Each production rule has a set of attribute computation


functions that define certain attributes for the nonterminals
in the rule
Each production rule has a (possibly empty) set of
predicate functions to check for attribute consistency
33

Attribute grammars defined


Let X0 X1 ... Xn be a rule
Synthesized attributes are computed with functions
of the form
S(X0) = f(A(X1), ... , A(Xn))
S(X0) depends only X0s child nodes

Inherited attributes for symbols Xj on the RHS are


computed with function of the form
I(Xj) = f(A(X0), ... , A(Xn))
I(Xj) depends on Xj s parent as well as its siblings
34

Attribute grammars defined


Initially, there are synthesized intrinsic attributes
on the leaves
When all attributes of a parse tree have been
computed, the parse tree is fully attributed
Predicate functions for X0 X1 ... Xn are Boolean
functions defined over the attribute set
{A(X0), ... , A(Xn)}
For a program to be correct, every predicate
function for every production rule must be true
Any false predicate function value indicates a
violation of the static semantics of the language
35

Attribute grammars

Example: expressions of the form id + id


id's can be either int_type or real_type
types of the two id's must be the same
type of the expression must match it's expected type

BNF:
<assign> <var> = <expr>
<expr> <var> + <var>
<var>
id

Attributes:
actual_type

Synthesized for <var> and <expr>


Intrinsic for id

expected_type

Inherited for <expr> from <var> in <assign> <var> = <expr>

36

The attribute grammar


Syntax rule: <expr> <var>[1] + <var>[2]
Attribute computation functon:
<expr>.actual_type <var>[1].actual_type
Predicates:
<var>[1].actual_type == <var>[2].actual_type
<expr>.expected_type == <expr>.actual_type
Syntax rule: <var> id
Attribute computation functon:
<var>.actual_type lookup (id.type)
37

Attribute grammars
In what order are attribute values computed?
If all attributes were inherited, the tree could be
decorated in top-down order
If all attributes were synthesized, the tree could be
decorated in bottom-up order
In many cases, both kinds of attributes are used, and it
is some combination of top-down and bottom-up that
must be used
Complex problem in general
May require construction of a dependency graph showing all
attribute dependencies
38

Computation of attributes
For the generated expression: sum + increment
<expr>.expected_type inherited from parent
<var>[1].actual_type lookup (sum.type)
<var>[2].actual_type lookup (increment.type)
<var>[1].actual_type =? <var>[2].actual_type
<expr>.actual_type <var>[1].actual_type
<expr>.actual_type =? <expr>.expected_type

39

Semantics
The meaning of expressions, statements, and
program units is known as dynamic semantics
We consider three methods of describing dynamic
semantics
Operational semantics
Axiomatic semantics
Denotational semantics

40

Operational semantics
Operational semantics describes the meaning of a
language statement by executing the statement on a
machine, either real or simulated
The meaning of the statement is defined by the
observed change in the state of the machine
i.e., the change in memory, registers, etc.

41

Operational semantics
The best approach is to use an idealized, low-level virtual
computer, implemented as a software simulation
Then, build a translator to translate source code to the
machine code of the idealized computer
The state changes in the virtual machine brought about by
executing the code that results from translating a given
statement defines the meaning of the statement
In effect, this describes the meaning of a high-level
language statement in terms of the statements of a
simpler, low-level language
42

Operational semantics example


The C statement
is equivalent to:

for ( expr1; expr2; expr3 ){ }


expr1;
loop: if expr2 = 0 goto out

exp3;
goto loop
out:

The human reader can informally be considered to be the


virtual computer
Evaluation of operational semantics:
Good if used informally (language manuals, etc.)
Based on lower-level languages, not mathematics and logic
43

Operational semantics homework


Assigned in class

44

Axiomatic semantics
Based on formal logic (predicate calculus)
Original purpose: formal program verification
Each statement in a program is both preceded by
and followed by an assertion about program
variables
Assertions are also known as predicates
Assertions will be written with braces { } to
distinguish them from program statements
45

Axiomatic semantics
A precondition is an assertion immediately before a
statement that describes the relationships and constraints
among variables that are true at that point in execution
A postcondition is an assertion immediately following a
statement that describes the situation at that point
Our point of view is to compute the preconditions for a given
statement from the corresponding postconditions
It is also possible to set things up in the opposite direction

A weakest precondition is the least restrictive precondition


that will guarantee the validity of the associated
postcondition
46

Axiomatic semantics
Notation: {P} S {Q}
P is the preconditon
S is a statement
Q is the postcondition

Example
Find the weakest precondition P for: {P} a = b + 1 {a > 1}
One possible precondition: {b > 10}
Weakest precondition:
{b > 0}
47

Axiomatic semantics
If the weakest precondition can be computed for
each statement in a program, then a correctness
proof can be constructed for the program
Start by using the desired result as the
postcondition of the last statement and work
backward
The resulting precondition of the first statement
defines the conditions under which the program
will compute the desired result
If this precondition is the same as the program
specification, the program is correct

48

Axiomatic semantics
Weakest preconditions can be computed using an
axiom or using an inference rule
An axiom is a logical statement assumed to be
true
An inference rule is a method of inferring the truth
of one assertion on the basis of the values of other
assertions
Each statement type in the language must have an
axiom or an inference rule
We consider assignments, sequences, selection,
and loops
49

Assignment statements
Let x=E be a generic assignment statement
An axiom giving the precondition is sufficient in this case:
{Q x E} x = E {Q}
Here the weakest precondition P is given by Q x E
In other words, P is the same as Q with all instances of x replaced
by expression E

For example, consider a = a + b 3 {a > 10}


Replace all instance of a in {a >10} by a+b-3
This gives a+b-3>10, or b>13-a
So, Q x E is { b>13-a }
50

Inference rules
The general form of an inference rule is
S1, S2, S3, , Sn
S

This states that if S1, S2, S3, , and Sn are true, then
the truth of S can be inferred

51

The Rule of Consequence


{P} S {Q}, P' P, Q Q'
{P' } S {Q'}

Here, => means implies


This says that a postcondition can always be weakened
and a precondition can always be strengthened
Thus in the earlier example
the postcondition { a>10 } can be weakened to { a>5 }
the precondition { b>13-a } can be strengthened to
{ b>15-a }
52

Sequence statements
Since a precondition for a sequence depends on the
statements in the sequence, the weakest precondition
cannot be described by an axiom
An inference rule is needed for sequences
Consider the sequence S1;S2 of two statements with
preconditions and postconditions as follows:
{P1} S1 {P2}
{P2} S2 {P3}

The inference rule is:

{P1} S1{P2}, {P2} S2 {P3}


{P1} S1; S2 {P3}

53

Sequence statements example


Consider the following sequence and postcondition
y = 3*x + 1; x = y + 3 { x < 10 }

The weakest precondition for x = y + 3 is { y < 7 }


Since this is the postcondition for y = 3*x + 1, the
weakest precondition for the sequence is { x < 2 }

54

Selection statements
Consider only if-then-else statements
The inference rule is
{ B and P } S1 { Q }, { (not B) and P } S2 { Q}
{ P } if B then S1 else S2 { Q }

Example:

if ( x > 0 ) then y = y - 5 else y = y + 3 { y > 0 }

The precondition for S2 is { x <= 0 } and {y > -3 }


The precondition for S1 is { x > 0 } and {y > 5 }
What is P ?

Note that { y > 5 } => { y > -3 }

By the rule of consequence, P is { y > 5 }


55

Loops
We consider a logical pretest (while) loop
{P} while B do S end {Q}
Computing the weakest precondition is more
difficult than for a sequence because the number of
iterations is not predetermined
An assertion called a loop invariant must be found
A loop invariant corresponds to finding the inductive
hypothesis when proving a mathematical theorem
using induction
56

Loops
The inference rule is
{ I and B } S { I }
{ I } while B do S end { I and (not B) }

where I is the loop invariant


The loop invariant must satisfy each of the following
P => I
{I and B} S {I}
(I and (not B)) => Q
The loop terminates

(the loop invariant must be true initially)


(I is not changed by the body of the loop)
(if I is true and B is false, Q is implied)
(this can be difficult to prove)
57

Example

Consider the loop: { P } while y <> x do y = y + 1 end { y = x }


An appropriate loop invariant is: I = { y <= x }
Let P = {y<=x} be the precondition for the while statement
Then
P => I is true
{ y <= x and y <> x } y = y + 1 { y <= x }
implies { y < x } y = y + 1 { y <= x },
which implies {I and B} S {I}
(I and (not B)) => Q is true because
{ y <= x and not (y <> x) } implies { y = x }, which is just Q
The loop terminates since P guarantees that initially y <= x
58

Loops
The loop invariant I is a weakened version of the
loop postcondition, and it is also a precondition.
I must be weak enough to be satisfied prior to the
beginning of the loop
When combined with the loop exit condition, I must
be strong enough to force the truth of the
postcondition

59

Axiomatic semantics
Evaluation of axiomatic semantics
Developing axioms or inference rules for all of the
statements in a language can be difficult
Axiomatic semantics is . . .
a good tool for correctness proofs
an excellent framework for reasoning about programs

Axiomatic semantics is not as useful for language


users and compiler writers

60

Axiomatic semantics homework


Assigned in class

61

Denotational semantics
Denotational semantics
Is the most rigorous, widely known method for
describing the meaning of programs
Based on recursive function theory

Fundamental concept
Define a mathematical object for each language entity
The mathematical objects can be rigorously defined
and manipulated
Define functions that map instances of the language
entities onto instances of the corresponding
mathematical objects
62

Denotational semantics
As is the case with operational semantics, the
meaning of a language construct is defined in
terms of the state changes
In denotational semantics, state is defined in terms
of the various mathematical objects
State is defined only in terms of the values of the
program's variables
The value of a variable is an instance of an appropriate
mathematical object
63

Denotational semantics
The state s of a program consists of the values of all
its current variables
s = {<i1, v1>, <i2, v2>, , <in, vn>}
Here, ik is a variable and vk is the associated value
Each vk is a mathematical object

Most semantics mapping functions for program


constructs map states to states
The state change defines the meaning of the program
construct
Expression statements (among others) map states to
values
64

Denotational semantics
Let VARMAP be a function that, when given a
variable name and a state, returns the current
value of the variable
VARMAP(ik, s) = vk
Any variable can have the special value undef
i.e., currently undefined

65

Denotational semantics example


The syntax of decimal numbers is described by
the EBNF grammar
<dec_num> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<dec_num> (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9)

The denotational semantics of decimal numbers


involves a semantic function that maps decimal
numbers as strings of symbols into numeric values
(mathematical objects)

66

Semantic function for decimal numbers


Mdec('0') = 0, Mdec ('1') = 1, , Mdec ('9') = 9
Mdec (<dec_num> '0') = 10 * Mdec (<dec_num>)
Mdec (<dec_num> '1) = 10 * Mdec (<dec_num>) + 1

Mdec (<dec_num> '9') = 10 * Mdec (<dec_num>) + 9

67

Denotational semantics of expressions

Assume expressions consist of decimal integer


literals, variables, or binary expressions having one
arithmetic operator and two operands, each of
which can only be a variable or integer literal
The value of an expression is an integer
The value of an expression is error if it involves an undef value
Thus, expressions map onto Z {error}
<expr> <dec_num> | <var> | <binary_expr>
<binary_expr> <left_expr> <operator> <right_expr>
<left_expr> <dec_num> | <var>
<right_expr> <dec_num> | <var>
<operator> + | *

68

Semantic function for expressions


Me(<expr>, s) =
case <expr> of
<dec_num> => Mdec(<dec_num>)
<var> =>
if VARMAP(<var>, s) == undef
then error
else VARMAP(<var>, s)
<binary_expr> =>
if (Me(<binary_expr>.<left_expr>, s) == undef
or Me(<binary_expr>.<right_expr>, s) == undef)
then error
else if (<binary_expr>.<operator> = +
then Me(<binary_expr>.<left_expr>, s) + Me(<binary_expr>.<right_expr>, s)
else Me(<binary_expr>.<left_expr>, s) * Me(<binary_expr>.<right_expr>, s)
end case

An expression is mapped to a value

69

Denotational semantics of assignments


Assignment statements map states to states
Ma(x = E, s) =
if Me(E, s) == error
then
error
else
s = {<i1,v1>,<i2,v2>,...,<in,vn>},
where, for j = 1, 2, ..., n,
vj = VARMAP(ij, s) when ij <> x and
vj = Me( E, s) when ij == x
70

Denotational semantics of logical


pretest loops
Logical pretest
loops map states
to states
Assume Msl maps
a statement list to
a state
Assume Mb maps
a Boolean expression
to a Boolean value
or to error

Ml( while B do L end, s ) =


if Mb(B, s) == undef then
error
else if Mb(B, s) == false then
s
else if Msl(L, s) == error then
error
else
Ml(while B do L end, Msl(L, s) )
71

Denotational semantics of loops


The meaning of the loop is the value of the
program variables after the statements in the loop
have been executed the prescribed number of
times (assuming there have been no errors)
In essence, the loop has been converted from
iteration to recursion, where the recursive control
is mathematically defined by other recursive state
mapping functions
Recursion, when compared to iteration, is easier to
describe with mathematical rigor
72

Denotational semantics
Evaluation of denotational semantics:
Can be used to determine meaning of complete
programs in a given language
Provides a rigorous way to think about programs
Can be an aid to language design

73

You might also like