English Syntax

Lexical and Syntactic Analysis
Here, we look at two of the tasks involved in the compilation process

Given source code, we need to first break it into meaningful units (lexical analysis) and then parse these lexical components into their syntactic uses (syntactic analysis)
The goal of lexical analysis is to identify each lexeme in the source code and assign it the proper token The goal of syntactic analysis is to parse the lexemes into a parse tree
During both lexical and syntactic analysis, errors can be detected and reported Note that this two-step process omits the actual translation from source code to machine language this is also required, but we will not consider it here
Lexical Analysis
Given source code that consists of
reserved words, identifiers, punctuation, blank spaces, comments
Identify each item for what it is

if it is a reserved word or punctuation, categorize the type, if it is an identifier, add it to the symbol table (if not already present), if it is blank space or comment, discard (ignore) it
How will we perform this operation?

we can use a relatively simple state transition diagram to describe the various entities of interest and then implement this in a program
the lexical analysis programs task is to parse the input and produce each item individually (the lexeme) each item should include its type (the token)
Recognizing Names/Words/Numbers
int lex( ) { getChar( ); switch (charClass) { case LETTER: addChar( ); getChar( ); while (charClass == LETTER || charClass == DIGIT) { addChar( ); getChar( ); } return lookup(lexeme); break; case DIGIT: addChar( ); getChar( ); while (charClass == DIGIT) { addChar( ); getChar( ); } return INT_LIT; break; } /* End of switch */ } /* End of function lex */
The process of
Parsing
generating a parse tree from a set of input that identifies

the grammatical categories of each element of the input
identifying if and where errors occur
Parsing is similar whether for a natural language or a programming language

a good parser will continue parsing even after errors have been found
this requires a recovery process
Two general forms of parsers

Top-down (used in LL parser algorithm)
start with LHS rules, map to RHS rules until terminal symbols have been identified, match these against the input
Bottom-up (used in LR parser algorithms)

start with RHS rules and input, collapse terminals and non-terminals into nonterminals until you have reached the starting non-terminal
Parsing is an O(n3) problem where n is the number of items in the input

if we cannot determine a single token for each lexeme, the problem because O(2n)! by restricting our parser to work only on the grammar of the given language, we can reduce the complexity to O(n)
Top-Down Parsing
Using a BNF of a language, we generate a recursive-decent parser
each of our non-terminal grammatical categories in the BNF are converted into functions (e.g., <expr>, <if>, <factor>, etc) in any given function, when called, it parses the next lexeme using a function called lex( ), and maps it to terminal symbols and/or calls further functions
this approach is known as an LL Parser left-to-right parse, using leftmost derivations
it is simple to generate the recursive-decent parser
There are two restrictions that we must make on the grammar

the grammar specifying the language cannot have left recursion
if a rule has recursive parts, those parts must not be the first items on the RHS of a rule
the grammars must pass the pairwise disjointness test
Algorithms exist to alter a grammar so that it passes both restrictions
Recursive Decent Parser Example

void expr( ) { term( ); while (nextToken = = PLUS_CODE || nextToken = = MINUS_CODE){ lex( ); term( ); } } void term( ) { factor( ); while (nextToken = = MULT_CODE || nextToken = = DIV_CODE) { lex( ); factor( ); } }
Recall our example expression grammar from chapter 3: <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>} <factor> id | ( <expr> ) void factor( ) { if(nextToken = = ID_CODE) lex( ); else if(nextToken = = LEFT_PAREN_CODE) { lex( ); expr( ); if(nextToken = = RIGHT_PAREN_CODE) lex( ); else error( ); } else error( ); }
void ifstmt( ) { if (nextToken != IF_CODE) error( ); else { lex( ); if (nextToken != LEFT_PAREN_CODE) error( ); else { boolexpr( ); if (nextToken != RIGHT_PAREN_CODE) error( ); else { statement( ); if(nextToken = = ELSE_CODE) { lex( ); statement( ); } } } } }
If Statement Example
We expect an if statement to look like this: if (boolean expr) statement; optionally followed by: else statement; Otherwise, we return an error
LL Parser Restriction
Recall one of our restrictions for the use of the LL parser was that the grammar pass the pairwise disjointness test
The parser will need to be able to select the proper right-hand side rule to apply while parsing
if the current rule being applied is of a <factor>, should we apply <id> or (<expr>) to it?
For the parser to be able to know which rule to apply, the first non-terminal on each right-hand side rule must differ
consider a rule
<A> a<B> | a<C>
if the parser finds an a in the input, which rule should be applied

should it call function B or C?
Pairwise Disjointness Test

The pairwise disjointness test examines a grammar to make sure that
all rules in the grammar are pairwise disjoint for the same LHS
the book provides a formal definition that we will skip
here are some examples

A aB | bAb | c
is pairwise disjoint
A aB | aAb
is not pairwise disjoint
<var> id | id[<expr>]
is not pairwise disjoint, but we can make it so:
<var> id<next> <next> e | [<expr>]

e means empty set
Bottom-Up Parsing
Because of the two restrictions placed on grammars to qualify for the LL parser
an alternative approach is the LR parser which does bottom-up parsing
LR: Left-to-right parsing, Rightmost derivation
The parser is implemented using a pushdown automaton

a stack added to the state diagrams seen earlier
The parser has two basic processes

shift move items from the input onto the stack reduce take consecutive stack items and reduce them, for instance, if we have a rule <A> a<B> and we have a and <B> on the stack, reduce them to <A>
while the parser is easy to implement, it relies on an LR parsing table, which is difficult to generate
there are numerous algorithms to generate the parsing table, we will skip how to do that and assume we already have one available
Given input S0, a1, , an, $
Parser Algorithm
S0 is the start state a1, , an are the lexemes that make up the program $ is a special end of input symbol
If action[Sm, ai] = Shift S, then push ai, S onto stack and change state to S If action[Sm, ai] = Reduce R, then use rule R in the grammar and reduce the items on the stack appropriately, changing state to be the state GOTO[Sm, R] If action[Sm, ai] = Accept then the parse is complete with no errors The Parsing If action[Sm, ai] = table stores the Error (or the entry values of in the table is action[x, y] and blank) then call GOTO[x, y] error-handling and recovery routine
Example
Parse of id+id*id$
Stack Input 0 id+id*id$ 0id5 +id*id$ 0F3 +id*id$ 0T2 +id*id$ 0E1 +id*id$ 0E1+6 id*id$ 0E1+6id5 *id$ 0E1+6F3 *id$ 0E1+6T9 *id$ 0E1+6T9*7 id$ 0E1+6T9*7id5 $ 0E1+6T9*7F10 $ 0E1+6T9 $ 0E1 $ Action S5 R6(GOTO[0,F]) R4(GOTO[0,T]) R2(GOTO[0,E]) S6 S5 R6(GOTO[6,F]) R4(GOTO[6,T]) S7 S5 R6(GOTO[7,F]) R3(GOTO[6,T]) R1(GOTO[0,E]) ACCEPT
Grammar: 1. E E + T 2. E T 3. T T * F 4. T F 5. F (E) 6. F id

English Syntax

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

English Syntax

Uploaded by

Copyright:

Available Formats

Lexical and Syntactic Analysis

Here, we look at two of the tasks involved in the compilation process

Identify each item for what it is

How will we perform this operation?

generating a parse tree from a set of input that identifies

identifying if and where errors occur

Parsing is similar whether for a natural language or a programming language

Two general forms of parsers

Bottom-up (used in LR parser algorithms)

Parsing is an O(n3) problem where n is the number of items in the input

it is simple to generate the recursive-decent parser

There are two restrictions that we must make on the grammar

the grammars must pass the pairwise disjointness test

Algorithms exist to alter a grammar so that it passes both restrictions

Recursive Decent Parser Example

if the parser finds an a in the input, which rule should be applied

Pairwise Disjointness Test

here are some examples

<var> id<next> <next> e | [<expr>]

The parser is implemented using a pushdown automaton

The parser has two basic processes

Given input S0, a1, , an, $

You might also like