Professional Documents
Culture Documents
During both lexical and syntactic analysis, errors can be detected and reported Note that this two-step process omits the actual translation from source code to machine language this is also required, but we will not consider it here
Lexical Analysis
Given source code that consists of
reserved words, identifiers, punctuation, blank spaces, comments
Recognizing Names/Words/Numbers
int lex( ) { getChar( ); switch (charClass) { case LETTER: addChar( ); getChar( ); while (charClass == LETTER || charClass == DIGIT) { addChar( ); getChar( ); } return lookup(lexeme); break; case DIGIT: addChar( ); getChar( ); while (charClass == DIGIT) { addChar( ); getChar( ); } return INT_LIT; break; } /* End of switch */ } /* End of function lex */
The process of
Parsing
Top-Down Parsing
Using a BNF of a language, we generate a recursive-decent parser
each of our non-terminal grammatical categories in the BNF are converted into functions (e.g., <expr>, <if>, <factor>, etc) in any given function, when called, it parses the next lexeme using a function called lex( ), and maps it to terminal symbols and/or calls further functions
this approach is known as an LL Parser left-to-right parse, using leftmost derivations
void ifstmt( ) { if (nextToken != IF_CODE) error( ); else { lex( ); if (nextToken != LEFT_PAREN_CODE) error( ); else { boolexpr( ); if (nextToken != RIGHT_PAREN_CODE) error( ); else { statement( ); if(nextToken = = ELSE_CODE) { lex( ); statement( ); } } } } }
If Statement Example
We expect an if statement to look like this: if (boolean expr) statement; optionally followed by: else statement; Otherwise, we return an error
LL Parser Restriction
Recall one of our restrictions for the use of the LL parser was that the grammar pass the pairwise disjointness test
The parser will need to be able to select the proper right-hand side rule to apply while parsing
if the current rule being applied is of a <factor>, should we apply <id> or (<expr>) to it?
For the parser to be able to know which rule to apply, the first non-terminal on each right-hand side rule must differ
consider a rule
<A> a<B> | a<C>
A aB | aAb
is not pairwise disjoint
<var> id | id[<expr>]
is not pairwise disjoint, but we can make it so:
Bottom-Up Parsing
Because of the two restrictions placed on grammars to qualify for the LL parser
an alternative approach is the LR parser which does bottom-up parsing
LR: Left-to-right parsing, Rightmost derivation
Parser Algorithm
S0 is the start state a1, , an are the lexemes that make up the program $ is a special end of input symbol
If action[Sm, ai] = Shift S, then push ai, S onto stack and change state to S If action[Sm, ai] = Reduce R, then use rule R in the grammar and reduce the items on the stack appropriately, changing state to be the state GOTO[Sm, R] If action[Sm, ai] = Accept then the parse is complete with no errors The Parsing If action[Sm, ai] = table stores the Error (or the entry values of in the table is action[x, y] and blank) then call GOTO[x, y] error-handling and recovery routine
Example
Parse of id+id*id$
Stack Input 0 id+id*id$ 0id5 +id*id$ 0F3 +id*id$ 0T2 +id*id$ 0E1 +id*id$ 0E1+6 id*id$ 0E1+6id5 *id$ 0E1+6F3 *id$ 0E1+6T9 *id$ 0E1+6T9*7 id$ 0E1+6T9*7id5 $ 0E1+6T9*7F10 $ 0E1+6T9 $ 0E1 $ Action S5 R6(GOTO[0,F]) R4(GOTO[0,T]) R2(GOTO[0,E]) S6 S5 R6(GOTO[6,F]) R4(GOTO[6,T]) S7 S5 R6(GOTO[7,F]) R3(GOTO[6,T]) R1(GOTO[0,E]) ACCEPT
Grammar: 1. E E + T 2. E T 3. T T * F 4. T F 5. F (E) 6. F id