Professional Documents
Culture Documents
Siddhesh S Pange
M.Sc.CS:1 Roll no.:06
Top-down parsing is a parsing strategy where one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. LL parsers are a type of parser that uses a top-down parsing strategy. Top-down parsing is a strategy of analyzing unknown data relationships by hypothesizing general parse tree structures and then considering whether the known fundamental structures are compatible with the hypothesis. It occurs in the analysis of both natural languages and computer languages. Top down parsing
set ip to point to the first symbol of w$. repeat let X be the top stack symbol and a the symbol pointed to by ip. if X is a terminal of $ then if X=a then pop X from the stack and advance ip else error() else if M[X,a]=X->Y1Y2...Yk then begin pop X from the stack; push Yk,Yk-1...Y1 onto the stack, with Y1 on top; output the production X-> Y1Y2...Yk end else error() until X=$
Consider the following example to understand the concept of First and Follow.
E -> TE' E'-> +TE' | e T -> FT' T'-> *FT' | e F -> (E) | id
Then:
FIRST (E) =FIRST (T) =FIRST (F) = {(, id} FIRST (E') = {+, e} FIRST (T') = {*, e}
FOLLOW (E) =FOLLOW (E') = {), $} FOLLOW (T) =FOLLOW (T') = {+,), $} FOLLOW (F) = {+,*,), $}
For example, id and left parenthesis are added to FIRST (F) by rule 3 in definition of FIRST with i=1 in each case, since FIRST (id)=(id) and FIRST('(')= {(} by rule 1. Then by rule 3 with i=1, the production T > FT' implies that id and left parenthesis belong to FIRST (T) also.
To compute FOLLOW, we put $ in FOLLOW (E) by rule 1 for FOLLOW. By rule 2 applied to production F-> (E), right parenthesis is also in FOLLOW (E). By rule 3 applied to production E -> TE', $ and right parenthesis are in FOLLOW (E').
(
E TE
E E T T F
E TE
Parsing procedure
Now we can have a predictive parsing mechanism Use a stack to keep track of the expanded form Initialization Put starting symbol S and $ into the stack Add $ to the end of the input string $ is for the recognition of the termination configuration
If a is at the top of the stack and a is the next input symbol then Simply pop a from stack and advance on the input string
If A is on top of the stack and a is the next input symbol then Assume that M [A, a] = A Replace A by in the stack
If A is on top of the stack and a is the next input but M[A, a] = empty Error
Stack E$ T E $ F T E $ T E $ E $ T E $ F T E $ T E $ F T E $ T E $ E $ $
id
num
E TE Error T FT Error F num
*
Error Error Error T *FT Error
+
Error E +TE Error T Error
(
E TE Error T FT Error F (E)
)
Error E Error T Error
$
Error E Error T Error
E E T T F
E TE Error T FT Error F id
1. As a starting point, we can place all symbols in FOLLOW (A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW (A) is seen and pop A from the stack, it is likely that parsing can continue. 2. It is not enough to use FOLLOW (A) as the synchronizing set for A. For example, if semicolons terminate statements, as in C, then keywords that begin statements may not appear in the FOLLOW set of the nonterminal generating expressions. A missing semicolon after an assignment may therefore result in the keyword beginning the next statement being skipped. Often, there is a hierarchical structure on constructs in a language; e.g., expressions appear within statement, which appear within blocks, and so on. We can add to the synchronizing set of a lower construct the symbols that begin higher constructs. For example, we might add keywords that begin statements to the synchronizing sets for the nonterminals generating expressions. 3. If we add symbols in FIRST(A) to the synchronizing set for nonterminal A, then it may be possible to resume parsing according to A if a symbol in FIRST(A) appears in the input. 4. If a nonterminal can generate the empty string, then the production deriving e can be used as a default. Doing so may postpone some error detection, but cannot cause an error to be missed. This approach reduces the number of nonterminals that have to be considered during error recovery.
5. If a terminal on top of the stack cannot be matched, a simple idea is to pop the terminal, issue a message saying that the terminal was inserted, and continue parsing. In effect, this approach takes the synchronizing set of a token to consist of all other tokens.