You are on page 1of 18

BOTTOM UP PARSING

A bottom-up parse corresponds to the construction of a parse tree for an input string
beginning at the leaves (the bottom) and working up towards the root (the top) . It is
convenient to describe parsing as the process of building parse trees, although a front end
may in fact carry out a translation directly without building an explicit tree.
We can think of bottom-up parsing as the process of "reducing" a string w to the start symbol
of the grammar. At each reduction step, a specific substring matching the body of a
production is replaced by the non terminal at the head of that production. The key decisions
during bottom-up parsing are about when to reduce and about what production to apply, as
the parse proceeds.

LR PARSERS:
The 'most prevalent type of bottom-up parser today is based on a concept called LR(k)
parsing; the "L" is for left-to-right scanning of the input, the "R" for constructing a rightmost
derivation in reverse, and the k for the number of input symbols of lookahead that are used in
making parsing decisions.
LR parsing is attractive because of variety of reasons:
LR parsers can be constructed to recognize virtually all programming language
constructs for which context-free grammars can be written. Non LR context-free
grammars exist, but these can generally be avoided for typical programminglanguage constructs.
The LR-parsing method is the most general non back tracking shift-reduce parsing
method known, yet it can be implemented as efficiently as other, more primitive shiftreduce methods .
An LR parser can detect a syntactic error as soon as it is possible to do so on a left-toright scan of the input.
LR grammars can describe more languages than LL grammars.

ITEMS AND THE LR(0) AUTOMATION:


An LR parser makes shift-reduce decisions by maintaining states to keep track of where we
are in a parse. States represent sets of "items." An LR(O) item (item for short) of a grammar
G is a production of G with a dot at some position of the body. Thus, production A -> XYZ
yields the four items
A -> XYZ
A -> X YZ
A -> XY Z
A -> XYZ
The production A -> generates only one item, A -> . .
Intuitively, an item indicates how much of a production we have seen at a given point in the
parsing process. For example, the item A -> XY Z indicates that we hope to see a string
derivable from XY Z next on the input. Item A -> X Y Z indicates that we have just seen on
the input a string derivable from X and that we hope next to see a string derivable from Y Z.
Item A ->XY Z indicates that we have seen the body XY Z and that it may be time to reduce
XYZ to A.

CLOSURE OF ITEM SETS:


If I is a set of items for a grammar G, then CLOSURE(I) is the set of items constructed from I
by the two rules:
1. Initially, add every item in I to CLOSURE(I).
2. If A ->B is in CLOSURE(I) and B -> is a production, then add the item B -> to
CLOSURE(I), if it is not already there. Apply this rule until no more new items can
be added to CLOSURE (I).
THE FUNCTION GOTO:
The second useful function is GOTO(I, X) where I is a set of items and X is a grammar
symbol. GOTO (I, X) is defined to be the closure of the set of all items [A -> X. ] such that
[A -> . X] is in I. Intuitively, the GOTO function is used to define the transitions in the
LR(O) automaton for a grammar. The states of the automaton correspond to sets of items, and
GOTO (I, X) specifies the transition from the state for I under input X.

THE LR PARSING ALGORITHM:


A schematic of an LR parser is shown in Fig. It consists of an input, an output, a stack, a
driver program, and a parsing table that has two parts (ACTION and GOTO) . The driver
program is the same for all LR parsers; only the parsing table changes from one parser to
another. The parsing program reads characters from an input buffer one at a time. Where a
shift-reduce parser would shift a symbol, an LR parser shifts a state. Each state summarizes
the information contained in the stack below it.

FIG: LR PARSING ALGORITHM.

STRUCTURE OF LR PARSING TABLE:


The parsing table consists of two parts: a parsing-action function ACTION and a goto
function GOTO.
1. The ACTION function takes as arguments a state i aI1d a terminal a (or $, the input
end marker). The value of ACTION[i, a] can have one of four forms:
Shift j , where j is a state. The action taken by the parser effectively shifts input a
to the stack, but uses state j to represent a .
Reduce A ->. The action of the parser effectively reduces on the top of the
stack to head A.
Accept. The parser accepts the input and finishes parsing;
Error. The parser discovers an error in its input and takes some
corrective action.
2. We extend the GOTO function, defined on sets of items, to states: if GOTO [Ii , A] = Ij ,
then GOTO also maps a state i and a non terminal A to state j.

ALGORITHMS USED:
Algorithm to compute closure of an item:

Algorithm to compute set of canonical LR(0) items:

Algorithm to construct LR(0) Parsing Table:

LR PARSING ALGORITHM:

IMPLEMENTATION CODE IN C LANGUAGE:


#include<stdio.h>
#include<conio.h>
#include<stdlib.h>
#include<string.h>
#define size 20
struct state
{
char productions[size][10],on_symbol;
short int
scanned_productions[size],no_of_productions,state_number,shift_info[10],number_of_shift;
struct state *link;
};
typedef struct state * NODE;
struct action
{
int state;
char act;
};
typedef struct action ACTION;
ACTION field[30][15];
NODE first = NULL,last = NULL;
int number_of_states,no_of_variables,no_of_terminals,count=1,jp=0,p_goto[30][10],tp;
char closure_productions[size][10],input[10][10],p_first[10][10],p_follow[10]
[10],variables[size],terminals[size],p[10];
void closure(NODE,int *);
void items();
NODE getnode()
{
NODE temp;
int i;
temp = (NODE) malloc(sizeof(struct state));
for(i = 0 ;i < size ;i++)

{
strcpy(temp->productions[i],"\0");
temp->scanned_productions[i] = 0;
}
temp->no_of_productions = 0;
temp->number_of_shift = 0;
temp->link = NULL;
return temp;
}
void insert(NODE temp)
{
if(first == NULL)
{
first = temp;
last = temp;
}
else
{
last->link = temp;
last = temp;
}
}
void dot_productions(char input[][10],int count)
{
int i = 1,j,k = 1;
char buffer[10]={'\0'};
while(i < count)
{
k = 0;
j = 0;
while(input[i][k] != '\0')
{
if(input[i][k] == '>')
{
buffer[j++] = '>';
buffer[j++] = '.';
}
else
buffer[j++] = input[i][k];
k++;
}
buffer[j] = '\0';
strcpy(closure_productions[i],buffer);
i++;
}
}
int check_for_presence_in_productions(NODE temp,char *buffer)
{
int i = 0;
while(temp->productions[i][0])
{
if(!strcmp(temp->productions[i],buffer))
return 1;
i++;
}
return 0;
}
void augment_grammar(char input[][10])
{
input[0][0] = input[1][0];

input[0][1]='1';
input[0][2]='-';
input[0][3]='>';
input[0][4] = input[1][0];
input[0][5] = '\0';

}
void initial_state(char input[][10])
{
int i = 0,j = 0,place;
char buffer[30] = {'\0'};
NODE temp = NULL;
while(input[0][i] != '\0')
{
if(input[0][i] == '>')
{
buffer[j++] = '>';
buffer[j++] = '.';
}
else
buffer[j++] = input[0][i];
i++;
}
buffer[j] = '\0';
temp = getnode();
insert(temp);
temp->state_number = 0;
temp->on_symbol = '\0';
strcpy(temp->productions[0],buffer);
temp->no_of_productions += 1;
place = 1;
closure(temp,&place);
}
int state_not_added(NODE temp1,int sno)
{
NODE temp;
int count;
temp = first;
while(temp != NULL)
{
if(temp->no_of_productions == temp1->no_of_productions)
{
count = 0;
while(count < temp->no_of_productions)
{
if(!strcmp(temp->productions[count],temp1->productions[count]))
count++;
else
break;
}
if(count == temp->no_of_productions)
{
temp->shift_info[++temp->number_of_shift] = sno;
return 1;
}
}
temp = temp->link;
}
return 0;
}
int findv(char c)
{
int i=0;
for(i=0;i<no_of_variables;i++)

if(c==variables[i])
{
return i;
}

}
return -1;

}
int findt(char c)
{
int i=0;
for(i=0;i<no_of_terminals;i++)
{
if(c==terminals[i])
return 1;
}
return 0;
}
int eptrans(char c)
{
int i=0;
for(i=1;i<=count;i++)
{
if(input[i][0]==c)
{
if(input[i][3]=='?')
return 1;
}
}
return 0;
}
void compute_first(char c,int index)
{
int j,i=0,k=3,y=0;
for(j=1;j<=count;j++)
{
if(input[j][0]==c)
{
if(input[j][3]=='?')
{
while(p_first[index][y]!='\0')
{
if(p_first[index][y]=='?')break;y++;
}
if(y==strlen(p_first[index]))
p_first[index][jp++]='?';
}
else if(findt(input[j][3]))p_first[index][jp++]=input[j][3];
else if(input[j][3]==c);
else
{
while(input[j][k]!='\0')
{
if(eptrans(input[j][k]))
{compute_first(input[j][k],index);
k++;
tp++;
}
else
{
compute_first(input[j][k],index);
break;

}
}

}
}
int detect_epslon(char c)
{
int var,i;
var = findv(c);
if(var == -1)
return 0;
for(i=0;i<strlen(p_first[var]);i++)
{
if(p_first[var][i]=='?')
return 1;
}
return 0;
}
int presence(char c,int index)
{
int i;
for(i=0;i<strlen(p_follow[index]);i++)
if(c==p_follow[index][i])
return 0;
return 1;
}
void compute_follow(char c,int index)
{
int j,i=0,k=0,var,m=0;
char temp;
for(j=1;j<=count;j++)
{
for(k=3;k<strlen(input[j]);k++)
{
if(c==input[j][k])
{
m=k;
while(input[j][m]!='\0')
{
temp=input[j][m+1];
if(findt(temp))
{
if(presence(temp,index))
p_follow[index][jp++]=temp;
break;
}
else if(temp==NULL||detect_epslon(temp))
{
var = findv(input[j][0]);
for(i=0;i<strlen(p_follow[var]);i++)
if(presence(p_follow[var][i],index))
p_follow[index][jp++]=p_follow[var][i];
}
if(var=findv(temp))
{
for(i=0;i<strlen(p_first[var]);i++)
if(p_first[var][i] != '?'&&presence(p_first[var][i],index))
p_follow[index][jp++]=p_first[var][i];
}

if(detect_epslon(temp))

else
}

m++;
break;

}
}

}
int findp(char *str)
{ int i;
char buff[20];
strcpy(buff,str);
buff[strlen(str)-1]='\0';
for(i=1;i<=count;i++)
{
if(!strcmp(buff,input[i]))
return i;
}
return 0;
}
int posterm(char c)
{
int i=0;
for(i=0;i<no_of_terminals;i++)
{
if(c==terminals[i])
return i;
}
return 0;
}
void compute_action()
{
NODE temp;
int i,j,k,l,m;
char ch;
for(i=0;i<no_of_terminals;i++)
{
ch=terminals[i];
temp = first;
while(temp!= NULL)
{
if(temp->on_symbol==ch)
{
for(j=1;j<=temp->number_of_shift;j++)
{
field[temp->shift_info[j]][i].state=temp->state_number;
field[temp->shift_info[j]][i].act='s';
}
}
temp=temp->link;
}
}
temp=first;
while(temp!=NULL)
{
if(temp->state_number!=1)
for(i=0;i<temp->no_of_productions;i++)
{
k=strlen(temp->productions[i]);
k--;

if(temp->productions[i
][k]=='.')
{
l=findp(temp->productions[i]);
m=findv(input[l][0]);
for(j=0;j<strlen(p_follow[m]);j++)
{
if(field[temp->state_number][posterm(p_follow[m]
[j])].act==NULL)

{
field[temp->state_number][posterm(p_follow[m]

[j])].state=l;

field[temp->state_number][posterm(p_follow[m]

[j])].act='r';

}
else
{

printf("%cr confilct\n",field[temp>state_number][posterm(p_follow[m][j])].act);
getch();
exit(0);
}
}
}
}
else
{
field[1][posterm('$')].state=1;
field[1][posterm('$')].act='a';
}
temp=temp->link;
}
}
void compute_goto()
{ int i,j;
char ch;
NODE temp;
for(i=0;i<no_of_variables;i++)
{
temp=first;
ch=variables[i];
while(temp!= NULL)
{
if(temp->on_symbol==ch)
{
for(j=1;j<=temp->number_of_shift;j++)
{p_goto[temp->shift_info[j]][i]=temp->state_number;
}
}
temp=temp->link;
}
}
}
void parse(char *str)
{
int i=0,stack[15],j,top = 0,k,m,pos=-1,l=0;

char ch,action[8]={'\0'},symbol[20]={'\0'},*p,temp;
stack[0] = 0;
while(str[i] !='\0')
{
j = posterm(str[i]);
p=&str[i];
if(field[stack[top]][j].act == 'a')
{
strcpy(action,"Accept");
printf("\n");
printf("%15s
",action);
for(l=0;l<=top;l++)
printf("%d",stack[l]);
printf("%15s",symbol);
printf("%15s",p);
printf("\nString parsed");
break;
}
else if(field[stack[top]][j].act=='s')
{
stack[top+1] = field[stack[top]][j].state;
symbol[++pos]=str[i];
strcpy(action,"Shift");
top++;
i++;
}
else if(field[stack[top]][j].act=='r')
{
strcpy(action,"Reduce");
k = strlen(input[field[stack[top]][j].state])-3;
m = findv(input[field[stack[top]][j].state][0]);
temp=input[field[stack[top]][j].state][0];
while(k>0)
{
stack[top--] = 0;
symbol[pos--]='\0';
k--;
}
symbol[++pos]=temp;
stack[top+1]=p_goto[stack[top]][m];
top++;

}
else
{

printf("\nERRROR");
return;
}
printf("\n");
printf("%15s
",action);
for(l=0;l<=top;l++)
printf("%d",stack[l]);
printf("%15s",symbol);
printf("%15s",p);

}
}
int validi(char *str)
{
int i=0;
for(i=0;i<strlen(str);i++)
{
if(!findt(str[i]))
return 0;
}
return 1;
}

int main()
{
int i = 0,j,k;
char buffer[10],str[20]={'\0'};NODE temp;
printf("\nenter the variables\n");
scanf("%s",variables);
no_of_variables=strlen(variables);
printf("\nenter the terminals\n");
scanf("%s",terminals);
terminals[strlen(terminals)]='$';
// terminals[strlen(terminals)]='?';
no_of_terminals=strlen(terminals);
printf("\nEnter the productions(? FOR EPSILON)$ to end the input\n");
while(1)
{
scanf("%s",buffer);
if(!strcmp(buffer,"$"))
break;
else
{
strcpy(input[count],buffer);
count++;
}
}
for(i=0;i<no_of_variables;i++)
{
jp=0;
tp=0;
compute_first(variables[i],i);
//printf("%c--->%s\n",variables[i],p_first[i]);
}
for(j=0;j<no_of_variables;j++)
{ for(i=0;i<no_of_variables;i++)
{
jp=0;
compute_follow(variables[i],i);
if(j==0&&i==0)p_follow[j][jp++]='$';
}
}
/*for(i=0;i<no_of_variables;i++)
{printf("f%c--->%s\n",variables[i],p_follow[i]);
}*/
dot_productions(input,count);
augment_grammar(input);
initial_state(input);
i = 0;
/*while(first->productions[i][0] != '\0')

printf("\n%s",first->productions[i++]);
printf(" %d\n",first->no_of_productions);
*/
items();
temp = first;
while(temp != NULL)
{
i = 0;
printf("\nI%d ",temp->state_number);
while(i < temp->no_of_productions)
printf("\n%s",temp->productions[i++]);
for(i = 1;i <=temp->number_of_shift;i++)
printf("\nI%d on %c = %d",temp->shift_info[i],temp->on_symbol,temp>state_number);
temp =temp->link;
}
getch();
compute_action();
compute_goto();
printf("\n--------------------------LR(0) PARSING TABLE---------------------------------");
printf("\n
");
for(i=0;i<no_of_terminals;i++)
{ printf("%7c",terminals[i]);
}
printf(" ");
for(i=0;i<no_of_variables;i++)
{
printf("%7c",variables[i]);
}
for(i=0;i<=number_of_states;i++)
{
printf("\n%7d",i);
for(j=0;j<no_of_terminals;j++)
printf(" %c%d ",field[i][j].act,field[i][j].state);
for(j=0;j<no_of_variables;j++)
printf("%7d",p_goto[i][j]);
}
printf("\nEnter the string to be parsed\n");
scanf("%s",str);
if(!validi(str))
{
printf("invalid input\n");
getch();
exit(0);
}
printf("\n------------------------------------PARSING------------------------------------");
parse(str);
getch();
return 0;
}
void closure(NODE temp,int *place)
{
int i = 0,j = 0,l = 0,k = 0,m;
i = (*place)-1;
j = 0;
while(temp->productions[i][0] != '\0')

m = 0;
while(temp->productions[i][m] != '.')
m++;
m++;
if(temp->productions[i][m] >= 65 && temp->productions[i][m] <=90)//findv
{
j = 1;
while(closure_productions[j][0] != '\0')
{
l=0;
if(closure_productions[j][0] == temp->productions[i][m])
{
if(!check_for_presence_in_productions(temp,closure_productions[j]))
{
strcpy(temp->productions[(*place)++],closure_productions[j]);
temp->no_of_productions += 1;
}
}
j++;

}
}
i++;
}
}
void items()
{
NODE temp,temp1 = NULL;
int i=0,j=0,k=0,place=0;
char ch,buffer[10];
temp = first;
while(temp != NULL)
{
while(i < temp->no_of_productions)
{
if(temp->scanned_productions[i] == 0)
{
j = 0;
temp->scanned_productions[i] = 1;
while(temp->productions[i][j++] != '.');
ch = temp->productions[i][j];
if( ch != '\0'&&ch!='?')
{
place = 0;
temp1 = getnode();
temp1->on_symbol = ch;
temp1->shift_info[++temp1->number_of_shift] = temp>state_number;
strcpy(buffer,temp->productions[i]);
buffer[j-1] = ch;
buffer[j] = '.';
strcpy(temp1->productions[0],buffer);
temp1->no_of_productions = 1;
place = 1;
closure(temp1,&place);
k = 0;
while(temp->productions[k][0] != '\0')

j = 0;
if(temp->scanned_productions[k] == 0)
{
while(temp->productions[k][j++] != '.');
if(ch == temp->productions[k][j] )
{
temp->scanned_productions[k] = 1;
strcpy(buffer,temp->productions[k]);
buffer[j-1] = ch;
buffer[j] = '.';
strcpy(temp1->productions[place++],buffer);
temp1->no_of_productions += 1;
closure(temp1,&place);
}

}
k++;
}
/*for(i = 0;i < temp1->no_of_productions;i++)
printf("\n%s",temp1->productions[i]);*/
if(!state_not_added(temp1,temp->state_number))
{
insert(temp1);
temp1->state_number = ++number_of_states;
}
else
free(temp1);
}
i=0;
}
else
i++;
}
temp = temp->link;
i = 0;
}
}

SAMPLE OUTPUT:

You might also like