You are on page 1of 10

CS143 Summer2012

Introductiontobison

Handout12 July9st,2012

HandoutwrittenbyMaggieJohnsonandrevisedbyJulieZelenski.

bisonisaparsergenerator.Itistoparserswhatflexistoscanners.Youprovidethe inputofagrammarspecificationanditgeneratesanLALR(1)parsertorecognize sentencesinthatgrammar.ThenameisagreatexampleofaCSinjoke.bisonisan upgradedversionoftheoldertoolyacc,"yetanothercompilercompiler"anditis probablythemostcommonoftheLALRtoolsoutthere.Ourprogrammingprojectsare configuredtousetheupdatedversionbison,acloserelativeoftheyak,butallofthe featuresweusearepresentintheoriginaltool,sothishandoutservesasabrief overviewofboth.Ourcoursewebpageincludealinktoanonlinebisonusersmanual forthosewhoreallywanttodigdeepandlearneverythingthereistolearnaboutparser generators. HowItWorks bisonisdesignedforusewithCcodeandgeneratesaparserwritteninC.Theparseris configuredforuseinconjunctionwithaflexgeneratedscannerandreliesonstandard sharedfeatures(tokentypes,yylval,etc.)andcallsthefunctionyylexasascannerco routine.Youprovideagrammarspecificationfile,whichistraditionallynamedusinga .yextension.Youinvokebisononthe.yfileanditcreatesthey.tab.h and y.tab.c filescontainingathousandorsolinesofintenseCcodethatimplementsan efficientLALR(1)parserforyourgrammar,includingthecodefortheactionsyou specified.Thefileprovidesanexternfunctionyyparse.ythatwillattemptto successfullyparseavalidsentence.YoucompilethatCfilenormally,linkwiththerest ofyourcode,andyouhaveaparser!Bydefault,theparserreadsfromstdinand writestostdout,justlikeaflexgeneratedscannerdoes.
% % % % bison myFile.y createsy.tab.cofCcodeforparser gcc -c y.tab.c compilesparsercode gcc o parse y.tab.o lex.yy.o ll -ly linkparser,scanner,libraries ./parse invokesparser,readsfromstdin

TheMakefilesweprovidefortheprojectswillexecutetheabovecompilationstepsfor you,butitisworthwhiletounderstandthestepsrequired.

2 bisonFileFormat Yourinputfileisorganizedasfollows(notetheintentionalsimilaritiesto flex):


%{ Declarations %} Definitions %% Productions %% User subroutines

TheoptionalDeclarationsandUsersubroutinessectionsareusedforordinaryCcode thatyouwantcopiedverbatimtothegeneratedCfile,declarationsarecopiedtothetop ofthefile,usersubroutinestothebottom.TheoptionalDefinitionssectioniswhereyou configurevariousparserfeaturessuchasdefiningtokencodes,establishingoperator precedenceandassociativity,andsettinguptheglobalvariablesusedtocommunicate betweenthescannerandparser.TherequiredProductionssectioniswhereyouspecify thegrammarrules.Asinflex,youcanassociateanactionwitheachpattern(thistime aproduction),whichallowsyoutodowhateverprocessingisneededasyoureduce usingthatproduction. Example Let'slookatasimple,butcomplete,specificationtogetourbearings.Hereisa bison inputfileforasimplecalculatorthatrecognizesandevaluatesbinarypostfixexpressions usingastack.
%{ #include <stdio.h> #include <assert.h> static int Pop(); static int Top(); static void Push(int val); %} %token T_Int %% S : | ; : | | | | ; S E '\n' { printf("= %d\n", Top()); }

E E '+' E E '-' E E '*' E E '/' T_Int

{ { { { {

Push(Pop() + Pop()); } int op2 = Pop(); Push(Pop() - op2); } Push(Pop() * Pop()); } int op2 = Pop(); Push(Pop() / op2); } Push(yylval); }

%%

3
static int stack[100], count = 0; static int Pop() { assert(count > 0); return stack[--count]; } static int Top() { assert(count > 0); return stack[count-1]; } static void Push(int val) { assert(count < sizeof(stack)/sizeof(*stack)); stack[count++] = val; } int main() { return yyparse(); }

Afewthingsworthpointingoutintheaboveexample: Alltokentypesreturnedfromthescannermustbedefinedusing%tokeninthe definitionssection.Thisestablishesthenumericcodesthatwillbeusedbythe scannertotelltheparserabouteachtokenscanned.Inaddition,theglobal variableyylvalisusedtostoreadditionalattributeinformationaboutthe lexemeitself. Foreachrule,acolonisusedinplaceofthearrow,averticalbarseparatesthe variousproductions,andasemicolonterminatestherule.Unlikeflex,bison paysnoattentiontolineboundariesintherulessection,soyourefreetouselots ofspacetomakethegrammarlookpretty. WithinthebracesfortheactionassociatedwithaproductionisjustordinaryC code.Ifnoactionispresent,theparserwilltakenoactionuponreducingthat production. Thefirstruleinthefileisassumedtoidentifythestartsymbolforthegrammar. yyparseisthefunctiongeneratedbybison.Itreadsinputfromstdin, attemptingtoworkitswaybackfromtheinputthroughaseriesofreductions backtothestartsymbol.Thereturncodefromthefunctionis0iftheparsewas successfuland1otherwise.Ifitencountersanerror(i.e.thenexttokeninthe inputstreamcannotbeshifted),itcallstheroutineyyerror,whichbydefault printsthegeneric"parseerror"messageandquits.

Inordertotryoutourparser,weneedtocreatethescannerforit.Hereistheflexfile weused:
%{ #include "y.tab.h" %} %% [0-9]+ { yylval = atoi(yytext); return T_Int;} [-+*/\n] { return yytext[0];} . { /* ignore everything else */ }

4 Giventheabovespecification,yylexwillreturntheASCIIrepresentationofthe calculatoroperators,recognizeintegers,andignoreallothercharacters.Whenit assemblesaseriesofdigitsintoaninteger,itconvertstoanumericvalueandstoresin yylval(theglobalreservedforpassingattributesfromthescannertotheparser).The tokentypeT_Intisreturned. Inordertotiethisalltogether,wefirstrunbisononthegrammarspecificationto generatethey.tab.candy.tab.h files,andthenrunflexonthescanner specificationtogeneratethelex.yy.cfile.Compilethetwo.cfilesandlinkthem together,andvoilaacalculatorisborn!HereistheMakefilewecoulduse:
calc: lex.yy.o y.tab.o gcc -o calc lex.yy.o y.tab.o lex.yy.c:calc.l y.tab.c flex calc.l y.tab.c: calc.y bison -vdty calc.y -ly -ll

Tokens,Productions,andActions Bydefault,flexandbisonagreetousetheASCIIcharactercodesforallsinglechar tokenswithoutrequiringexplicitdefinitions.Forallmultichartokens,thereneedstobe anagreeduponnumberingsystem,andallofthesetokensneedtobespecifically defined.The%tokendirectiveestablishestokennamesandassignsnumbers.


%token T_Int T_Double T_String T_While

Theabovelinewouldbeincludedinthedefinitionssection.Itistranslatedby bison intoCcodeasasequenceof#definesforT_Int,T_Double,andsoon,usingthe numbers257andhigher.Thesearethecodesreturnedfromyylexaseach multicharactertokenisscannedandidentified.TheCdeclarationsforthetokencodes areexportedinthegeneratedy.tab.hheaderfile.#includethatfileinothermodules (inthescanner,forexample)tostayinsyncwiththeparsergenerateddefinitions. Productionsandtheiraccompanyingactionsarespecifiedusingthefollowingformat:


left_side: right_side1 { action1 } | right_side2 { action2 } | right_side3 { action3 } ;

Theleftsideisthenonterminalbeingdescribed.Nonterminalsarenamedliketypical identifiers:usingasequenceoflettersand/ordigits,startingwithaletter.Non terminalsdonothavetobedefinedbeforeuse,buttheydoneedtobedefined eventually.Thefirstnonterminaldefinedinthefileisassumedtothestartsymbolfor thegrammar.

5 Eachrightsideisavalidexpansionfortheleftsidenonterminal.Verticalbarsseparate thealternatives,andtheentirelistispunctuatedbyasemicolon.Eachrightsideisalist ofgrammarsymbols,separatedbywhitespace.Thesymbolscaneitherbenon terminalsorterminals.Terminalsymbolscaneitherbeindividualcharacterconstants, e.g.'a'ortokencodesdefinedusing%tokensuchasT_While. TheCcodeenclosedincurlybracesaftereachrightsideistheassociatedaction. Whenevertheparserrecognizesthatproductionandisreadytoreduce,itexecutesthe actiontoprocessit.Whentheentirerightsideisassembledontopoftheparsestackand thelookaheadindicatesareduceisappropriate,theassociatedactionisexecutedright beforepoppingthehandleoffthestackandfollowingthegotofortheleftsidenon terminal.Thecodeyouincludeintheactionsdependsonwhatprocessingyouneed. Theactionmightbetobuildasectionoftheparsetree,evaluateanarithmetic expression,declareavariable,orgeneratecodetoexecuteanassignmentstatement. Althoughitismostcommonforactionstoappearattheendoftherightside,itisalso possibletoplaceactionsinbetweengrammarsymbols.Thoseactionswillbeexecuted atthatpointwhenthesymbolstotheleftareonthestackandthesymbolstotheright arecomingup.Theseembeddedactionscanbealittletrickybecausetheyrequirethe parsertocommittothecurrentproductionearly,morelikeapredictiveparser,andcan introduceconflictsiftherearestillopenalternativesatthatpointintheparse. SymbolAttributes Theparserallowsyoutoassociateattributeswitheachgrammarsymbol,bothterminals andnonterminals.Forterminals,theglobalvariableyylvalisusedtocommunicate theparticularsaboutthetokenjustscannedfromthescannertotheparser.Fornon terminals,youcanexplicitlyaccessandsettheirattributesusingtheattributestack. Bydefault,YYSTYPE(theattributetype)isjustaninteger.Usuallyyouwanttodifferent informationforvarioussymboltypes,soauniontypecanbeusedinstead.Youindicate whatfieldsyouwantintheunionviathe%uniondirective.
%union { int intValue; double doubleValue; char *stringValue; }

Theabovelinewouldbeincludedinthedefinitionssection.Itistranslatedby bison intoCcodeasaYYSTYPEtypedefforanewuniontypewiththeabovefields.The globalvariableyylvalisofthistype,andparserstoresvariablesofthistypeonthe parserstack,oneforeachsymbolcurrentlyonthestack.

6 Whendefiningeachtoken,youcanidentifywhichfieldoftheunionisapplicabletothis tokentypebyprecedingthetokennamewiththefieldnameenclosedinanglebrackets. Thefieldnameisoptional(forexample,itisnotrelevantfortokenswithoutattributes).

%token <intValue>T_Int <doubleValue>T_Double T_While T_Return

Tosettheattributeforanonterminal,usethe%typedirective,alsointhedefinitions section.Thisestablisheswhichfieldoftheunionisapplicabletoinstancesofthenon terminal:


%type<intValue> Integer IntExpression

Toaccessagrammarsymbol'sattributefromtheparsestack,therearespecialvariables availablefortheCcodewithinanaction.$nistheattributeforthenthsymbolofthe currentrightside,countingfrom1forthefirstsymbol.Theattributeforthenon terminalontheleftsideisdenotedby$$.Ifyouhavesetthetypeofthetokenornon terminal,thenitisclearwhichfieldoftheattributesunionyouareaccessing.Ifyou havenotsetthetype(oryouwanttooverrulethedefinedfield),youcanspecifywith thenotation$<fieldname>n.Atypicaluseofattributesinanactionmightbetogather theattributesofthevarioussymbolsontherightsideandusethatinformationtosetthe attributeforthenonterminalontheleftside. Asimilarmechanismisusedtoobtaininformationaboutsymbollocations.Foreach symbolonthestack,theparsermaintainsavariableoftypeYYLTYPE,whichisa structurecontainingfourmembers:firstline,firstcolumn,lastline,andlastcolumn.To obtainthelocationofagrammarsymbolontherightside,yousimplyusethenotation @n,completelyparallelto$n.Thelocationofaterminalsymbolisfurnishedbythe lexicalanalyzerviatheglobalvariableyylloc.Duringareduction,thelocationofthe nonterminalontheleftsideisautomaticallysetusingthecombinedlocationofall symbolsinthehandlethatisbeingreduced. ConflictResolution WhathappenswhenyoufeedbisonagrammarthatisnotLALR(1)?bisonreports anyconflictswhentryingtofillinthetable,butratherthanjustthrowingupitshands,it hasautomaticrulesforresolvingtheconflictsandbuildingatableanyway.Fora shift/reduceconflict,bisonwillchoosetheshift.Inareduce/reduceconflict,itwill reduceusingtheruledeclaredfirstinthefile.Theseheuristicscanchangethelanguage thatisacceptedandmaynotbewhatyouwant.Evenifithappenstoworkout,itisnot recommendedtoletbisonpickforyou.Youshouldcontrolwhathappensbyexplicitly declaringprecedenceandassociativityforyouroperators. Forexample,askbisontogenerateaparserforthisambiguousexpressiongrammar thatincludesaddition,multiplication,andexponentiation(using'^').

7
%token T_Int %% E : | | | ; E '+' E E '*' E E '^' E T_Int

Whenyourunbisononthisfile,itreports:
conflicts: 9 shift/reduce

Inthegeneratedy.output,ittellsyoumoreabouttheissue:
... State 6 contains 3 shift/reduce conflicts. State 7 contains 3 shift/reduce conflicts. State 8 contains 3 shift/reduce conflicts. ...

Ifyoulookthroughthehumanreadabley.outputfile,youwillseeitcontainsthe familyofconfiguratingsetsandthetransitionsbetweenthem.Whenyoulookatstates 6,7,and8,youwillseetheplaceweareintheparseandthedetailsoftheconflict. UnderstandingallthatLR(1)constructionstuffjustmightbeusefulafterall!Ratherthan rewritingthegrammartoimplicitlycontroltheprecedenceandassociativitywitha bunchofintermediatenonterminals,wecandirectlyindicatetheprecedencesothat bisonwillknowhowtobreakties.Inthedefinitionssection,wecanaddanynumber ofprecedencelevels,oneperline,fromlowesttohighest,andindicatetheassociativity (eitherleft,right,ornonassociative).Severalterminalscanbeonthesamelinetoassign themequalprecedence.
%token T_Int %left '+' %left '*' %right '^' %% E : | | | ; E '+' E E '*' E E '^' E T_Int

Theabovefilesaysthatadditionhasthelowestprecedenceanditassociateslefttoright. Multiplicationishigher,andisalsoleftassociative.Exponentiationishighest precedenceanditassociatesright.Noticethatthisisthereverseofhowflexregular expressionprecedenceworksthelastprecedencerulehashighestpredecence,not lowest!Thesedirectivesdisambiguatetheconflicts.Whenwefeedbisonthechanged file,itusestheprecedenceandassociativityasspecifiedtobreakties.Forashift/reduce conflict,iftheprecedenceofthetokentobeshiftedishigherthanthatoftheruleto

8 reduce,itchoosestheshiftandviceversa.Theprecedenceofaruleisdeterminedbythe precedenceoftherightmostterminalontherighthandside(orcanbeexplicitlysetwith the%precdirective).Soifa4+5isonthestackand*iscomingup,the*hashigher precedencethanthe4+5,soitshifts.If4*5isonthestackand+iscomingup,it reduces.If4+5isonthestackand+iscomingup,theassociativitybreaksthetie,a lefttorightassociativitywouldreducetheruleandthengoon,arighttoleftwould shiftandpostponethereduction. Anotherwaytosetprecedenceisbyusingthe%precdirective.Whenplacedattheend ofaproductionwithaterminalsymbolasitsargument,itexplicitlysetstheprecedence oftheproductiontothesameprecedenceasthatterminal.Thiscanbeusedwhenthe rightsidehasnoterminalsorwhenyouwanttooverruletheprecedencegivenbythe rightmostterminal. Eventhoughitdoesntseemlikeaprecedenceproblem,thedanglingelseambiguitycan beresolvedusingprecedencerules.Thinkcarefullyaboutwhattheconflictis:Identify whatthetokenisthatcouldbeshiftedandthealternateproductionthatcouldbe reduced.Whatwouldbetheeffectofchoosingtheshift?Whatistheeffectofchoosing toreduce?Whichistheonewewant? Usingbison'sprecedencerules,youcanforcethechoiceyouwantbysettingthe precedenceofthetokenbeingshiftedversustheprecedenceoftherulebeingreduced. Whicheverprecedenceishigherwinsout.Theprecedenceofthetokenissetusingthe ordinary%left,%right,or%nonassocdirectives.Theprecedenceoftherulebeing reducedisdeterminedbytheprecedenceoftherightmostterminal(setthesameway)or viaanexplicit%precdirectiveontheproduction. Errorhandling Whenabisongeneratedparserencountersanerror(i.e.thenextinputtokencannotbe shiftedgiventhesequenceofthingssofaronthestack),itcallsthedefault yyerror routinetoprintageneric"parseerror"messageandhaltparsing.However,quittingin responsetotheveryfirsterrorisnotparticularlyhelpful! bisonsupportsaformoferrorresynchronizationthatallowsyoutodefinewherein thestreamtogiveuponanunsuccessfulparseandhowfartoscanaheadandtryto cleanuptoallowtheparsetorestart.Thespecialerrortokencanbeusedintheright sideofaproductiontomarkacontextinwhichtoattempterrorrecovery.Theusualuse istoaddtheerrortokenpossiblyfollowedbyasequenceofoneormoresynchronizing tokens,whichtelltheparsertodiscardanytokensuntilitseesthat"familiar"sequence thatallowstheparsertocontinue.Asimpleandusefulstrategymightbesimplytoskip therestofthecurrentinputlineorcurrentstatementwhenanerrorisdetected. Whenanerrorisencountered(andreportedviayyerror,)theparserwilldiscardany partiallyparsedrules(i.e.popstacksfromtheparsestack)untilitfindsoneinwhichit

9 canshiftanerrortoken.Itthenreadsanddiscardsinputtokensuntilitfindsonethat canfollowtheerrortokeninthatproduction.Forexample:
Var : | ; Modifiers Type IdentifierList ';' error ';'

Thesecondproductionallowsforhandlingerrorsencounteredwhentryingtorecognize Varbyacceptingthealternatesequenceerrorfollowedbyasemicolon.Whathappens iftheparserintheinthemiddleofprocessingtheIdentifierListwhenitencounters anerror?Theerrorrecoveryrule,interpretedstrictly,appliestotheprecisesequenceof anerrorandasemicolon.IfanerroroccursinthemiddleofanIdentifierList, thereareModifiersandaTypeandwhatnotonthestack,whichdoesn'tseemtofit thepattern.However,bisoncanforcethesituationtofittherule,bydiscardingthe partiallyprocessedrules(i.e.poppingstatesfromthestack)untilitgetsbacktoastatein whichtheerrortokenisacceptable(i.e.allthewaybacktothestateatwhichitstarted beforematchingtheModifiers).Atthispointtheerrortokencanbeshifted.Then,if thelookaheadtokencouldntpossiblybeshifted,theparserreadstokens,discarding themuntilitfindsatokenthatisacceptable.Inthisexample,bisonreadsanddiscards inputuntilthenextsemicolonsothattheerrorrulecanapply.Itthenreducesthatto Var,andcontinuesonfromthere.Thisbasicallyallowedforamangledvariable declarationtobeignoreduptotheterminatingsemicolon,whichwasareasonable attemptatgettingtheparserbackontrack.Notethatifaspecifictokenfollowstheerror symbolitwilldiscarduntilitfindsthattoken,otherwiseitwilldiscarduntilitfindsany tokeninthelookaheadsetforthenonterminal(forexample,anythingthatcanfollow Varintheexampleabove). Inourpostfixcalculatorfromthebeginningofthehandout,wecanaddan error productiontorecoverfromamalformedexpressionbydiscardingtherestofthelineup tothenewlineandallowthenextlinetobeparsednormally:
S : | | ; S E '\n' { printf("= %d\n", Top()); } error '\n' { printf("Error! Discarding entire line.\n"); }

Likeanyotherbisonrule,onethatcontainsanerrorcanhaveanassociatedaction.It wouldbetypicalatthispointtocleanupaftertheerrorandothernecessary housekeepingsothattheparsecanresume. Whereshouldoneputerrorproductions?Itsmoreofanartthanascience.Putting errortokensfairlyhighuptendstoprotectyoufromallsortsoferrorsbyensuringthere isalwaysaruletowhichtheparsercanrecover.Ontheotherhand,youusuallywantto discardaslittleinputaspossiblebeforerecovering,anderrortokensatlowerlevelrules canhelpminimizethenumberofpartiallymatchedrulesthatarediscardedduring recovery.

10 Onereasonablestrategyistoadderrortokensfairlyhighupandusepunctuationasthe synchronizingtokens.Forexample,inaprogramminglanguageexpectingalistof declarations,itmightmakesensetoallowerrorasoneofthealternativesforalistentry. Ifpunctuationseparateselementsofalist,youcanusethatpunctuationinerrorrulesto helpfindsynchronizationpointsskippingaheadtothenextparameterorthenext statementinalist.Tryingtoadderroractionsattoolowalevel(sayinexpression handling)tendstobemoredifficultbecauseofthelackofstructurethatallowsyouto determinewhentheexpressionendsandwheretopickbackup.Sometimesadding erroractionsintomorethanonelevelmayintroduceconflictsintothegrammarbecause morethanoneerroractionisapossiblerecovery. Anothermechanism:deliberatelyaddincorrectproductionsintothegrammar specification,allowtheparsertohelpyourecognizetheseillegalforms,andthenusethe actiontohandletheerrormanually.Forexample,let'ssayyou'rewritingaJava compilerandyouknowsomepoorCprogrammerisgoingtoforgetthatyoucan't specifythesizeinaJavaarraydeclaration.Youcouldaddanalternatearraydeclaration thatallowsforasize,butknowingitwasillegal,theactionforthisproductionreportsan erroranddiscardstheerroneoussizetoken. Mostcompilerstendtofocustheireffortsontryingtorecoverfromthemostcommon errorsituations(forgettingasemicolon,mistypingakeyword),butdon'tputalotof effortintodealingwithmorewackyinputs.Errorrecoveryislargelyatrialanderror process.Forfun,youcanexperimentwithyourfavoritecompilerstoseehowgoodajob theydoatrecognizingerrors(theyareexpectedtohitthisoneperfectly),reporting errorsclearly(lessthanperfect),andgracefullyrecovering(farfromperfect). OtherFeatures Thishandoutdoesn'tdetailallthegreatfeaturesavailableinbisonandyacc,please refertothemanpagesandonlinereferencesformoredetailsontokensandtypes, conflictresolution,symbolattributes,embeddedactions,errorhandling,andsoon. Bibliography J. Levine, T. Mason, and D. Brown, Lex and Yacc. Sebastopol, CA: OReilly & Associates, 1992. A. Pyster, Compiler Design and Construction. New York, NY: Van Nostrand Reinhold, 1988.

You might also like