You are on page 1of 38

CS2352 PRINCIPLES OF COMPILER DESIGN INTRODUCTION

UNIT-I

COMPILER A compiler is a program that can read a program in one language the source language and translate it into an equivalent program in another language the target language. An important role of the compiler is to report any errors in the source program that it detects during the translation process.

COUSINS OF THE COMPILER The preprocessor, assembler, linker and loader are referred to as the cousins of the compiler. In addition to a compiler, several other programs may be required to create an executable target program. A source program may be divided into modules stored in separate files. The task of collecting the source program is sometimes entrusted to a separate program, called a preprocessor. The preprocessor may also expand shorthands, called macros, into source language statements. The modified source program is then fed to a compiler. The compiler may produce an assembly-language program as its output, because assembly language is easier to produce as output and is easier to debug. The assembly language is then processed by a program called an assembler that produces relocatable machine code as its output.

C.S.Anita, Assoc.Prof/CSE

Page 1

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

Large programs are often compiled in pieces, so the relocatable machine code may have to be linked together with other relocatable object files and library files into the code that actually runs on the machine.

The linker resolves external memory addresses, where the code in one file may refer to a location in another file. The loader then puts together all of the executable object files into memory for execution.

ANALYSIS SYNTHESIS MODEL OF COMPILATION The process of compilation has two parts namely : Analysis and Synthesis Analysis :The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. Synthesis : The synthesis part constructs the desired target program from the intermediate representation .

C.S.Anita, Assoc.Prof/CSE

Page 2

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

The analysis part is often called the front end of the compiler; the synthesis part is the back end of the compiler.

PHASES OF THE COMPILER There are six phases of the compiler namely

1. Lexical Analysis 2. Syntax Analysis 3. Semantic Analysis 4. Intermediate Code Generation 5. Code Optimization 6. Code Generation

C.S.Anita, Assoc.Prof/CSE

Page 3

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

Lexical Analysis The first phase of a compiler is called lexical analysis linear analysis or scanning. The lexical analyzer reads the stream of characters making up the source program and groups the characters into meaningful sequences called lexemes. C.S.Anita, Assoc.Prof/CSE Page 4

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

For each lexeme, the lexical analyzer produces as output a token of the form (token-name, attribute-value) that it passes on to the subsequent phase, syntax analysis.

For example, suppose a source program contains the assignment statement p o s i t i o n := i n i t i a l + r a t e * 60 The characters in this assignment could be grouped into the following lexemes and mapped into the following tokens passed on to the syntax analyzer: 1. The identifier position 2. The assignment symbol := 3. The identifier initial 4. The plus sign 5. The identifier rate 6. The multiplication sign 7. The number 60

The blanks separating the characters are eliminated during lexical analysis

Syntax Analysis The second phase of the compiler is syntax analysis or hierarchical analysis or parsing. The tokens from the lexical analyzer are grouped hierarchically into nested collections with collective meaning. This is represented using a parse tree.For eg. For the assignment statement p o s i t i o n: = i n i t i a l + rate * 60

the parse tree is as follows

C.S.Anita, Assoc.Prof/CSE

Page 5

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

A syntax tree is a compressed representation of a parse tree in which each interior node represents an operation and the children of the node represent the arguments of the operation. The syntax tree for the above is as follows

Semantic Analysis The semantic analyzer uses the syntax tree and the information in the symbol table to check the source program for semantic consistency with the language definition. It also gathers type information and saves it in either the syntax tree or the symbol table, for subsequent use during intermediate-code generation. An important part of semantic analysis is type checking, where the compiler checks that each operator has matching operands. C.S.Anita, Assoc.Prof/CSE Page 6

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

For example, a binary arithmetic operator may be applied to either a pair of integers or to a pair of floating-point numbers. If the operator is applied to a floating-point number and an integer, the compiler may convert the integer into a floating-point number.

For the above syntax tree we apply the type conversion considering all the identifiers to be real values,we get

Intermediate Code Generation The intermediate representation should have two important properties: it should be easy to produce and it should be easy to translate into the target machine.

We consider an intermediate form called three-address code, which consists of a sequence of assembly-like instructions with three operands per instruction. Properties of three-address instructions. 1. Each three-address assignment instruction has at most one operator on the right side. 2. The compiler must generate a temporary name to hold the value computed by a three-address instruction. 3. Some "three-address instructions may have fewer than three operands.

C.S.Anita, Assoc.Prof/CSE

Page 7

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

Code Optimization The machine-independent code-optimization phase attempts to improve the intermediate code so that better target code will result. There is a great variation in the amount of code optimization different compilers perform. Those that do the most, are called "optimizing compilers."A significant amount of time is spent on this phase. There are simple optimizations that significantly improve the running time of the target program without slowing down compilation too much.

Code Generation The code generator takes as input an intermediate representation of the source program and maps it into the target language. If the target language is machine code, registers or memory locations are selected for each of the variables used by the program. Then, the intermediate instructions are translated into sequences of machine instructions that perform the same task.

Symbol-Table Management An essential function of a compiler is to record the variable names used in the source program and collect information about various attributes of each name. These attributes may provide information about the storage allocated for a name, its type, its scope (where in the program its value may be used), and in the case of procedure names, such things as the number and types of its arguments, the method of passing each argument (for example, by value or by reference), and the type returned. The symbol table is a data structure containing a record for each variable name, with fields for the attributes of the name. The data structure should be designed to allow the compiler to find the record for each name quickly and to store or retrieve data from that record quickly.

C.S.Anita, Assoc.Prof/CSE

Page 8

CS2352 PRINCIPLES OF COMPILER DESIGN Error Detection and Reporting Each phase can encounter errors.

UNIT-I

After detecting an error , a phase must be able to recover from the error so that compilation can proceed and allow further errors to be detected. A compiler which stops after detecting the first error is not useful.

Translation of an assignment statement

C.S.Anita, Assoc.Prof/CSE

Page 9

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

GROUPING OF PHASES

Front and Back Ends: C.S.Anita, Assoc.Prof/CSE Page 10

CS2352 PRINCIPLES OF COMPILER DESIGN The phases are collected into a front end and a back end.

UNIT-I

Front End: Consists of those phases or parts of phases that depend primarily on the source language and are largely independent of target machine. Lexical and syntactic analysis, symbol table, semantic analysis and the generation of intermediate code is included. Certain amount of code optimization can be done by the front end. It also includes error handling that goes along with each of these phases.

Back End: Includes those portions of the compiler that depend on the target machine and these portions do not depend on the source language. Find the aspects of code optimization phase, code generation along with necessary error handling and symbol table operations.

Passes: Several phases of compilation are usually implemented in a single pass consisting of reading an input file and writing an output file. It is common for several phases to be grouped into one pass, and for the activity of these phases to be interleaved during the pass. Eg: Lexical analysis, syntax analysis, semantic analysis and intermediate code generation might be grouped into one pass. If so, the token stream after lexical analysis may be translated directly into intermediate code.

Reducing the number of passes: It is desirable to have relatively few passes, since it takes time to read and write intermediate files. If we group several phases into one pass, we may forced to keep the entire program in memory, because one phase may need information in a different order than a previous phase produces it.

C.S.Anita, Assoc.Prof/CSE

Page 11

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

The internal form of the program may be considerably larger than either the source program or the target program, so this space may not be a trivial matter.

COMPILER-CONSTRUCTION TOOLS Some commonly used compiler-construction tools include 1. Parser generator 2. Scanner generator 3. Syntax-directed translation engine 4. Automatic code generator 5. Data flow engine

1. Parser generators produce syntax analyzers from input that is based on context-free grammar. Earlier, syntax analysis consumed large fraction of the running time of a compiler + large fraction of the intellectual effort of writing a compiler. This phase is now considered as one of the easiest to implement. Many parser generators utilize powerful parsing algorithms that are too complex to be carried out by hand.

2. Scanner generators C.S.Anita, Assoc.Prof/CSE Page 12

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

automatically generates lexical analyzers from a specification based on regular expression. The basic organization of the resulting lexical analyzers is finite automation.

3. Syntax-directed translation engines produce collections of routines that walk a parse tree and generating intermediate code. The basic idea is that one or more translations are associated with each node of the parse tree. Each translation is defined in terms of translations at its neighbor nodes in the tree.

4. Automatic Coder generators A tool takes a collection of rules that define the translation of each operation of the intermediate language into the machine language for a target machine. The rules must include sufficient details that we can handle the different possible access methods for data.

5. Data-flow analysis engines gathering of information about how values are transmitted from one part of a program to each other part. Data-flow analysis is a key part of code optimization.

C.S.Anita, Assoc.Prof/CSE

Page 13

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

THE ROLE OF THE LEXICAL ANALYZER The main task of the lexical analyzer is to read the input characters of the source program, group them into lexemes, and produce as output a sequence of tokens for each lexeme in the source program. The stream of tokens is sent to the parser for syntax analysis. When the lexical analyzer discovers a lexeme constituting an identifier, it needs to enter that lexeme into the symbol table.

. Interactions between the lexical analyzer and the parser

Other functions of lexical analyzer 1. stripping out comments and whitespace (blank, newline, tab characters that are used to separate tokens in the input). 2. Another task is correlating error messages generated by the compiler with the source program. For instance, the lexical analyzer may keep track of the number of newline characters seen, so it can associate a line number with each error message.

C.S.Anita, Assoc.Prof/CSE

Page 14

CS2352 PRINCIPLES OF COMPILER DESIGN Issues in Lexical Analysis

UNIT-I

1. Simplicity of design is the most important consideration. The separation of lexical and syntactic analysis often allows us to simplify at least one of these tasks. 2. Compiler efficiency is improved. A separate lexical analyzer allows us to apply specialized techniques that serve only the lexical task, not the job of parsing. In addition, specialized buffering techniques for reading input characters can speed up the compiler significantly. 3. Compiler portability is enhanced.

Tokens, Patterns, and Lexemes

Token A token is a pair consisting of a token name and an optional attribute value. The token name is an abstract symbol representing a kind of lexical unit, e.g., a particular keyword, or a sequence of input characters denoting an identifier. The token names are the input symbols that the parser processes.

Pattern A pattern is a description of the form that the lexemes of a token may take. In the case of a keyword as a token, the pattern is just the sequence of characters that form the keyword. For identifiers and some other tokens, the pattern is a more complex structure that is matched by many strings.

Lexeme

C.S.Anita, Assoc.Prof/CSE

Page 15

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token.

Examples of tokens Lexical Errors It is hard for a lexical analyzer to tell, without the aid of other components, that there is a source-code error. For instance in the following C statement f i ( a == f ( x ) ) . .. a lexical analyzer cannot tell whether fi is a misspelling of the keyword if or an undeclared function identifier. Since f i is a valid lexeme for the token id, the lexical analyzer must return the token id to the parser and let some other phase of the compiler probably the parser in this case handle an error due to transposition of the letters.

However, suppose a situation arises in which the lexical analyzer is unable to proceed because none of the patterns for tokens matches any prefix of the remaining input. The simplest recovery strategy is "panic mode" recovery. We delete successive characters from the remaining input, until the lexical analyzer an find a well-formed token at the beginning of what input is left.

Other possible error-recovery actions are: 1. Delete one character from the remaining input. 2. Insert a missing character into the remaining input. 3. Replace a character by another character.

C.S.Anita, Assoc.Prof/CSE

Page 16

CS2352 PRINCIPLES OF COMPILER DESIGN 4. Transpose two adjacent characters.

UNIT-I

INPUT BUFFERING We often have to look one or more characters beyond the next lexeme before we can be sure we have the right lexeme. Hence we introduce a two-buffer scheme that handles large lookaheads safely. We then consider an improvement involving "sentinels" that saves time checking for the ends of buffers.

Buffer Pairs Each buffer is of the same size N, and N is usually the size of a disk block, e.g., 4096 bytes. Using one system read command we can read N characters into a buffer. If fewer than N characters remain in the input file, then a special character, represented by eof, marks the end of the source file. Two pointers to the input are maintained: 1. Pointer lexeme_beginning, marks the beginning of the current lexeme, whose extent we are attempting to determine. 2. Pointer forward scans ahead until a pattern match is found.

Once the next lexeme is determined, forward is set to the character at its right end. Then, after the lexeme is recorded as an attribute value of a token returned to the parser, lexeme_beginning is set to the character immediately after the lexeme just found.

Advancing forward requires that we first test whether we have reached the end of one of the buffers, and if so, we must reload the other buffer from the input, and move forward to the beginning of the newly loaded buffer.

C.S.Anita, Assoc.Prof/CSE

Page 17

CS2352 PRINCIPLES OF COMPILER DESIGN if forward at end of first half then begin reload second half; forward := forward + 1 end else if forward at end of second half then begin reload first half; move forward to beginning of first half end else forward := forward + 1;

UNIT-I

Code to advance forward pointer

Sentinels For each character read, we make two tests: one for the end of the buffer, and one to determine what character is read . We can combine the buffer-end test with the test for the current character if we extend each buffer to hold a sentinel character at the end. The sentinel is a special character that cannot be part of the source program, and a natural choice is the character eof. Note that eof retains its use as a marker for the end of the entire input. Any eof that appears other than at the end of a buffer means that the input is at an end.

forward : = forward + 1; if forward = eof then begin if forward at end of first half then begin reload second half; forward := forward + 1 C.S.Anita, Assoc.Prof/CSE Page 18

CS2352 PRINCIPLES OF COMPILER DESIGN end else if forward at end of second half then begin reload first half; move forward to beginning of first half end else

UNIT-I

/* eof within a buffer signifying end of input */

terminate lexical analysis end

Lookahead code with sentinels

SPECIFICATION OF TOKENS Regular expressions are an important notation for specifying lexeme patterns.

Strings and Languages

An alphabet is a finite set of symbols. A string over an alphabet is a finite sequence of symbols drawn from that alphabet. A language is any countable set of strings over some fixed alphabet.

C.S.Anita, Assoc.Prof/CSE

Page 19

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

In language theory, the terms "sentence" and "word" are often used as synonyms for "string." The length of a string s, usually written |s|, is the number of occurrences of symbols in s.

For example, banana is a string of length six. The empty string, denoted , is the string of length zero.

Terms for Parts of Strings The following string-related terms are commonly used:

1. A prefix of string s is any string obtained by removing zero or more symbols from the end of s. For example, ban, banana, and e are prefixes of banana.

2. A suffix of string s is any string obtained by removing zero or more symbols from the beginning of s. For example, nana, banana, and e are suffixes of banana.

3. A substring of s is obtained by deleting any prefix and any suffix from s. For example, banana, nan, and e are substrings of banana.

4. The proper prefixes, suffixes, and substrings of a string s are those,prefixes, suffixes, and substrings, respectively, of s that are not or not equal to s itself.

5. A subsequence of s is any string formed by deleting zero or more not necessarily consecutive positions of s. For example, baan is a subsequence of banana.

Operations on Languages In lexical analysis, the most important operations on languages are union, concatenation, and closure. C.S.Anita, Assoc.Prof/CSE Page 20

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

Definitions of operations on languages Example: Let L be the set of letters {A, B , . . . , Z, a, b , . . . , z} and let D be the set of digits { 0 , 1 , . . . 9}. L U D is the set of letters and digits LD is the set of strings of length two, each consisting of one letter followed by one digit. L4 is the set of all 4-letter strings. L* is the set of all strings of letters, including , the empty string. L(L U D)* is the set of all strings of letters and digits beginning with a letter. D+ is the set of all strings of one or more digits.

Regular Expressions Each regular expression r denotes a language L(r). Here are the rules that define the regular expressions over some alphabet and the languages that those expressions denote. is a regular expression, and L() is { }, that is, the language whose sole member is the empty string. If a is a symbol in , then a is a regular expression, and L(a) = {a}, that is, the language with one string, of length one, with a in its one position. C.S.Anita, Assoc.Prof/CSE Page 21

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

Suppose r and s are regular expressions denoting languages L(r) and L(s), respectively.

1. (r)|(s) is a regular expression denoting the language L(r) U L(s). 2. (r)(s) is a regular expression denoting the language L(r)L(s). 3. (r)* is a regular expression denoting (L(r))*. 4. (r) is a regular expression denoting L(r). The unary operator * has highest precedence and is left associative. Concatenation has second highest precedence and is left associative. | has lowest precedence and is left associative. A language that can be defined by a regular expression is called a regular set. If two regular expressions r and s denote the same regular set, we say they are equivalent and write r = s. For instance, (a|b) = (b|a). There are a number of algebraic laws for regular expressions

Algebraic laws for regular expressions

C.S.Anita, Assoc.Prof/CSE

Page 22

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

Regular Definitions Giving names to regular expressions is referred a Regular definition. If is an alphabet of basic symbols, then a regular definition is a sequence of definitions of the form: where: dl r 1 d2 r2

dn rn

1. Each di is a distinct name 2. Each ri is a regular expression over the alphabet U {dl, d2,. . . , di-l). E.g: Identifiers is the set of strings of letters and digits beginning with a letter. Regular definition for this set:

letter A | B | . | Z | a | b | . | z | digit 0 | 1 | . | 9 id letter ( letter | digit ) *

Notational Shorthands Certain constructs occur so frequently in regular expressions that it is convenient to introduce notational shorthands for them.

C.S.Anita, Assoc.Prof/CSE

Page 23

CS2352 PRINCIPLES OF COMPILER DESIGN 1. One or more instances (+) The unary postfix operator + means one or more instances of

UNIT-I

If r is a regular expression that denotes the language L(r), then ( r )+ is a regular expression that denotes the language ( L (r ) )+ Thus the regular expression a+ denotes the set of all strings of one or more as.

The operator + has the same precedence and associativity as the operator *.

2. Zero or one instance ( ?) The unary postfix operator ? means zero or one instance of. The notation r? is a shorthand for r | . If r is a regular expression, then ( r )? Is a regular expression that denotes the language L( r ) U { }.

3. Character Classes. The notation [abc] where a, b and c are alphabet symbols denotes the regular expression a | b | c. Character class such as [a z] denotes the regular expression a | b | c | d | .|z. Identifiers as being strings generated by the regular expression, [AZaz][AZaz 09]* 4. Regular Set A language denoted by a regular expression is said to be a regular set.

5. Non-regular Set A language which cannot be described by any regular expression.

Eg. The set of all strings of balanced parentheses and repeating strings cannot be described by a regular expression. This set can be specified by a contextfree grammar.

Example 1: C.S.Anita, Assoc.Prof/CSE Page 24

CS2352 PRINCIPLES OF COMPILER DESIGN Regular definition for identifiers in Pascal letter A | B | . . . | Z | a | b | . . . | z digit 0 | 1 | 2 | . . . | 9 id letter (letter | digit)*

UNIT-I

Example 2: Regular definition for unsigned numbers in Pascal digit 0 | 1 | 2 | . . . | 9 digit digit digit* optional-fraction . digits | optional-exponent (E (+ | - | ) digits) | num digits optional-fraction optional-exponent.

Tokens, their patterns and attribute values

RECOGNITION OF TOKENS It is done using finite automata(transition diagrams) Consider the following grammar fragment

C.S.Anita, Assoc.Prof/CSE

Page 25

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

Transition Diagrams As an intermediate step in the construction of a lexical analyzer, we first convert patterns into stylized flowcharts, called "transition diagrams." Transition diagrams have a collection of nodes or circles, called states. Each state represents a condition that could occur during the process of scanning the input looking for a lexeme that matches one of several patterns. Edges are directed from one state of the transition diagram to another. Each edge is labeled by a symbol or set of symbols.

Some important conventions about transition diagrams are: 1. Certain states are said to be accepting, or final. These states indicate that a lexeme has been found. We always indicate an accepting state by a double circle, and if there is an action to be taken typically returning a token and an attribute value to the parser we shall attach that action to the accepting state.

C.S.Anita, Assoc.Prof/CSE

Page 26

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

2. In addition, if it is necessary to retract the forward pointer one position (i.e., the lexeme does not include the symbol that got us to the accepting state), then we shall additionally place a * near that accepting state. 3. One state is designated the start state, or initial state; it is indicated by an edge, labeled "start," entering from nowhere. 4. The transition diagram always begins in the start state before any input symbols have been read.

Transition diagram for relop

Transition diagram for unsigned numbers

C.S.Anita, Assoc.Prof/CSE

Page 27

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

Transition diagram for whitespace characters

Transition diagram for identifiers and keywords

A LANGUAGE FOR SPECIFYING A LEXICAL ANALYZER There is a tool called Lex, or in a more recent implementation Flex, that allows one to specify a lexical analyzer by specifying regular expressions to describe patterns for tokens. The input notation for the Lex tool is referred to as the Lex language and the tool itself is the Lex compiler. The Lex compiler transforms the input patterns into a transition diagram and generates code, in a file called l e x . y y . c, that simulates this transition diagram. An input file, which we call l e x . l , is written in the Lex language and describes the lexical analyzer to be generated. The Lex compiler transforms l e x . 1 to a C program, in a file that is always named l e x . y y . c. The latter file is compiled by the C compiler into a file called a . o u t , as always. The C-compiler output is a working lexical analyzer that can take a stream of input characters and produce a stream of tokens.

C.S.Anita, Assoc.Prof/CSE

Page 28

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

Creating a lexical analyzer with Lex Lex Specifications A lex program has three parts

declarations %% translation rules %% auxiliary procedures The translation rules are statements of the form

p1 {action1} p2 {action2} pn {actionn}

Example Lex program for a few tokens C.S.Anita, Assoc.Prof/CSE Page 29

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

Part-A
C.S.Anita, Assoc.Prof/CSE Page 30

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

1. Define a preprocessor. What are the functions of preprocessor. [AU MAY/JUN 2007(Reg 2004),NOV/DEC 2007(Reg 2004)] A preprocessor produces input to compilers. A source program may be divided into modules stored in separate files. The task of collecting the source program is sometimes entrusted to a distinct program called a preprocessor. Its functions are Macro processing, file inclusion, rational preprocessors and language extensions. 2. What are the issues in lexical analysis? [AU MAY/JUN 2007(Reg 2004)] Simpler design Compiler efficiency is improved Compiler portability is enhanced. 3. Define a symbol table. [AU NOV/DEC 2007(Reg 2004)] The symbol table is a data structure containing a record for each variable name, with fields for the attributes of the name.

4. Differentiate compiler and interpreter. [AU APR/MAY 2008(Reg 2004)] compiler It is a translator that translates high level to low level language It displays the errors after the whole program is executed. interpreter It is a translator that translates high level to low level language It checks line by line for errors.

C.S.Anita, Assoc.Prof/CSE

Page 31

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

5. Write short notes on buffer pair. [AU APR/MAY 2008(Reg 2004)] Each buffer is of the same size N, and N is usually the size of a disk block, e.g., 4096 bytes. Using one system read command we can read N characters into a buffer.If fewer than N characters remain in the input file, then a special character, represented by eof, marks the end of the source file.Two pointers to the input buffer are maintained namely Pointer lexeme_beginning and Pointer forward .

6. What is a language processing system? [AU NOV/DEC 2008(Reg 2004),MAY/JUNE 2012(Reg 2004)]

7. What are the error recovery actions in a lexical analyzer? [AU NOV/DEC 2008(Reg 2004),MAY/JUN 2012(Reg 2008)]
C.S.Anita, Assoc.Prof/CSE Page 32

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

1. Delete one character from the remaining input. 2. Insert a missing character into the remaining input. 3. Replace a character by another character. 4. Transpose two adjacent characters. 8. What are the issues to be considered in the design of lexical analyzer? [AU MAY/JUN 2009(Reg 2004)] 1. Simplicity of Design 2. Compiler Efficiency 3. Compiler Portability

9. Define concrete and abstract syntax with example. [AU MAY/JUN 2009(Reg 2004)] Abstract syntax: what are the significant parts of the expression? Example: a sum expression has its two operand expressions as its significant parts

Concrete syntax: what does the expression look like? Example: the same sum expression can look in different ways: 2+3 -- infix

C.S.Anita, Assoc.Prof/CSE

Page 33

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

(+ 2 3)

-- prefix

(2 3 +)

-- postfix

10. What is a sentinel? What is its purpose? [AU NOV/DEC 2010(Reg 2004),MAY/JUNE 2012(Reg 2004)]

The sentinel is a special character that cannot be part of the source program, and a natural choice is the character eof. eof retains its use as a marker for the end of the entire input. Any eof that appears other than at the end of a buffer means that the input is at an end. This is used for speeding-up the lexical analyzer.

11. Define lexeme and pattern. [AU NOV/DEC 2010(Reg 2004)] Lexeme A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token. Pattern A pattern is a description of the form that the lexemes of a token may take.

12.What is an interpreter? [AU APR/MAY 2011(Reg 2008)]

It is one of the translators that translate high level language to low level language. During execution, it checks line by line for errors.
C.S.Anita, Assoc.Prof/CSE Page 34

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

13.Define token and lexeme. [AU APR/MAY 2011(Reg 2008)]

Token A token is a pair consisting of a token name and an optional attribute value. Lexeme A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token.

14. What is the role of lexical analyzer? [AU NOV/DEC 2011(Reg 2008)]

The main task of the lexical analyzer is to read the input characters of the source program, group them into lexemes, and produce as output a sequence of tokens for each lexeme in the source program.The stream of tokens is sent to the parser for syntax analysis.When the lexical analyzer discovers a lexeme constituting an identifier, it needs to enter that lexeme into the symbol table.

15. Give the transition diagram of identifier. [AU NOV/DEC 2011(Reg 2008)]
C.S.Anita, Assoc.Prof/CSE Page 35

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

16. Mention few cousins of the compiler. [AU MAY/JUN 2012 (Reg 2008)] The following are the cousins of compilers i. Preprocessors ii. Assemblers iii. Loaders iv. Link editors.

17. What is a Complier? A Complier is a program that reads a program written in one language-the source language-and translates it in to an equivalent program in another language-the target language. As an important part of this translation process, the compiler reports to its user the presence of errors in the source program .

15. State some software tools that manipulate source program? i. Structure editors ii. Pretty printers iii. Static checkers iv. Interpreters.
C.S.Anita, Assoc.Prof/CSE Page 36

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

16. What are the two main parts of compilation? The two main parts are a) Analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. b) Synthesis part constructs the desired target program from the intermediate representation .

17. How many phases does analysis consists? Analysis consists of three phases i .Linear analysis ii .Hierarchical analysis iii. Semantic analysis

18. State some compiler construction tools? i. Parse generator ii. Scanner generators iii. Syntax-directed translation engines iv. Automatic code generator v. Data flow engines.

19. State the general phases of a compiler i) Lexical analysis ii) Syntax analysis iii) Semantic analysis iv) Intermediate code generation
C.S.Anita, Assoc.Prof/CSE Page 37

CS2352 PRINCIPLES OF COMPILER DESIGN

UNIT-I

v) Code optimization vi) Code generation

20. Give the transition diagram for whitespace characters.

21.What is an assembler? Assembler is a program, which converts the source language in to assembly language.

C.S.Anita, Assoc.Prof/CSE

Page 38

You might also like