Professional Documents
Culture Documents
on
Lexical Analyzer
Presenters
1.Tofael Mahmud Rizvi
ID:133-15-2821
4.Masuma Akter
ID:133-15-2989
2. S. M Neoaz Mahfuz
ID:133-15-2982
5.Md. Al-amin
ID:133-15-3037
3.Zahidul Islam
ID:133-15-3061
Lexical Analyzer
Lexical Analyzer reads the source program character by
character and returns the tokens of the source program.
Puts information about identifiers into the symbol table.
Tries to understand each element in a program.
Lexical Analysis
Terminologies
Token :
A token is a pair consisting of a token name and an optional attribute
value. The token name is an abstract symbol representing a kind of
lexical unit.
Typically,
1. Each keyword is a token, e.g, then, begin, integer.
2. Each identifier is a token, e.g., a, zap.
3. Each constant is a token, e.g., 123, 123.45, 1.2E3.
4. Each sign is a token, e.g., (, <, <=, +.
Terminologies
Lexeme :
A lexeme is a sequence of characters in the source
that matches the pattern for a token.
program
Pattern :
A pattern is a rule describing the set of lexemes
that can represent a particular token in source
program.
Regular expressions are an important notation for specifying
patterns.
Regular Expression
The regular expressions are built recursively out of smaller regular
expressions.
Each regular expression r denotes a language L(r).
BASIS: There are two rules that form the basis:
1. E is a regular expression, and L (E) is {E) , that is, the language
whose sole member is the empty string.
2. If a is a symbol in in a set, then a is a regular expression, and
L(a) = {a), that is, the language with one string, of length one, with a
in its one position.
Pattern Specifications
Alphabets:
Any finite set of symbols is a set of binary alphabets.
{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets
{a-z, A-Z} is a set of English language alphabets.
Strings:
Any finite sequence of alphabets is called a string.
Length of the string is the total number of occurrence of alphabets.
A string of zero length is known as an empty string and is denoted
by (epsilon).
Pattern Specifications
Special Symbols
Arithmetic Symbols- Addition + , Subtraction , Modulo, Multiplication , Division/
Punctuation-
Assignment-
Preprocessor-
Location Specifier-
&
Logical-
Shift Operator-
Pattern Specifications
Language
A language is considered as a finite set of strings over
some finite set of alphabets.
Computer languages are considered as finite sets, and
mathematically set operations can be performed on
them.
Finite languages can be described by means of regular
expressions.
Lexical Errors
The errors thrown by the lexer when unable to continue.
Means there's no way to recognize a lexeme as a valid token
The simplest recovery strategy is "panic mode" recovery.
Other error-recovery actions are:
i. Delete one character from the remaining input.
ii. Insert a missing character in to the remaining input.
iii. Replace a character by another character.
iv. Transpose two adjacent characters.
Thank You