Professional Documents
Culture Documents
Programming Languages
Lecture 03:
Lexical Analysis
Javier Gonzalez-Sanchez
javiergs@asu.edu
BYENG M1-21
Office Hours: By appointment
Keywords
Lexical
Alphabet
Symbol
String
Word
Regular
Expression
Token
Rules
Deterministic
Finite
Automata
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2
Regular expression
A rule to describe finite combination of symbols
(sequences) that are considered well-formed.
Regular expression has symbols and operators
Symbols are defined in the alphabet
The operators used in regular expressions are: * (0 or
more), + (1 or more), ? (0 or 1), | (or). Besides those
we can use [ ] to enclose sets of symbols without
enumerating all of them, such as [0-9] or [A-Z]. Also,
we can use parenthesis.
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3
Examples (words)
foobarOne
(a | b)
{a, b}
foobarTwo
(a | b)(a | b)
foobarThree
a*
foobarFour
(a | b)*
foobarFive
a+
foobarSix
[a-z]+
number
[0-9]+
Example (word)
digit
0 | 1 | 2 | 3 | ... | 9
integer
digit+
1945
fraction
.digit+
.55
exponent
e(+|-)?digit+
e+210
floatDraftOne
integer(fraction?) (exponent?)
340.08e-14
floatDraftTwo
{[-+]?([0-9]+\.?[0-9]*|\.[0-9]+)([eE][-+]?[0-9]+)?}
binary
0b(0|1)+
0b1010
1. These definitions are NOT fully complete or correct. They purpose is only to exemplify RE. For
instance 07 match as an integer, which will NOT be the case for our language.
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5
DFA | Examples
binary
0b(0|1)+
string
([a-z] | [A-Z] | [0-9])*
DFA | Examples
{}
Char
{.}
{}
Operator
{.}
{+,-,*,/,%,
<,>,=,!,}
Start
{(, ), {, }, [, ]}
{}
{}
String
{a-z}
Delimiter
{0-9}
{_}
{\.}
{0-9}
ID
{$}
Integer
Float
{\.}
{0-9}
Additional Examples
Regular Expressions and Deterministic Finite Automata
Handwritten notes
Regular expression
Regular expression
Regular expression
Handwritten notes
Deterministic
Finite Automa
Handwritten notes
Regular expression
Regular expression
-9
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12
Handwritten notes
Handwritten notes
Is this correct?
Is this correct?
Is this correct?
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14
Handwritten notes
Error
Correct
Examples (words)
foobar4
foobar5
{"ab", "c"}*
DFA- Examples
Define a DFA for each case
a) URLs
b) Email addresses
c) ZIP codes
Ours Tokens
Which tokens are needed for a programming language?
a) Reserved words
b) Special Symbols: Operators and delimiters
c) Identifiers
d) Literals or constants
Drafting a Lexer
Keywords =
Operator =
Delimiter =
ID =
Drafting a Lexer
Float =
Integer =
Hexadecimal =
Octal =
Binary =
String =
Char =
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21
Homework