You are on page 1of 23

CSE340 - Principles of

Programming Languages
Lecture 03:
Lexical Analysis

Javier Gonzalez-Sanchez
javiergs@asu.edu
BYENG M1-21
Office Hours: By appointment

Keywords

Lexical

Alphabet

Symbol

String

Word
Regular
Expression

Token

Rules
Deterministic
Finite
Automata
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2

Regular expression
A rule to describe finite combination of symbols
(sequences) that are considered well-formed.
Regular expression has symbols and operators
Symbols are defined in the alphabet
The operators used in regular expressions are: * (0 or
more), + (1 or more), ? (0 or 1), | (or). Besides those
we can use [ ] to enclose sets of symbols without
enumerating all of them, such as [0-9] or [A-Z]. Also,
we can use parenthesis.
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3

Regular expression | Examples


Token

Regular Expression (rule)

Examples (words)

foobarOne

(a | b)

{a, b}

foobarTwo

(a | b)(a | b)

{aa, bb, ba, ab}

foobarThree

a*

, a, aa, aaa, aaaa, ... }

foobarFour

(a | b)*

, a, b, aa, bb, ...abba ...}

foobarFive

a+

{ a, aa, aaa, aaaa, ... }

foobarSix

[a-z]+

{hello, world, etc, }

number

[0-9]+

{1934, 0101, 33, 12321}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4

Regular expression | Examples


Token1

Regular Expression (rule)

Example (word)

digit

0 | 1 | 2 | 3 | ... | 9

integer

digit+

1945

fraction

.digit+

.55

exponent

e(+|-)?digit+

e+210

floatDraftOne

integer(fraction?) (exponent?)

340.08e-14

floatDraftTwo

{[-+]?([0-9]+\.?[0-9]*|\.[0-9]+)([eE][-+]?[0-9]+)?}

binary

0b(0|1)+

0b1010

1. These definitions are NOT fully complete or correct. They purpose is only to exemplify RE. For
instance 07 match as an integer, which will NOT be the case for our language.
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5

Deterministic Finite Automata


It is a finite state machine that accepts/rejects finite
strings of symbols and produces a unique result for
each input string.
In the automaton, there are three states (denoted
graphically by circles) and transition arrows
connecting one state with other.
Upon reading a symbol, a DFA jumps
deterministically from a state to another by
following the transition arrow.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6

DFA | Examples
binary
0b(0|1)+

string
([a-z] | [A-Z] | [0-9])*

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7

DFA | Examples
{}
Char

{.}

{}
Operator

{.}

{+,-,*,/,%,
<,>,=,!,}

Start

{(, ), {, }, [, ]}

{}

{}

String

{a-z}

Delimiter

{0-9}

{_}

{\.}

{0-9}

ID

{$}
Integer

Float

{\.}

{0-9}

{$, _, 0-9, a-z}


{0-9}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8

Additional Examples
Regular Expressions and Deterministic Finite Automata

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9

Handwritten notes

Regular expression

Regular expression

Regular expression

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10

Handwritten notes

Deterministic
Finite Automa

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11

Handwritten notes
Regular expression
Regular expression

-9
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12

Handwritten notes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13

Handwritten notes

Is this correct?
Is this correct?
Is this correct?
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14

Handwritten notes

Error
Correct

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15

Regular expressions - Examples


Token

Regular Expression (rule)

Examples (words)

foobar4

{"a", "b", "c"}*

{, "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc",


"ca", "cb", "cc", ...}

foobar5

{"ab", "c"}*

{, "ab", "c", "abab", "abc", "cab", "cc", "ababab",


"ababc", "abcab, ...}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16

Regular expressions - Examples


Define a regular expression for each case
a) URLs
b) Email addresses
c) ZIP codes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17

DFA- Examples
Define a DFA for each case
a) URLs
b) Email addresses
c) ZIP codes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18

Ours Tokens
Which tokens are needed for a programming language?
a) Reserved words
b) Special Symbols: Operators and delimiters
c) Identifiers

d) Literals or constants

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19

Drafting a Lexer
Keywords =

Operator =

Delimiter =

ID =

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20

Drafting a Lexer
Float =
Integer =
Hexadecimal =
Octal =
Binary =
String =
Char =
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21

Homework

Define the necessary lexical rules for a programming language


Express these rules using a DFA and Regular Expressions
Share them on Blackboard and discuss their correctness with your classmates.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 22

CSE340 - Principles of Programming Languages


Javier Gonzalez-Sanchez
javiergs@asu.edu
Fall 2014
Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.

You might also like