You are on page 1of 16

Regular Expressions

Reading: Chapter 3

Regular Expressions vs. Finite Automata

Offers a declarative way to express the pattern of any string we want to accept

E.g., 01*+ 10*

Automata => more machine-like


< input: string , output: [accept/reject] >

Regular expressions => more program syntax-like Unix environments heavily use regular expressions

E.g., bash shell, grep, vi & other editors, sed

Perl scripting good for string processing Lexical analyzers such as Lex or Flex
2

Regular Expressions
Regular expressions Syntactical expressions

Finite Automata (DFA, NFA, -NFA) Automata/machines

Regular Languages Formal language classes


3

Language Operators

Union of two languages:


L U M = all strings that are either in L or M Note: A union of two languages produces a third language

Concatenation of two languages:

L . M = all strings that are of the form xy s.t., x L and y M The dot operator is usually omitted

i.e., LM is same as L.M


4

i here refers to how many strings to concatenate from the parent language L to produce strings in the language Li

Kleene Closure (the * operator)

Kleene Closure of a given language L: L0= {} L1= {w | for some w L} L2= { w1w2 | w1 L, w2 L (duplicates allowed)} Li= { w1w2wi | all ws chosen are L (duplicates allowed)}

(Note: the choice of each wi is independent)

L* = Ui0 Li (arbitrary number of concatenations)

Example: Let L = { 1, 00} L0= {}


L1= {1,00} L2= {11,100,001,0000} L3= {111,1100,1001,10000,000000,00001,00100,0011} L* = L0 U L1 U L2 U


5

Kleene Closure (special notes)

L* is an infinite set iff |L|1 and L{} If L={}, then L* = {} If L = , then L* = {}

* denotes the set of all words over an alphabet

Therefore, an abbreviated way of saying there is an arbitrary language L over an alphabet is:

L *
6

Building Regular Expressions

Let E be a regular expression and the language represented by E is L(E) Then:


(E) = E L(E + F) = L(E) U L(F) L(E F) = L(E) L(F) L(E*) = (L(E))*

Example: how to use these regular expression properties and language operators?

L = { w | w is a binary string which does not contain two consecutive 0s or two consecutive 1s anywhere)

E.g., w = 01010101 is in L, while w = 10010 is not in L

Goal: Build a regular expression for L Four cases for w:


Case A: w starts with 0 and |w| is even Case B: w starts with 1 and |w| is even Case C: w starts with 0 and |w| is odd Case D: w starts with 1 and |w| is odd Case A: Case B: Case C: Case D: (01)* (10)* 0(10)* 1(01)*

Regular expression for the four cases:


Since L is the union of all 4 cases:

If we introduce then the regular expression can be simplified to:

Reg Exp for L = (01)* + (10)* + 0(10)* + 1(01)* Reg Exp for L = ( +1)(01)*( +0)

Precedence of Operators

Highest to lowest

* operator (star)

(concatenation)

+ operator

Example:

01* + 1

( 0 . ((1)*) ) + 1
9

Finite Automata (FA) & Regular Expressions (Reg Ex)

To show that they are interchangeable, consider the following theorems:

Proofs in the book

Theorem 1: For every DFA A there exists a regular expression R such that L(R)=L(A) Theorem 2: For every regular expression R there exists an -NFA E such that L(E)=L(R)
-NFA
Theorem 2

NFA Kleene Theorem DFA


Theorem 1 10

Reg Ex

DFA

Theorem 1

Reg Ex

DFA to RE construction
Informally, trace all distinct paths (traversing cycles only once) from the start state to each of the final states and enumerate all the expressions along the way Example:
1 q0 0 0 q1 0,1

q2

(1*) 0 1*

(0*) 1

(0 + 1)* (0+1)* Q) What is the language?

00*

1*00*1(0+1)*

11

Reg Ex

Theorem 2

-NFA

RE to -NFA construction
Example: (0+1)*01(0+1)*

(0+1)*
0

01

(0+1)*


0 1

12

Algebraic Laws of Regular Expressions

Commutative:

E+F = F+E (E+F)+G = E+(F+G) (EF)G = E(FG) E+ = E E=E=E E = E =


13

Associative:

Identity:

Annihilator:

Algebraic Laws

Distributive:

E(F+G) = EF + EG (F+G)E = FE+GE

Idempotent: E + E = E Involving Kleene closures:


(E*)* * * E+ E?

= E* = = =EE* = +E
14

True or False?
Let R and S be two regular expressions. Then:
1.

((R*)*)* = R*

2.

(R+S)* = R* + S*
(RS + R)* RS = (RR*S)*

?
?

3.

15

Summary

Regular expressions Equivalence to finite automata DFA to regular expression conversion Regular expression to -NFA conversion Algebraic laws of regular expressions Unix regular expressions and Lexical Analyzer
16

You might also like