You are on page 1of 216

Regular Expressions

Regular Languages

Regular Languages

A language is called a regular language iff:


It is accepted by some DFA. It is accepted by some NFA.

Regular languages are closed under various operations:


Complement Union Intersection

Concatenation

The concatenation of two languages L1 and L2 over the alphabet is the language L1 L2 = { wx | w L1 x L2 } The set of strings that can be split into two pieces: a string in L1 and a string in L2.

Concatenation Example

Example: Let = { a, b, , z, A, B, , Z }

Noun = { Velociraptor, Rainbow, Whale, } Verb = { Eats, Juggles, Loves, } The = { The } { TheVelociraptorEatsTheVelociraptor, TheWhaleLovesTheRainbow, TheRainbowJugglesTheVelociraptor, }

The language TheNounVerbTheNoun is

Concatenating Regular Languages

If L1 and L2 are regular languages, is L1L2? Intuition can we split a string w into two strings x and y such that w = xy, x L1, and y L2? Idea: Run the automaton for L1 on w, and whenever L1 reaches an accepting state hand the rest off w to L2.

If L2 accepts the remainder, then L1 accepted the first part and the string is in L1L2. If L2 rejects the remainder, then the split was incorrect.

Concatenating Regular Languages

start

Lots and Lots of Concatenation


Consider the language L = { aa, b } LL is the set of strings formed by concatenating pairs of strings in L.

{ aaaa, aab, baa, bb }

LLL is the set of strings formed by concatenating triples of strings in L.

{ aaaaaa, aaaab, aabaa, aabb, baaaa, baab, bbaa, bbb }

LLLL is the set of strings formed by concatenating quadruples of strings in L

{ aaaaaaaa, aaaaaab, aaaabaa, aaaabb, aabaaaa, aabaab, aabbaa, aabbb, baaaaaa, baaaab, baabaa, baabb, bbaaaa, bbaab, bbbaa, bbbb }

Language Exponentiation

We can define what it means to exponentiate a language as follows: L0 = { }


The set containing just the empty string. Idea: Any string formed by concatenating zero strings together is the empty string. Idea: Concatenating (n + 1) strings together works by concatenating n strings, then concatenating one more.

Ln + 1 = LLn

The Kleene Closure

An important operation on languages is the Kleene Closure, which is defined as L* =

i=0
i

Li

This is an infinite union of Intuitively, all possible ways of concatenating sets. It is defined as the any number of copies of strings in L together. set of all x contained in L for any natural number i.

The Kleene Closure

An important operation on languages is the Kleene Closure, which is defined as L* =

i=0

Li

Intuitively, all possible ways of concatenating any number of copies of strings in L together.

Reasoning about Infinity

How do we prove properties of this infinite union? A Bad Line of Reasoning:


L0 = { } is regular. L1 = L is regular. L2 = LL is regular L3 = (LL)L is regular So their infinite union is regular.

Reasoning about Infinity

Reasoning about Infinity

Reasoning About the Infinite

If a series of finite objects all have some property, their infinite union does not necessarily have that property!

No matter how many times we zigzag that line, it's never straight. Concluding that it must be equal in the limit is not mathematically precise. (This is why calculus is interesting).

A better intuition: Can we convert an NFA for the language L to an NFA for the language L*?

The Kleene Star


start

Kleene Star in Action


L = { ma, mom, mommy, mum }

a
start

m u

m m

Kleene Star in Action


L = { ma, mom, mommy, mum }
a
start

m u

m m

m a m o m m u m

Kleene Star in Action


L = { ma, mom, mommy, mum }
a
start

m u

m m

m a m o m m u m

Summary

NFAs are a powerful type of automaton that allows for nondeterministic choices. NFAs can also have -transitions that move from state to state without consuming any input. The subset construction shows that NFAs are not more powerful than DFAs, because any NFA can be converted into a DFA that accepts the same language. The union, intersection, difference, complement, concatenation, and Kleene closure of regular languages are all regular languages.

Rethinking Regular Languages

We currently have several tools for showing a language is regular.


Construct a DFA for it. Construct an NFA for it. Apply closure properties to existing languages.

We have not spoken much of this last idea.

Constructing Regular Languages

Idea: Build up all regular languages as follows:

Start with a small set of simple languages we already know to be regular. Using closure properties, combine these simple languages together to form more elaborate languages.

A bottom-up approach to the regular languages.

Regular Expressions

Regular expressions are a family of descriptions that can be used to capture the regular languages. Often provide a compact and human-readable description of the language. Used as the basis for numerous software systems (Perl, flex, grep, etc.)

Atomic Regular Expressions

The regular expressions begin with three simple building blocks. The symbol is a regular expression that represents the empty language . The symbol is a regular expression that represents the language { }

This is not the same as !

For any a , the symbol a is a regular expression for the language { a }

Compound Regular Expressions

We can combine together existing regular expressions in four ways. If R1 and R2 are regular expressions, R1R2 is a regular expression represents the concatenation of the languages of R1 and R2. If R1 and R2 are regular expressions, R1 | R2 is a regular expression representing the union of R1 and R2. If R is a regular expression, R* is a regular expression for the Kleene closure of R. If R is a regular expression, (R) is a regular expression with the same meaning as R.

Operator Precedence

Regular expression operator precedence is (R) R* R1R2 R1 | R 2 So ab*c|d is parsed as ((a(b*))c)|d

Regular Expression Examples

The regular expression trick|treat represents the regular language { trick, treat } The regular expression booo* represents the regular language { boo, booo, boooo, } The regular expression candy!(candy!)* represents the regular language { candy!, candy!candy!, candy!candy!candy!, }

Regular Expressions, Formally

The language of a regular expression is the language described by that regular expression. Formally:

() = {} () = (a) = {a} (R1 R2) = (R1) (R2) (R1 | R2) = (R1) (R2) (R*) = (R)* ((R)) = (R)

Regular Expressions are Awesome


Let = {0, 1} Let L = { w | w contains 00 as a substring }

(0 | 1)*00(0 | 1)*
11011100101 0000 11111011110011111

Regular Expressions are Awesome


Let = {0, 1} Let L = { w | |w| = 4 }
The length of The length of a string w is a string w is denoted |w| denoted |w|

Regular Expressions are Awesome


Let = {0, 1} Let L = { w | |w| = 4 }

(0|1)(0|1)(0|1)(0|1)
0000 1010 1111 1000

Regular Expressions are Awesome


Let = {0, 1} Let L = { w | |w| = 4 }

(0|1)4
0000 1010 1111 1000

Regular Expressions are Awesome


Let = {0, 1} Let L = { w | w contains at most one 0 }

1*(0 | )1*
11110111 111111 0111 0

Regular Expressions are Awesome


Let = {0, 1} Let L = { w | w contains at most one 0 }

1*0?1*
11110111 111111 0111 0

Regular Expressions are Awesome

Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
aa* (.aa*)* @ aa*.aa* (.aa*)* cs103@cs.stanford.edu first.middle.last@mail.site.org barack.obama@whitehouse.gov

Regular Expressions are Awesome

Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
a+ (.aa*)* @ aa*.aa* (.aa*)* cs103@cs.stanford.edu first.middle.last@mail.site.org barack.obama@whitehouse.gov

Regular Expressions are Awesome

Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
a+ (.a+)* @ a+.a+ (.a+)*

cs103@cs.stanford.edu first.middle.last@mail.site.org barack.obama@whitehouse.gov

Regular Expressions are Awesome

Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
a+ (.a+)* @ a+.a+ (.a+)*

cs103@cs.stanford.edu first.middle.last@mail.site.org barack.obama@whitehouse.gov

Regular Expressions are Awesome

Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
a+ (.a+)* @ a+ (.a+)+

cs103@cs.stanford.edu first.middle.last@mail.site.org barack.obama@whitehouse.gov

Regular Expressions are Awesome

Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
a+(.a+)*@a+(.a+)+ cs103@cs.stanford.edu first.middle.last@mail.site.org barack.obama@whitehouse.gov

Regular Expressions are Awesome a (.a )*@a (.a )


+ + + + +

@, . a, @, . @, .

q2
.
start

q8
@

@, . @
@, .

q7
. a a

., @
@ a

q0

q1

q3

q0 4
a

q5

q6
a

Extensions to Regular Expressions

To simplify regular expressions, we introduce the following shorthand (which is just syntax sugar for some other regular expression). Rn represents the language RR...R (n times). R? represents R | .

Either zero or one copies of R. n 1 copies of R. Sometimes called Kleene Plus.

R+ represents RR*.

Regular Expressions and Languages

So far, we have two characterizations of regular languages:


Any language accepted by some DFA. Any language accepted by some NFA.

Theorem: A language is regular iff it is described by some regular expression.

Need to show if a regular expression exists for L, L is regular. Need to show that if L is regular, there is a regular expression for L.

The second direction is not obvious!

A Marvelous Construction

To show that any language described by a regular expression is regular, we show how to convert a regular expression into an NFA. Theorem: For any regular expression R, there is an NFA N such that

(R) = (N) N has exactly one accepting state. N has no transitions into its start state. N has no transitions out of its accepting state.
start

A Marvelous Construction
To show that any language describedThese regular expression by a are stronger These is regular, we show how to convert a regular are stronger expression into requirements than are requirements than are an NFA. Theorem: For any regular such that
(R) = (N)

necessary for aa normal NFA. necessary for normal NFA. expression enforce these NFA Nto R, there is an rules We enforce these rules to We simplify the construction. simplify the construction.

N has exactly one accepting state. N has no transitions into its start state. N has no transitions out of its accepting state.
start

Base Cases
start

Automaton for
start

Automaton for
start

Automaton for single character a

Construction for R1R2

start

start

R1

R2

Construction for R1R2

start

R1

R2

Construction for R1R2

start

R1

R2

Construction for R1R2

start

R1

R2

Construction for R1 | R2
start

R1

start

R2

Construction for R1 | R2
start

start

R1

start

R2

Construction for R1 | R2

start

R1

R2

Construction for R1 | R2

start

R1

R2

Construction for R1 | R2

start

R1

R2

Construction for R1 | R2

start

R1

R2

Construction for R*

start

Construction for R*

start

start

Construction for R*

start

start

Construction for R*

start

Construction for R*

start

Construction for R*

start

The Other Direction

Proving that if L is regular, there is a regular expression R for L is much trickier. Idea: Convert an NFA for L into a regular expression for L. How do we do this?

From NFAs to Regular Expressions


s1, s2, , sn
start

Regular expression:

s1, s2, , sn
start

Regular expression: (s1 | s2 | | sn)*

From NFAs to Regular Expressions


s1 | s2 | | sn
start
Key idea: Allow Key idea: Allow transitions to be transitions to be

Regular expression:

labeled with arbitrary labeled with arbitrary regular expressions. regular expressions.

s1 | s2 | | sn
start

Regular expression: (s1 | s2 | | sn)*

From NFAs to Regular Expressions


start

Regular expression: R
Key idea: If we can convert any Key idea: If we can convert any NFA into something that looks NFA into something that looks like this, we can easily read off like this, we can easily read off the regular expression. the regular expression.

From NFAs to Regular Expressions


start

Regular expression: R

s1 | s2 | | sn
start

From NFAs to Regular Expressions


start

Regular expression: R

start

(s1 | s2 | | sn)*

Regular expression: (s1 | s2 | | sn)*

From NFAs to Regular Expressions


start

Regular expression: R

From NFAs to Regular Expressions


start

Regular expression: R

s1 | s2 | | sn
start

From NFAs to Regular Expressions


start

Regular expression: R

start

From NFAs to Regular Expressions


start

Regular expression: R

start

Regular expression:

From NFAs to Regular Expressions


start

Regular expression: R

From NFAs to Regular Expressions


start

Regular expression: R

R11
start

R12 R21

R22

q1

q2

From NFAs to Regular Expressions


start

Regular expression: R

R11* R12 (R22 | R21R11*R12)*


start

q1

q2

From NFAs to Regular Expressions


R11
start

R12 R21

R22

q1

q2

From NFAs to Regular Expressions


R11
start

R12 R21

R22

qs

q1

q2

qf

From NFAs to Regular Expressions


R11
start

qs

R12 R21

R22

q1

q2

qf

From NFAs to Regular Expressions


R11
start

qs

R12 R21

R22

q1

q2

qf

From NFAs to Regular Expressions


R11
start

qs

R12 R21

R22

q1

q2

qf

From NFAs to Regular Expressions


R11
start

qs

R12 R21

R22

q1

q2

qf

Could we eliminate Could we eliminate this state from this state from the NFA? the NFA?

From NFAs to Regular Expressions


R11
start

qs

R12 R21

R22

q1

q2

qf

From NFAs to Regular Expressions


R11
start

qs

R12 R21

R22

q1

q2

qf

From NFAs to Regular Expressions


R11* R12 R11
start

qs

R12 R21

R22

q1

q2

qf

From NFAs to Regular Expressions


R11* R12 R11
start

qs

R12 R21

R22

q1

q2

qf

From NFAs to Regular Expressions


R11* R12 R11
start

qs

R12 R21

R22

q1

q2

qf

From NFAs to Regular Expressions


R11* R12 R11
start

qs

R12 R21

R22

q1

q2

qf

From NFAs to Regular Expressions


R11* R12 R11
start

qs

R12 R21

R22

q1

q2
R21 R11* R12

qf

From NFAs to Regular Expressions


R11* R12 R11
start

qs

R12 R21

R22

q1

q2
R21 R11* R12

qf

From NFAs to Regular Expressions


R11* R12 R22
start

qs

q2
R21 R11* R12

qf

From NFAs to Regular Expressions


R11* R12 R22
start

qs

q2
R21 R11* R12

qf

From NFAs to Regular Expressions


R11* R12

start

qs

q2

qf

R22 | R21 R11* R12

From NFAs to Regular Expressions

start

qs

R11* R12

q2

qf

R22 | R21 R11* R12

From NFAs to Regular Expressions

start

qs

R11* R12

q2

qf

R22 | R21 R11* R12

From NFAs to Regular Expressions

start

qs

R11* R12

q2

qf

R22 | R21 R11* R12

From NFAs to Regular Expressions

start

qs

R11* R12

q2

qf

R22 | R21 R11* R12

From NFAs to Regular Expressions


R11* R12 (R22 | R21R11*R12)* R11* R12

start

qs

q2

qf

R22 | R21 R11* R12

From NFAs to Regular Expressions


R11* R12 (R22 | R21R11*R12)* R11* R12

start

qs

q2

qf

R22 | R21 R11* R12

From NFAs to Regular Expressions


R11* R12 (R22 | R21R11*R12)*

start

qs

qf

From NFAs to Regular Expressions


R11* R12 (R22 | R21R11*R12)*

start

qs

qf

From NFAs to Regular Expressions

R11* R12 (R22 | R21R11*R12)*


start

qs

qf

From NFAs to Regular Expressions

R11* R12 (R22 | R21R11*R12)*


start

qs
R11
start

qf
R22

R12 R21

q1

q2

The Construction at a Glance

Start with an NFA for the language. For simplicity, add a new start state qs and accept state qf to the NFA, then eliminate all other accepting states. Repeatedly remove states other than qs and qf from the NFA by shortcutting them until only two states remain: qs and qf. The transition from qs to qf is then a regular expression for the NFA.

Another Example

start

q0
1

0 0, 1

q1

q2
0, 1

Another Example

start

q0
1

0 0|1

q1

q2
0, 1

Another Example
qs
0 0|1

start

q0
1

q1

q2
0, 1

qf

Another Example
start

qs
0 0|1

q0
1

q1

q2
0, 1

qf

Another Example
start

qs
0 0|1

q0
1

q1

q2
0, 1

qf

Another Example
start

qs
0 0|1

q0
1

q1

q2
0, 1

qf

Another Example
start

qs
0 0|1

q0
1

q1

q2
0, 1

qf

Another Example
start

qs
0 0|1

q0
1

q1

q2
0, 1

qf

Another Example
start

qs
0 0|1

q0
1

q1

q2
0, 1

qf

Another Example
start

qs
0 0|1

q0

q1

qf

Another Example
start

qs
0 0|1

q0

q1

qf

Another Example
start

qs
0 0|1

q0

q1

qf

Another Example
start

qs
0 0|1

q0

q1

qf

Another Example
start

qs
0 0|1

q0

q1

qf

Another Example
start

qs
0 0|1

q0

q1

qf

Another Example
start

qs
0 0|1

q0

q1

qf

Another Example
start

qs
0 0|1

0(0 | 1)

q0

q1

qf

Another Example
start

qs
0 0|1

0(0 | 1)

q0

q1

qf

Another Example
start

qs

0(0 | 1)

q0
0

qf

Another Example
start

qs

0(0 | 1)

q0
0

qf

Another Example
start

qs

0(0 | 1)

q0

0|

qf

Another Example
start

qs

0(0 | 1)

q0
0|

qf

Another Example
start

qs

0(0 | 1)

q0
0|

qf

Another Example
start

qs

0(0 | 1)

q0
0|

qf

Another Example
start

qs

0(0 | 1)

q0
0|

qf

Another Example
start

qs

0(0 | 1)

q0
0|

(0(0 | 1))*(0 | )

qf

Another Example
start

qs

0(0 | 1)

q0
0|

(0(0 | 1))*(0 | )

qf

Another Example
start

qs

(0(0 | 1))*(0 | )

qf

Another Example
start

qs

(0(0 | 1))*(0 | )

qf

Another Example
start

qs

(0(0 | 1))*(0 | )

qf

One More Example


start

q0
0

q1
1

q2
0

q3
1

q5
0

One More Example


qs
start

q0
0

q1
1

q2
0

q3
1

q5
0

qf

One More Example


start

qs

q0
0

q1
1

q2
0

q3
1

q5
0

qf

One More Example


start

qs

q0
0

q1
1

q2
0

q3
1

q5
0

qf

One More Example


start

qs

q0
0

q1
1

q2
0

q3
1

q5
0

qf

One More Example


start

qs

q0
0

q1
1

q2
0

q3
1

0 qf

q5
0

One More Example


start

qs

q0
0

q1
1

q2
0

q3
1

0 qf

q5
0

One More Example


start

qs

q0
0

q1
1

q2
0

q3
1

0 qf

q5
0

One More Example


start

qs

q0
0

q1
1

q2
0

q3
1

0 qf

q5
0

One More Example


start

qs

q0
0

q1
1

q2
0

11*
1

q3

1 qf

q5
0

One More Example


start

qs

q0
0

q1
1

q2
0

11*
1

q3

1 qf

q5
0

One More Example


start

qs

q0
0

q1
1

q2
0

11*
1

q3

1 qf

q5
0

One More Example


start

qs

q0
0

q1
1

q2
0

11*
1

q3

1 qf

q5
0

One More Example


start

qs

q0
0

q1
1

q2
0

11*
1

q3

1 qf

11*0

q5
0

One More Example


start

qs

q0
0

q1
1

q2
0

11*
1

q3

1 qf

11*0

q5
0

One More Example


start

qs

q0
0

q1

q2
0

11*

1 11*0 qf

q5
0

One More Example


start

qs

q0
0

q1

q2
0

11*

1 11*0 qf

q5
0

One More Example


start

qs

q0
0 | 11*0

q1

q2
0

11*

1 qf

q5
0

One More Example


start

qs

q0
0 | 11*0

q1

q2
0

11*

1 qf

q5
0

One More Example


start

qs

q0
0 | 11*0

q1

q2
0

11*

1 qf

q5
0

One More Example


start

qs

q0
0 | 11*0

q1

q2
0

11*

1 qf

q5
0

One More Example


start

qs

q0
0 | 11*0

q1

q2
0

11*

1 qf

q5
0

00*

One More Example


start

qs

q0
0 | 11*0

q1

q2
0

11*

1 qf

q5
0

00*

One More Example


start

qs

q0
0 | 11*0

q1

q2
0

11*

1 qf

q5
0

00*

One More Example


start

qs

q0
0 | 11*0

q1

q2
0

11*

1 qf

q5
0

00*

One More Example


start

qs

q0
0 | 11*0

q1

q2
0

11*

1
00*1

q5
0

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

q2
0

11*

1
00*1

q5
0

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

q2

11*
00*1

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

q2

11*
00*1

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1

q2

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1

q2

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1

q2

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1

q2

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1

q2

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1

q2
111*

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1

q2
111*

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1

q2
111*

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1 (1 | 00*1)11*

q2
111*

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1 (1 | 00*1)11*

q2
111*

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1 (1 | 00*1)11*

q2
111*

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1 (1 | 00*1)11*

q2
111*
(0 | 11*0)(1 | 00*1)

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1 (1 | 00*1)11*

q2
111*
(0 | 11*0)(1 | 00*1)

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1 (1 | 00*1)11*

q2
111*
(0 | 11*0)(1 | 00*1)

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

q1

1 | 00*1 (1 | 00*1)11*

q2
111*
(0 | 11*0)(1 | 00*1)

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

1 (0 | 11*0)

q1

1 | 00*1 (1 | 00*1)11*

q2
111*
(0 | 11*0)(1 | 00*1)

11*

00*

qf

One More Example


start

qs

q0
0 | 11*0

1 (0 | 11*0)

q1

1 | 00*1 (1 | 00*1)11*

q2
111*
(0 | 11*0)(1 | 00*1)

11*

00*

qf

One More Example


start

qs

q0

1 (0 | 11*0)

q2
111* (1 | 00*1)11*
(0 | 11*0)(1 | 00*1)

00*

qf

One More Example


start

qs

q0
1 (0 | 11*0)

q2

(0 | 11*0)(1 | 00*1)

111*

(1 | 00*1)11*

00*

qf

One More Example


start

qs

q0
1 (0 | 11*0) | 0

q2

(0 | 11*0)(1 | 00*1)

111*

(1 | 00*1)11*

00*

qf

One More Example


start

qs

q0
1 (0 | 11*0) | 0

q2

(0 | 11*0)(1 | 00*1)

111*

(1 | 00*1)11* | 00*

qf

One More Example


start

qs

q0
1 (0 | 11*0) | 0

q2

(0 | 11*0)(1 | 00*1)

111*

(1 | 00*1)11* | 00*

qf

One More Example


start

qs

q0
1 (0 | 11*0) | 0

q2

(0 | 11*0)(1 | 00*1)

111*

(1 | 00*1)11* | 00*

qf

One More Example


start

qs

q0
1 (0 | 11*0) | 0

q2

(0 | 11*0)(1 | 00*1)

111*

(1 | 00*1)11* | 00*

qf

One More Example


start

qs

q0
1 (0 | 11*0) | 0

q2

(0 | 11*0)(1 | 00*1)

111*

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*)

(1 | 00*1)11* | 00*

qf

One More Example


start

qs

q0
1 (0 | 11*0) | 0

q2

(0 | 11*0)(1 | 00*1)

111*

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*)

(1 | 00*1)11* | 00*

qf

One More Example


start

qs

q0

111*

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*)

qf

One More Example


start

qs

q0

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*)

111*

qf

One More Example


start

qs

q0

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*)

111*

qf

One More Example


start

qs

q0

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*) | 111*

qf

One More Example


start

qs

q0

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*) | 111*

qf

One More Example


start

qs

q0

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*) | 111*

qf

One More Example


start

qs

(1 (0 | 11*0) | 0) ((0 | 11*0)(1 | 00*1))*((1 | 00*1)11* | 00*) | 111*

qf

Our Transformations
direct conversion state elimination

DFA
subset construction

NFA
recursive transform

Regexp

Regular Languages

A language L is regular iff


L is accepted by some DFA. L is accepted by some NFA. L is described by some regular expression.

What constructions on regular languages can we do with regular expressions?

Reversal

The reverse of a string w is the string wR of the characters of w in the opposite order.

helloR = olleh velociraptorR = rotparicolev aibohphobiaR = aibohphobia

Reversing a Language

Given a language L, the reverse of L is the language LR defined as LR = { w R | w L } { whale, rainbow }R = { elahw, wobniar } { mom, momm, mommm, }R = {mom, mmom, mmmom, }

Reversing a Regular Language


If L is regular, then LR is regular. Idea: Get a regular expression for L, then transform it into a regular expression for LR. We could also transform DFAs or NFAs, but the regular expression transformation is a bit easier.

Reversing a Regular Expression


Let REV (E) denote a regular expression for (E)R. REV is defined inductively as follows:

REV(a) = a, for any a . REV() = REV() = REV(R1R2) = REV(R2) REV(R1) REV(R1 | R2) = REV(R1) | REV(R2) REV(R*) = REV(R)* REV((R)) = (REV(R))

Reversing a Regular Expression


= REV( (2 | 1*)0 ) = REV(0) REV( (2 | 1*) ) = 0 REV( (2 | 1*) ) = 0 ( REV(2 | 1*) ) = 0 ( REV(2) | REV(1*) ) = 0 ( 2 | REV(1*) ) = 0 ( 2 | REV(1)* ) = 0 ( 2 | 1* )

String Homomorphism

Let 1 and 2 be alphabets. Consider any function h : 1 2* that associates symbols in 1 with symbols in 2. For example:

1 = { 0, 1 } 2 = { a, b, c, d } h(0) = acdb h(1) = ccc

String Homomorphism

Let 1 and 2 be alphabets. Consider any function h : 1 2* that associates symbols in 1 with symbols in 2. For example:

1 = { a, b, c, d, } 2 = { A, B, C, D, } h(a) = A h(b) = B ...

String Homomorphism

Let 1 and 2 be alphabets. Consider any function h : 1 2* that associates symbols in 1 with symbols in 2. For example:

1 = { 0, 1 } 2 = { 0, 1 } h(0) = h(1) = 1

String Homomorphism

Given a function h : 1 2*, consider the function h* : 1* 2* defined recursively as follows:


h*() = h*(wa) = h*(w) h(a) From Greek same shape.

This function is called a string homomorphism.

A Simple Homomorphism

Example: h(a) = A, h(b) = B


= h*(baa) = h*(ba) h(a) = h*(b) h(a) h(a) = h*() h(b) h(a) h(a) = h(b) h(a) h(a) = B h(a) h(a) = BA h(a) = BAA

A Simple Homomorphism

Example: h(0) = a, h(1) = bc


= h*(0110) = h*(011)h(0) = h*(01)h(1)h(0) = h*(0)h(1)h(1)h(0) = h*()h(0)h(1)h(1)h(0) = h(0)h(1)h(1)h(0) = a h(1)h(1)h(0) = abc h(1)h(0) = abcbc h(0) = abcbca

A Simple Homomorphism

Example: h(0) = , h(1) = 1


= h*(0110) = h*(011)h(0) = h*(01)h(1)h(0) = h*(0)h(1)h(1)h(0) = h*()h(0)h(1)h(1)h(0) = h(0)h(1)h(1)h(0) = h(1)h(1)h(0) = 1 h(1)h(0) = 11 h(0) = 11

String Homomorphism, Intuitively

String homomorphism represents building a new string that has the same structure as an older string. Example: Let 1 = { 0, 1, 2 } and consider the string 0121 If 2 = {A, B, C, , Z, a, b, , z, ', [, ], . }, define h : 1 2* as

h(0) = That's the way h(1) = [Uh huh uh huh] h(2) = I like it

Then h*(0121) = That's the way [Uh huh uh huh] I like it [Uh huh uh huh] Note that h*(0121) has the same structure as 0121, just expressed differently.

Homomorphisms of Languages

If L 1* is a language and h* : 1* 2* is a homomorphism, the language h*(L) is defined as h*(L) = { h*(w) | w L } The language formed by applying the homomorphism to every string in L.

Homomorphisms of Regular Languages

If L is a regular language over 1 and h* : 1* 2* is a homomorphism, then is h*(L) a regular language? If so, how might we prove it? If not, why not?

Homomorphisms of Regular Languages

Idea: Transform a regular expression for L into a regular expression for h*(L). Define HOM(R) as

HOM() = HOM() = HOM(a) = (h(a)) HOM(R1 R2) = HOM(R1) HOM(R2) HOM(R1 | R2) = HOM(R1) | HOM(R2) HOM(R*) = HOM(R)* HOM((R)) = (HOM(R))

Homomorphisms of Regular Languages

Consider the language (0120)* and the function


h(0) = n h(1) = y h(2) = a

Then h*((0120)*) = (n)(y)(a)(n)*

Homomorphisms of Regular Languages

Consider the language 011* and the function


h(0) = Here h(1) = Kitty

Then h*(011*) = (Here)(Kitty)(Kitty)*

The Big List of Closure Properties

The regular languages are closed under


Union Intersection Complement Set Difference (why?) Set Symmetric Difference (why?) Concatenation Kleene Closure Reversal String Homomorphism Plus a whole lot more!

You might also like