Professional Documents
Culture Documents
Regular Languages
Regular Languages
Concatenation
The concatenation of two languages L1 and L2 over the alphabet is the language L1 L2 = { wx | w L1 x L2 } The set of strings that can be split into two pieces: a string in L1 and a string in L2.
Concatenation Example
Example: Let = { a, b, , z, A, B, , Z }
Noun = { Velociraptor, Rainbow, Whale, } Verb = { Eats, Juggles, Loves, } The = { The } { TheVelociraptorEatsTheVelociraptor, TheWhaleLovesTheRainbow, TheRainbowJugglesTheVelociraptor, }
If L1 and L2 are regular languages, is L1L2? Intuition can we split a string w into two strings x and y such that w = xy, x L1, and y L2? Idea: Run the automaton for L1 on w, and whenever L1 reaches an accepting state hand the rest off w to L2.
If L2 accepts the remainder, then L1 accepted the first part and the string is in L1L2. If L2 rejects the remainder, then the split was incorrect.
start
Consider the language L = { aa, b } LL is the set of strings formed by concatenating pairs of strings in L.
{ aaaaaaaa, aaaaaab, aaaabaa, aaaabb, aabaaaa, aabaab, aabbaa, aabbb, baaaaaa, baaaab, baabaa, baabb, bbaaaa, bbaab, bbbaa, bbbb }
Language Exponentiation
The set containing just the empty string. Idea: Any string formed by concatenating zero strings together is the empty string. Idea: Concatenating (n + 1) strings together works by concatenating n strings, then concatenating one more.
Ln + 1 = LLn
i=0
i
Li
This is an infinite union of Intuitively, all possible ways of concatenating sets. It is defined as the any number of copies of strings in L together. set of all x contained in L for any natural number i.
i=0
Li
Intuitively, all possible ways of concatenating any number of copies of strings in L together.
If a series of finite objects all have some property, their infinite union does not necessarily have that property!
No matter how many times we zigzag that line, it's never straight. Concluding that it must be equal in the limit is not mathematically precise. (This is why calculus is interesting).
A better intuition: Can we convert an NFA for the language L to an NFA for the language L*?
start
a
start
m u
m m
m u
m m
m a m o m m u m
m u
m m
m a m o m m u m
Summary
NFAs are a powerful type of automaton that allows for nondeterministic choices. NFAs can also have -transitions that move from state to state without consuming any input. The subset construction shows that NFAs are not more powerful than DFAs, because any NFA can be converted into a DFA that accepts the same language. The union, intersection, difference, complement, concatenation, and Kleene closure of regular languages are all regular languages.
Construct a DFA for it. Construct an NFA for it. Apply closure properties to existing languages.
Start with a small set of simple languages we already know to be regular. Using closure properties, combine these simple languages together to form more elaborate languages.
Regular Expressions
Regular expressions are a family of descriptions that can be used to capture the regular languages. Often provide a compact and human-readable description of the language. Used as the basis for numerous software systems (Perl, flex, grep, etc.)
The regular expressions begin with three simple building blocks. The symbol is a regular expression that represents the empty language . The symbol is a regular expression that represents the language { }
We can combine together existing regular expressions in four ways. If R1 and R2 are regular expressions, R1R2 is a regular expression represents the concatenation of the languages of R1 and R2. If R1 and R2 are regular expressions, R1 | R2 is a regular expression representing the union of R1 and R2. If R is a regular expression, R* is a regular expression for the Kleene closure of R. If R is a regular expression, (R) is a regular expression with the same meaning as R.
Operator Precedence
The regular expression trick|treat represents the regular language { trick, treat } The regular expression booo* represents the regular language { boo, booo, boooo, } The regular expression candy!(candy!)* represents the regular language { candy!, candy!candy!, candy!candy!candy!, }
The language of a regular expression is the language described by that regular expression. Formally:
() = {} () = (a) = {a} (R1 R2) = (R1) (R2) (R1 | R2) = (R1) (R2) (R*) = (R)* ((R)) = (R)
(0 | 1)*00(0 | 1)*
11011100101 0000 11111011110011111
(0|1)(0|1)(0|1)(0|1)
0000 1010 1111 1000
(0|1)4
0000 1010 1111 1000
1*(0 | )1*
11110111 111111 0111 0
1*0?1*
11110111 111111 0111 0
Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
aa* (.aa*)* @ aa*.aa* (.aa*)* cs103@cs.stanford.edu first.middle.last@mail.site.org barack.obama@whitehouse.gov
Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
a+ (.aa*)* @ aa*.aa* (.aa*)* cs103@cs.stanford.edu first.middle.last@mail.site.org barack.obama@whitehouse.gov
Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
a+ (.a+)* @ a+.a+ (.a+)*
Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
a+ (.a+)* @ a+.a+ (.a+)*
Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
a+ (.a+)* @ a+ (.a+)+
Let = { a, ., @ }, where a represents some letter. A regular expression for email addresses is
a+(.a+)*@a+(.a+)+ cs103@cs.stanford.edu first.middle.last@mail.site.org barack.obama@whitehouse.gov
@, . a, @, . @, .
q2
.
start
q8
@
@, . @
@, .
q7
. a a
., @
@ a
q0
q1
q3
q0 4
a
q5
q6
a
To simplify regular expressions, we introduce the following shorthand (which is just syntax sugar for some other regular expression). Rn represents the language RR...R (n times). R? represents R | .
R+ represents RR*.
Any language accepted by some DFA. Any language accepted by some NFA.
Need to show if a regular expression exists for L, L is regular. Need to show that if L is regular, there is a regular expression for L.
A Marvelous Construction
To show that any language described by a regular expression is regular, we show how to convert a regular expression into an NFA. Theorem: For any regular expression R, there is an NFA N such that
(R) = (N) N has exactly one accepting state. N has no transitions into its start state. N has no transitions out of its accepting state.
start
A Marvelous Construction
To show that any language describedThese regular expression by a are stronger These is regular, we show how to convert a regular are stronger expression into requirements than are requirements than are an NFA. Theorem: For any regular such that
(R) = (N)
necessary for aa normal NFA. necessary for normal NFA. expression enforce these NFA Nto R, there is an rules We enforce these rules to We simplify the construction. simplify the construction.
N has exactly one accepting state. N has no transitions into its start state. N has no transitions out of its accepting state.
start
Base Cases
start
Automaton for
start
Automaton for
start
start
start
R1
R2
start
R1
R2
start
R1
R2
start
R1
R2
Construction for R1 | R2
start
R1
start
R2
Construction for R1 | R2
start
start
R1
start
R2
Construction for R1 | R2
start
R1
R2
Construction for R1 | R2
start
R1
R2
Construction for R1 | R2
start
R1
R2
Construction for R1 | R2
start
R1
R2
Construction for R*
start
Construction for R*
start
start
Construction for R*
start
start
Construction for R*
start
Construction for R*
start
Construction for R*
start
Proving that if L is regular, there is a regular expression R for L is much trickier. Idea: Convert an NFA for L into a regular expression for L. How do we do this?
Regular expression:
s1, s2, , sn
start
Regular expression:
labeled with arbitrary labeled with arbitrary regular expressions. regular expressions.
s1 | s2 | | sn
start
Regular expression: R
Key idea: If we can convert any Key idea: If we can convert any NFA into something that looks NFA into something that looks like this, we can easily read off like this, we can easily read off the regular expression. the regular expression.
Regular expression: R
s1 | s2 | | sn
start
Regular expression: R
start
(s1 | s2 | | sn)*
Regular expression: R
Regular expression: R
s1 | s2 | | sn
start
Regular expression: R
start
Regular expression: R
start
Regular expression:
Regular expression: R
Regular expression: R
R11
start
R12 R21
R22
q1
q2
Regular expression: R
q1
q2
R12 R21
R22
q1
q2
R12 R21
R22
qs
q1
q2
qf
qs
R12 R21
R22
q1
q2
qf
qs
R12 R21
R22
q1
q2
qf
qs
R12 R21
R22
q1
q2
qf
qs
R12 R21
R22
q1
q2
qf
Could we eliminate Could we eliminate this state from this state from the NFA? the NFA?
qs
R12 R21
R22
q1
q2
qf
qs
R12 R21
R22
q1
q2
qf
qs
R12 R21
R22
q1
q2
qf
qs
R12 R21
R22
q1
q2
qf
qs
R12 R21
R22
q1
q2
qf
qs
R12 R21
R22
q1
q2
qf
qs
R12 R21
R22
q1
q2
R21 R11* R12
qf
qs
R12 R21
R22
q1
q2
R21 R11* R12
qf
qs
q2
R21 R11* R12
qf
qs
q2
R21 R11* R12
qf
start
qs
q2
qf
start
qs
R11* R12
q2
qf
start
qs
R11* R12
q2
qf
start
qs
R11* R12
q2
qf
start
qs
R11* R12
q2
qf
start
qs
q2
qf
start
qs
q2
qf
start
qs
qf
start
qs
qf
qs
qf
qs
R11
start
qf
R22
R12 R21
q1
q2
Start with an NFA for the language. For simplicity, add a new start state qs and accept state qf to the NFA, then eliminate all other accepting states. Repeatedly remove states other than qs and qf from the NFA by shortcutting them until only two states remain: qs and qf. The transition from qs to qf is then a regular expression for the NFA.
Another Example
start
q0
1
0 0, 1
q1
q2
0, 1
Another Example
start
q0
1
0 0|1
q1
q2
0, 1
Another Example
qs
0 0|1
start
q0
1
q1
q2
0, 1
qf
Another Example
start
qs
0 0|1
q0
1
q1
q2
0, 1
qf
Another Example
start
qs
0 0|1
q0
1
q1
q2
0, 1
qf
Another Example
start
qs
0 0|1
q0
1
q1
q2
0, 1
qf
Another Example
start
qs
0 0|1
q0
1
q1
q2
0, 1
qf
Another Example
start
qs
0 0|1
q0
1
q1
q2
0, 1
qf
Another Example
start
qs
0 0|1
q0
1
q1
q2
0, 1
qf
Another Example
start
qs
0 0|1
q0
q1
qf
Another Example
start
qs
0 0|1
q0
q1
qf
Another Example
start
qs
0 0|1
q0
q1
qf
Another Example
start
qs
0 0|1
q0
q1
qf
Another Example
start
qs
0 0|1
q0
q1
qf
Another Example
start
qs
0 0|1
q0
q1
qf
Another Example
start
qs
0 0|1
q0
q1
qf
Another Example
start
qs
0 0|1
0(0 | 1)
q0
q1
qf
Another Example
start
qs
0 0|1
0(0 | 1)
q0
q1
qf
Another Example
start
qs
0(0 | 1)
q0
0
qf
Another Example
start
qs
0(0 | 1)
q0
0
qf
Another Example
start
qs
0(0 | 1)
q0
0|
qf
Another Example
start
qs
0(0 | 1)
q0
0|
qf
Another Example
start
qs
0(0 | 1)
q0
0|
qf
Another Example
start
qs
0(0 | 1)
q0
0|
qf
Another Example
start
qs
0(0 | 1)
q0
0|
qf
Another Example
start
qs
0(0 | 1)
q0
0|
(0(0 | 1))*(0 | )
qf
Another Example
start
qs
0(0 | 1)
q0
0|
(0(0 | 1))*(0 | )
qf
Another Example
start
qs
(0(0 | 1))*(0 | )
qf
Another Example
start
qs
(0(0 | 1))*(0 | )
qf
Another Example
start
qs
(0(0 | 1))*(0 | )
qf
q0
0
q1
1
q2
0
q3
1
q5
0
q0
0
q1
1
q2
0
q3
1
q5
0
qf
qs
q0
0
q1
1
q2
0
q3
1
q5
0
qf
qs
q0
0
q1
1
q2
0
q3
1
q5
0
qf
qs
q0
0
q1
1
q2
0
q3
1
q5
0
qf
qs
q0
0
q1
1
q2
0
q3
1
0 qf
q5
0
qs
q0
0
q1
1
q2
0
q3
1
0 qf
q5
0
qs
q0
0
q1
1
q2
0
q3
1
0 qf
q5
0
qs
q0
0
q1
1
q2
0
q3
1
0 qf
q5
0
qs
q0
0
q1
1
q2
0
11*
1
q3
1 qf
q5
0
qs
q0
0
q1
1
q2
0
11*
1
q3
1 qf
q5
0
qs
q0
0
q1
1
q2
0
11*
1
q3
1 qf
q5
0
qs
q0
0
q1
1
q2
0
11*
1
q3
1 qf
q5
0
qs
q0
0
q1
1
q2
0
11*
1
q3
1 qf
11*0
q5
0
qs
q0
0
q1
1
q2
0
11*
1
q3
1 qf
11*0
q5
0
qs
q0
0
q1
q2
0
11*
1 11*0 qf
q5
0
qs
q0
0
q1
q2
0
11*
1 11*0 qf
q5
0
qs
q0
0 | 11*0
q1
q2
0
11*
1 qf
q5
0
qs
q0
0 | 11*0
q1
q2
0
11*
1 qf
q5
0
qs
q0
0 | 11*0
q1
q2
0
11*
1 qf
q5
0
qs
q0
0 | 11*0
q1
q2
0
11*
1 qf
q5
0
qs
q0
0 | 11*0
q1
q2
0
11*
1 qf
q5
0
00*
qs
q0
0 | 11*0
q1
q2
0
11*
1 qf
q5
0
00*
qs
q0
0 | 11*0
q1
q2
0
11*
1 qf
q5
0
00*
qs
q0
0 | 11*0
q1
q2
0
11*
1 qf
q5
0
00*
qs
q0
0 | 11*0
q1
q2
0
11*
1
00*1
q5
0
00*
qf
qs
q0
0 | 11*0
q1
q2
0
11*
1
00*1
q5
0
00*
qf
qs
q0
0 | 11*0
q1
q2
11*
00*1
00*
qf
qs
q0
0 | 11*0
q1
q2
11*
00*1
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1
q2
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1
q2
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1
q2
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1
q2
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1
q2
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1
q2
111*
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1
q2
111*
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1
q2
111*
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1 (1 | 00*1)11*
q2
111*
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1 (1 | 00*1)11*
q2
111*
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1 (1 | 00*1)11*
q2
111*
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1 (1 | 00*1)11*
q2
111*
(0 | 11*0)(1 | 00*1)
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1 (1 | 00*1)11*
q2
111*
(0 | 11*0)(1 | 00*1)
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1 (1 | 00*1)11*
q2
111*
(0 | 11*0)(1 | 00*1)
11*
00*
qf
qs
q0
0 | 11*0
q1
1 | 00*1 (1 | 00*1)11*
q2
111*
(0 | 11*0)(1 | 00*1)
11*
00*
qf
qs
q0
0 | 11*0
1 (0 | 11*0)
q1
1 | 00*1 (1 | 00*1)11*
q2
111*
(0 | 11*0)(1 | 00*1)
11*
00*
qf
qs
q0
0 | 11*0
1 (0 | 11*0)
q1
1 | 00*1 (1 | 00*1)11*
q2
111*
(0 | 11*0)(1 | 00*1)
11*
00*
qf
qs
q0
1 (0 | 11*0)
q2
111* (1 | 00*1)11*
(0 | 11*0)(1 | 00*1)
00*
qf
qs
q0
1 (0 | 11*0)
q2
(0 | 11*0)(1 | 00*1)
111*
(1 | 00*1)11*
00*
qf
qs
q0
1 (0 | 11*0) | 0
q2
(0 | 11*0)(1 | 00*1)
111*
(1 | 00*1)11*
00*
qf
qs
q0
1 (0 | 11*0) | 0
q2
(0 | 11*0)(1 | 00*1)
111*
(1 | 00*1)11* | 00*
qf
qs
q0
1 (0 | 11*0) | 0
q2
(0 | 11*0)(1 | 00*1)
111*
(1 | 00*1)11* | 00*
qf
qs
q0
1 (0 | 11*0) | 0
q2
(0 | 11*0)(1 | 00*1)
111*
(1 | 00*1)11* | 00*
qf
qs
q0
1 (0 | 11*0) | 0
q2
(0 | 11*0)(1 | 00*1)
111*
(1 | 00*1)11* | 00*
qf
qs
q0
1 (0 | 11*0) | 0
q2
(0 | 11*0)(1 | 00*1)
111*
(1 | 00*1)11* | 00*
qf
qs
q0
1 (0 | 11*0) | 0
q2
(0 | 11*0)(1 | 00*1)
111*
(1 | 00*1)11* | 00*
qf
qs
q0
111*
qf
qs
q0
111*
qf
qs
q0
111*
qf
qs
q0
qf
qs
q0
qf
qs
q0
qf
qs
qf
Our Transformations
direct conversion state elimination
DFA
subset construction
NFA
recursive transform
Regexp
Regular Languages
L is accepted by some DFA. L is accepted by some NFA. L is described by some regular expression.
Reversal
The reverse of a string w is the string wR of the characters of w in the opposite order.
Reversing a Language
Given a language L, the reverse of L is the language LR defined as LR = { w R | w L } { whale, rainbow }R = { elahw, wobniar } { mom, momm, mommm, }R = {mom, mmom, mmmom, }
If L is regular, then LR is regular. Idea: Get a regular expression for L, then transform it into a regular expression for LR. We could also transform DFAs or NFAs, but the regular expression transformation is a bit easier.
Let REV (E) denote a regular expression for (E)R. REV is defined inductively as follows:
REV(a) = a, for any a . REV() = REV() = REV(R1R2) = REV(R2) REV(R1) REV(R1 | R2) = REV(R1) | REV(R2) REV(R*) = REV(R)* REV((R)) = (REV(R))
String Homomorphism
Let 1 and 2 be alphabets. Consider any function h : 1 2* that associates symbols in 1 with symbols in 2. For example:
String Homomorphism
Let 1 and 2 be alphabets. Consider any function h : 1 2* that associates symbols in 1 with symbols in 2. For example:
String Homomorphism
Let 1 and 2 be alphabets. Consider any function h : 1 2* that associates symbols in 1 with symbols in 2. For example:
1 = { 0, 1 } 2 = { 0, 1 } h(0) = h(1) = 1
String Homomorphism
A Simple Homomorphism
A Simple Homomorphism
A Simple Homomorphism
String homomorphism represents building a new string that has the same structure as an older string. Example: Let 1 = { 0, 1, 2 } and consider the string 0121 If 2 = {A, B, C, , Z, a, b, , z, ', [, ], . }, define h : 1 2* as
h(0) = That's the way h(1) = [Uh huh uh huh] h(2) = I like it
Then h*(0121) = That's the way [Uh huh uh huh] I like it [Uh huh uh huh] Note that h*(0121) has the same structure as 0121, just expressed differently.
Homomorphisms of Languages
If L 1* is a language and h* : 1* 2* is a homomorphism, the language h*(L) is defined as h*(L) = { h*(w) | w L } The language formed by applying the homomorphism to every string in L.
If L is a regular language over 1 and h* : 1* 2* is a homomorphism, then is h*(L) a regular language? If so, how might we prove it? If not, why not?
Idea: Transform a regular expression for L into a regular expression for h*(L). Define HOM(R) as
HOM() = HOM() = HOM(a) = (h(a)) HOM(R1 R2) = HOM(R1) HOM(R2) HOM(R1 | R2) = HOM(R1) | HOM(R2) HOM(R*) = HOM(R)* HOM((R)) = (HOM(R))
Union Intersection Complement Set Difference (why?) Set Symmetric Difference (why?) Concatenation Kleene Closure Reversal String Homomorphism Plus a whole lot more!