You are on page 1of 15

DFA to Regular Expressions

Proposition: If L is regular then there is a regular expression r such that


L = L(r).
Proof Idea: Let M = (Q, Σ, δ, q1 , F ) be a DFA recognizing L, with
Q = {q1 , q2 , . . . qn }, and F = {qf1 , . . . qfk }

• Construct regular expression r1fi such that


L(r1fi ) = {w : | δ(q1 , w) = qfi }, i.e., r1fi describes the set of strings
on which M ends up in state qfi when started in the initial state q1 .

• Then

L = L(M ) = L(r1f1 ) ∪ L(r1f2 ) ∪ · · · ∪ L(r1fk )


= L(r1f1 + r1f2 + · · · + r1fk )

Thus, the desired regular expression is r1f1 + r1f2 + · · · + r1fk

89
Constructing r1fi

Idea 1 For every i, j, build regular expressions rij describing strings


taking M from state qi to state qj , where qi need not be the initial
state and qj need not be a final state.

Idea 2 Build the expression rij inductively.


• Start with expressions that describe paths from qi to qj that do
not pass through any intermediate states; i.e., these are single
nodes or single edges.
• Inductively build expressions that describe paths that pass
through progressively a larger set of states.

90
Definitions and Notation

k
Define Rij to be the set (not regular expression) of strings leading from
qi to qj such that any intermediate state is ≤ k

• Note, the superscript k refers only to the intermediate states; so i


and j could be greater than k.
0
• Rij set of strings that go from qi to qj without passing through any
intermediate states; in other words they are ǫ or single edges.
n
• Rij is set of all strings going from qi to qj

91
k
Constructing set Rij : Base Case

8<
{a | δ(qi , a) = qj } if i 6= j
:
0
Rij =
{a | δ(qi , a) = qj } ∪ {ǫ} if i = j

92
k
Constructing set Rij : Inductive Step

k−1

k−1
Assume we have Rij

k k−1 k−1 k−1 ∗ k−1


Rij = Rij ∪ Rik (Rkk ) Rkj

93
Constructing the Regular Expression

k k k
Task: Construct expression rij such that L(rij ) = Rij .
Base Case
8>
<∅ 0
if Rij =∅

>: a + a + · · · a
0 0
rij = 1 2 m if Rij = {a1 , a2 , . . . am }
0
ǫ + a + ···a
1 m if Rij = {ǫ, a1 , . . . am }

94
Constructing the Regular Expression: Inductive step

k−1 k−1
Assume inductively, rij is the regular expression for Rij

k k−1 k−1 k−1 ∗ k−1


Rij = Rij ∪ Rik (Rkk ) Rkj
k−1 k−1 k−1 ∗ k−1
= L(rij ) ∪ L(rik )(L(rkk )) L(rkj )
k−1 k−1 k−1 ∗ k−1
= L(rij + rik (rkk ) rkj )
k−1 k−1 k−1 ∗ k−1 k
rij + rik (rkk ) rkj is the Regular Expression for Rij .

95
Completing the Proof

Proposition: If L is regular then there is a regular expression r such that


L = L(r).
Proof: Let q1 be the initial state, and {qf1 , qf2 , . . . qfk } the final states of
M (which recognizes L), then the desired regular expression is
n n n
r1f1
+ r 1f2
+ · · · r 1fk

96
Example

1 1
0

q1 q2
0

0 0
r11 = 1+ǫ r22 = 1+ǫ
0 0
r12 = 0 r21 = 0
1 0 0 0 ∗ 0 1 0 0 0 ∗ 0
r12 = r12 + r11 (r11 ) r12 r22 = r22 + r21 (r11 ) r12
= 0 + (1 + ǫ)+ 0 = (1 + ǫ) + 0(1 + ǫ)∗ 0

2 1 1 1 ∗ 1
r12 = r12 + r12 (r22 ) r22
+ + ∗ +
= (0 + (1 + ǫ) 0) + (0 + (1 + ǫ) 0)((1 + ǫ) + 0(1 + ǫ) 0)
+ ∗ ∗
= (0 + (1 + ǫ) 0)(1 + ǫ + 0(1 + ǫ) 0)
= (1 + ǫ)∗ 0(1 + ǫ + 01∗ 0)∗
∗ ∗ ∗
= 1 0(1 + 01 0)
2
L(M ) = L(r12 )

97
Analysis of the Translation

Size of the constructed regular expression

• Number of regular expressions = O(n3 )

• At each step the regular expression may blowup by a factor of 4


n
• Each regular expression rij can be of size O(4n )

The above method works for both NFA and DFA


For converting DFA there is slightly more efficient method (see
textbook)

98
Thus far . . .

NFA DFA

ǫ-NFA Regular Exp.

99
Regular Expression Identities

Associativity and Commutativity

L+M =M +L
(L + M ) + N = L + (M + N )
(LM )N = L(M N )

Note: LM 6= M L
Distributivity
L(M + N ) = LM + LN
(M + N )L = M L + N L

100
More Identities

Identities and Anhilators


∅+L=L+∅=L
ǫL = Lǫ = L
∅L = L∅ = ∅

Idempotent Law
L+L=L

Closure Laws
(L∗ )∗ = L∗
(∅)∗ = ǫ
ǫ∗ = ǫ
L+ = LL∗ = L∗ L
L∗ = L+ + ǫ

101
Testing Regular Expression Identities

To test if an equation E = F holds [Example: (L + M )∗ = (L∗ M ∗ )∗ ]

1. Convert E and F into concrete expressions C and D, by replacing


each language variable in E and F by a concrete symbol
[Replacing L by a and M by b, we get C = (a + b)∗ and D = (a∗ b∗ )∗ ]

2. L(C) = L(D) iff the equation E = F holds


[L((a + b)∗ ) = L((a∗ b∗ )∗ ) and so (L + M )∗ = (L∗ M ∗ )∗ holds]

Correctness of algorithm follows from Theorems 3.13 and 3.14 in the


book

102
However, caution!!

The algorithm only applies to equations of regular expressions and not to


all equations

• Consider the equation L ∩ M ∩ N = L ∩ M , where ∩ means set


intersection

• The equation is clearly false

• Concretizing we get {a} ∩ {b} ∩ {c} and {a} ∩ {b}, which are clearly
equal!!

103

You might also like