You are on page 1of 19

Fundamentals of Theory of Computer Science Unit 9

Sikkim Manipal University Page No.: 169


Unit 9 Regular Expressions and
Regular Languages
Structure
9.1 Introduction
Objectives
9.2 Regular expressions
9.3 Regular Expressions accepted by the Language
9.4 Finite Automaton from Regular Grammar
9.5 Regular Grammar from Finite Automata
Self Assessment Questions
9.6 Summary
9.7 Terminal Questions
9.8 Answers

9.1 Introduction
In this unit, you will learn about regular expressions along with finite
automata, which act as a device for computing regular expressions. A
regular expression is a set of strings of symbols that can be generated by a
regular grammar using certain operations such as union, intersection and
concatenation. A regular expression also follows different identities that is
based on common mathematical operations such as addition and
multiplication. These identities help simplify the regular expression. A
regular expression can be accepted both by deterministic as well as non-
deterministic automata.
Objectives:
After going through this unit, you will be able to
- explain the concept of regular expressions
- understand the regular expression accepted by the language.
- Convert finite automata from regular grammar.
- Convert regular grammar from finite automata.
Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 170
9.2 Regular Expressions
In computing, regular expressions are used to represent a set of strings and
include symbols that are arranged using certain syntax rules. We can de
regular expression R
1
using terminal symbols such as . and | that are
elements of E. Some of the algebraic operations defined with regular
expression are:
1. Union: The union of two regular expressions is also a regular
expression. For example, if R
1
and R
2
are the two regular expressions,
then the union R
1
+ R
2
is also a regular expression.
2. Concatenation: The concatenation of two regular expressions is a
regular expression. For example, if R
1
and R
2
are the two regular
expressions, then the concatenation R
1
R
2
is also a regular expression.
3. Iteration: The iteration of a regular expression is also a regular
expression. For example, if R
1
is a regular expression, then the iteration
-
1
R is also a regular expression.
4. Order of evolution: The order of evolution of a regular expression is a
regular expression. For example, if R
1
is a regular expression, then
order of evolution (R
1
) is also a regular expression.
9.2.1 Definition:
A regular expression is recursively defined as follows.
1. | is a regular expression denoting an empty language.
2. . is a regular expression which indicates the language containing an
empty string.
3. a is a regular expression which indicates the language containing only
{a}
4. If R is a regular expression denoting the language L
R
and S is a regular
expression denoting the language L
S
, then
Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 171
a. R+S is a regular expression corresponding to the language
L
R
L
S
.
b. RS is a regular expression corresponding to the language L
R
.L
S
.
c. R* is a regular expression corresponding to the language L
R
.
5. The expressions obtained by applying any of the rules from 1 to 4 are
regular expressions.
Note: If parentheses are not present in the regular expressions, then
precedence of the operands is as follows: iteration, concatenation and
union. First you need to perform the iteration operation, then the
concatenation operation and finally the union operation.
Note: Any set, which is represented by using a regular expression, is known
as regular set. If the regular expression is R, then the regular set of R is
L(R).
9.2.2 Example:
Let x, y e E, where,
- x represents the set {x}
- x + y represents the set {x, y}
- xy represents the set {xy}
- x
*
represents the set {., x, xx, xxx, }
- (x + y)
*
represents the set{x + y}
*

9.3 Regular Expressions accepted by the Language
9.3.1 Example:
Some examples of regular expressions and the language corresponding to
these regular expressions are given here.


Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 172
Regular
Expressions
Meaning
(a+b)* Set of strings of as and bs of any length including the NULL
string.
(a+b)*abb Set of strings of as and bs ending with the string abb
ab(a+b)* Set of strings of as and bs starting with the string ab.
(a+b)*aa(a+b)* Set of strings of as and bs having a sub string aa.
a*b*c*

Set of string consisting of any number of as(may be empty
string also) followed by any number of bs(may include
empty string) followed by any number of cs(may include
empty string).
abc

Set of string consisting of at least one a followed by string
consisting of at least one b followed by string consisting of
at least one c.
aa*bb*cc*

Set of strings consisting of at least one a followed by string
consisting of at least one b followed by string consisting of
at least one c.
(a+b)* (a + bb) Set of strings of as and bs ending with either a or bb
(aa)* (bb)
*
b Set of strings consisting of even number of as followed by
odd number of bs.
9.3.2 Example
Obtain a regular expression to accept a language consisting of strings of as
and alternate as and bs.
Solution: The alternate as and bs can be obtained by concatenating the
string ab zero or more times which can be represented by the regular
expression
(ab)
*
and adding an optional b to the front and adding an optional a at the end as
shown below:
(. + b) (ab)* (. + a).
Thus, the complete expression is given by
(. + b) (ab)* (. + a)

Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 173
9.3.3 Note
The expression can also be obtained as shown below:
The as and bs can be generated using one of the following ways:
i) (ab)
*
ii) b(ab)
*

iii) (ba)
*
iv) a(ba)
*

Therefore the expression to generate alternate as and bs can be obtained
by taking the union of regular expressions as shown below:
(ab)* + b(ab)* + (ba)
*
+ a(ba)
*

9.3.4 Example
Obtain a regular expression to accept a language consisting of strings of 0s
and 1s with at most one pair of consecutive 0s.
Solution: It is clear from the statement that the string consisting of at most
one pair of consecutive 0s may
o begin with combination of any number of 1s and 01s represented by (1
+ 01)
*

o end with any number of 1s represented by 1
*
.
Therefore the complete regular expression which consists of strings 0s and
1s with at most one pair of consecutive 0s is given by
(1 + 01)
*
00 1
*
.
9.3.5 Example
Obtain a regular expression to accept a language containing at least one a
and at least one b where E = {a, b, c}.
Solution: Strings of as, bs and cs can be generated using the regular
expression
(a + b + c)*.
Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 174
But this string should have at least one a and at least one b. There are
two cases to be considered:
First a preceding b which can be represented using
c*a(a + c)*b
First b preceding a which can be represented using
c*b(b + c)*a
The regular expression (a + b + c)* can be preceded by one of the regular
expressions considered in the two cases just discussed.
Therefore the final regular expression is
c*a(a + c)*b(a + b + c)
*
+c*b(b + c)*a(a + b + c)*
This expression can also be written as shown below:
[c*a(a+c)*b + c*b(b4c)*a] (a+b+c)*
9.3.6 Example
Obtain a regular expression to accept a language consisting of strings of as
and bs of even length.
Solution: String of as and bs of even length can be obtained by the
combination of the strings aa, ab, ba and bb.
The language may even consist of an empty string denoted by ..
Therefore the regular expression can be of the form
(aa + ab + ba + bb)*
The * closure includes the empty string.
The language corresponding to the regular expression is denoted by
L(R)={(aa + ab + ba + bb)
n
n > 0}.


Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 175
9.3.7 Example
Obtain a regular expression to accept a language consisting of strings of as
and bs of odd length.
Solution: String of as and bs of odd length can be obtained by the
combination of the strings aa, ab, ba and bb followed by either a or b.
Therefore the regular expression can be of the form
(aa + ab + ba + bb)* (a + b)
String of as and bs of odd length can also be obtained by the combination
of the strings aa, ab, ba and bb preceded by either a or b.
Therefore the regular expression can also be represented as
(a + b) (aa + ab + ba + bb)*.
Observation: Even though these two expressions seem to be different, the
language corresponding to these two expressions is same.
9.3.8 Example
Obtain a regular expression such that L(R) = {w w e {0, 1}* with at least
three consecutive 0s.
Solution: A string consisting of 0s and ls can be represented by the
regular expression
(0 + 1)*
This arbitrary string can precede three consecutive zeros and can follow
three consecutive zeros.
Therefore the regular expression can be written as
(0 +1)* 000(0+1)*.
The language corresponding to the regular expression can be written as
L(R) = { (0 + 1)
m
000(0+1)
n
m > 0 and n > 0}.

Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 176
9.4 Finite Automaton from Regular Grammar
9.4.1 Definition
A grammar G = (V
N
, V
T
, S, u) is said to be regular grammar the
grammar is right regular or left regular.
A grammar G is said to be right regular if all the productions are of the form
A wB and / or A w, where A, B e V
T
and w e V
T
*
.
A grammar G is said to be left regular if all the productions are of the form
A Bw and / or A w, where A, B e V
T
and w e V
T
*
.
9.4.2 Example
(i) The grammar with the set of productions
S aaB bbA .
A aA b
B bB a .
is a right linear grammar.
(ii) The grammar with the set of productions
S Baa Abb .
A Aa b
B Bb a .
is a left linear grammar.
9.4.3 Definition
A grammar which has at most one non terminal on the right side of any
production without restriction on the position of this non terminal (observe
that: non terminal can be leftmost or rightmost) is called linear grammar.
9.4.4 Theorem
Let G = (V
N
, V
T
, S, u) be a right linear grammar. Then there exists a
language L(G) which is accepted by a finite automata, that is, the language
generated from the regular grammar is a regular language.
Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 177
Proof: Let V = {q
0
, q
1
, } be the variables and S = q
0
be the start state.
Let the productions in the grammar be
q
0
x
1
q
1

q
1
x
2
q
2

q
2
x
3
q
3


q
n
x
n+1

Assume that the language L(G) generated from these productions is w.
Corresponding to each production in the grammar we can have equivalent
transitions in the FA to accept the string w.
After accepting the string wm the FA will be in the final state.
The procedure to obtain FA from these productions is given below.
Step 1: The start symbol q
0
in the grammar is the start state of FA.
Step 2: For each production of the form q
I
wq
j
the corresponding
transition defined will be of the form
o*(q
i
, w) = q
j
.
Step 3: For each production of the form q
i
w, the corresponding transition
defined will be of the form
o*(q
i
, w) = q
f
, where q
f
is the final state.
Since the string w e L(G) is also accepted by FA, by applying the transitions
obtained in step 1 through 3, the language is regular.





Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 178
9.4.5 Problem: Construct a DFA and the transition diagram, to accept the
language generated by the following grammar.
S 01A
A 10B
B 0A 11
Solution: Observe that each production of the form
A wB
the corresponding transition will be o(A, w) = B
Also, for each production of the form A w, we can introduce the transition
o(A, w) = q
f
, where q
f
is the final state.
The transitions obtained from grammar G is shown in the table.

The transition diagram is shown below.

The DFA is
M = (Q, E, o, q
0
, F) where
Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 179
Q = {S, A, B, q
f
, q
1
, q
2
, q
3
},
E = {0, 1}, q
0
= S (start state), F = {q
f
}, o is shown in the table. Here, the
additional vertices (states) introduced are q
1
, q
2
, q
3
.
9.4.6 Problem:
Construct DFA and the corresponding transition diagram to accept the
language generated by the following grammar.
S aA .
A aA bB .
B bB .
Solution: Observe that each production of the form
A wB
the corresponding transition will be
o(A, w) = B
Also, for each production of the form
A w,
we can introduce the transition
o(A, w) = q
f
, where q
f
is the final state.
The transitions obtained from grammar G is shown in the table.

Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 180
Observe that for each production of the form , make A as the final
state.
The transition diagram corresponding to this is shown below.


9.5 Regular Grammar from Finite Automata
9.5.1 Theorem:
Let M = (Q, E, o, q
0
, F) be a finite automata. If L is the regular language
accepted by FA, then there exists a right linear grammar G = (V
N
, V
T
, S, u)
so that L = L(G).
Proof: Let M = (Q, E, o, q
0
, F), where Q = {q
0
, q
1
, , q
n
}, E = {a
1
, a
2
, , a
m
}.
A regular grammar G = (V
N
, V
T
, S, u) can be constructed where
V
N
= {q
0
, q
1
, , q
n
}, V
T
= E, S = q
0
.
The set of productions u can be obtained as shown below.
Step 1: For each transition of the form o (q
i
, a) = q
j
the corresponding production is
q
i
aq
j

Step 2: If q e F, the final state in FA, then introduce the production q ..
Since these productions are obtained from the transitions defined for FA, the
language accepted by FA is also accepted by the grammar.
Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 181
9.5.2 Example:
Obtain a regular grammar from the following DFA given by the transition diagram.

Solution: For each transition of the form o (A, a) = B, introduce the
production A aB. If q e F (the final state), introduce the production A ..
The productions obtained from the transitions defined for DFA is shown
below.

From the diagram, it is clear that the state B is a final state.
Therefore we introduce the production .
The grammar G corresponding to the productions obtained is shown below.


Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 182

9.5.3 Example
Construct a regular grammar for the following DFA given by the transition
diagram.

Solution: For each transition of the form o (A, a) = B, introduce the
production A aB.
If q e F (the final state), introduce the production A .. The productions
obtained from the transitions defined for DFA is shown below.

Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 183
Since the set of final states: {S, A, B}, we introduce the productions S .,
A ., and B ..
Therefore the grammar G is:
G = (V
N
, V
T
, S, u) where
V
N
= {S, A, B, C}
V
T
= {a, b}

Observation: The finite automaton in this problem accepts strings of as
and bs except those containing the substring abb. Therefore from the
grammar G we can obtain a regular language which consist of strings of as
and bs without the substring abb.
9.5.4 Example
Obtain a right linear grammar for the regular expression ((aab)
*
ab)
*
, given
by the transition diagram.

The right linear grammar is given by
G = (V
N
, V
T
, S, u) where
V
N
= {S, A, B}
V
T
= {a, b}

Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 184

9.5.5 Note
The left linear grammar can be obtained from FA as follows.
Step 1: Obtain the reverse of given DFA.
Step 2: Obtain the right linear grammar from the reversed DFA.
Step 3: Obtain the left linear grammar from right linear grammar.
9.5.6 Example
Obtain a left linear grammar for the DFA shown below.

Step 1: Reverse the DFA. That is, A as the final state and C as the start
state and reverse the direction of the arrow. The reversed DFA is shown
below.

Step 2: obtain the right linear grammar for the above DFA. The
corresponding productions are shown below.

Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 185

Step 3: Reverse the productions of right linear grammar to get left linear
grammar.
If A abcdB is the production in right linear grammar, after reversing the
production will be of the form
A Bdcba.
The conversion of right linear grammar to the left linear grammar is shown
below.

Therefore the final left linear grammar is
G = (V
N
, V
T
, S, u) where
V
N
= {C, A, B}
V
T
= {0, 1}

Now we show that the string 10101 is accepted by DFA.



Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 186

Hence the left linear grammar obtained is equivalent to the given FA.

Self Assessment Questions
1. The regular expression (11)
*
stands for _______
2. The regular expression (01)
*
+ 1 stands for _____
3. The regular expression (0 + 10)*1* stands for ______
4. Obtain a left linear grammar for the regular expression ((aab)
*
ab)
*
.


9.6 Summary
In this unit special type of grammar called regular grammars were
considered. Different forms of regular expressions and the regular
expressions accepted by the language are given. We provided a method of
Fundamentals of Theory of Computer Science Unit 9
Sikkim Manipal University Page No.: 187
obtaining a regular grammar from the finite and automaton (and vice versa).
Sufficient number of examples were given.

9.7 Terminal Questions
1. Obtain a right linear grammar for the language L = {a
n
b
m
n > 2, m > 3}.
2. Obtain the left linear grammar for the right linear grammar shown below.
9.8 Answers
Self Assessment Questions
1. Set of strings consisting of even number of 1s.
2. The language consists of a string 1 or strings of (01)s that repeat zero
or more times.
3. Stings of 0s and 1s ending with any number of 1s (possible none).
4. G = (V
N
, V
T
, S, u) where V
N
= {A, B, S}, V
T
= {a, b}

You might also like