A regular expression is a set of strings of symbols that can be generated by a Regular Grammar using certain operations such as union, intersection and concatenation. Regular Expressions can be accepted both by deterministic as well as non-deterministic automata. After going through this unit, you will be able to - explain the concept of Regular Expressions - understand the regular expression accepted by the language.
A regular expression is a set of strings of symbols that can be generated by a Regular Grammar using certain operations such as union, intersection and concatenation. Regular Expressions can be accepted both by deterministic as well as non-deterministic automata. After going through this unit, you will be able to - explain the concept of Regular Expressions - understand the regular expression accepted by the language.
A regular expression is a set of strings of symbols that can be generated by a Regular Grammar using certain operations such as union, intersection and concatenation. Regular Expressions can be accepted both by deterministic as well as non-deterministic automata. After going through this unit, you will be able to - explain the concept of Regular Expressions - understand the regular expression accepted by the language.
Unit 9 Regular Expressions and Regular Languages Structure 9.1 Introduction Objectives 9.2 Regular expressions 9.3 Regular Expressions accepted by the Language 9.4 Finite Automaton from Regular Grammar 9.5 Regular Grammar from Finite Automata Self Assessment Questions 9.6 Summary 9.7 Terminal Questions 9.8 Answers
9.1 Introduction In this unit, you will learn about regular expressions along with finite automata, which act as a device for computing regular expressions. A regular expression is a set of strings of symbols that can be generated by a regular grammar using certain operations such as union, intersection and concatenation. A regular expression also follows different identities that is based on common mathematical operations such as addition and multiplication. These identities help simplify the regular expression. A regular expression can be accepted both by deterministic as well as non- deterministic automata. Objectives: After going through this unit, you will be able to - explain the concept of regular expressions - understand the regular expression accepted by the language. - Convert finite automata from regular grammar. - Convert regular grammar from finite automata. Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 170 9.2 Regular Expressions In computing, regular expressions are used to represent a set of strings and include symbols that are arranged using certain syntax rules. We can de regular expression R 1 using terminal symbols such as . and | that are elements of E. Some of the algebraic operations defined with regular expression are: 1. Union: The union of two regular expressions is also a regular expression. For example, if R 1 and R 2 are the two regular expressions, then the union R 1 + R 2 is also a regular expression. 2. Concatenation: The concatenation of two regular expressions is a regular expression. For example, if R 1 and R 2 are the two regular expressions, then the concatenation R 1 R 2 is also a regular expression. 3. Iteration: The iteration of a regular expression is also a regular expression. For example, if R 1 is a regular expression, then the iteration - 1 R is also a regular expression. 4. Order of evolution: The order of evolution of a regular expression is a regular expression. For example, if R 1 is a regular expression, then order of evolution (R 1 ) is also a regular expression. 9.2.1 Definition: A regular expression is recursively defined as follows. 1. | is a regular expression denoting an empty language. 2. . is a regular expression which indicates the language containing an empty string. 3. a is a regular expression which indicates the language containing only {a} 4. If R is a regular expression denoting the language L R and S is a regular expression denoting the language L S , then Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 171 a. R+S is a regular expression corresponding to the language L R L S . b. RS is a regular expression corresponding to the language L R .L S . c. R* is a regular expression corresponding to the language L R . 5. The expressions obtained by applying any of the rules from 1 to 4 are regular expressions. Note: If parentheses are not present in the regular expressions, then precedence of the operands is as follows: iteration, concatenation and union. First you need to perform the iteration operation, then the concatenation operation and finally the union operation. Note: Any set, which is represented by using a regular expression, is known as regular set. If the regular expression is R, then the regular set of R is L(R). 9.2.2 Example: Let x, y e E, where, - x represents the set {x} - x + y represents the set {x, y} - xy represents the set {xy} - x * represents the set {., x, xx, xxx, } - (x + y) * represents the set{x + y} *
9.3 Regular Expressions accepted by the Language 9.3.1 Example: Some examples of regular expressions and the language corresponding to these regular expressions are given here.
Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 172 Regular Expressions Meaning (a+b)* Set of strings of as and bs of any length including the NULL string. (a+b)*abb Set of strings of as and bs ending with the string abb ab(a+b)* Set of strings of as and bs starting with the string ab. (a+b)*aa(a+b)* Set of strings of as and bs having a sub string aa. a*b*c*
Set of string consisting of any number of as(may be empty string also) followed by any number of bs(may include empty string) followed by any number of cs(may include empty string). abc
Set of string consisting of at least one a followed by string consisting of at least one b followed by string consisting of at least one c. aa*bb*cc*
Set of strings consisting of at least one a followed by string consisting of at least one b followed by string consisting of at least one c. (a+b)* (a + bb) Set of strings of as and bs ending with either a or bb (aa)* (bb) * b Set of strings consisting of even number of as followed by odd number of bs. 9.3.2 Example Obtain a regular expression to accept a language consisting of strings of as and alternate as and bs. Solution: The alternate as and bs can be obtained by concatenating the string ab zero or more times which can be represented by the regular expression (ab) * and adding an optional b to the front and adding an optional a at the end as shown below: (. + b) (ab)* (. + a). Thus, the complete expression is given by (. + b) (ab)* (. + a)
Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 173 9.3.3 Note The expression can also be obtained as shown below: The as and bs can be generated using one of the following ways: i) (ab) * ii) b(ab) *
iii) (ba) * iv) a(ba) *
Therefore the expression to generate alternate as and bs can be obtained by taking the union of regular expressions as shown below: (ab)* + b(ab)* + (ba) * + a(ba) *
9.3.4 Example Obtain a regular expression to accept a language consisting of strings of 0s and 1s with at most one pair of consecutive 0s. Solution: It is clear from the statement that the string consisting of at most one pair of consecutive 0s may o begin with combination of any number of 1s and 01s represented by (1 + 01) *
o end with any number of 1s represented by 1 * . Therefore the complete regular expression which consists of strings 0s and 1s with at most one pair of consecutive 0s is given by (1 + 01) * 00 1 * . 9.3.5 Example Obtain a regular expression to accept a language containing at least one a and at least one b where E = {a, b, c}. Solution: Strings of as, bs and cs can be generated using the regular expression (a + b + c)*. Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 174 But this string should have at least one a and at least one b. There are two cases to be considered: First a preceding b which can be represented using c*a(a + c)*b First b preceding a which can be represented using c*b(b + c)*a The regular expression (a + b + c)* can be preceded by one of the regular expressions considered in the two cases just discussed. Therefore the final regular expression is c*a(a + c)*b(a + b + c) * +c*b(b + c)*a(a + b + c)* This expression can also be written as shown below: [c*a(a+c)*b + c*b(b4c)*a] (a+b+c)* 9.3.6 Example Obtain a regular expression to accept a language consisting of strings of as and bs of even length. Solution: String of as and bs of even length can be obtained by the combination of the strings aa, ab, ba and bb. The language may even consist of an empty string denoted by .. Therefore the regular expression can be of the form (aa + ab + ba + bb)* The * closure includes the empty string. The language corresponding to the regular expression is denoted by L(R)={(aa + ab + ba + bb) n n > 0}.
Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 175 9.3.7 Example Obtain a regular expression to accept a language consisting of strings of as and bs of odd length. Solution: String of as and bs of odd length can be obtained by the combination of the strings aa, ab, ba and bb followed by either a or b. Therefore the regular expression can be of the form (aa + ab + ba + bb)* (a + b) String of as and bs of odd length can also be obtained by the combination of the strings aa, ab, ba and bb preceded by either a or b. Therefore the regular expression can also be represented as (a + b) (aa + ab + ba + bb)*. Observation: Even though these two expressions seem to be different, the language corresponding to these two expressions is same. 9.3.8 Example Obtain a regular expression such that L(R) = {w w e {0, 1}* with at least three consecutive 0s. Solution: A string consisting of 0s and ls can be represented by the regular expression (0 + 1)* This arbitrary string can precede three consecutive zeros and can follow three consecutive zeros. Therefore the regular expression can be written as (0 +1)* 000(0+1)*. The language corresponding to the regular expression can be written as L(R) = { (0 + 1) m 000(0+1) n m > 0 and n > 0}.
Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 176 9.4 Finite Automaton from Regular Grammar 9.4.1 Definition A grammar G = (V N , V T , S, u) is said to be regular grammar the grammar is right regular or left regular. A grammar G is said to be right regular if all the productions are of the form A wB and / or A w, where A, B e V T and w e V T * . A grammar G is said to be left regular if all the productions are of the form A Bw and / or A w, where A, B e V T and w e V T * . 9.4.2 Example (i) The grammar with the set of productions S aaB bbA . A aA b B bB a . is a right linear grammar. (ii) The grammar with the set of productions S Baa Abb . A Aa b B Bb a . is a left linear grammar. 9.4.3 Definition A grammar which has at most one non terminal on the right side of any production without restriction on the position of this non terminal (observe that: non terminal can be leftmost or rightmost) is called linear grammar. 9.4.4 Theorem Let G = (V N , V T , S, u) be a right linear grammar. Then there exists a language L(G) which is accepted by a finite automata, that is, the language generated from the regular grammar is a regular language. Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 177 Proof: Let V = {q 0 , q 1 , } be the variables and S = q 0 be the start state. Let the productions in the grammar be q 0 x 1 q 1
q 1 x 2 q 2
q 2 x 3 q 3
q n x n+1
Assume that the language L(G) generated from these productions is w. Corresponding to each production in the grammar we can have equivalent transitions in the FA to accept the string w. After accepting the string wm the FA will be in the final state. The procedure to obtain FA from these productions is given below. Step 1: The start symbol q 0 in the grammar is the start state of FA. Step 2: For each production of the form q I wq j the corresponding transition defined will be of the form o*(q i , w) = q j . Step 3: For each production of the form q i w, the corresponding transition defined will be of the form o*(q i , w) = q f , where q f is the final state. Since the string w e L(G) is also accepted by FA, by applying the transitions obtained in step 1 through 3, the language is regular.
Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 178 9.4.5 Problem: Construct a DFA and the transition diagram, to accept the language generated by the following grammar. S 01A A 10B B 0A 11 Solution: Observe that each production of the form A wB the corresponding transition will be o(A, w) = B Also, for each production of the form A w, we can introduce the transition o(A, w) = q f , where q f is the final state. The transitions obtained from grammar G is shown in the table.
The transition diagram is shown below.
The DFA is M = (Q, E, o, q 0 , F) where Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 179 Q = {S, A, B, q f , q 1 , q 2 , q 3 }, E = {0, 1}, q 0 = S (start state), F = {q f }, o is shown in the table. Here, the additional vertices (states) introduced are q 1 , q 2 , q 3 . 9.4.6 Problem: Construct DFA and the corresponding transition diagram to accept the language generated by the following grammar. S aA . A aA bB . B bB . Solution: Observe that each production of the form A wB the corresponding transition will be o(A, w) = B Also, for each production of the form A w, we can introduce the transition o(A, w) = q f , where q f is the final state. The transitions obtained from grammar G is shown in the table.
Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 180 Observe that for each production of the form , make A as the final state. The transition diagram corresponding to this is shown below.
9.5 Regular Grammar from Finite Automata 9.5.1 Theorem: Let M = (Q, E, o, q 0 , F) be a finite automata. If L is the regular language accepted by FA, then there exists a right linear grammar G = (V N , V T , S, u) so that L = L(G). Proof: Let M = (Q, E, o, q 0 , F), where Q = {q 0 , q 1 , , q n }, E = {a 1 , a 2 , , a m }. A regular grammar G = (V N , V T , S, u) can be constructed where V N = {q 0 , q 1 , , q n }, V T = E, S = q 0 . The set of productions u can be obtained as shown below. Step 1: For each transition of the form o (q i , a) = q j the corresponding production is q i aq j
Step 2: If q e F, the final state in FA, then introduce the production q .. Since these productions are obtained from the transitions defined for FA, the language accepted by FA is also accepted by the grammar. Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 181 9.5.2 Example: Obtain a regular grammar from the following DFA given by the transition diagram.
Solution: For each transition of the form o (A, a) = B, introduce the production A aB. If q e F (the final state), introduce the production A .. The productions obtained from the transitions defined for DFA is shown below.
From the diagram, it is clear that the state B is a final state. Therefore we introduce the production . The grammar G corresponding to the productions obtained is shown below.
Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 182
9.5.3 Example Construct a regular grammar for the following DFA given by the transition diagram.
Solution: For each transition of the form o (A, a) = B, introduce the production A aB. If q e F (the final state), introduce the production A .. The productions obtained from the transitions defined for DFA is shown below.
Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 183 Since the set of final states: {S, A, B}, we introduce the productions S ., A ., and B .. Therefore the grammar G is: G = (V N , V T , S, u) where V N = {S, A, B, C} V T = {a, b}
Observation: The finite automaton in this problem accepts strings of as and bs except those containing the substring abb. Therefore from the grammar G we can obtain a regular language which consist of strings of as and bs without the substring abb. 9.5.4 Example Obtain a right linear grammar for the regular expression ((aab) * ab) * , given by the transition diagram.
The right linear grammar is given by G = (V N , V T , S, u) where V N = {S, A, B} V T = {a, b}
Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 184
9.5.5 Note The left linear grammar can be obtained from FA as follows. Step 1: Obtain the reverse of given DFA. Step 2: Obtain the right linear grammar from the reversed DFA. Step 3: Obtain the left linear grammar from right linear grammar. 9.5.6 Example Obtain a left linear grammar for the DFA shown below.
Step 1: Reverse the DFA. That is, A as the final state and C as the start state and reverse the direction of the arrow. The reversed DFA is shown below.
Step 2: obtain the right linear grammar for the above DFA. The corresponding productions are shown below.
Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 185
Step 3: Reverse the productions of right linear grammar to get left linear grammar. If A abcdB is the production in right linear grammar, after reversing the production will be of the form A Bdcba. The conversion of right linear grammar to the left linear grammar is shown below.
Therefore the final left linear grammar is G = (V N , V T , S, u) where V N = {C, A, B} V T = {0, 1}
Now we show that the string 10101 is accepted by DFA.
Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 186
Hence the left linear grammar obtained is equivalent to the given FA.
Self Assessment Questions 1. The regular expression (11) * stands for _______ 2. The regular expression (01) * + 1 stands for _____ 3. The regular expression (0 + 10)*1* stands for ______ 4. Obtain a left linear grammar for the regular expression ((aab) * ab) * .
9.6 Summary In this unit special type of grammar called regular grammars were considered. Different forms of regular expressions and the regular expressions accepted by the language are given. We provided a method of Fundamentals of Theory of Computer Science Unit 9 Sikkim Manipal University Page No.: 187 obtaining a regular grammar from the finite and automaton (and vice versa). Sufficient number of examples were given.
9.7 Terminal Questions 1. Obtain a right linear grammar for the language L = {a n b m n > 2, m > 3}. 2. Obtain the left linear grammar for the right linear grammar shown below. 9.8 Answers Self Assessment Questions 1. Set of strings consisting of even number of 1s. 2. The language consists of a string 1 or strings of (01)s that repeat zero or more times. 3. Stings of 0s and 1s ending with any number of 1s (possible none). 4. G = (V N , V T , S, u) where V N = {A, B, S}, V T = {a, b}