You are on page 1of 54

Foundations of Computer

Science (COSC-3302),
Lecture 2 (prepared after
Chapter 2 of Martins 2011
Stefan Andrei
textbook)

04/22/16

Lecture 2 COSC 3302

Course content
1.1. Mathematical Tools and Techniques (Chapter 1)
1.2. Finite Automata and the Languages they Accept (Chapter 2)
1.3. Regular Expressions, Nondeterminism, and Kleenes Theorem (Chapter 3)
Exam 1
2.1. Context-Free Languages (Chapter 4)
2.2. Pushdown Automata (Chapter 5)
2.3. Context-Free and Non-Context-Free Languages (Chapter 6)
Exam 2
3.1. Turing Machines (Chapter 7)
3.2. Recursively Enumerable Languages (Chapter 8)
3.3. Undecidable Problems (Chapter 9)
3.4. Computable Functions (Chapter 10)
3.5. Introduction to Computational Complexity (Chapter 11)
Exam 3
04/22/16

Lecture 2 COSC 3302

Overview of Previous
Lecture

Mathematical Tools and Techniques

1.

Logic and Proofs


Sets
Functions and Equivalence Relations
Languages
Recursive Definitions
Structural Induction

2.
3.
4.
5.
6.

04/22/16

Lecture 2 COSC 3302

Finite Automata and the


Languages They Accept
1.
2.

3.
4.
5.

6.

Finite Automata: Examples and Definitions


Accepting the Union, Intersection, or Difference of
Two Languages
Distinguishing One String from Another
The Pumping Lemma
How to Build a Simple Computer Using Equivalence
Classes
Minimizing the Number of States in a Finite
Automaton

04/22/16

Lecture 2 COSC 3302

Finite Automata: Examples


and Definitions

A finite automaton is a simple type of computer.

Its output is limited to yes or no.


It has very primitive memory capabilities.

Any computer that answers yes or no acts as a


language acceptor.
For this chapter, consider that:

The input comes in the form of a string of individual


input symbols.
The computer gives an answer for the current string
(the string of symbols that have been read so far).

04/22/16

Lecture 2 COSC 3302

Finite Automata: Examples


and Definitions (contd.)

A finite automaton (FA) or finite state machine is


always in one of a finite number of states.
At each step it makes a move that depends only on
the state its currently in and the input symbol it gets.
Well see later that these FAs are deterministic (DFA).
The move is to enter a particular state (possibly the
same as the one it was already in).
States are either accepting or nonaccepting.

Entering an accepting state means answering yes


Entering a nonaccepting state means no.

A FA has an initial state, which is an accepting state if


and only if the language the FA accepts includes .
04/22/16

Lecture 2 COSC 3302

Finite Automata: Examples


and Definitions (contd.)

An FA can be described by the set of states, the


input alphabet, the initial state, the set of
accepting states, and a transition function.

This can be described by a diagram or a table of


values.
In a diagram, states are represented by circles,
transitions by arrows labeled with input symbols, and
accepting states by double circles.

04/22/16

Lecture 2 COSC 3302

Finite Automata: Examples


and Definitions (contd.)

This FA accepts the language of strings that


end in aa:

The three states represent strings that end with


no as, one a, and two as, respectively.
From each state, if the input is anything but an a,
go back to the initial state, because now the
current string doesnt end with a.

Lecture 2 COSC 3302

Finite Automata: Examples


and Definitions (contd.)

This FA accepts the strings containing b and not


aa.

The idea is to go to a permanently-non-accepting state if


you ever read two as in a row.
Go to an accepting state if you see a b (and havent read
two as), leave it when you see anything else.

Lecture 2 COSC 3302

Finite Automata: Examples


and Definitions (contd.)

This FA accepts strings that contain abbaab.


What do we do when a prefix of abbaab has been
read but the next symbol doesnt match?

Go back to the state representing the longest prefix of


abbaab at the end of the new current string.
If weve read abba and the next symbol is b, go to q2,
because ab is the longest prefix at the end of abbab.
Lecture 2 COSC 3302

10

Finite Automata: Examples


and Definitions (contd.)

A FA that accepts binary


representation of integers
divisible by 3:

States 0, 1, and 2 represent the


current remainder.
The initial state is non-accepting:
at least one bit is required.
Leading zeros are prohibited.
Transitions represent
multiplication by two, then
addition of the input bit.
Lecture 2 COSC 3302

11

Finite Automata: Examples


and Definitions (contd.)

FAs are ideally suited for lexical analysis, the first stage in
compiling a computer program.
Real-world example: the drink machine from the
hallway!
A lexical analyzer takes a string of characters and provides
a string of tokens.
Tokens have a simple structure: e.g., 41.3, main.
The next slide shows an FA that accepts tokens for a
simple language based on C:
The only tokens are identifiers, semicolons, =, aa, and
numeric literals; tokens are separated by spaces.
Accepting states represent scanned tokens; each
accepting state represents a category of token.

04/22/16

Lecture 2 COSC 3302

12

Finite Automata: Examples


and Definitions (contd.)

D is any digit.
L is a lowercase letter
other than a.
M is D or L.
N is D or L or a.
is a space.
All transitions not
shown explicitly go to
an error state and
stay there.

04/22/16

Lecture 2 COSC 3302

13

Finite Automata: Examples


and Definitions (contd.)

Definition: a finite automaton is a 5-tuple


M = (Q, , q0, A, ), where:

Q is a finite set of states


is a finite input alphabet
q0 Q is the initial state

A Q is the set of accepting states

: Q Q is the transition function.

Note: From state q the machine will move to


state (q, ) if it receives input symbol .

In other words, the destination state is unique hence


FA is deterministic (DFA).

04/22/16

Lecture 2 COSC 3302

14

04/22/16

Lecture 2 COSC 3302

15

04/22/16

Lecture 2 COSC 3302

16

Finite Automata: Examples


and Definitions (contd.)

The notation *(q, x) describes the state the FA


is in after starting in state q and receiving input
string x. The extended transition function * : Q
* Q is defined recursively as follows:

For every q Q, *(q, ) = q


For every q Q, every y *, and every ,
*(q, y) = (*(q, y), ).
This just says that if you already know how to
process the string y, then to process one additional
symbol , you use the ordinary transition function
starting from the state *(q, y).

04/22/16

Lecture 2 COSC 3302

17

Finite Automata:
Examples and
Definitions (cont)

Suppose we wanted to evaluate *(q0, baa) in the example


above.
This is easy using the diagram (follow the arrows), but lets
use the recursive formula on the previous slide, to see how it
works.
*(q0, baa) = (*(q0, ba), a) = ((*(q0, b), a), a)
= ((*(q0, b), a), a)
= (((*(q0, ), b), a), a)
= (((q0, b), a), a) = ((q0, a), a)
= (q1, a) = q1
We had to look at the diagram only in the last 3 steps, to get
the values of .
04/22/16

Lecture 2 COSC 3302

18

Finite Automata: Examples


and Definitions (contd.)

Definition:

Let M=(Q, , q0, A, ) be an FA, and let x


*. Then x is accepted by M if *(q0, x) A
and rejected otherwise.

The language accepted by M is


L(M) = {x * | x is accepted by
M}.

04/22/16

Lecture 2 COSC 3302

19

Accepting the Union,


Intersection, or Difference of
Suppose that L and L are languages over
Two Languages
1

Given an FA that accepts L1 and another that


accepts
L2, we can construct one that accepts L1 L2.

The same approach works for intersection and


difference as well.
The idea is to construct an FA that executes both
of the original FAs at the same time.
This works because if x *, then knowing
whether x L1 and whether x L2 is enough to
determine
whether x L1 L2.

04/22/16

Lecture 2 COSC 3302

20

Accepting the Union,


Intersection, or Difference of
Theorem: Suppose M =(Q , , q , A , ) and
Two
Languages (contd.)
M =(Q , , q , A , ) are FAs accepting L and L .

Let M=(Q, , q0, A, ) be defined as follows:

Q = Q1 Q2

q0 = (q1, q2)

((p, q), ) = (1(p, ), 2(q, ))

Then, if:
A = {(p, q) | p A or q A }, M accepts L L
1
2
1
2

A = {(p, q) | p A1 and q A2}, M accepts L1 L2


A = {(p, q) | p A1 and q A2}, M accepts L1 - L2

04/22/16

Lecture 2 COSC 3302

21

Accepting the Union,


Intersection, or Difference of
Given two machines, create the Cartesian product of

Two
Languages (contd.)
the state sets, and draw the necessary transitions.

04/22/16

Lecture 2 COSC 3302

22

Accepting the Union,


Intersection, or Difference of
Simplify the resulting machine, if possible, and
Two Languages (contd.)

designate the appropriate accepting states.


The machine below accepts the union of the
two languages.

Lecture 2 COSC 3302

23

Accepting the Union,


Intersection, or Difference of
For the intersection, we
Two
Languages
(contd.)
can simplify further,

and we end up with the


machine on the right.
The simplification
involved turning states
CP, CQ, and CR into a
single state (none of
them was accepting,
and there was no way
to leave them).
Lecture 2 COSC 3302

24

Distinguishing One
String from Another

Any three-state FA, such as the one that accepts the strings
ending in aa, ignores, or forgets, a lot of information:
aba and aabbabbabaaaba lead to the same state (q ); there
1
is no way for the FA to remember which string has been
seen.
aba and ab, however, lead to different states; the essential
difference is that one ends with a and the other does not.
aba and ab are distinguishable with respect to the language
accepted by the FA; there is at least one string z (such as a)
so that abaz is in the language (i.e., is accepted) and abz is
not, or vice versa.
(get a *) Do ababa and babab go to the same state? Justify.

04/22/16

Lecture 2 COSC 3302

25

Distinguishing One String from


Another (contd.)

Definition: If L is a language over the alphabet , and x and


y are strings in *, then x and y are distinguishable with
respect to L, or L-distinguishable, if there is a string z *
such that either xz L and yz L, or xz L and yz L.
A string having this property is said to distinguish x and y
with respect to L. Another term is disagree.
Equivalently, x and y are L-distinguishable if L/x L/y,
where L/x = {z * | xz L}.
Real-world example: If L = {4092345678, 4091234567,
}, then L/409 = {2345678, 1234567, } represents the
phone numbers to be called from Beaumont/Galveston area.
We use numbers without prefix when we call from
Beaumont/Galveston.
04/22/16

Lecture 2 COSC 3302

26

Distinguishing One String from


Another (contd.)

Theorem: Suppose M=(Q, , q0, A, ) is an FA


accepting L *.
1.

2.

If x and y are two strings in * that are Ldistinguishable, then *(q0, x) *(q0, y).
For every n 2, if there is a set of n pairwise Ldistinguishable strings in *, then Q must contain
at least n states.
Example: This theorem shows why we need at
least three states in any machine that accepts the
language L of strings ending in aa: {, a, aa}
contains 3 pairwise L-distinguishable strings.

04/22/16

Lecture 2 COSC 3302

27

Distinguishing One String from


Another (contd.)

To create an FA to accept L = {aa, aab}*{b}, we


notice first that is not in L, b is in L, a is not, and
and a are L-distinguishable (for example, b L, ab
L).

We need at least the states in the first diagram.


L contains b but nothing else that begins with b, so we add a
state s to take care of illegal prefixes.
If the input starts with aa we, need to leave state p because
a and aa are L-distinguishable; create state t.

28

Distinguishing One String from


Another (contd.)

(t, b) must be accepting,


because aab L; call that new
state u.
Let (u, b) be r, because aabb is
in L but not a prefix of any other
string in L.
States t and u can be thought of
as representing the end of an
occurrence of aa or aab; if the
next symbol is a its the start of a
new occurrence, so go back to p.
The result is shown here.
Lecture 2 COSC 3302

29

The Pumping Lemma

Suppose that M = (Q, S, q0, A, ) is an FA


accepting L and that it has n states.

If it accepts a string x such that |x| n, then by the time


n symbols have been read, M must have entered some
state more than once; i.e., there must be two different
prefixes u and uv such that *(q0,u)= *(q0,uv).

Lecture 2 COSC 3302

30

The Pumping Lemma


(contd.)

This implies that there are many more


strings in L, because we can traverse the
loop v any number of times (including
leaving it out altogether).
In other words, all of the strings uviw for i
0 are in L.
This fact is known as the Pumping Lemma
for Regular Languages.

04/22/16

Lecture 2 COSC 3302

31

The Pumping Lemma


(contd.)
Theorem: Suppose L is a language over .

If L is accepted by the FA M=(Q, , q0, A, ), then


there is an integer n so that for every x in L
satisfying |x| n, there are three strings u, v,
and w such that x = uvw and
1.
|uv| n
2.
|v| > 0 (i.e., v )
3.
For every i 0, the string uviw belongs to L.
The way we found n was to take the number of
states in an FA accepting L.
In many applications we do not need to know
this, only that there is such an n.

04/22/16

Lecture 2 COSC 3302

32

The Pumping Lemma


(contd.)

The most common application of the pumping


lemma is to show that a language cannot be
accepted by an FA, because it doesnt have the
properties that the pumping lemma says are
required for every language that can be.
The proof is by contradiction.

We suppose that the language can be accepted by an


FA, and we let n be the integer in the pumping lemma.

Then we choose a string x with |x| n to which


we can apply the lemma so as to get a
contradiction.

04/22/16

Lecture 2 COSC 3302

33

The Pumping Lemma


Let L be the language AnBn = {a b | i 0}; let us prove
(contd.)

that it cannot be accepted by an FA.


Suppose, for the sake of contradiction, that L is
accepted by an FA; let n be as in the pumping lemma.
Choose x = anbn; then x L and |x| n.
Therefore, by the pumping lemma, there are strings
u, v, and w such that x = uvw and the 3 conditions
hold.
Because |uv| n and x starts with n as, all the
symbols in u and v are as; therefore, v = ak for some
k > 0.
uvvw L, so an+kbn L. This is our contradiction, and
we conclude that L cannot be accepted by an FA.

04/22/16

Lecture 2 COSC 3302

34

The Pumping Lemma


(contd.)
Let us show L = {a | i 0} is not accepted
i2

by an FA.

Suppose L is accepted by an FA, and let n be the


integer in the pumping lemma.
Choose x = an2.
x = uvw, where 0 < |v| n.
Then n2 = |uvw| < |uv2w| = n2 + |v| n2 + n <
(n+1)2.
This is a contradiction, because |uv2w| must be i2
for some integer i (because uv2w L), but there
is no integer i whose square is strictly between
n2 and (n+1)2.

04/22/16

Lecture 2 COSC 3302

35

The Pumping Lemma


(contd.)

Examples: There are other languages that


are not accepted by any FA, among them:
1.

2.

3.

Balanced, the set of balanced strings of


parentheses
Expr, the language of simple algebraic
expressions
The set L of legal/valid C programs.

In all three examples, because of the nature


of these languages, a proof using the
pumping lemma might look a lot like the
proof for AnBn, our first example.

04/22/16

Lecture 2 COSC 3302

36

The Pumping Lemma


(contd.)

We can formulate several decision problems


involving the language L accepted by an FA

The membership problem (Given x, is x L?)


Given an n-state FA M, is the language L(M) empty?

It follows from the pumping lemma that this an be solved


by looking at all possible strings of length 0 to
n -1; if
none of those is accepted, the language is empty.

Given an n-state FA M, is L(M) infinite?

04/22/16

The pumping lemma implies that the language is infinite if


and only if at least one of the strings with length from n to
2n -1 is accepted.

Lecture 2 COSC 3302

37

How to Build a Simple


Computer Using Equivalence
Consider M, an FA accepting the language of strings
Classes
ending in aa.
(get a *) Weve shown that three states are needed; why
is this enough?
We can confirm that M really does accept L by
showing that if x and y are two strings that cause M to
end in the same state, then M does not need to
distinguish them, because they are not Ldistinguishable.
The three states of M correspond to three sets of
strings: those not ending in a, those ending in a but
not aa, and those ending in aa.
04/22/16

Lecture 2 COSC 3302

38

How to Build a Simple


Computer Using Equivalence
No two strings in any one of these sets are LClasses (contd.)
distinguishable,
and a string in one of these sets is

distinguishable from a string in any other one.


These two facts say that the three sets are the
equivalence classes for a certain equivalence relation.
Definition: For a language L {a, b}*, we define the
relation IL on * as follows:
For x, y, *, x I y if and only if x and y are LL
indistinguishable.
Note: IL is an equivalence relation and we can start
building an FA accepting L by using the induced
equivalence classes as its states.
04/22/16

Lecture 2 COSC 3302

39

How to Build a Simple


Computer Using Equivalence
The initial state
should be [], because when we start
Classes
(contd.)

we have received no input.


The accepting state should be the equivalence class
of strings ending in aa, since thats the language we
want to accept.
The transitions are defined in a natural way:
1.
Take any element x of one class, and consider xa
or xb.
2.
The new string is in some equivalence class.
3.
The a-transition or b-transition from [x] simply
goes to that class.

04/22/16

Lecture 2 COSC 3302

40

How to Build a Simple


Computer Using Equivalence
Theorem (Myhill-Nerode):
Classes
(contd.) L * can be accepted

by an FA if and only if the set QL of equivalence


classes of the relation IL (that is, */IL) on * is finite.

If the set QL is finite, then the finite automaton


ML = (QL, , q0, A, ) accepts L, where:
1.
2.
3.

q0 = [],
A = {q QL | q L},
For every x * and every , ([x], ) = [x ].

Note: ML has the fewest states of any FA accepting


L.

04/22/16

Lecture 2 COSC 3302

41

How to Build a Simple


Computer Using Equivalence
It is often easier
to construct an FA directly than to
Classes
(contd.)
determine the set of equivalence classes.
The above theorem serves to answer the question of how
much a computer accepting a language L needs to
remember about the current string x: only its equivalence
class.
Identifying the equivalence classes, if we already have an
FA accepting L, is not too hard.
1. For each state q, we define Lq = {x * | *(q0, x) = q}.
2.

Every Lq is a subset of some equivalence class of IL (is


the whole class if the FA has as few states as possible).

04/22/16

Lecture 2 COSC 3302

42

How to Build a Simple


Computer Using Equivalence
Classes
(contd.)
The Myhill-Nerode
Theorem provides a way of

showing a language cannot be accepted by an FA


(and it might apply even if the pumping lemma
doesnt, since its an if-and-only-if result).
Consider the equivalence classes of IL for L = AnBn:

for i j, ai and aj are L-distinguishable, because


aibi L and ajbi L.
This implies that there are an infinite number of
equivalence classes, and thus there can be no FA
accepting L.

04/22/16

Lecture 2 COSC 3302

43

Minimizing the Number of


States in a Finite Automaton

Suppose M = (Q, , q0, A, ) accepts L *.

Define Lq = { x * | *(q0, x) = q}.

The first step in minimizing the number of states


is to remove every state q for which Lq = ,
along with transitions from these states;
removing them has no effect on the language.
Now define on Q: p q means that strings in
Lp are L-indistinguishable from strings in Lq.
This is the same as saying Lp and Lq are subsets
of the same equivalence class of IL.

04/22/16

Lecture 2 COSC 3302

44

Minimizing the Number of


States in a Finite Automaton
Two strings x and y are L-distinguishable if, for
(contd.)
some string z, exactly one of xz, yz is in L.

Therefore, p q if, for some string z, exactly one


of the states *(p, z), *(q, z) is in A.
Define SM as the set of unordered pairs (p, q) of
distinct states satisfying p q.
A systematic way of finding SM is this:

If exactly one of p, q is in A, then (p, q) SM.


For every pair of states r and s, and every
symbol , if ((r, ), (s, )) SM, then (r, s) SM.

04/22/16

Lecture 2 COSC 3302

45

Minimizing the Number of


States in a Finite Automaton
An algorithm to identify the pairs (p, q) with p q:
(contd.)

1.
2.
3.

4.

5.
6.

04/22/16

List all unordered pairs (p, q) of distinct states.


Make a sequence of passes through these pairs:
On the first pass, mark each pair (p, q) so that
exactly one of the two states is in A.
On each subsequent pass, and each unmarked
pair
(r, s), if (r, ) = p and (s, ) = q for some
, and
(p, q) is marked, then mark (r, s).
After a pass in which no pairs are marked, stop.
The marked pairs are the pairs (p, q) for which p
q.
Lecture 2 COSC 3302

46

Minimizing the Number of


States in a Finite Automaton
When the algorithm terminates, the unmarked pairs
(contd.)

(p, q) represent two states that can be combined


into one.
Make one final pass through the states.
The first state represents a state in the new
minimal FA.
Every subsequent state q determines a new state
only if (p, q) is marked for every p considered
previously.
Once we have the states in the new minimum-state
FA, determining the transitions is straightforward.

04/22/16

Lecture 2 COSC 3302

47

04/22/16

Lecture 2 COSC 3302

48

Let us consider the following FA with five states (A initial, E accepting):

The table DISTINCT after executing step (1) and one iteration of step (2) is:

04/22/16

Lecture 2 COSC 3302

49

The table DISTINCT after executing another iteration of step (2) is:

The minimized automata is:

04/22/16

Lecture 2 COSC 3302

50

Summary

Finite Automata and the Languages They Accept

1.

Finite Automata: Examples and Definitions


Accepting the Union, Intersection, or Difference of Two
Languages
Distinguishing One String from Another
The Pumping Lemma
How to Build a Simple Computer Using Equivalence
Classes
Minimizing the Number of States in a Finite Automaton

2.

3.
4.
5.

6.

04/22/16

Lecture 2 COSC 3302

51

Reading suggestions

From [Martin; 2011]


Chapter 2 (Finite Automata and the Languages
They Accept)

04/22/16

Lecture 2 COSC 3302

52

Coming up next

From [Martin; 2011]:


Chapter 3 (Regular Expressions,
Nondeterminism, and Kleenes Theorem)

04/22/16

Lecture 2 COSC 3302

53

Thank you for your


attention!
Questions?

04/22/16

Lecture 2 COSC 3302

54

You might also like