You are on page 1of 14

cover

Models of Computation:
Automata and Formal Languages

Sam Myo Kim


Kee Young Yoo
Kyungpook National University
Preface
This book is designed as an introductory course on automata and formal languages for the second or third year of an
undergraduate computer science or computer engineering curriculum. Departments often shy away from courses in this
area because of the misunderstanding that such theoretical disciplines end up being mere theory with no tangible
applications. The advancement of Internet and multimedia computing has made computer science more vocational and
application oriented. Interestingly, however, such advancement also brings these theoretical topics closer to the practical
realm because they provide efficient conceptual tools and foster the ability to use them for applications.
When we say “Tom has a good eye for people,” it means that Tom has the ability to identify important characteristics of
a person that might influence his or her relationship with others. Tom can abstract away superficial details such as the
person’s clothes, make-up, weight etc., while attending carefully to kindness, patience, leadership, or other attributes, that
are important (and would remain unchanged) for the personal relationship. Tom’s talent is in the capability to abstract
and thereby focus on important personal characteristics of the persons he meets.
Our scientific and engineering perspective on computers and computation can be nurtured by studying the concepts and
principles of computing. Today, computer science is so diversified that the use of computers involves a large number of
specific details that are constantly evolving. However, under this “bush” of details, there are fundamental concepts and
principles that do not change. To study these principles and concepts, we construct abstract models that embody the
important features of computing and computer systems. Such models are powerful conceptual tools that practitioners use
in computer engineering.
Automata are the models for computers, and formal languages are the models for programming languages. In this book
we will learn how to build, manipulate, and use them. We will also learn the relationships among different models. We
will start our discussion with the definition of formal languages and their grammars and other models for language
representation. Then our topic will switch to automata, i.e., the machine models for language recognition. Finally, we will
discuss the relationship between the languages and the automata. This last part of the subject presents the beautifully
hierarchical classification of various models, the Chomsky Hierarchy (named after Noam Chomsky, who first designed
the formal languages).

ii
Preface

Models of Language
Generation and Recognition

iii
Preface

The knowledge that we will learn in this course has so permeated other areas of computer science (and even other disciplines)
that virtually all of it has practical application. The following page shows just a few application examples that you can easily find
in other CS text books on your bookshelf. In the near future, you will be dealing with string searching and pattern matching on
the Internet, and you will be reminded of the finite state automata and regular expressions that you will learn in this course. To
design a new programming language that can be used in a small embedded system, you would review the topics for the formal
languages. Most importantly, you would be using the ability, which you might have nurtured while studying the subject, to reason
and express precisely in dealing with the problems in your profession.
Theory is the product of abstraction, the process of conceptualization of important features that are common to a wide range of
special topics. Automata are the abstract models for computers that embody the essential features for computing. Thus it is not
surprising to see that automata theory is applicable to solving many problems that we encounter while dealing with real
computers. So it is with the formal languages as well. We will learn that programming languages, such as C++, HTML, XML,
etc., are formal languages.
While the most of the material of this book is consistent with the usual content of an introductory course for automata and
formal languages, the novelty of the book is in its unique way of presentation. In a course for automata and formal languages, it
has been traditional to study the material bottom-up. Usually it starts with the lowest level models, finite automata and regular
languages, and moves up to the higher level. The contents of this book are organized the other way around. It begins with a quick
introduction to the higher level models; the Turing machines in the topic for automata, and their languages in the topic for formal
languages. Then it moves down to the models in the lower level; context-sensitive languages (linear bounded automata), context-
free languages (pushdown automata) and regular languages (finite automata) for the language models (respective, for the
machine models). In this layout, the lower level models are presented as restricted ones in the upper level. This approach helps
the reader see more clearly the whole picture of the computational models and their relationships.

iv
Preface
Application examples
Running block
0,1/ 0 1,2 / 1 dispatch
0/1
X Y
timer run out
0 2

Ready
2/0 Blocked
wakeup

(a) Digital system: state diagram of a relay (b) Operating systems : process state transition
c

c
not (c or u) stmt
[c] u
start c
[cu] if
expr then
not (c or p) stmt

p E1
any if expr stmt
then else
stmt
[cup] S2
not(c) E2 S1

(c) Web browsing: pattern matching for “cup” (d) Compiler: constructing a parse tree

v
To help the reader fully grasp important concepts, at the end of each chapter they are brought up again under the title of
“rumination” with more examples and illustrations. The manuscript carries six appendices containing some challenging material
that is too heavy to present in the main chapters.
The book is written in the PowerPoint form, with its wealth of figures, often animated, that would help the reader understand
the topics more easily. To deliver the subject with this manuscript as a text, the instructor need not develop class presentation
material. For the convenience of class presentation, whenever a figure or a formula is needed to refer, a copy of it is presented in
the current slide instead.
Texts for a theory course under the title of automata and formal languages usually include the topics on the computational
complexity and unsolvable problems. I have not included these topics because they are too advanced to study in a course intended
for the students in the second or third year of their undergraduate computer science curriculum. I usually teach Chapters 1 through
14 in one-semester course for students in their second year. Other options are possible; for example, (a) Chapters 1 through 10,
and 14, or (b) Chapters 1 through 12 and 14. For an upper or graduate level course, the full text could be covered in depth.

Acknowledgements
We would like to thank the following people: Professor Oscar Ibarra who had taught Sam, the first author, and nurtured his
knowledge in the field, Professor Robert McNaughton who helped him to grow it to its maturity. We thank Professor Mukkai
Krishnamoorthy who reviewed the early version of the material and suggested interesting exercise problems. We also extend thanks
to all former students (of CS Departments of Rensselaer Polytechnic Institute and CE Department of Kyungpook National
University) who gave valuable criticism while taking the course with an early draft. The book is still far from perfect. Any
remaining inadequacies are the responsibility of the authors. Please send any feedback to kims351535@yahoo.co.kr

S. M. Kim
K. Y. Yoo
Kyungpook National University
Daegu, Korea
December, 2009

vi
Contents
Chapter 1 Preliminaries 1
1.1 Notation 2
Symbols, Strings, Composite symbols
Sets and set operations
1.2 Propositions and logical connectives 6
1.3 Proof techniques 10
Proof by contrapositive, Proof by cases, Proof by contradiction
Proof by example, Proof by generalization
Proof by induction, Proof by pigeonhole principle
Proof by counting, Diagonalization technique
Rumination 22
Exercises 26

Chapter 2 Formal Languages and Grammars 30


2.1 Languages 31
2.2 Deriving a Language with rewriting rules: Examples 33
2.3 Definitions: Formal languages and grammars 48
Type 0 (phrase structured) grammars
Type 1 (context-sensitive) grammars
Type 2 (context-free) grammars
Type 3 (regular) grammars
2.4 Grammars and their languages: Examples 53
Rumination 55
Exercises 58

vii
Contents

Chapter 3 Other Models for Language Representation 61


3.1 L-systems 62
Definitions and examples
Application
3.2 Syntax flow graph 67
Definition and examples
3.3 Regular Expressions 70
Definition and examples
Algebraic properties of regular expressions
Rumination 74
Exercises 75

Chapter 4 Automata 78
4.1 Deterministic Turing machines (DTM) 80
Design example
State transition graph, State transition table, State transition function
Formal definition
4.2 DLBA (Deterministic Linear Bounded Automata) 98
Definition and Design example
4.3 DPDA (Deterministic Pushdown Automata) 100
Definition
Design example
4.4 DFA (Deterministic Finite Automata) 111
Definition
Design example
Rumination 114
Exercises 120

viii
Contents
Chapter 5 Nondeterministic Automata 124
5.1 NFA (Nondeterministic Finite Automata) 126
Definition and examples
5.2 NPDA (Nondeterministic Pushdown Automata) 132
Definition and examples
5.3 NTM (Nondeterministic Turing Machines) and
NLBA (Nondeterministic Linear Bounded Automata) 139
5.4 Nondeterministic Computing 140
Nondeterministic computing
Example
5.5 NFA and -transitions 145
-transitions in an FA
Eliminating -transitions
Rumination 154
Exercises 156

Chapter 6 Variations of Automata 160


6.1 Transducers: automata with output 161
Mealy automata, Moore automata
6.2 Variations of TM and LBA 162
Multi-track TM, Multi-tape TM, 2D-tape TM
6.3 Variations of PDA 168
2-way PDA, Multi-stack PDA
PDA accepting with empty stack
6.4 Variations of FA 176
2-way FA, 2D-tape FA, Multi-head FA,
FA array: cellular automata, Hidden Markov models (HMM)
6.5 Church’s hypothesis 181
Exercises 182

ix
Contents

Chapter 7 Hierarchy of the Models: Characterization 185


7.1 Chomsky Hierarchy 187
7.2 Proof of Characterization 191
Constructing an FA from a regular grammar G that recognizes L(G).
Constructing a regular grammar from an FA M that produces L(M).
Constructing an FA from a regular expression R that recognizes L(R).
Constructing a regular expression from an FA M that expresses L(M).
Rumination 215
Exercises 216

Chapter 8 Manipulating FA 218


8.1 Converting an NFA to DFA 221
8.2 State Minimization 224
Rumination 231
Exercises 236

Chapter 9 Properties of Formal Languages 239


9.1 Properties of regular languages 240
Union, Concatenation, Kleene star,
Reverse, Complement, Intersection, Set subtraction
9.2 Properties of context-free languages 245
Union, Concatenation, Kleene star,
Reverse, Complement
Rumination 248
Exercises 251

x
Contents

Chapter 10 Manipulating Context-free Grammars 253


10.1 Reducing -production rules 254
Algorithm and example
10.2 Eliminating unit production rules 262
Algorithm and example
10.3 Eliminating useless symbols 266
Algorithm and example
10.4 Normal forms of CFG’s 277
Chomsky normal form (CNF)
Greibach normal form (GNF)
Exercises 285

Chapter 11 Ambiguity of Context-free Grammars 288


11.1 Parse tree 290
11.2 Parse Tree and Ambiguity 293
11.3 Eliminating Ambiguity of an ambiguous CFG 295
Using parenthesis, Fixing the order of rule applications
Eliminating redundant rules
Setting up precedence and associativity
Rumination 305
Exercises 306

xi
Contents

Chapter 12 Hierarchy of the Models: Proper Containment 309


12.1 Relationship of the language classes: proper containment 312
12.2 The pumping lemma 316
The pumping lemma and proof
12.3 Applications of the pumping lemma 323
Examples
12.4 The pumping lemma for context-free languages 328
12.5 Application of the pumping lemma for context-free languages 331
12.6 Ogden’s lemma 336
The lemma and an application
12.7 A proof of the pumping lemma for context-free languages 339
Rumination 346
Exercises 350

Chapter 13 Parsing 352


13.1 Derivation 354
Leftmost derivation, Rightmost-derivation
Derivations and parse trees
13.2 LL(k) parsing strategy 357
13.3 Designing an LL(k) parser 367
Examples
Definition of LL(k) grammars
13.4 LR(k) parsing strategy 379
13.5 Designing LR(k) parsers 387
Examples
Definition of LR(k) grammars
13.6 Lex and YACC 404
Rumination 409
Exercises 412

xii
Contents

Chapter 14 Web Programming and Bioinformatics 414


14.1 Hyper Text Markup Language (HTML) 415
14.2 Document Type Definition (DTD) and XML 418
14.3 Genetic code and grammar 425

Chapter 15 Hierarchy of the Models: Final Proof 436


15.1 Languages that TM’s cannot recognize 437
Enumeration
Enumerating TM’s
15.2 Universal TM 448
15.3 Enumerable Languages 450
15.4 Recursive Languages 459
15.5 Proof of characterizations 471
Type 0 grammar and TM
CSG and LBA
CFG and PDA
Rumination 506
Exercises 507

Appendix
A. Pascal Syntax Flow Graph 508
B. Converting 2-way FA to 1-way FA 515
C. Computing Regular Expression for the Language Recognized by an FA 532
D. Properties of Deterministic Context-free Languages 542
E. A CFL Satisfying the Pumping Lemma for Regular Languages 564
F. CYK Algorithm for the Membership Test for context-free languages 569
References 576
Index 577

xiii

You might also like