Professional Documents
Culture Documents
Grammar
SAURABH SINGH
Language
Language is the ability to acquire and use complex systems of
Linguistics
In linguistics, formal languages are used for the scientific study of
human language.
Linguists privilege a generative approach, as they are interested in
defining a (finite) set of rules stating the grammar based on which any
reasonable sentence in the language can be constructed.
A grammar does not describe the meaning of the sentences or what can
be done with them in whatever context - but only their form.
Chomsky Hierarchy
Noam Chomsky (1928) is an American linguist, philosopher, cognitive
Chomsky Hierarchy
Chomsky proposed a hierarchy that partitions formal grammars into
classes with increasing expressive power, i.e. each successive class can
generate a broader set of formal languages than the one before.
Interestingly, modelling some aspects of human language requires a
more complex formal grammar (as measured by the Chomsky
hierarchy) than modelling others.
Example, While a regular language is powerful enough to model
English morphology (symbols, words), it is not powerful enough to
model English syntax.
automata?
There is an equivalence between the Chomsky hierarchy and the
different kinds of automata. Thus, theorems about formal languages
can be dealt with as either grammars or automata.
Generation process
Start symbol
Expand with rewrite rules.
Stop when a word of the language is generated.
Recognition process
Is a set of words, that is, finite strings of symbols taken from the alphabet over which the
language is defined.
1 = { 0, 1 }
2 = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }
3 = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F }
4 = { a, b, c,, z }
Notation
a, b, c, . . . denote symbols
1010 1
123 2
hello 4
Notation
|a| = 1
|125| = 3
||=0
k {a1ak | a1ak }
Example
0 = {} for any
11 = {0, 1}
21 = {00, 01, 10, 11}
*
+
Example
w = w = w
Example
English, Chinese, . . .
C, Pascal, Java, HTML, . . .
the set of binary numbers whose value is prime:
{10; 11; 101; 111; 1011;. }
(the empty language)
{}
Operations on languages
Let L1 and L2 be languages over the alphabets 1 and 2, respectively.
Then:
L1 U L2 = {w | w L1 V w L2 }
= {w *1 | w }
L1L2 = {w1w2 | w1 L1 w2 L2}
L*1 = {} U L1 U L21 U
L1 L2 = {w | w L1 w L2}
Grammar
A grammar is a tuple G = (V,T,S,P) where
V = {Sentence, Subject, Verb, Object},T = {I, You, Eat, Buy, Pen, Apple}, S =
{Sentence}, and P = {Sentence SubjectVerbObject, Subject I | You, Verb Eat |
Buy, Object Pen | Apple}.
Context-free
X
pushdown automata
Context-sensitive
Unrestricted
Accepting machines
NFAs (or DFAs)
Nondeterministic
Nondeterministic linear
with || ||
bounded automata
Turing machines
(unrestricted)
terminal on the left-hand side and a right-hand side consisting of a single terminal or single
terminal followed by a single non-terminal.
The productions must be in the form X a or X aY
where X, Y N (Non terminal)
and a T (Terminal)
The rule S is allowed if S does not appear on the right side of any rule.
Example
S aB
B bB
B
What language does this define? ab*
Finite Automata
A finite automaton is a 5-tuple M = (Q, , , q0, F)
accepts
Finite Automata
M = (Q, , , q0, F) where
= {0,1}
: Q Q transition function*
q0 Q is start state
q1
1
0,1
1
q0
qq2
2
0
0
q3
Build an automaton that accepts all and only those strings that
contain 001
0,1
0
q
q0
1
q00
q001
(Type-2)Context-free grammars
SX
X ab|aXb
L={anbn | n }
appear in the rules, now they can appear anywhere. Hence the term context-free.
Pushdown Automata
Pushdown automata extend FAs in one very important way. We are
now given a stack on which we can store information. This works like a
standard LIFO stack, where information gets pushed onto the top and
popped off the top.
This means that we can now choose transitions based not just on the
input, but also based on whats on the top of the stack.
We also now have transition actions available to us. We can either
push a specific element to the top of the stack, or pop the top element
off the stack.
L { a nb n : n 1}
a, $/A$
b, A/
b, A/
#, $/
L { a n b n # : n 1}
b, A/
a, $/A$
b, A/
#, $/
$
accepting
L { a b # : n 1}
n
b, A/
a, $/A$
b, A/
#, $/
rejecting
L { a b # : n 1}
n
a, $/A$
b, A/
b, A/
#, $/
rejecting
(Type-1)Context-Sensitive Grammar
Type-1 grammars generate context-sensitive languages. The productions must be in the form
A
where A N (Non-terminal) and , , (T N)* (Strings of terminals and non-terminals)
The strings and may be empty, but must be non-empty.
The rule S is allowed if S does not appear on the right side of any rule. The languages generated by these grammars
are recognized by a linear bounded automaton.
Example
S abc|aAbc
Ab bA
Ac Bbcc
bB Bb
aB aa|aaA
L={anbncn ; n }
Alternate Definition:
P={-> ; | ||}
restrictions. They are any phase structure grammar including all formal grammars.
They generate the languages that are recognized by a Turing machine.
The productions can be in the form of where is a string of terminals and non-terminals
with at least one non-terminal and cannot be null. is a string of terminals and non-terminals.
Example
S ACaB
Bc acB
CB DB
aD Db
Turing Machines
Recall that NFAs are essentially memory-less, whilst NPDAs are
p the tape with the test string s * written left-toright starting at the read position, and with blank symbols everywhere else.
Then let the machine run (maybe overwriting s), and if it enter
the nal state, declare that the original string s is accepted.
The language accepted by T (written L(T )) consists of all strings
s that are accepted in this way.
Theorem: A set L * is generated by some unrestricted (Type 0)
grammar if and only if L = L(T ) for some Turing machine T . So
both Type 0 grammars and Turing machines lead to the same class
of recursively enumerable languages.
containing just the test string s with end markers on either side:
Language type
Regular
Grammar rules
X , X Y,
X aY
Context-free
X
pushdown automata
Context-sensitive
Unrestricted
Accepting machines
NFAs (or DFAs)
Nondeterministic
Nondeterministic linear
with || ||
bounded automata
(unrestricted)
Turing machines