You are on page 1of 32

Natural Language Processing (COSC 709)

Lecture 03: Syntax and Parsing

Department of Computer Science,


Addis Ababa University

Yaregal Assabie

2014/15—Sem II
Introduction
Phrases
Sentences
Introduction
Tree Representation
Formal Language Theory
Parsing

Introduction

Syntax- refers to the way words are related to each other in a sentence.
Syntactic Analysis- analyizes:
how words are grouped together into phrases;
what words modify other words;
what words are of central importance to the sentence.

Syntactic Analysis is used in many NLP applications such as:


Grammar Checking
Question Answering
Information Extraction
Machine Translation

Morphological Analysis Syntactic Analysis

Phases of Text
Generation: Words Phrases Sentences

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 2/32
Introduction
Noun Phrases
Phrases
Adjective Phrases
Sentences
Verb Phrases
Tree Representation
Prepositional Phrases
Formal Language Theory
Adverbial Phrases
Parsing

Noun Phrases

English Noun Phrases


Student, the student, that student, two students, many students
Clever student
A student of computer science
AAU students, long queues, the student with long hair, the city where I lived

Amharic Noun Phrases


‫ֳשׂ‬በُ
የወ‫ץ‬ቅ ‫ֳשׂ‬በُ
ָُቅ የወ‫ץ‬ቅ ‫ֳשׂ‬በُ
ያ ָُቅ የወ‫ץ‬ቅ ‫ֳשׂ‬በُ

ጠጅ
የ‫ ץד‬ጠጅ
ንፁֱ የ‫ ץד‬ጠጅ
ሁֳُ ֵُ‫ ץ‬ንፁֱ የ‫ ץד‬ጠጅ

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 3/32
Introduction
Noun Phrases
Phrases
Adjective Phrases
Sentences
Verb Phrases
Tree Representation
Prepositional Phrases
Formal Language Theory
Adverbial Phrases
Parsing

Adjective Phrases

English Adjective Phrases


incredibly short
rather difficult
very happy
unbelievably quick
exceedingly sorry about the mistake
amazingly rich in minerals

Amharic Adjective Phrases

ደግ
በጣ‫ ו‬ደግ

ፈ‫ע‬
‫ר‬ው ፈ‫ע‬
እንደ ወንድ‫ר ב‬ው ፈ‫ע‬
በጣ‫ ו‬እንደ ወንድ‫ר ב‬ው ፈ‫ע‬

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 4/32
Introduction
Noun Phrases
Phrases
Adjective Phrases
Sentences
Verb Phrases
Tree Representation
Prepositional Phrases
Formal Language Theory
Adverbial Phrases
Parsing

Verb Phrases

English Verb Phrases


turn, turn on, is turning on, have been working
threatened to throw himself into the window
was an understandable reaction by the visitors
is amazingly rich in minerals

Amharic Verb Phrases

‫ר‬ጠ٤ው
ֳካሳ ‫א‬ፅሀፍ ‫ר‬ጠ٤ው

ָኮֶָٍ
ֳአስَ‫ ץ‬ገንዘብ ָኮֶָٍ
በ‫וֹ‬ንክ ֳአስَ‫ ץ‬ገንዘብ ָኮֶָٍ

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 5/32
Introduction
Noun Phrases
Phrases
Adjective Phrases
Sentences
Verb Phrases
Tree Representation
Prepositional Phrases
Formal Language Theory
Adverbial Phrases
Parsing

Prepositional Phrases

English Prepositional Phrases


on the table
across the world
over your head
in the hotel
to their house

Amharic Adpositional Phrases

ወደ ቤُ

ከወንድ‫ ב‬ጋ‫ץ‬

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 6/32
Introduction
Noun Phrases
Phrases
Adjective Phrases
Sentences
Verb Phrases
Tree Representation
Prepositional Phrases
Formal Language Theory
Adverbial Phrases
Parsing

Adverbial Phrases

English Adverbial Phrases


immediately
unbelievably quickly
very carefully

Amharic Adverbial Phrases


ክፉኛ
እንደ ወንድ‫ ב‬ክፉኛ
በጣ‫ ו‬እንደ ወንድ‫ ב‬ክፉኛ

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 7/32
Introduction
Phrases
Sentences Simple Sentences
Tree Representation Complex Sentences
Formal Language Theory
Parsing

Simple Sentences

Simple Sentences (English)


The computer is on the table
He went home
They are always happy

Simple Sentences (Amharic)


አስَ‫ ץ‬ወደገበያ ְደ٤

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 8/32
Introduction
Phrases
Sentences Simple Sentences
Tree Representation Complex Sentences
Formal Language Theory
Parsing

Complex Sentences

Complex Sentences (English)


He was driving the car that he bought from his father
We rented our house to friends while we were abroad

Complex Sentences (Amharic)


ካሳ አስَ‫ ץ‬ቤُ እንደ‫ףר‬٤ ‫ָّור‬

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 9/32
Introduction
Phrases
Sentences Examples (English)
Tree Representation Examples (Amharic)
Formal Language Theory
Parsing

Examples (English): Simple Sentences

The computer is on the table SS

The computer NP is on the table VP

The Det computer N is V on the table PP

on P the table NP

the Det table N

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 10/32
Introduction
Phrases
Sentences Examples (English)
Tree Representation Examples (Amharic)
Formal Language Theory
Parsing

Examples (Amharic): Noun Phrases

  
 NP
  
 NP

 Det  A (Modifier) 


 NP
 Det  A (Modifier) 
 NP

 N (Comp.)
 N (Head)
 N (Comp.)
 N (Head)


 ! "#$ % &' NP

 ! "#$ % &' NP


 ! NP

 ! NP
"#$ A
"#$ A
% &' NP
% &' NP




DetDet !
!NN %NN(Comp.)
% (Comp.) &'
&' N (Head)
N (Head)

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 11/32
Introduction
Phrases
Sentences Examples (English)
Tree Representation Examples (Amharic)
Formal Language Theory
Parsing

Examples (Amharic): Adjective Phrases

  AP

  AP
 Det  A (HEAD)

 Det  A (HEAD)

 E    AP

 E    AP


 Det E  PP   AP

 Det E


E P  PP N
  N  A
P
A (Head)

E P  N  N  A (Head)

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 12/32
Introduction
Phrases
Sentences Examples (English)
Tree Representation Examples (Amharic)
Formal Language Theory
Parsing

Examples (Amharic): Verb Phrases

$% &'() *+,- VP


$% &'() *+,- VP

$% PP &'() N (Comp.) *+,- V (Head)


$% PP &'() N (Comp.) *+,- V (Head)

P $% N
P $% N

 A
  VP
 A
  VP

 PP (Modifier) A


  VP
 PP (Modifier) A
  VP

P  N A
PP  N (Comp.)  V (Head)
P  N A
PP  N (Comp.)  V (Head)

P A
N
P A
N

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 13/32
Introduction
Phrases
Sentences Examples (English)
Tree Representation Examples (Amharic)
Formal Language Theory
Parsing

Examples (Amharic): Adpositional Phrases

  PP
  PP

 P (Head)  N
 P (Head)  N

  PP
  PP

P   PP
P   PP


 N N  P (Head)
 P (Head)

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 14/32
Introduction
Phrases
Sentences Examples (English)
Tree Representation Examples (Amharic)
Formal Language Theory
Parsing

Examples (Amharic): Adverbial Phrases

 E 
AdvP

 Det E 


PP Adv (Head)

E P 
N

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 15/32
Introduction
Phrases
Sentences Examples (English)
Tree Representation Examples (Amharic)
Formal Language Theory
Parsing

Examples (Amharic): Simple Sentences

A  
 SS

A N  
 VP

 
PP  V

 P 
N

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 16/32
Introduction
Phrases
Sentences Examples (English)
Tree Representation Examples (Amharic)
Formal Language Theory
Parsing

Examples (Amharic): Complex Sentences

 A  E   CS

 N A  E   VP

A  E  SS  V

A N  E  VP

 N E  V

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 17/32
Introduction
Phrases
Concepts
Sentences
Hierarchy of Grammars/Languages
Tree Representation
Automata and Machines
Formal Language Theory
Parsing

Concepts: Alphabet, String, and Language

Formal Language Theory - considers a language as a mathematical object defined


by alphabets, strings and grammar.

Alphabet - a finite set of symbols.


e.g. Binary Alphabet: {0, 1}
Decimal Alphabet: {0, 1, 2 , … , 9}
Amharic Alphabet: {, , , … , }
English Alphabet: {a, b, c, … , z, A, B, C, …, Z}

String - a finite sequence of symbols from an alphabet.


e.g. Binary String: 0100101, 01101, 00110
Decimal String: 176392, 12, 398702
Amharic String: , A, 
English String: killed, Abebe, lion, the

Language- (potentially infinite) set of strings over an alphabet.


e.g. Binary Language: {0100101, 01101, 00110, ….}
Decimal Language: {176392, 12, 398702, ….}
Amharic language: {, A, , ….}
English Language: {killed, Abebe, lion, the, ….}

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 18/32
Introduction
Phrases
Concepts
Sentences
Hierarchy of Grammars/Languages
Tree Representation
Automata and Machines
Formal Language Theory
Parsing

Concepts: Grammar

Grammar - a formalism to generate strings in a language by a process of replacing


symbols.
- has 4 elements (tuples) represented as: G= (N, Σ, P, S) where
N is a finite set of non-terminal symbols. In natural languages, this can be
syntactic categories, phrases or sentences.
Σ is a finite set of terminal symbols (disjoint from N). It consists of
elements of target language such as words and letters in natural
language.
P is a finite set of production rules of the form α→β with at least one non-
terminal in α.
S is member of N called the start symbol (special non-terminal symbol). In
natural languages, the start symbol is a sentence.

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 19/32
Introduction
Phrases
Concepts
Sentences
Hierarchy of Grammars/Languages
Tree Representation
Automata and Machines
Formal Language Theory
Parsing

Hierarchy of Grammars/Languages

• Also known as Chomsky Classification, the hierarchy of grammars/languages represents a


hierarchy of expressiveness of grammars.
• Different classes of grammars/languages are defined by putting different constraints on
production rules resulting in different structural complexity of sentences of natural
languages.
• Chomsky classification consists of the following four levels of grammars/languages:
 Type 0 (Unrestricted / Recursively Enumerable)
 Type I (Context-Sensitive)
 Type II (Context-Free)
 Type III (Regular)

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 20/32
Introduction
Phrases
Concepts
Sentences
Hierarchy of Grammars/Languages
Tree Representation
Automata and Machines
Formal Language Theory
Parsing

Hierarchy of Grammars/Languages: Type 0 (Unrestricted)

- No limitation on production rules


- At least one non-terminal on left hand side
e.g. S →S S
S →A B C
A B→ B A
B A→ A B
A C→ C A
C A→ A C
B C→ C B
A →a
B →b
C →c
S →є
Valid strings generated include: є, abc, aabbcc, cabcab, etc…

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 21/32
Introduction
Phrases
Concepts
Sentences
Hierarchy of Grammars/Languages
Tree Representation
Automata and Machines
Formal Language Theory
Parsing

Hierarchy of Grammars/Languages: Type I (Context-Sensitive)

- Production rule:
α1Βα2→α1βα2 where
B is non-terminal symbol
α1, α2, β are all (possibly empty) sequences of terminal and non-terminal symbols (α1 is
left context and α2 is right context.
S → є is allowed if S does not appear on right hand side of any rule
These rules are used in natural languages to describe subject-verb agreement with respect to
number, i.e. singular or plural as reflected in sentences: the students come and the student
comes. For example, the following production rules can be used to describe such contexts.

S → NP VP [S=Sentence, NP= Noun Phrase, VP= Verb Phrase]


NP → Det Nsing [Det= Determiner, Nsing= Noun (singular)]
NP → Det Nplur [Nplur= Noun (plural)]
Nsing VP → Nsing Vsing [Vsing= Verb (singular)]
Nplur VP → Nplur Vplur [Vplur= Verb (plural)]
Det → the
Nsing → student
Nplur → students
Vsing → comes
Vplur → come

Note: Context-Sensitive Languages/Grammars are subsets of Unrestricted Languages/ Grammars.

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 22/32
Introduction
Phrases
Concepts
Sentences
Hierarchy of Grammars/Languages
Tree Representation
Automata and Machines
Formal Language Theory
Parsing

Hierarchy of Grammars/Languages: Type II (Context-Free)

- Production rule:
Exactly one non-terminal on left hand side, but anything on the right hand side.

These rules are used to describe grammars of natural languages that are context-free. For
example, past tenses of English are context-free with respect to the subject. Thus, it is
grammatically correct to construct the sentences: the students came and the student came. The
following production rules can be used to represent such context-free grammars.

S → NP VP [S=Sentence, NP= Noun Phrase, VP= Verb Phrase]


NP → Det N [Det= Determiner, N= Noun]
VP → V [V= Verb]
Det → the
N → student
N → students
V → came

Context-Free Grammars are important since they are:


• Restricted enough to build efficient parsers
• Powerful enough to describe the syntax of most programming languages

Note: Context-Free Languages/Grammars are subsets of Context-Sensitive Languages/Grammars.

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 23/32
Introduction
Phrases
Concepts
Sentences
Hierarchy of Grammars/Languages
Tree Representation
Automata and Machines
Formal Language Theory
Parsing

Hierarchy of Grammars/Languages: Type III (Regular)

- Production rule:
Exactly one non-terminal on left hand side, and one terminal and at most one non-
terminal on right hand side.
Examples:
A → aB Right Regular Grammar
A → Ba Left Regular Grammar
A→ a
Note: Regular Languages/Grammars are subsets of Context-Free Languages/Grammars.

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 24/32
Introduction
Phrases
Concepts
Sentences
Hierarchy of Grammars/Languages
Tree Representation
Automata and Machines
Formal Language Theory
Parsing

Hierarchy of Grammars/Languages

Containment Hierarchy

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 25/32
Introduction
Phrases
Concepts
Sentences
Hierarchy of Grammars/Languages
Tree Representation
Automata and Machines
Formal Language Theory
Parsing

Automata and Machines

Reading assignment on

Automata and machines, and their equivalence to different classes of languages/grammars.

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 26/32
Introduction
Phrases
Parsing
Sentences
Parsing Strategies
Tree Representation
Towards Efficient Parsing
Formal Language Theory
Parsing

Parsing

Parsing - is a derivation process which identifies the structure of sentences using a given
grammar.

- considered as a special case of a search problem.

- two basic methods of searching are used


 top-down strategy
 bottom-up strategy

- methods of improving efficiency


 storing lexical rules separately
 chunking

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 27/32
Introduction
Phrases
Parsing
Sentences
Parsing Strategies
Tree Representation
Towards Efficient Parsing
Formal Language Theory
Parsing

Parsing Strategies: Top-down Parsing

Top-down parsing starts with the symbol S and then searches through different ways to
rewrite the symbols until the input sentence is generated.

Given the following English grammar. Then, the sentence Abebe killed the lion can
be parsed using top-down strategy as follows.
S → NP VP
VP → V NP S ⇒ NP VP [rewriting S]
NP → NAME
NP → DET N ⇒ NAME VP [rewriting NP]
NAME → Abebe ⇒ Abebe VP [rewriting NAME]
V → killed
⇒ Abebe V NP [rewriting VP]
DET → the
N → lion ⇒ Abebe killed NP [rewriting V]
⇒ Abebe killed DET N [rewriting NP]
⇒ Abebe killed the N [rewriting DET]
⇒ Abebe killed the lion [rewriting N]

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 28/32
Introduction
Phrases
Parsing
Sentences
Parsing Strategies
Tree Representation
Towards Efficient Parsing
Formal Language Theory
Parsing

Parsing Strategies: Bottom-up Parsing

Bottom-up parsing starts with words in a sentence and uses production rules backward
to reduce the sequence of symbols until it consists solely of S.

Given the following English grammar. Then, the sentence Abebe killed the lion can
be parsed using bottom-up strategy as follows.
S → NP VP
VP → V NP Abebe killed the lion
NP → NAME
NAME killed the lion [rewriting Abebe]
NP → DET N
NAME → Abebe NAME V the lion [rewriting killed]
V → killed
DET → the NAME V DET lion [rewriting the]
N → lion NAME V DET N [rewriting lion]
NP V DET N [rewriting NAME]
NP V NP [rewriting DET N]
NP VP [rewriting V NP]
S [rewriting NP VP]

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 29/32
Introduction
Phrases
Parsing
Sentences
Parsing Strategies
Tree Representation
Towards Efficient Parsing
Formal Language Theory
Parsing

Towards Efficient Parsing: Separating Lexical Rules

The efficiency of parsing algorithms can be improved if lexical rules are stored separately in a
structure called lexicon, which specifies the possible categories for each word.

The following example shows the lexical rules separated from other grammatical rules.

Grammatical Rules Grammatical (without lexical rules)

S → NP VP S → NP VP
VP → V NP VP → V NP
NP → NAME NP → NAME
NP → DET N NP → DET N
NAME → Abebe
V → killed
V → fly
Lexical Rules
DET → the
N → lion
N → fly Abebe: NAME
killed: V
the: DET
lion: N
fly: V, N

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 30/32
Introduction
Phrases
Parsing
Sentences
Parsing Strategies
Tree Representation
Towards Efficient Parsing
Formal Language Theory
Parsing

Towards Efficient Parsing: Chunking

Chunking, also called partial parsing, is a technique which attempts to model human parsing
by breaking the text up into small pieces, each parsed separately. Chunk boundaries
correspond roughly to the pauses in everyday speech.

For example, consider the following sentence.


When I read a sentence, I read it a chunk at a time.

Then, the following chunks can be identified.


[When I read] [a sentence], [I read it] [a chunk] [at a time].

Each chunk can then be parsed separately. In addition to perhaps being a better model of
human behavior than full parsing methods, other advantages of chunk parsing are as
follows:
• Because a chunk parser only needs to deal with small, non-recursive clauses, it is able
to process text much more quickly.
• A chunk parser is easier to implement and requires much less memory to parse.
• When a full parse fails, it must discard an entire sentence, even if it got much of the
structure correct. A chunk parser only discards a few words when it cannot figure out
how to proceed.

Department of Computer Science, Addis Ababa University Lecture 03: Syntax and Parsing 31/32
TOC: Course Syllabus

Previous: Morphological Analysis

Current: Syntax and Parsing

Next: Semantic Analysis

You might also like