You are on page 1of 12

DAI 2005

Speech and Natural Language Processing NLP

Schedule:
3 weeks: 8th, 15th and 22nd November
2 hours lectures each week, 1 lab hour in week 3
3 hours independent work
Assessment: course work and exam
Aims:
To understand two processing paradigms:
- The symbolic, rule based approach
- Data driven, empirical methods
To realise the scope and limitations of current
NLP and Speech Recognition applications

Overview
Week 1
Introduction
Applications
Biological background
Rule based methods for NLP
Week 2
Empirical methods for NLP
Week 3
Speech recognition
Practical work: Link parser

Applications
Automated Speech Recognition role of the language model
Speech recognisers for individuals, Subtitles for live TV etc.
Speech Synthesis
Information Retrieval TREC Question Answering competition
Automated Dialog systems SYRINX Spoken Language System
Machine Assisted Translation Verbmobile
Text compression
Spelling correctors etc.
Copy detection the Ferret

Biological background
Only humans use language
Speech is the primary language form
Physiological cost of developing speech production mechanisms
Difference in physiology of humans and other primates: longer vocal tract
Advantages
Higher transmission rates
Larger range of sounds
Sounds which are less susceptible to perceptual confusion
Sounds which are more easily combined
Disadvantage
Humans liability to choke (after infancy)

Language acquisition

Controversy over how language is acquired


a) in the life time of the species
b) in historical times
c) in the lifetime of the individual
Chomskys hypothesis that there is an innate universal grammatical
module, fleshed out in native language
Contrasting theory that infants learn from ambient speech; grammar
develops to reduce complexity, increase ease of understanding

Decomposition of NLP task


Because of the complexity of NLP, decompose the processing task into
stages (but stages are not entirely independent.)

Prosodic
Morphological
Syntactic
Semantic
Pragmatic
Dialog management

Week 1: Focus on syntax, automatic parsing

Grammar and parsing


A grammar is a set of rules that describes patterns in a linear sequence of
elements, such as language where the elements are words.
Parsing is the process of structuring a linear sequence in accordance with
a given grammar.
We need grammar in order to understand language.

Parse Trees
Using a Phrase Structure Grammar, parse a sentence and display it.
Take the sentence:
the man crossed the river
Give the words part-of-speech-tags
the
determiner (det)
man
noun
crossed verb
river
noun
Group parts-of-speech into phrases
NP (noun phrase) --> det noun
e.g. the man
VP (verb phrase) --> verb NP
e.g. crossed the river
Group phrases into a sentence
S --> NP VP

( S (NP the man ) (VP crossed (NP the river) ) )


Sentence

NP

VP
NP

det

noun

verb

det

noun

the

man

crossed

the

river

Semantic knowledge for syntactic analysis


Examples with prepositional phrase attachment
He cut the tree with the axe.
He cut the tree with the blossom.
I saw a film with Arnold Schwarzenegger
Attach the amplifier to the terminal with the red wire.
She walked to the park with the pond.
She walked to the park with the dogs.
She walked to the park with the peacocks.

Capabilities and limitations of PSG

A PSG can
Capture some of the language structure
Model some significant features, such as recursion.
E.g. Direct recursion:
Indirect recursion:

NP --> det NP
NP --> noun PP
PP --> prep NP
Deal with some limited part-of-speech ambiguity
E.g. Can parse:
They water the flowers
They swim in the water

Limitations of PSG
Many unwanted parses are allowed.
With real language and longer sentences, many alternative parses are
produced, as with other rule based methods.
Syntactic knowledge alone is not enough. Semantic information is
needed

You might also like