Professional Documents
Culture Documents
Schedule:
3 weeks: 8th, 15th and 22nd November
2 hours lectures each week, 1 lab hour in week 3
3 hours independent work
Assessment: course work and exam
Aims:
To understand two processing paradigms:
- The symbolic, rule based approach
- Data driven, empirical methods
To realise the scope and limitations of current
NLP and Speech Recognition applications
Overview
Week 1
Introduction
Applications
Biological background
Rule based methods for NLP
Week 2
Empirical methods for NLP
Week 3
Speech recognition
Practical work: Link parser
Applications
Automated Speech Recognition role of the language model
Speech recognisers for individuals, Subtitles for live TV etc.
Speech Synthesis
Information Retrieval TREC Question Answering competition
Automated Dialog systems SYRINX Spoken Language System
Machine Assisted Translation Verbmobile
Text compression
Spelling correctors etc.
Copy detection the Ferret
Biological background
Only humans use language
Speech is the primary language form
Physiological cost of developing speech production mechanisms
Difference in physiology of humans and other primates: longer vocal tract
Advantages
Higher transmission rates
Larger range of sounds
Sounds which are less susceptible to perceptual confusion
Sounds which are more easily combined
Disadvantage
Humans liability to choke (after infancy)
Language acquisition
Prosodic
Morphological
Syntactic
Semantic
Pragmatic
Dialog management
Parse Trees
Using a Phrase Structure Grammar, parse a sentence and display it.
Take the sentence:
the man crossed the river
Give the words part-of-speech-tags
the
determiner (det)
man
noun
crossed verb
river
noun
Group parts-of-speech into phrases
NP (noun phrase) --> det noun
e.g. the man
VP (verb phrase) --> verb NP
e.g. crossed the river
Group phrases into a sentence
S --> NP VP
NP
VP
NP
det
noun
verb
det
noun
the
man
crossed
the
river
A PSG can
Capture some of the language structure
Model some significant features, such as recursion.
E.g. Direct recursion:
Indirect recursion:
NP --> det NP
NP --> noun PP
PP --> prep NP
Deal with some limited part-of-speech ambiguity
E.g. Can parse:
They water the flowers
They swim in the water
Limitations of PSG
Many unwanted parses are allowed.
With real language and longer sentences, many alternative parses are
produced, as with other rule based methods.
Syntactic knowledge alone is not enough. Semantic information is
needed