You are on page 1of 23

Natural Language Understanding

Understanding Natural Language means determining the


meaning of the sentence with respect to context in which it is
used.
It requires an analysis of the sentence on several different
levels:
Syntactic
Semantic
Pragmatic
Discourse
Syntactic: Here syntax (grammar) of a sentence is
checked for its correctness. Syntax is a tool for describing
the structure of sentences in the language.
Semantics: denotes the literal meaning we ascribe to a
sentence.
Pragmatics: refers to intended meaning of a sentence. How
sentences are used in a different contexts and how context
affects the interpretation of the sentence.
Discourse: refers to conversation between two or more
individuals.
Basic parsing Techniques
Context free grammars
S NP, VP
VP verb, NP
NP det, NP
NP det, noun, NP
NP det, adj
*
, NP
One can have top-down or bottom-up parsers.
Simple Transition Networks
More convenient for visualizing the grammar (CFG)
It consists of nodes and labeled arcs.
For example, network for NP is shown as follows:.
det noun pop
NP NP1 NP2
Adj
Arcs are labeled by word category.
Starting at a given node, we can traverse through an arc if
the current word in the sentence is in the category on that arc.
This network recognizes the same set of sentences as the
following CFG.
NP det, NP1
NP1 adj, NP1
NP1 noun
Simple Transition Network formalism is not powerful
enough to describe all languages that can be described by
CFG.
Recursive grammar can not be defined by Transition Net.
Recursive Transition Networks (RTN)
To get the descriptive power of CFG, you need a notion of
recursion in the network grammar.
In RTN which is like a simple Transition Network except
that it also allows arc labels that refer to other networks
rather than word categories.
The network for simple English sentences can be
expressed as
NP verb NP pop
S S1 S2 S3
Uppercase labels on arc refer to networks.
The arc from S to S1 can be followed only if the network for
NP can be successfully traversed to a pop arc in NP network.
RTN might allow to have an arc labeled with its own name.
Any language generated by CFG can be generated by RTN
and vice verse. Thus they are equivalent in their generative
capacity.
Implementation of RTN in Prolog
Vocabulary for RTN can be stored as a set of facts as:
word_type ( an, det ).
word_type ( the, det ).
word_type (man, noun ).
word_type (apple, noun ).
word_type ( eats, verb ).
The top level clause for RTN is named as run and is
defined as follows:
run :- set_state(s0), writeln(Enter your sentence),
readln(Sent), analyze(Sent),
writeln(Your sentence is syntactically
correct), clear_state.
run :- writeln(Your sentence is wrong
syntactically), clear_state.
set_state(S) :- assert(current_state(S)).
/* Initialize current state to s0 */
analyze (S) :- S = ., final_state(_).
analyze (S) :- get_state(NS), !, transition(NS, S, S1),
analyze (S1).
get_state(S) :- current_state(S).
/* transition states are listed as */
transition (s0, A, B) :- check_np(np, A, B),
set_state(s1).
transition(s1, A, B) :- get_token(A, W, B),
word_type(W, verb),
set_state(s2).
transition (s2, A, B) :- check_np(np, A, B),
set_state(s3).
transition (np, A, B) :- get_token(A, W, B),
word_type(W, det),
set_state(np1).
transition (np1, A, B) :- get_token(A, W, B),
word_type(W, noun),
set_state(np2).
transition (np1, A, B) :- get_token(A, W, B),
word_type(W, adj),
set_state(np1).
check_np(np2, A, B) :- A = B, !.
check_np(St, A, B) :- transition(St, A, C),
get_state(Ns),
check_np(Ns, C, B), !.
final_state(s3) :- current_state(s3).
/* get a token binds W with the first word from A and
remaining seqof words is unified with B */
get_token(A, W, B) :- Define appropriately
clear_state :- retract(current_state(_)), fail.
clear_state.
Query:
?- run
the man eats an apple.
Yes.
?- run
man eat apple.
Yes. (This is actually wrong but we have not taken care of
number agreement into account)
These formalisms are limited in the following ways.
They could only accept or reject a sentence rather
than producing an analysis of the structure of the
sentence.
Augmenting RTN formalism which involves both
generalizing the network notation and introducing more
information about words in the structure, collecting
information and testing features while paring becomes
augmented transition network (ATN).
Similar kind of augmentation & extension has been made
to CFG called Definite Clause Grammar.
Recording sentence Structure while parsing in ATN.
Collect the structure of legal sentence in order to further
analyze them.
For instance we can identify one particular noun phase as
the syntactic subject (SUBJ ) of a sentence and another as the
syntactic object of the verb (OBJ )
Within noun phrase we might identify the det structure,
adjective, the head noun and so on.
Thus the sentence J ack found a bag might be represented
by the structure
(S ( SUBJ (NP NAME jack)
MAIN-V found
TENSE PAST
OBJ (NP DET a
HEAD bag
)
)
Such a structure is created using RTN parser by allowing
each network to have a set of registers.
Registers are local to each network. Thus each time a new
network is pushed, a new set of empty registers are created.
When the network is popped, the registers disappear.
Registers can be set to the values while parsing and these
values can be retrieved from registers.
NP network has registers named as, DET, ADJ S, HEAD and
NUM.
Registers are set by actions that can be specified on the arc.
When an arc is followed, the actions associated with it are
executed.
The most common act involves setting the register to a
certain value.
When a pop arc is followed, all the registers set in the current
network are automatically collected to form a structure
consisting of the network followed by a list of the registers with
their values.
When a category arc, such as name or verb etc. is
followed, the word in the input is put into a special variable
named as *.
name
S1 S2
Thus the plausible action on the arc from S1 to S2 would
be to set NAME register to the current word.
It is written as
NAME *
The arcs such as NP (called push arc) must be treated
differently.
Typically many words will be used by network called using
push arc.
The network used in push would have set of registers that
capture the structure of the constituent that was parsed.
The structure built by the pushed network is returned in the
value *.
NP
S S1
Thus the action on the arc from S to S1 might be SUBJ *
Therefore, a RTN with registers and tests with actions on
those registers, is an Augmented Transition network (ATN).
NP verb NP pop
S: S S1 S2 S3
jump
det noun pop
NP: NP NP1 NP2
adj
name
Arc Test Actions
NP/1 none DET *
NUM NUM*
NP/2 none NAME *
NUM NUM*
NP1/1 NUM NUM*
{then action is taken HEAD *
otherwise it fails } NUM NUM NUM*
NP1/2 none ADJ S Append (ADJ S, *)
S/1 none SUBJ *
S1/1 NUM
SUBJ
NUM*
{then action is taken MAIN_V *
otherwise it fails } NUM NUM
SUBJ
NUM*
S2/1 none OBJ J *
Notations:
NUM* - is the NUM register of the structure in *
NUM
SUBJ
- is the NUB register of the structure in SUBJ
The values of the registers are often viewed as sets and the
intersection () and union () of sets are allowed to combine
the values of different registers.
For the registers that may take a list of values, an append
function is permitted.
Append (ADJ S,*) returns the list of adjectives.
EXAMPLE:
The following sentence with the word positions indicated, is
as follows:
1
The
2
dogs
3
love
4
john
5
.
A simple Lexicon:
Word Representation
dogs (NOUN ROOT dog
NUM {p} )
dog (NOUN ROOT dog
NUM {s} )
the (DET ROOT the
NUM (s, p} )
Love (VERB ROOT love
NUM {p} )
J ohn (NAME ROOT john )
Trace of S Network
Step Node Position Arc followed Registers
1 S 1 S / 1 -
2 NP 1 NP / 1 [ DET the,
NUM {s, p}]
3 NP1 2 NP / 1
{check {s, p } {p} }
[HEAD dogs,
NUM {p}]
4 NP2 3 NP2 / 1 return structure
{NP { DET the
HEAD dogs
NUM {p}
}
}
5 - 3 S / 1 succeeds SUBJ
{NP { DET the
HEAD dogs
NUM {p}
} }
S1 3 S1 / 1
{check {p } {p} } [MAIN_V love,
NUM {p}]
6 S2 4 S2 / 1 OBJ *
7 NP 4 NP / 2 { NAME john
NUM {p}
}
8 NP2 5 NP2 / 1 return structure
OBJ
{NP { NAME john
NUM {p}
}
}
9 S3 5 S3 / 1 return
succeeds {S {SUBJ
{NP { DET the
HEAD dogs,
NUM {p}
}
MAIN_V love,
NUM {p}]
OBJ
{NP { NAME john
NUM {p} }
} )
Implementation of ATN in Prolog
Database clauses will be used to store and read the registers.
run :- set_state (s0) /* initialize state */,
write (ATN analyses your sent), nl,
write (Please type in your sentence),
readln(Sent), analyse(Sent),
write (Your sent is syntactically correct), nl ,
clear_dbase.
run :- write( your sent is syntactically wrong),
clear_dbase.
clear_dbase :- retract(_), fail.
clear_dbase.
analyse(S) :- S = ., final_state(_).
analyse(S) :- current_state(N_state), !,
transition(N_state, S, S1), !,
analyse(S1).
/* main transistions*/
transition(s0, A, B) :- get_token(A, W, B),
word_type(W, verb),
asserta(type_reg(QUEST)),
asserta(verb_reg(W)), set_state(s2).
transition(s0, A, B) :- check_np(np, A, B),
asserta(type_reg(DECL),
build_phrase(np, STR),
assert(subj_reg(STR)), set_state(s1).
/* NP transition */
check_np(np2, C, B) :- B = C, !.
check_np(np3, C, B) :- B = C, !.
check_np(St, A, B) :- transition(St, A, C),
get_state(N), check_np(N, C, B).
/* Build Phrases */
build_phrase(np, STR) :- det_reg(DET),
adj_reg(ADJ ), noun_reg(NOUN),
get_template(np, T),
fill_template(T, AUX, T1),
fill_template(T1, VERB, STR).

You might also like