Professional Documents
Culture Documents
Yoram Singer
Abstract
INTRODUCTION
Markov models are a natural candidate for language modeling and temporal pattern recognition,
mostly due to their mathematical simplicity. However, it is obvious that nite memory Markov models cannot capture the recursive nature of language, nor can they be trained eectively with
long memories. The notion of variable context
length also appears naturally in the context of universal coding (Rissanen, 1978; Rissanen and Langdon, 1981). This information theoretic notion is
now known to be closely related to ecient modeling (Rissanen, 1988). The natural measure that
Si
Si+1
P(Ti+1|Si+1)
P(Ti|Si)
;n
;n
;n
;n
;n
P(t1;n; w1;n) =
P(t1)P(w1jt1)P(t2jt1 ; w1)P(w2jt1;2; w1)
: : :P(tnjt1;n?1; w1;n?1)P(wnjt1;n; w1;n?1)
n
Y
= P(t jt
i=1
With the simplifying assumption that the probability of a tag only depends on previous tags and
that the probability of a word only depends on its
tags, we get:
P(t1;n; w1;n) =
Yn P(t jt
i=1
i 1;i?1)P(wijti)
;n
Yn P(t jS ; M)P(w jt )
i i?1
i i
i=1
i; k
l
i; k
l
i; k
n
ANALYSIS OF RESULTS
VMM:
correct:
NN
VBD
NNS
VBN
JJ
VB
CS
NP
IN
VBG
RB
QL
JJ VBN NN VBD
259
228
63
CS NP RP QL RB VB VBG
102 100
227
165
142
110
IN
194
69
219
66
71
112
94
103
63
76
64
DISCUSSION
ACKNOWLEDGMENT
Part of this work was done while the second author was visiting the Department of Computer
and Information Sciences, University of California,
Santa-Cruz, supported by NSF grant IRI-9123692.
We would like to thank Jan Pedersen and Naftali Tishby for helpful suggestions and discussions
of this material. Yoram Singer would like to thank
the Charles Clore foundation for supporting this
research. We express our appreciation to faculty
and students for the stimulating atmosphere at
the 1993 Connectionist Models Summer School at
which the idea for this paper took shape.
References
174, 1974.
J. Berger, Statistical decision theory and Bayesian
analysis, New-York: Springer-Verlag, 1985.
E. Brill. Automatic grammar induction and parsing free text: A transformation-based approach. In Proceedings of ACL 31, pp. 259{
265, 1993.
E. Charniak, Curtis Hendrickson, Neil Jacobson,
and Mike Perkowitz, Equations for Part-ofSpeech Tagging, Proceedings of the Eleventh
National Conference on Articial Intelligence,
pp. 784{789, 1993.