Professional Documents
Culture Documents
Name
unigram
bigram
trigram
4-gram?
Example
sky
the sky
across the sky
comes across the sky
Probability of a bigram w1 w2
c(w1 w2)
P (w1 w2) =
N
Unigrams
a
screaming
comes
across
the
sky
c(w1 . . . wn)
N
Bigrams
<s1> a
a screaming
screaming comes
comes across
across the
the sky
Trigrams
<s1> <s2> a
<s2> a screaming
a screaming comes
screaming comes across
comes across the
across the sky
If the arguments are vectors, they are concatenated term-byterm, and recycled as needed
> paste(c("a","b"), c(1,2))
[1] "a 1" "b 2"
# Again, remove the first element and add a period at the end
> uni3 <- c(uni2[-1], ".")
"across" "the"
"sky"
"."
"."
> uni2
[1] "screaming" "comes" "across" "the" "sky" "."
> uni3
[1] "comes"
of the
the lord
and the
in the
and he
shall be
to the
all the
and they
unto the
i will
of israel
for the
the king
said unto
and the
6268
in the
5031
and he shall be
2791
2461
> head(freq)
the children of
1355
the lord and
816
the son of
1450
saith the lord
854
0.01457614
0.008884348
0.00791572
0.00635354
0.003524693
0.003107943
0.002717714
0.002707611
0.002634364
0.002566169
0.002427252
0.002143104
0.002115321
0.002095115
0.002082486
# Again, remove the first element and add a period at the end
> tokens3 <- c(tokens2[-1], ".")
trigrams
of the lord
1775
the house of
883
11542
7035
6268
5031
2791
2461
2152
2144
2086
2032
1922
1697
1675
1659
1649
10
1775
1450
1355
883
854
816
805
672
647
617
571
561
560
510
509
0.002241609
0.001831173
0.0017112
0.001115121
0.001078498
0.001030509
0.001016617
0.0008486542
0.0008170822
0.0007791958
0.0007211035
0.0007084747
0.0007072118
0.0006440679
0.000642805
11
Name
unigram
bigram
trigram
4-gram?
Example
sky
the sky
across the sky
comes across the sky
12
13
Independence of events
14
15
Conditional probability
P (A|B) =
P (A B)
P (B)
16
17
More examples
P (wn|w1 . . . wn1) =
P (F E)
P (E)
1
1
6
1 = 3
Bigram example
P (lunch|eat) =
P (eat lunch)
P (eat)
Trigram example
P (lunch|to eat) =
P (w1 . . . wn)
P (w1 . . . wn1)
18
19
c(w1 . . . wn)
c(w1 . . . wn1)
c(of the)
1496
=
= 0.086
c(of )
17481
c(eat lunch)
c(eat)
Trigram example
P (lunch|to eat) =
Bigram example
P (lunch|eat) =
P (king|the) =
c(to eat lunch)
c(to eat)
877
c(the king)
=
= 0.032
c(the)
27378
20
21
22
203,759
176,022
120,449
118,057
36,444
23
Smoothing
Add-one smoothing
24
Good-Turing discounting
Good-Turing smoothes P (a) =
ca
N
to P 0(a) =
ca
N
Nc+1
Nc ,
25
c (GT)
0.0000270
0.446
1.26
2.24
3.24
4.22
5.19
6.21
7.24
8.25
C(a) + 1
N +Vn
c (MLE)
0
1
2
3
4
5
6
7
8
9
P 0(a) =
+3P (wi)
Reasonable weights might be 1 = 0.6, 2 = 0.3, 3 = 0.1
A better method is to compute separate weights for each
bigram: 1(wi2 wi1), 2(wi2 wi1), 3(wi2 wi1)
The larger C(wi2 wi1) is, the more weight we give to 1
26
27
P P (W ) = P (w1 . . . w )
1
P (w1 . . . wN )
Model
Unigram
Bigram
Trigram
QN
28
P (w1 . . . wn)
P (w1 . . . wn1)
Perplexity
2H(p,m)
962
170
109
Cross
entropy
H(p, m)
9.91
7.41
6.77
30
29