Professional Documents
Culture Documents
(1 slide)
(2 slides)
(2 slides)
FM-Index(13 slides)
(12 slides)
(1 slide)
(1 slide)
(2 slides)
(1 slide)
IDechizen_tm
EchizenBlog-Zwei
(http://d.hatena.ne.jp/echizen_tm/)
web
()
(1/2)
()
LOUDS
(Information Theoretical
Lower Bound = ITLB)
(O(1)O(logN))
ic
(2/2)
(DSIRNLP#2)
LOUDS
(DSIRNLP#1)
(DSIRNLP#3)
(1/2)
(Full-Text Search Engine)
()
(Inverted Index)
(Suffix Array)
(2/2)
FM-Index
(Suffix Array)
(4)
FM-Index
(0.3)
FM-Index(1/13)
FM-Index
FerraginaManzini
[Ferragina+ 2000]
Ferragina & Manzini - Index
Burrows-Wheeler(BWT)
(self-index)
()
[Ferragina+ 2004]
FM-Index(2/13)
(Suffix Array)
mississippi
(1)(Suffix)
0
1
2
3
4
5
6
7
8
9
10
11
mississippi#
ississippi#m
ssissippi#mi
sissippi#mis
issippi#miss
ssippi#missi
sippi#missis
ippi#mississ
ppi#mississi
pi#mississip
i#mississipp
#mississippi
11
10
7
4
1
0
9
8
6
3
5
2
#mississippi
i#mississipp
ippi#mississ
issippi#miss
ississippi#m
mississippi#
pi#mississip
ppi#mississi
sippi#missis
sissippi#mis
ssippi#missi
ssissippi#mi
FM-Index(3/13)
(Suffix Array)
mississippi
(2)
(3)
11
10
7
4
1
0
9
8
6
3
5
2
#mississippi
i#mississipp
ippi#mississ
issippi#miss
ississippi#m
mississippi#
pi#mississip
ppi#mississi
sippi#missis
sissippi#mis
ssippi#missi
ssissippi#mi
FM-Index(4/13)
N
4
(N) + (4N)
= 5N (5)
(O(N)) + (O(NlogN))
= O(NlogN)
FM-Index(5/13)
FM-Index
Burrows-Wheeler(BWT)
BWT(N)
BWT
()
( o(N))
FM-Index
N + o(N)
(o(N)0.3
= 1.33)
FM-Index(6/13)
Burrows-Wheeler(BWT)
BWT
#mississippi
i#mississipp
ippi#mississ
issippi#miss
ississippi#m
mississippi#
pi#mississip
ppi#mississi
sippi#missis
sissippi#mis
ssippi#missi
ssissippi#mi
i
p
s
s
m
#
p
i
s
s
i
i
BWT
FM-Index(7/13)
BWTTO(N)
O(1)
LF()
LF(0) = 1
LF(1) = 6
LF(6) = 7
LF(7) = 2
LF(2) = 8
LF(8) = 10
LF(10) = 3
LF(3) = 9
LF(9) = 11
LF(11) = 4
LF(4) = 5
T[0] = i
T[1] = p
T[6] = p
T[7] = i
T[2] = s
T[8] = s
T[10] = i
T[3] = s
T[9] = s
T[11] = i
T[4] = m
0
1
2
3
4
5
6
7
8
9
10
11
#mississippi
i#mississipp
ippi#mississ
issippi#miss
ississippi#m
mississippi#
pi#mississip
ppi#mississi
sippi#missis
sissippi#mis
ssippi#missi
ssissippi#mi
i
p
s
s
m
#
p
i
s
s
i
i
FM-Index(8/13)
LF()
LF(i) = TT[i]
+
TiT[i]
ipssm#pissii
LF(9) = s
(#1 + i4 + m1 + p2 = 8)
+
9s(T[2], T[3], T[8])
=8+3
= 11
FM-Index(9/13)
LF()
LF(i) = TT[i]
+
TiT[i]
TT[i]
(256)
TiT[i]
(256)(N)
FM-Index(10/13)
TiT[i]
FM-Index(11/13)
DSIRNLP#2
(O(1)O(logN))
rank(i) = i1
select(i) = i1
rank()
ic
FM-Index(12/13)
LOUDS
BP
DFUDS
FM-Index(13/13)
/
(4)
BWT(FM-Index)
BWT
ic
(1/12)
(Wavelet Tree)
NO(N) + o(N)
O(1)O(logN)
rank(i, c)ic
select(i, c)ic
FM-Indexrank
rank
(2/12)
012
a,b,c,d4
(3/12)
:abcdabdc
rank(5, a) = 2
abcdabdc2
a
(4/12)
abcdabdcrank(5,a)
4
2
abcdabdc
abab (ab)
cddc (cd)
(5/12)
abab => 0101
cddc => 0110
rank
rank
abcdabdcrank(5, a)
ababrank(i, a)
irank
(6/12)
abcdabdc5
a2b1
ab3
rank(5, a)
5a
5ab3
ababrank(3, a)
(7/12)
abcdabdc5
ab(abab)
abcdabdc
abab0
cddc 1
abcdabdc => 00110011
rank(5, 0) = 3
(8/12)
abcdabdcrank(5, a)
abab
rank(3, a)
0101
rank(3, 0)
rank(3, 0) = 2
(9/12)
abcdabdcrank(5, a)
abab, cddc
a,b => 0, c,d => 1
00110011rank(5, 0)
rank(5, 0) = 3
(10/12)
bv = abcdabdc
= 00110011 (abcdabdc)
y[0] = 0101 (abab)
y[1] = 0110 (cddc)
(11/12)
bv = abcdabdc
= 00110011 (abcdabdc)
y[0] = 0101 (abab)
y[1] = 0110 (cddc)
(12/12)
4
rank(i, c)
c
rank
4 => 2
256 => 8
1=1
rankrank8
FM-Index
FM-Index
FM-IndexBurrows-Wheeler
FM-Index
LOUDS
(
)
(1/2)
FM-IndexShellinford
Shellinford
Shellinford()
(2/2)
shellinford::fm_index fm;
fm.push_back();
fm.push_back();
fm.push_back();
fm.search(, values);
i = values.begin();
while (i != values.end()) {
cout << fm.get_document(i->first) << endl;
i++;
}
[Ferragina+ 2000]
Opportunistic Data Structures with Applications,
FOCS 2000
[Ferragina+ 2004]
An Alphabet-Friendly FM-index, SPIRE 2004
(FM-Index)
FM-index
version2(http://www.di.unipi.it/~ferragin/Libraries/fmindexV2/index.
html)
FM-index++(http://code.google.com/p/fmindex-plus-plus/)
Shellinford(http://code.google.com/p/shellinford/)
()
wat-array(http://code.google.com/p/wat-array/)