You are on page 1of 37

echizen_tm

Mar. 24, 2012


(1 slide)

(2 slides)
(2 slides)
FM-Index(13 slides)

(12 slides)
(1 slide)
(1 slide)

(2 slides)
(1 slide)


IDechizen_tm

EchizenBlog-Zwei
(http://d.hatena.ne.jp/echizen_tm/)

web

()

(1/2)

()


LOUDS
(Information Theoretical
Lower Bound = ITLB)
(O(1)O(logN))
ic

(2/2)

(DSIRNLP#2)

LOUDS
(DSIRNLP#1)

(DSIRNLP#3)

(1/2)
(Full-Text Search Engine)

()


(Inverted Index)



(Suffix Array)


(2/2)
FM-Index

(Suffix Array)



(4)

FM-Index



(0.3)

FM-Index(1/13)
FM-Index
FerraginaManzini
[Ferragina+ 2000]
Ferragina & Manzini - Index

Burrows-Wheeler(BWT)

(self-index)
()

[Ferragina+ 2004]

FM-Index(2/13)
(Suffix Array)
mississippi
(1)(Suffix)
0
1
2
3
4
5
6
7
8
9
10
11

mississippi#
ississippi#m
ssissippi#mi
sissippi#mis
issippi#miss
ssippi#missi
sippi#missis
ippi#mississ
ppi#mississi
pi#mississip
i#mississipp
#mississippi

11
10
7
4
1
0
9
8
6
3
5
2

#mississippi
i#mississipp
ippi#mississ
issippi#miss
ississippi#m
mississippi#
pi#mississip
ppi#mississi
sippi#missis
sissippi#mis
ssippi#missi
ssissippi#mi

FM-Index(3/13)
(Suffix Array)
mississippi
(2)
(3)

11
10
7
4
1
0
9
8
6
3
5
2

#mississippi
i#mississipp
ippi#mississ
issippi#miss
ississippi#m
mississippi#
pi#mississip
ppi#mississi
sippi#missis
sissippi#mis
ssippi#missi
ssissippi#mi

FM-Index(4/13)

N
4

(N) + (4N)
= 5N (5)

(O(N)) + (O(NlogN))
= O(NlogN)

FM-Index(5/13)
FM-Index
Burrows-Wheeler(BWT)
BWT(N)
BWT
()


( o(N))

FM-Index
N + o(N)
(o(N)0.3
= 1.33)

FM-Index(6/13)
Burrows-Wheeler(BWT)

BWT
#mississippi
i#mississipp
ippi#mississ
issippi#miss
ississippi#m
mississippi#
pi#mississip
ppi#mississi
sippi#missis
sissippi#mis
ssippi#missi
ssissippi#mi

i
p
s
s
m
#
p
i
s
s
i
i

BWT

FM-Index(7/13)
BWTTO(N)

O(1)
LF()

LF(0) = 1
LF(1) = 6
LF(6) = 7
LF(7) = 2
LF(2) = 8
LF(8) = 10
LF(10) = 3
LF(3) = 9
LF(9) = 11
LF(11) = 4
LF(4) = 5

T[0] = i
T[1] = p
T[6] = p
T[7] = i
T[2] = s
T[8] = s
T[10] = i
T[3] = s
T[9] = s
T[11] = i
T[4] = m

0
1
2
3
4
5
6
7
8
9
10
11

#mississippi
i#mississipp
ippi#mississ
issippi#miss
ississippi#m
mississippi#
pi#mississip
ppi#mississi
sippi#missis
sissippi#mis
ssippi#missi
ssissippi#mi

i
p
s
s
m
#
p
i
s
s
i
i

FM-Index(8/13)
LF()
LF(i) = TT[i]
+
TiT[i]

ipssm#pissii
LF(9) = s
(#1 + i4 + m1 + p2 = 8)
+
9s(T[2], T[3], T[8])
=8+3
= 11

FM-Index(9/13)
LF()
LF(i) = TT[i]
+
TiT[i]

TT[i]
(256)
TiT[i]
(256)(N)

FM-Index(10/13)

TiT[i]

FM-Index(11/13)
DSIRNLP#2

(O(1)O(logN))
rank(i) = i1
select(i) = i1

rank()

ic

FM-Index(12/13)
LOUDS

BP

DFUDS

FM-Index(13/13)

/

(4)
BWT(FM-Index)
BWT
ic


(1/12)
(Wavelet Tree)

NO(N) + o(N)
O(1)O(logN)
rank(i, c)ic
select(i, c)ic

FM-Indexrank
rank


(2/12)
012

a,b,c,d4


(3/12)
:abcdabdc

rank(5, a) = 2

abcdabdc2
a


(4/12)
abcdabdcrank(5,a)

4
2
abcdabdc

abab (ab)
cddc (cd)


(5/12)

abab => 0101
cddc => 0110


rank

rank
abcdabdcrank(5, a)
ababrank(i, a)

irank


(6/12)
abcdabdc5
a2b1
ab3

rank(5, a)
5a

5ab3

ababrank(3, a)


(7/12)
abcdabdc5
ab(abab)

abcdabdc
abab0
cddc 1
abcdabdc => 00110011

rank(5, 0) = 3


(8/12)
abcdabdcrank(5, a)

abab
rank(3, a)

0101
rank(3, 0)
rank(3, 0) = 2


(9/12)

abcdabdcrank(5, a)
abab, cddc
a,b => 0, c,d => 1
00110011rank(5, 0)
rank(5, 0) = 3

ababa => 0, b => 1


0101rank(3, 0)
rank(3, 0) = 2


(10/12)
bv = abcdabdc

= 00110011 (abcdabdc)
y[0] = 0101 (abab)
y[1] = 0110 (cddc)

a = {0, 0}, b = {0, 1}, c = {1, 0}, d = {1, 1}


bv.rank(5, a)
= y[a[0]].rank(x.rank(5, a[0]), a[1])
= y[0].rank(x.rank(5, 0), 0)
= y[0].rank(3, 0)
=2


(11/12)
bv = abcdabdc

= 00110011 (abcdabdc)
y[0] = 0101 (abab)
y[1] = 0110 (cddc)

a = {0, 0}, b = {0, 1}, c = {1, 0}, d = {1, 1}


bv.rank(6, c)
= y[c[0]].rank(x.rank(6, c[0]), c[1])
= y[1].rank(x.rank(6, 1), 0)
= y[1].rank(2, 0)
=1


(12/12)

4

rank(i, c)
c
rank
4 => 2
256 => 8

1=1
rankrank8


FM-Index

FM-Index
FM-IndexBurrows-Wheeler

FM-Index

LOUDS

The Burrows-wheeler Transform


BWT


(
)

(1/2)
FM-IndexShellinford
Shellinford

Shellinford()

(2/2)
shellinford::fm_index fm;
fm.push_back();
fm.push_back();
fm.push_back();
fm.search(, values);
i = values.begin();
while (i != values.end()) {
cout << fm.get_document(i->first) << endl;
i++;
}



[Ferragina+ 2000]
Opportunistic Data Structures with Applications,
FOCS 2000
[Ferragina+ 2004]
An Alphabet-Friendly FM-index, SPIRE 2004

(FM-Index)
FM-index
version2(http://www.di.unipi.it/~ferragin/Libraries/fmindexV2/index.
html)
FM-index++(http://code.google.com/p/fmindex-plus-plus/)
Shellinford(http://code.google.com/p/shellinford/)

()
wat-array(http://code.google.com/p/wat-array/)

You might also like