Professional Documents
Culture Documents
Idan Szpektor
Boyer and Moore
What It’s About
A String Matching Algorithm
T: aaaaaaaaaaaaaaaaaaaaaaaaa
P: abaaaa
The Good Suffix Rule (GSR)
1 2 3 4 5 6 7 8 9 10 11 12 13
P: b b a b b a a b b c a b b
L: 0 0 0 0 0 0 0 0 0 5 9 0 12
Preprocessing the GSR – l(i)
P: b b a b b a a b b c a b b
l: 2 2 2 2 2 2 2 2 2 2 2 1
Using L(i) and l(i) in GSR
s: b b a c d c b b a a b b c d d
Z: 1 0 0 0 0 3 1 0 0 2 1 0 0 0
s: d d c b b a a b b c d c a b b
N: 0 0 0 1 2 0 0 1 3 0 0 0 0 1
Building L(i) in O(n)
L(i) – The biggest index j < n, such that prefix
P[1..j] contains suffix P[i..n] as a suffix but not
suffix P[i-1..n]
for i := 1 to n, L(i) := 0
for j := 1 to n-1
i := n – N(j) + 1
L(i) := j
Building l(i) in O(n)
l(i) – The length of the longest suffix of P[i..n]
that is also a prefix of P
k := 0
for j := 1 to n-1
If(N(j) == j), k := j
l(n – j + 1) := k
Building Z in O(n)
↑ ↑ ↑ ↑
S i’ j i
explicitly
If j + Z(j) < i + Z(i), j := i
Building Z in O(n) - Analysis
1. Properties of strings
2. Proof of search in O(m) if P is not in T, using
only the good suffix rule.
3. Proof of search in O(m) even if P is in T,
adding the Galil rule.
Properties of Strings
If for two strings δ, γ: δγ = γδ then there is a
string β such that δ = βi and γ = βj, i, j > 0
- Proof by induction
β’ β β β
Properties of Strings (Cont…)
α’ α α α
q
α’ α α α
Proof - when P is Not Found in T
Σsi ≤ m
Σfi = m
We want to prove that gi ≤ 3si ( Σgi ≤ 3m).
Proof (Cont…)
Each round don’t find P it matched a
substring ti and one bad char xi in T (xiti T)
T: bbacdcbaabcbbabdbabcaabcbcb
P: bdbabc
Proof: by Lemma 1.
Lemma 3 (|ti| + 1 > 3si)
Suppose P overlapped ti during round i. We
shall examine in what ways could P overlap ti
in previous rounds.
|n| + 1 ≤ 3s
∑ matches in round i ≤ ∑3s ≤ m