You are on page 1of 44

(360IT

yahoo)
http://nfabo.cn
rockeet@163.com
2014-07-15
//qq 18016168

4
11

www.eeqee.com
eeqee_com

20115

DFA
DFA
AC
DFA
API

DFA NFA

DFA NFA

ADFA Acyclic DFA

Trie: ADFA
MinADFA:

MinADFA

Regular Expression

Lexical Analyzing

Pattern Matching

(AC)

Dictionary Compressing

DFA &

DFA

DFA
DFA
DFADFA
DFA

DFA

DFA

0~63

Tree
DFA (Trie)

DAG
DFA (DAWG)

0~62

Tree
DFA (Trie)

DAG
DFA (DAWG)

1 ~ 99999

O(log(n))

int
int_t
intmax_t
int8_t
int16_t
int32_t
int64_t
uint
uint_t
uintmax_t
uint8_t
uint16_t
uint32_t
uint64_t

Tree
DFA (Trie)

DAG
DFA (DAWG)

DFA

Hopcroft

MyhillNerode

() p, q
p, q MyhillNerode
w(p,w) (q,w)

Partition Refinement

Partition Refinement Hopcroft

Hopcroft
P := {F, Q \ F }; // \ QF
W := {F, Q \ F }; //
// Q \ F W (WaitingSet)
// W { min(F, Q \ F) },
while (W is not empty) do
choose and remove a set A from W
for each c in do
let X be the set of states for which a transition on c leads to a state in A
for each set Y in P for which X Y is nonempty do
replace Y in P by the two sets X Y and Y \ X
if Y is in W
replace Y in W by the same two sets
X
Y
else
add min( X Y, Y \ X ) to W

Hopcroft

DFANFA
Trie
smallmap

O(1)

(permutation)

Waiting Set

Waiting Set

ADFA

ADFA

State Register

Online

map<IsFinal+TargetSet, StateID>
TargetSet StateID pair(Char,Target)

Keymap

Offline

DFA ()DFA

HopcroftO(n)

ADFA

Online ADFA

DFA
/
/(path through)

Offline ADFA

graph-post-order-walk

ADFA Online
1.

2.

3.

4.

CommonPrefix
CommonPrefixLen
State Register
DFA

ADFA

Confluence State

DAWG: ADFA +

Map<string, Data>

ADFA Set<string>

DAWG (Directed Acyclic Word Graph)

ADFA ADFA
()ADFA
map<Key,Value> Value

DAWG ()
A:
B: ()

0
2
2 A

DFA Map

(key, val) delim

key \t value
delim key value

key
delim [0, 256), key
=[0,257) delim=256

key

30%

DFA

CPU Cache

DAWG

DFA

DFA
MinDFA

DFADFA

DFA

ACAho-Corasick

AC Trie

AC Trie

fail link

AC Double Array
AC

AC

DFA

()
typedef unsigned int state_id_t;
typedef unsigned char char_t;
typedef state_id_t automata_t[][256];

Demo
DFA
DFA

DFA
Google RE2
DFA

Double Array Trie

99.9%
OfflineBFS/DFS
Online

Bitmap Byte Char Set DFA

min/max char +

Bitmap+
popcnt ctz

Succinct Reprsentation

Rank-Select

30%

bzip2

Trie
Tree Edge Rank-Select
Non-Tree Edge

Memory Mapping

DFA
Memory Mapping

memcpy
memcpy

mmap

keyonly : strset/dawg
key \t val : map
regex \t data/regex_id

dfa
adfa_build, dawg_build,
regex_build, kvbin_build, ac_build
.,
build
MapReduce

DFA
dot

dot

svg
pdf
png

DFA

DFA

dfa = DFA_Interface::load_from(dfafile);

build

adfa_build

dawg_build

map<string, AnyValue>

kvbin_build (delim=256)

set<string>
delim map<string, set<string>>
nested map<string, map<string, map<.> > >

map<ByteArray,set<ByteArray>>
nested map

ac_build ( AC )

build

regex_build

(Multi Regular Expression Matching)


O(strlen(Input)+matched_regex)

One Pass submatch

dfa_union

DFA

pinyin_build

APIDFA_Interface

DFA
DFA_Interface::load_from(filename)

DFA

regex DFA
adfa DFA
dawg dfa
ac dfa ( Double Array )
dfa (filename;, )

#include <febird/automata/dfa_interface.hpp>

API dawg & ac


DFA_Interface* dfa = DFA_Interface::load_from(somefile);
// use as normal dfa
// ...
const DAWG_Interface* dawg = dfa->get_dawg();
if (NULL != dawg) {
// use as dawg
}
const AC_Scan_Interface* ac = dfa->get_ac();
if (NULL != ac) {
// use as Aho-Corasick DFA
}

Anchor

: map<url, AnchorSet>
url AnchorSet

C++11

: map<word, SynonymSet>

ADFA onfly build + NFA/DFA + DFA Minimize

(pinyin_build)

P1*P2**Pn

SLCF

Straight Line Context Free grammar

: n -> O(log(log(n)))
ADFA : n -> O(log(n))
SLCF NP

: http://nfab.cn ()

: C++11

&

Linux: gcc-4.7+, icc-14.0, clang-3.4/3.5


Windows: Visual Studio 2013+, Cygwin-32/64

C++11C++98

4
11

www.eeqee.com
eeqee_com

20115

44

You might also like