You are on page 1of 48

TRIE

What isatrie?
Thetermtriecomesfrom(information)retrieval
Thetriedata

anspecialized

structure
(very

(abstract

data

type)is

efficient)implementation

an(ordered) indexfortext based keys

of

The standard Trie data structure


Definitions:
Alphabet=asetofcharacters
LetS=asetofsstrings(=keys)overanalphabet

The standard Trie data structure


Atrie TthatstorethekeysinSisastructurewhere:
EachnodeofT(excepttheroot

node) islabeledwith

acharacterc
Theroothasnolabel
Eachinternal nodeofTcanhave || # of keys
Thekeysare

stored

aninternal node

inalphabetical

orderinside

The standard Trie data structure


ThetrieThassexternal nodes
Eachexternal nodeisassociatedwithone stringinS
Thepathfromtheroot nodetoanexternal nodeyields

exactlyone string in S

Example:

S={ bear, bell, bid, bull, buy, sell, stock, stop } (s=8)

Howtoimplementatatrie
Useanarrayofreferences
Useabinary tree

Useanarrayofreferences
Eacharray elementrepresentoneletterofthealphabet
Eacharray

referencewill

point

to

asub-triethat

correspondstostringsthatstartswiththecorresponding
letter

Thebinary treeimplementation:
Thebinary

treeimplementation

of

atrieis

known

as

abitwise trie
Theimplementationusesthealphabet: = {0, 1}
Inotherwords,theimplementationstoresasequences of bits
Thekeysarereadasasequence of bits
Example:

Abitwise trieisabinary tree:

Structural properties of the standard trie


Everyinternal nodehas || children

Thisfollowsfromthewaythatthetrieisconstructed

Structuralpropertiesofthestandardtrie
ThetrieTonthesetSwithsstrings(keys)hasexactlys

externalnodes
Thisisbecause
apathfromtheroot
nodetoone extern
al nodecorrespond
sto1 key

Structuralpropertiesofthestandardtrie
The height of the trie T on the set S = the length of the

longeststringS
longest path=longest key S

Thisisbecauseapathfromtherootnodetooneexternalnode
correspondsto1key

Structuralpropertiesofthestandardtrie
The number of nodes in the trie T on the set S = O(n),

wheren=#charactersinthestringsS
In

theworst

case,every

thekeysaredifferent
Example :
S={ try, die, flaw, pack }

characterin

Inserting into a standard trie


Example:insertstockinatrie

Inserting into a standard trie


Firstwetraversetheprefixthatisalreadystoredinthe

trie:"sto":

Inserting into a standard trie


Thenwecreatenodesforlettersthatarenotinthetrie:

"ck":

AdvantagesofTriesoveranordinarymap
Lookingupdatainatrieisintheworst case = O(m), where

m = length of the key


LookupinamapisO(lg(n))wheren=#entries!!!
So a trie has performance levels that is similar to a hash

table!!!
Unlike a hash table, a trie can provide an alphabetical

orderingoftheentriesbykey
(i.e., A trie implements an ordered map while a hash table

cannot!)

Handlingkeysthatareprefixesofanotherkey
Thestandard triehasthepropertythat:
Onlytheexternalnodescanstoreinformation
(Thepathformedbytheinternalnodesrepresentsthekey)
Whenakey(string)isaprefixofanotherkey,thepathofthefirst

keywouldendinaninternalnode
Example:atandate

Handlingkeysthatareprefixesofanotherkey

Solution
Addaspecialterminationsymbol tothealphabet
Theterminationsymbolhasthelower value inthe

alphabet
I.e.,:terminationsymbolprecedeseverycharacterinthe

alphabet
Weappendtheterminationsymboltoeachkeywordstored

inthetrie
WetypicallyusetheNULcharacter'\0'asterminationsymbol

APPLICATIONSOFTRIEDATASTRUCTURES

TRIESINAUTOCOMPLETE
Since a trie is a tree-like data structure in which

each node contains an array of pointers, one pointer


for each character in the alphabet.
Starting at the root node, we can trace a word by

following pointers corresponding to the letters in the


target word.
Starting from the root node, you can check if a word

exists in the trie easily by following pointers


corresponding to the letters in the target word.

AUTOCOMPLETE
Auto-complete functionality is used widely over

the internet and mobile apps. A lot of websites and


apps try to complete your input as soon as you
start typing.
All the descendants of a node have a common

prefix of the string associated with that node.

AUTOCOMPLETEINGOOGLESEARCH

WHYTRIESINAUTOCOMPLETE
Implementing auto complete using a trie is easy.
We simply trace pointers to get to a node that

represents

the

string

the

user

entered.

By

exploring the trie from that node down, we can


enumerate all strings that complete users input.

CRIMINOLOGY
Suppose that you are at the scene of a crime and

observe the first few characters CRX on the


registration plate of the getaway car. If we have a
trie of registration numbers, we can use the
characters CRX to reach a subtrie that contains all
registration numbers that begin with CRX. The
elements in this subtrie can then be examined to
see which cars satisfy other properties that might
have been observed.

AUTOMATICCOMMANDCOMPLETION
When using an operating system such as Unix or

DOS, we type in system commands to accomplish


certain tasks. For example, the Unix and DOS
command cd may be used to change the current
directory.

Commandsthathavetheprefixps
ps2ascii

ps2pdf

psbook

ps2epsi

ps2pk

pscal

ps2frag

ps2ps

psidtopgm

ps2gif

psbb

pslatex

psmandup
psmerge
psnup
psresize

psselect
pstopnm
pstops

pstruct

Figure 10 Commands that begin with "ps"

We can simply the task of typing in commands by providing a command


completion facility which automatically types in the command suffix once
the user has typed in a long enough prefix to uniquely identify the
command. For instance, once the letters psi have been entered, we know
that the command must be psidtopgm because there is only one
command that has the prefix psi. In this case, we replace the need to type
in a 9 character command name by the need to type in just the first 3
characters of the command!

LONGESTPREFIXMATCHING
Longest prefix match (also called Maximum prefix length match)

refers to an algorithm used by routers in Internet Protocol (IP)


networking to select an entry from a routing table .
Because each entry in a routing table may specify a network, one

destination address may match more than one routing table entry.
The most specific table entry the one with the highest subnet
mask is called the longest prefix match. It is called this because
it is also the entry where the largest number of leading address
bits in the table entry match those of the destination address.

For example, consider this IPv4 routing table (CIDR notation


is used):
192.168.20.16/28
192.168.0.0/16

When the address 192.168.20.19 needs to be looked up,


both entries in the routing table "match". That is, both
entries contain the looked up address. In this case, the
longest prefix of the candidate routes is 192.168.20.16/28,
since its subnet mask (/28) is higher than the other entry's
mask (/16), making the route more specific.

A network browser keeps a history of the URLs of

sites that you have visited. By organizing this


history as a trie, the user need only type the prefix
of a previously used URL and the browser can
complete the URL.

SPELLCHECKERS

Spell checkers are ubiquitous. Word

processors have spell checkers, as


do browser-based e-mail clients.
They all work the same way: a
dictionary is stored in some data
structure, then each word of input
is submitted to a search in the data
structure, and those that fail are
flagged as spelling errors

SPELLCHECKERS
There are many appropriate data structures to

store the word list, including a sorted array


accessed via binary search, a hash table, or a
bloom filter. In this exercise you are challenged to
store the word list character-by-character in a trie.

SpellCheck..

SpellCheck..
a

bc

0 p

0
a

1
a
page

pug

pig

peg

pest

PHONEBOOKSEARCH..
Trie data structure are mostly used to search for a

contact on phone book.


Prefix Matching
a = a*

Example
a

bc

Contacts in Phone book

0
t

a
alberto

ram

1
a

sanka
r

alberto
ram
sankar
star
stella

2
star

stella

PHONEBOOKSEARCH..
Suffix Matching
Can be used to index all
suffixes in a text in order
to carry out fast full text
searches.

TRIESINT9
T9 is a technology used on many mobile phones to make

typing text messages easier.


The idea is simple - each number of the phone's keypad

corresponds to 3-4 letters of the alphabet.


Many phones will notice when you type in a word that is not

in its dictionary, and will add that word. Others keep track of
the frequency of certain words and favor those words over
other words that have the same sequence of keypresses.

TRIESINT9
How does a T9 dictionary work?
It can be implemented in several ways, one of

them is Trie. The route is represented by the digits


and the nodes point to collection of words.
T9 works by filtering the possibilities down

sequentially starting with the first possible letters.

TRIESINT9
It can be implemented using nested hash tables

as well, the key of the hash table is a letter and on


every digit the algorithm calculates all possible
routes (O(3^n) routes).
For example, If we type '4663' we get 'good' when

we press down button we get 'gone' then 'home'


etc..

Application
Spellcheckers.
Datacompression.
PrincetonU-CALL.
Computationalbiology.
RoutingtablesforIPaddresses.
StoringandqueryingXMLdocuments.
Associativearrays,associativeindexing.
Modernapplication:invertedindexofWeb.
Inserteachwordofeverywebpageintotrie,storingURLlist

inleaves.
Findquerykeywordsintrie,andtakeintersectionofURLlists.
UsePagerankalgorithmtorankresultingwebpages.

Reference
http://www.mathcs.emory.edu/~

cheung/Courses/323/Syllabus/Text/trie01.html

You might also like