You are on page 1of 22

Group members:

Rahul Kelaskar A – 636


Anish Khale A - 638
Dhaval Doshi A - 682 Guide : Mr. Gautam Borkar
• Process of exploring and analyzing data
• Iterative multi-step process
• Involves data preparation, search for patterns, knowledge
evaluation and interpretation
• Arrangement or Ordering
• Existence of organization of underlying structure
 Application of algorithms to
extract patterns in data.

 Act of taking in raw data and


taking “action” based on the
“category” of the pattern.
Identifies underlying patterns from transformed data.
 Input:
A database DB, represented by FP-tree and a
minimum support S.
 Output:
The complete set of frequent patterns.
 Method:
call FP-growth(FP-tree, null)
 Procedure FP-growth(Tree, α)
 {
 if Tree contains a single prefix path // Mining single prefix-path FP-tree
 then {
 let P be the single prefix-path part of Tree;
 let Q be the multipath part with the top branching node replaced by a null root;
 for each combination (denoted as β) of the nodes in the path P do
 generate pattern β ∪ α with support = minimum support of nodes in β;
 let freq pattern set(P) be the set of patterns so generated; }
 else let Q be Tree;
 for each item ai in Q do { // Mining multipath FP-tree
 generate pattern β = ai ∪ α with support = ai .support;
 construct β’s conditional pattern-base and then β’s conditional FP-tree Treeβ ;
 if Treeβ = ∅
 then call FP-growth(Treeβ, β);
 let freq pattern set(Q) be the set of patterns so generated; }
 return(freq pattern set(P) ∪ freq pattern set(Q) ∪ (freq pattern set(P) ×freq pattern
set(Q)))
 }
Example:[1]

{}
Header Table
Conditional pattern bases
Item frequency head f:4 c:1 item cond. pattern base
f 4 c f:3
c 4 c:3 b:1 b:1
a 3 a fc:3
b 3 a:3 p:1 b fca:1, f:1, c:1
m 3
p 3 m fca:2, fcab:1
m:2 b:1
p fcam:2, cb:1
p:2 m:1
m-conditional pattern base:
fca:2, fcab:1
{}
Header Table
f:4 c:1 {} All frequent patterns
Item frequency head relate to m
f 4 m,
c:3 b:1 b:1  f:3 
c 4
fm, cm, am,
a 3 c:3
b 3 a:3 p:1 fcm, fam, cam,
m 3 a:3 fcam
p 3 m:2 b:1
m-conditional FP-tree
p:2 m:1
GENERALIZED SEQUENTIAL PATTERN MINING
ALGORITHM
1. Initially, every item in DB is a candidate of
length-1.
2. For each level (i.e., sequences of length-k) do
2.1 Scan database to collect support count for each
candidate sequence.
2.2 Generate candidate length-(k+1) sequences from
length-k frequent sequences using Apriori.
3. Repeat until no frequent sequence or no
candidate can be found.
Cand Sup
<a> 3
Seq. ID Sequence
10 <(bd)cb(ac)> <b> 5
20 <(bf)(ce)b(fg)> <c> 4
30 <(ah)(bf)abf> <d> 3
40 <(be)(ce)d> <e> 3
50 <a(bd)bcb(ade)>
<f> 2
Minimum support =2 <g> 1
<h> 1
Length-1 Candidates
<a> <b> <c> <d> <e> <f>
<a> <aa> <ab> <ac> <ad> <ae> <af>
<b> <ba> <bb> <bc> <bd> <be> <bf>
<c> <ca> <cb> <cc> <cd> <ce> <cf>
<d> <da> <db> <dc> <dd> <de> <df>
<e> <ea> <eb> <ec> <ed> <ee> <ef>
<f> <fa> <fb> <fc> <fd> <fe> <ff>
<a> <b> <c> <d> <e> <f>
<a> <(ab)> <(ac)> <(ad)> <(ae)> <(af)>
<b> <(bc)> <(bd)> <(be)> <(bf)>
<c> <(cd)> <(ce)> <(cf)>
<d> <(de)> <(df)>
Length-2 Candidates
<e> <(ef)>
<f>
5th scan: 1 cand. <(bd)cba> Cand. cannot pass
1 length-5 seq. pat. sup. threshold

4th scan: 8 cand. <abba> <(bd)bc> … Cand. not in DB at all


6 length-4 seq. pat.
3rd scan: 46 cand. <abb> <aab> <aba> <baa> <bab> …
19 length-3 seq. pat

2nd scan: 51 cand. <aa> <ab> … <af> <ba> <bb> … <ff> <(ab)> … <(ef)>
19 length-2 seq. pat.
1st scan: 8 cand. <a> <b> <c> <d> <e> <f> <g> <h>
6 length-1 seq. pat.
Seq. ID Sequence

min_sup =2 10 <(bd)cb(ac)>
20 <(bf)(ce)b(fg)>
30 <(ah)(bf)abf>
40 <(be)(ce)d>
50 <a(bd)bcb(ade)>
 Security(credit card fraud)
 Global climate modeling
 Business
 Disaster Management
 [1] Florian Verhein, Frequent Pattern Growth (FP-Growth)
Algorithm, 2008.

 [2] An Introduction to Apriori-based method: GSP


(Generalized Sequential Patterns: Srikant & Agrawal
[EDBT’96].
 

You might also like