Welcome to Scribd!

Skip carousel

Pattern Discovery 10.4

Uploaded by

Rossana Cunha

0% found this document useful (0 votes)

31 views10 pages

Session 4. Strategy 3: First Phrase Mining then Topic Modeling

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Session 4. Strategy 3: First Phrase Mining then Topic Modeling

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

31 views10 pages

Pattern Discovery 10.4

Uploaded by

Rossana Cunha

Session 4. Strategy 3: First Phrase Mining then Topic Modeling

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 10

Search inside document

Session 4.

Strategy 3: First
Phrase Mining then Topic
Modeling

Strategy 3: First Phrase Mining then Topic Modeling

First phrase construction, then topic mining

Contrast with KERT: topic modeling, then phrase mining

ToPMine [El-Kishky et al. VLDB15]

The ToPMine Framework:

Perform frequent contiguous pattern mining to extract candidate phrases and

their counts

Perform agglomerative merging of adjacent unigrams as guided by a significance

scoreThis segments each document into a bag-of-phrases

The newly formed bag-of-phrases are passed as input to PhraseLDA, an extension

of LDA, that constrains all words in a phrase to each sharing the same latent topic

Why First Phrase Mining then Topic Modeling ?

With Strategy 2, tokens in the same phrase may be assigned to different topics
Ex. knowledge discovery using least squares support vector machine classifiers
Knowledge discovery and support vector machine should have coherent topic labels
Solution: switch the order of phrase mining and topic model inference

[knowledge discovery] using [least

squares] [support vector machine]

[classifiers]

squares] [support vector machine]

[classifiers]

Techniques
Phrase mining and document segmentation
Topic model inference with phrase constraint

Phrase Mining: Frequent Pattern Mining +

Statistical Analysis
Quality
phrases

Based on significance score [Church et al.91]:

(P1, P2) (f(P1P2) 0(P1,P2))/ f(P1P2)

[Markov blanket] [feature selection] for [support vector
machines]
[knowledge discovery] using [least squares] [support
vector machine] [classifiers]
[support vector] for [machine learning]
4

Raw
freq.

True
freq.

[support vector machine]

[vector machine]

[support vector]

100

Phrase

Collocation Mining

Collocation: A sequence of words that occur more frequently than expected

Often interesting and due to their non-compositionality, often relay information
not portrayed by their constituent terms (e.g., made an exception, strong tea)
Many different measures used to extract collocations from a corpus [Dunning 93,
Pederson 96]

E.g., mutual information, t-test, z-test, chi-squared test, likelihood ratio

Many of these measures can be used to guide the agglomerative phrasesegmentation algorithm

ToPMine: Phrase LDA (Constrained Topic

Modeling)
The generative model for PhraseLDA is the same as
LDA
Difference: the model incorporates constraints
obtained from the bag-of-phrases input
Chain-graph shows that all words in a phrase are
constrained to take on the same topic values

[knowledge discovery] using [least squares]

[support vector machine] [classifiers]

Topic model inference with phrase constraints

Example Topical Phrases: A Comparison

social networks
information retrieval
web search
text classification
time series
machine learning
search engine
support vector machines
management system information extraction
real time
neural networks
decision trees
text categorization
:
:
Topic 1

Topic 2

PDLDA [Lindsey et al. 12] Strategy 1

(3.72 hours)
7

information retrieval

feature selection

social networks
web search
search engine
information
extraction

machine learning
semi supervised
large scale
support vector
machines

question answering

active learning

web pages

face recognition

Topic 1

Topic 2

ToPMine [El-kishky et al. 14]

Strategy 3 (67 seconds)

ToPMine: Experiments on DBLP Abstracts

ToPMine: Topics on Associate Press News (1989)

ToPMine: Experiments on Yelp Reviews

cs229 Prob PDF
Document12 pages
cs229 Prob PDF
indoexchange
No ratings yet
Pattern Discovery 11.5
Document5 pages
Pattern Discovery 11.5
Rossana Cunha
No ratings yet
Stanford University CS224d - Deep Learning For Natural Language Processing - Syllabus
Document3 pages
Stanford University CS224d - Deep Learning For Natural Language Processing - Syllabus
Rossana Cunha
No ratings yet
cs229 Linalg
Document26 pages
cs229 Linalg
Awais Ali
No ratings yet
Pattern Discovery 11.3
Document6 pages
Pattern Discovery 11.3
Rossana Cunha
No ratings yet
The Impact of Translation Technologies
Document23 pages
The Impact of Translation Technologies
Rossana Cunha
No ratings yet
VVV Project
Document16 pages
VVV Project
Rossana Cunha
No ratings yet
Ergonomic Criteria for Evaluating Human-Computer Interfaces
Document83 pages
Ergonomic Criteria for Evaluating Human-Computer Interfaces
Rossana Cunha
No ratings yet
Pattern Discovery 11.6
Document2 pages
Pattern Discovery 11.6
Rossana Cunha
No ratings yet
(Garcia-2009) Beyond Translation Memory Computers and The Profes
Document17 pages
(Garcia-2009) Beyond Translation Memory Computers and The Profes
Rossana Cunha
No ratings yet
Simultaneously Infer Phrases and Topics
Document4 pages
Simultaneously Infer Phrases and Topics
Rossana Cunha
No ratings yet
Pattern Discovery 11.4
Document5 pages
Pattern Discovery 11.4
Rossana Cunha
No ratings yet
Pattern Discovery 11.2
Document7 pages
Pattern Discovery 11.2
Rossana Cunha
No ratings yet
Pattern Discovery 11.1
Document10 pages
Pattern Discovery 11.1
Rossana Cunha
No ratings yet
Session 5. A Comparative Study of Three Strategies
Document7 pages
Session 5. A Comparative Study of Three Strategies
Rossana Cunha
No ratings yet
Coursera - Pattern Discovery in Data Mining - Week 1 Quiz
Document5 pages
Coursera - Pattern Discovery in Data Mining - Week 1 Quiz
Rossana Cunha
No ratings yet
Pattern Discovery 10.3
Document6 pages
Pattern Discovery 10.3
Rossana Cunha
No ratings yet
Preparing Curriculum For Translator Training Courses
Document4 pages
Preparing Curriculum For Translator Training Courses
Rossana Cunha
No ratings yet
Pattern Discovery 10.1
Document5 pages
Pattern Discovery 10.1
Rossana Cunha
No ratings yet
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Rating: 4.5 out of 5 stars
4.5/5 (537)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Rating: 4 out of 5 stars
4/5 (587)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Rating: 4 out of 5 stars
4/5 (894)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Rating: 4 out of 5 stars
4/5 (98)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Rating: 3.5 out of 5 stars
3.5/5 (399)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Rating: 4 out of 5 stars
4/5 (73)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Rating: 4 out of 5 stars
4/5 (5794)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Rating: 4.5 out of 5 stars
4.5/5 (838)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Rating: 4.5 out of 5 stars
4.5/5 (474)
Yes Please
From Everand
Yes Please
Amy Poehler
Rating: 4 out of 5 stars
4/5 (1891)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Rating: 3.5 out of 5 stars
3.5/5 (231)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
Rating: 4 out of 5 stars
4/5 (599)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Rating: 4.5 out of 5 stars
4.5/5 (271)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Rating: 4 out of 5 stars
4/5 (1090)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Rating: 3.5 out of 5 stars
3.5/5 (2219)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Rating: 4.5 out of 5 stars
4.5/5 (234)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Rating: 4.5 out of 5 stars
4.5/5 (344)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Rating: 4.5 out of 5 stars
4.5/5 (265)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
Rating: 3.5 out of 5 stars
3.5/5 (738)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
Rating: 4.5 out of 5 stars
4.5/5 (440)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Rating: 3.5 out of 5 stars
3.5/5 (137)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
Rating: 4.5 out of 5 stars
4.5/5 (806)
John Adams
From Everand
John Adams
David McCullough
Rating: 4.5 out of 5 stars
4.5/5 (2409)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
Rating: 4 out of 5 stars
4/5 (45)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
Rating: 4 out of 5 stars
4/5 (1015)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
Rating: 4.5 out of 5 stars
4.5/5 (1712)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
Rating: 4 out of 5 stars
4/5 (3811)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
Rating: 4 out of 5 stars
4/5 (1839)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
Rating: 4.5 out of 5 stars
4.5/5 (2099)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
Rating: 3.5 out of 5 stars
3.5/5 (2322)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
Rating: 4.5 out of 5 stars
4.5/5 (789)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Rating: 4.5 out of 5 stars
4.5/5 (119)
Little Women
From Everand
Little Women
Louisa May Alcott
Rating: 4 out of 5 stars
4/5 (104)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
Rating: 3.5 out of 5 stars
3.5/5 (1937)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
Rating: 4.5 out of 5 stars
4.5/5 (4609)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Rating: 4 out of 5 stars
4/5 (4200)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
Rating: 3.5 out of 5 stars
3.5/5 (792)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
Rating: 4.5 out of 5 stars
4.5/5 (1929)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Rating: 4 out of 5 stars
4/5 (1103)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carre
Rating: 3.5 out of 5 stars
3.5/5 (104)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Rating: 4 out of 5 stars
4/5 (821)
CMOS Cell Libraries For Minimal Leakage Power
Document125 pages
CMOS Cell Libraries For Minimal Leakage Power
S G ShivaPrasad Yadav
No ratings yet
Latex Error Classes Ambiguous Errors
Document4 pages
Latex Error Classes Ambiguous Errors
niraj129
No ratings yet
Defense and Security Workshop Logistics and Maintenance
Document2 pages
Defense and Security Workshop Logistics and Maintenance
harisomanath
No ratings yet
Explanatory Notes: 2.1 Introduction To Systems Analysis and Design (6 Theory Periods)
Document2 pages
Explanatory Notes: 2.1 Introduction To Systems Analysis and Design (6 Theory Periods)
Mohd Asrul
No ratings yet
Factoring Polynomials by Grouping Pt. 1
Document24 pages
Factoring Polynomials by Grouping Pt. 1
ColtonThomas
No ratings yet
Ethical Decision Making and Information Technology PDF
Document2 pages
Ethical Decision Making and Information Technology PDF
Melissa
0% (1)
Literature Review: Digital Payment System in India
Document8 pages
Literature Review: Digital Payment System in India
Akshay Thampi
0% (1)
RPS Dashboard Sample Data
Document45 pages
RPS Dashboard Sample Data
Nikhil Pandey
No ratings yet
Processing of Satellite Image Using Digital Image Processing
Document21 pages
Processing of Satellite Image Using Digital Image Processing
Sana Ullah
No ratings yet
FT800 Programmers Guide
Document251 pages
FT800 Programmers Guide
bl19cm7
No ratings yet
Optimal Dynamic Path Availability Mesh Networks
Document63 pages
Optimal Dynamic Path Availability Mesh Networks
Amir Amirthalingam
No ratings yet
Java Login Web Service: Part 4
Document2 pages
Java Login Web Service: Part 4
Lethal11
100% (1)
Comparison and Contrast Between The OSI and TCP/IP Model
Document44 pages
Comparison and Contrast Between The OSI and TCP/IP Model
dody_7070
No ratings yet
220-601 Test
Document396 pages
220-601 Test
Anthony Mason
No ratings yet
The Biblatex Package: Programmable Bibliographies and Citations
Document264 pages
The Biblatex Package: Programmable Bibliographies and Citations
Erlkoenig85
No ratings yet
Mpr4x Format Us
Document65 pages
Mpr4x Format Us
Lucas Pegas
No ratings yet
Monitor Calibration: by Thomas Niemann
Document6 pages
Monitor Calibration: by Thomas Niemann
mrrwho
No ratings yet
04 PandasSQL PDF
Document18 pages
04 PandasSQL PDF
mgrubisic
No ratings yet
IPC Textbook
Document389 pages
IPC Textbook
Sunilracha
No ratings yet
WBFuncs
Document21 pages
WBFuncs
Fernando Sousa
No ratings yet
My - Statement - 1 Dec, 2017 - 30 Dec, 2017 - 7014139395
Document11 pages
My - Statement - 1 Dec, 2017 - 30 Dec, 2017 - 7014139395
dr.rakesh
No ratings yet
N FN DJFF NJ FKF HFDF KFH KDJFH Ehru U FHF SDFH Sduf H Eifdhsf DSFJKDSSDKF DSJ
Document6 pages
N FN DJFF NJ FKF HFDF KFH KDJFH Ehru U FHF SDFH Sduf H Eifdhsf DSFJKDSSDKF DSJ
zuritov
No ratings yet
Systems Planning, Analysis, and Design
Document39 pages
Systems Planning, Analysis, and Design
Venice Dato
No ratings yet
Cummins Filtration Fuel/Water Separator Part Details
Document2 pages
Cummins Filtration Fuel/Water Separator Part Details
sonic acc
No ratings yet
Vice President Director Sales in Orlando FL Resume Beverly Rider PDF
Document3 pages
Vice President Director Sales in Orlando FL Resume Beverly Rider PDF
Ismael
No ratings yet
ABAP RESTful Programming Model
Document27 pages
ABAP RESTful Programming Model
lancelot630
No ratings yet
Matlab R2010a Link
Document8 pages
Matlab R2010a Link
tari
No ratings yet
Magenta Manual
Document59 pages
Magenta Manual
quiensois9665
No ratings yet
Introductory Methods of Numerical Analysis by S.S. Sastry PDF
Document455 pages
Introductory Methods of Numerical Analysis by S.S. Sastry PDF
bharat
92% (12)
Hana Interview Questions
Document16 pages
Hana Interview Questions
prasad
No ratings yet