Professional Documents
Culture Documents
Languages
Why??Data mining query
language(need of)
Three Different Answers
DMQL: A Data Mining Query
Language for Relational Databases
(Han et al, Simon Fraser University)
Integrating Data Mining with SQL
Databases: OLE DB for Data Mining
(Netz et al, Microsoft)
MSQL: A Query Language for
Database Mining (Imielinski &
Virmani, Rutgers University)
Some Common Ground
Create and manipulate data mining models
through a SQL-based interface (“Command-
driven” data mining)
Abstract away the data mining particulars
Data mining should be performed on data in
the database (should not need to export to
a special-purpose environment)
Approaches differ on what kinds of models
should be created, and what operations we
should be able to perform
DMQL
Commands specify the following:
The set of data relevant to the data mining
task (the training set)
The kinds of knowledge to be discovered
• Generalized relation
• Characteristic rules
• Discriminant rules
• Classification rules
• Association rules
DMQL
others
DMQL
use database Hospital
find association rules as Heart_Health
related to Salary, Age, Smoker,
Heart_Disease
from Patient_Financial f, Patient_Medical m
where f.ID = m.ID and m.age >= 18
with support threshold = .05
with confidence threshold = .7
DMQL
Retrieve all rules with descriptors of the form “Age = x” in the body,
except when there is a rule with equal or greater support and
confidence with a rule containing a superset of the descriptors in
the body
MSQL
GetRules(C) R1
where <pruning-conds>
correlated and not exists ( GetRules(C) R2
where <same pruning-conds>
and R2.Body HAS R1.Body)
GetRules(C) R1
where <pruning-conds>
and consequent is {(X=*)}
stratified and consequent in (SelectRules(R2)
where consequent is {(X=*)}
MSQL
Nested Get-Rules Queries and their
optimization
Stratified (non-corrolated) queries are
evaluated “bottom-up.” The subquery is
evaluated first, and replaced with its results
in the outer query.
Correlated queries are evaluated either top-
down or bottom-up (like “loop-unfolding”),
and there are rules for choosing between the
two options
MSQL
GetRules(Patients)
where Body has {Age = *}
and Support > .05 and Confidence > .7
and not exists ( GetRules(Patients)
Support > .05 and
Confidence > .7
and R2.Body HAS R1.Body)
MSQL
Top-Down Evaluation
GetRules(Patients)
where Body has {Age = *}
and Support > .05 and Confidence > .7
Bottom-Up Evaluation
not exists ( GetRules(Patients)
Support > .05 and Confidence > .7
and R2.Body HAS R1.Body)