You are on page 1of 7

h.

l;u't [-',

DATA MINING QUERY LANGUAGES


DMOL-A Oata tvtinine Q

'+

Data mining language must be designed to facilitate flexible and effective knowledge discovery.

+ 4 *
'S
',&

Having a query language for data mining may help standardize the development of

platforms for data mining systems. gut designed a language is challenging because data mining covers a wide spectrum of
tasks and each task has different requirement. Hence, the design of a language requires deep understanding of the limitations and

underlying mechanism of the various kinds of tasks.


So...how would you design an efficient query language???
Based on the primitives discussed earlier.

+ +
,t.

DMQL allows mining of different kinds of knowledge from relational databases and data warehouses at multiple levels of abstraction

Adopts SQL-like syntax


languages

,'*. Hence, can be easily integrated with relational query


Defined in BNF grammar

o o

[ ] represents 0 or one occurrence

{ } represents 0 or more occurrences

.,$ Words in sans serif represent keywords


A DMQL can provide the ability to support ad-hoc and interactive data mining
By providing a standardized language like SQL

' .

Hope to achieve a similar effect like that SQL has on relational database

Foundation for system development and evolution

2
. I
Design

Facilitate information exchange, technology transfer, commercialization


and wide acceptance

D
.4x Syntax

DMQL is designed with the primitives described as follows:

'* *
'l*

Syntax

for DMQL for specification oftask-relevont dota


hi

the kind of knowledge to be mined


con

cept

erarchy specification

'&. pottern presentotion and visualizotion * Putting it all together - o DMQL query

Syntax of DMQL

,/ ./ ./

(DMQL) ;;= (pMQL-Stotement);{(DMQL-Statement)

(DMQL_Stotement) ;;= (pota_Mining_Stotement)

| (Concept_Hierorchy_Definition-Statement)

(V is ua I i zoti

n-o

d-P

re se

ntati o n )
use

Doto_Mining_Stotement)

database(dotabase_nome) | use data warehouse (doto_worehouse_name) {use hierorchy (hierorchy_nome) for (attribute_or-dimension)}

::=

(Mine-Knowledge-Specification)
attri b ute-o r-d i me n si o n-l ist) from ( re I oti o n (s) /c u be ( s ) ) [where (condition)] [order by (order_list) [group by
(

in relevance to

(grouping-list)] [hoving

(condition)]

{with [(interest_meosure_nome)] threshold = (threshold_volue) ffor (attribute(s))l]


Mine_Knowtedge_Specificotion) ;;= (Mine-Char)

./ ./ ,/

| (Mine-Desc) | (Mine-Assoc) | (Mine-Closs)

(Mine_Char) ::= mine characteristics [as (pattern_nome)] analyze (meosure(s))


(Mine_Desc) ::= mine comparison [as

(pattern-name)] lor (target-closs)where

(torget_condition)
analyze (meosure(s))

{versus (contrast-closs_i) where (contrast-condition-i)l

,/ ./

Mine-Assoc) Mine_Closs)

::= mine ossociation [as (pottern-name)] [motching (metopottern)]

::= mine classification [as (pottern-name)] analyze i me n s i o n ) ( cl a ssify i n g-ott ri b ute -or-d

7,

,/

(Concept_Hierorchy_Definition-statemeittl

::=

(attribute_or_dimension)] as (hierarchy_description)
[for

define hierorchy (hierorchy-nonte) on (relotion_or_cube_or_hierarchy)

[where (condition)]

./ ./

(Visuolization_and_Presentotion) ::= display as (resultJorm)

| {(Multilevel_Manipulation)}

(Multilevel_Monipulation)

::= I drill down on (ottribute_or_dimension)


I d rop ( att ri b ute_o
r_d i m e nsi o n )

roll up on (ottribute_or_dimension) | odd (attribute_or_dimension)

DMQL-Svntax for task-relevant data specification

. . . . . . .

Nomes of the relevont database or doto warehouse, conditions ond relevant attributes or

dimensions must be specified


use ddtabase <dotabase_nome) or use dota worehouse <data_worehouse_name)

from <relation(s)/cube(s)t [where condition] inrelevdnceto<attribute or dimension listt


order by torder_list> group by <grouping_list> hoving <conditiont

Svntax for specifvine the kind of knowledee to be mined

Characterization

Mine_Knowledge_Specification
m i ne ch a ro

::=

cteri sti cs [ospattern-na me] anolyze measure{s)

o o o

Specifies that characteristic descriptions are to be mined

Analyze specifies aggregate measures


Example: mine characteristics as customerPurchasing analyze count%

4.

Discrimination
M
in

e-Kn ow

ed

ge-S Pe cifi coti o n : :=

mine comporison [as pattern-name] for target-class where target-condition {versus contrast-class-i where confidst-condition-i} analYze measure(s)

''' ' .

o given target closs of obiects Specifies thot discriminant descriptions ore to be mined, compore with one or more contrasting c/osses (thus referred to os comparison)

Andlyze specifies oggregote meosures avg(t.price) >= 5L00 Example: mine comporison as purchose Groups for big Spenders where versus budget Spenders where avg(l'price) < 5100 onalyze count

/
o

Association
Mine-Knowledge-specification ::= mine associations [as pattern-namel

r o o o

[matching(metaPattern)]

Specifies the mining of patterns of association

can provide templates (metapattern) with the matching clause


W) and Q(X, Y; =2 Example: mine associations as buyingHabits matching P(X: customer,
buys (X,Z)

/
o

Classification
Mine-Knowledge-specification ::=
m
o
i

ne

cl

o ssifi cqti o n Iospatte rn-na me]

no lyzeclassifyi ng-attri bute-or-di me nsion

. . .

Specifies that patterns for data classification are

to be mined

to the values Analyze clause specifies that classification is performed according of


(cl assifyi

ng-attri bute-or-d

me nsion)

a class (such as For categorical attributes or dimensions, each value represents low-risk, medium risk, high risk)

5
I '
For numeric attributes, each class defined by a range (such as 20-39,40-59, 6089 for age) Example: mine classifications as classifyCustomerCreditRating analyze credit

rating

To specifv what concept hierarchies


use
h ie ra

to use

rchy <hierarchy> for <attribute_or_dimension>

We use different syntax to define different type of hierarchies

o o

schema hierarchies

define hierarchy time_hierarchy on date as [date, monthquarter, year]

set-groupinghierarchies

define hierarchy age-hierarchy for age on customer

as

. o o o o

levell: {young, middle_aged, seniorl < level0: all level2: {2O, ...,39} < levelli young level2: {4O, ...,59} < levell: middle_aged level2: {60, ..., 89} < levell: senior

operation-derived hierarchies
as

Definehierarchyage_hierarchy for age on customer

{age_category (1), ...,age_category(5)} := cluster(default, age, 5) <all(age)

o
Def
i

rule-basedhierarchies
h i e ra rc

hyprof it_ma rgin_h iera rchyo

item

o o o

level_l: low_profit_margin< level_O: all o if (price - cost)< $50 level_l: medium-profit_margin<level_0: all o if ((price - cost) > $SO1 and ((price - cost) <= $250)) level_l: high_profit_margin< level_0: all o if (price - cost) > $250

Syntax for pattern oresentation and visualization specification

We have syntax which allows users to specify the display of discovered patterns in one or more forms

6,
display as <result_form>

ResultJorm = Rules, tables, crosstabs, pie or bar charts, decision trees, cubes, cunres, or surfaces
To
M
u

facilitate interactive viewing at different concept level, the following syntax is defined:
lti level_Ma
n
i

pu lati

on'.'.=

rol I u p o nallribute-or_d ime nsion

I d ri I I dow n onattribute_or_dimension I dropattri


b

addattribute_or-dimension

ute_o r_d i me

nsi o n

used ata ba seAll Electronics_d b

usehiera rchylocation_hierarchy

for B.address

mine cha racteristics ascustomerPurchasing analyze count% in relevance to C.age,l.type, l.place-made

from customer C, item l, purchases P, items-sold

S,

works-at W, branch

wherel.item_lD = S.item-lD and S.trans-lD = P.trans-lD andP.cust-lD = C.cust-lD and P.method-paid = "AmEx"
andP.empl_lD = W.empl_lD and W.branch-lD = B.branch-lD and B.address = "Canada" and

l.prico= 100
with noise threshold
displayas table
= 0.05

/
.'*

Other Data Minine Laneuaees & Standardization Efforts

Association rule language specifications

o o

MSQL (lmielinski& Virmani'99)

MineRule (MeoPsaila and Ceri'96)

7
o *
OLEDB

Query flocks based on Datalog slntax (Tsur et al'98)

for DM (Microsoft'2000)
Based on OLE, OLE DB, OLE DB for OLAP

o o + o o + +

lntegrating DBMS, data warehouse and data mining

CRISP-DM (CRoss-lndustry Standard Process

for Data Mining)

Providing a platform and process structure for effective data mining


Emphasizing on deploying data mining technology to solve business problems

Other Data Mining Languages & Standardization Efforts


Association rule language specifications

o o o
"a!
OTEDB

MSQL (lmielinski& Virmani'99)

MineRule (MeoPsaila and Ceri'96) Query flocks based on Datalog syntax (Tsur et al'98)

for DM {Microsoft'20OO} and recently DMX (Microsoft SQ[server 2005)


Based on OLE, OLE DB, OLE DB for OLAP, C#

o o + o

lntegrating DBMS, data warehouse and data mining

DMMI (Data Mining Mark-up Language) by DMG (www.dmg.org)


Providing a platform and process structure for effective data mining

Hierarchy Specification
A hierarchy is a root member of an alternate hierarchy, which is always at generation2 of a dimension. Member value expressions are not allowed as hierarchy arguments.
Alternate hierarchies are applicable to aggregate storage databases only.
The dimension of the hierarchy argument passed to a function must match the dimension of the other arguments passed to the function. If they do not match, an error is return and the query is
aborted.

urN++7

You might also like