You are on page 1of 328

, .

Data Mining
Data Mining,
, Data Mining.
.
Data Mining
OLAP-, , Data Mining
(, , , , ).
Data Mining. Web Mining.
Data Mining: , ,
, , ,
, .

, Data Mining.
Data Mining . OLTP, OLAP,
ROLAP, MOLAP. Data
Mining. .
,
Data Mining, .

Data Mining, ,
,
Data Mining

, , ,
( ),
.
1


Data Mining?..........................................................................................................................7
, Data Mining..........................................................9
.....................................................................................................9
Data Mining........................................................................................................................10
Data Mining .........................................................11
.................................................................................................................................................17
?...........................................................................................................................17
.........................................................................................................17
........................................................................................................................................19
.....................................................................................................................22
.............................................................................................................23
. ..............................................................................................24
........................................................................................................27
.....................................................................................................................................28
Data Mining.............................................................................................................29
Data Mining. .......................................................................................39
Data Mining..........................................................................................................................39
....................................................................................................................43
................................................................................................................44
.....................................................................................................................................45
...............................................................................................................................................48
"", "", "".................................48
Data Mining. ......................................................................50
....................................................................................................................50
..................................................................................................................53
, .......................................................54
: ........................................................................56
...................................................................................56
.....................................................................................................................57
.....................................................................................................60
...................................................................................................................60
.................................................................................................60
.............................................................................................................................................61
Data Mining. ...................................................................63
.................................................................................................................63
..................................................................64
............................................................................................64
.......................................................................................................................71
Data Mining.........................................................................................................74
Data Mining -...................................................................75
Data Mining ........................................................................................79
......................................................................................................................84
Microsoft Excel.....................................................................................................84
................................................................................................................84
.................................................................................................................88
2

....................................................................................................................90
.............................................................................................................................................96
. .....................................................97
...............................................................................................100
................................................................................101
......................................................................................................................................104
...........................................................................................................................................106
. . "
". ..............................................................................................107
..............................................................................................................107
SVM...............................................................................................................................108
" " .....110
.......................................................................112
...............................................................................................113
k -...........................................................................114
.........................................................................................................115
.............................................................................................116
. .....................................................118
..........................................................................................................119
......................................................................................................120
..........................................................................................................122
.............................................................................................................124
................................................126
...............................................................................................................127
Matlab...................................................................................................................................132
. ..............................................................134
.................................................................................................134
................................................................................................135
................................................................................................136
............................................................................................................................136
...............................................................................................................141
...........................................................................................................................................146
. .................................................................147
.......................................................................................................151
...............................................................................................................................153
..................................................................................................154
SPSS..................................................................................155
. .....................................................................159
k- (k-means)......................................................................................................159
PAM ( partitioning around Medoids)..............................................................................162
..............................................................................162
.........................................................................................................................162
SPSS...............................................................................................163
, ..165
...................167
...........................................................................................170
:...........................170
3

............................................................................................170
.............................................................................171
.....................................................................................................................................172
.......................................................................................173
...............................................173
.......................................................................................174
Apriori.................................................................................................176
...........................................................178
. .........................................184
Data Mining....................................................................................184
Data Mining .............................................................................................185
...................................................................................................................186
, .........................................................186
4 + ....................................................................................187
..........................................................................................................187
" "..............................................................................................................................188
.................................................................................................................190
......................................................................191
...........................................................................191
...........................................................................................................................................194
Data Mining, OLAP ...................195
.....................................................................................................................197
OLAP-.................................................................................................................................198
OLAP-...............................................................................................................................199
OLAP Data Mining....................................................................................................200
........................................................................................................................201
......................................................................202
Data Mining. ...........................................................................................205
1. ............................................................................................205
2. ............................................................................................................206
3. ...........................................................................................................206
...........................................................................................................................................214
Data Mining. .............................................................................................215
......................................................................................................215
.....................................................................................................221
Data Mining. .........................................................223
.............................................................................................................................223
................................................................................................................................224
...............................................................................................................226
4. ..........................................................................................................227
5. ............................................................................................229
6. ...................................................................................................................230
7. .........................................................................................................230
8. ....................................................................................231
Data Mining...........................................................................................231
...........................................................................................................................................233
Data Mining. Data Mining....................234
..........................................................................................................234
4

. Data Mining................................................................................235
CRISP-DM ..................................................................................................................238
SEMMA .....................................................................................................................240
Data Mining......................................................................................................241
PMML..............................................................................................................................241
, ................................................................242
Data Mining.....................................................................................................244
Data Mining................................................................................................................244
Data Mining..................................................................................250
Data Mining ............................251
......................252
...............................................253
Data Mining
.........................................................................................................................................................253
...........................................................................................................................................254
Data Mining. SAS Enterprise Miner..............................................................................255
...................................................................................266
SAS - ..............................................266
SASR Enterprise Miner..............................................................267
Data Mining. PolyAnalyst..............................................................................268
.....................................................................................................................268
PolyAnalyst Workplace - .........................................................................269
PolyAnalyst...............................................................................269
............................................................................................................271
...........................................................................................................271
.................................................................................................................272
..........................................................................................................273
.................................................................................................................................274
.............................................................................................275
PolyAnalyst..............................................................................276
WebAnalyst.....................................................................................................................................278
Data Mining. Cognos STATISTICA Data Miner...280
Cognos 4Thought........................282
STATISTICA Data Miner.....................................................................................................286
STATISTICA Data Miner.....................................................................................288
Oracle Data Mining Deductor...................................................................................295
Oracle Data Mining..........................................................................................................................295
.............................................................................................................297
................................................................................................................297
Deductor............................................................................................298
KXEN...............................................................................................................................309
Data Mining .....................................................................................................................318
Data Mining-.........................................................................................................................318
.........................................................................................................................320
.........................................................................................................................322
...................................................................................................323
...........................................................................................................................................326
5

Data Mining?
" , , ,
,
- :
, ".


.
,

,
.
,
. Data
Mining . ,
, ,
.
,
, Data Mining ,
.
Data Mining :
(data) (mining).
,
.
Data Mining , ,
, , ,
, , " ",
, , "" .
" " (Knowledge Discovery in Databases, KDD)
Data Mining [1].
Data Mining, 1978 ,
1990- .
,
.
Data Mining , "Data
Mining" Google ( 2005 ) - 18
.
Data Mining?
Data Mining - ,
, , ,
., . . 1.1.
7

. 1.1. Data Mining

,
Data Mining.

- ,
, .
, ,
,
.
,
. .

.

. 1996 : " - ,
,
".

.

- ,

, .
(intelligence) intellectus, ,
, , .
, (AI, Artificial Intelligence)
.

, .
, Data Mining, .
.
, Data Mining

o , Data Mining, .
o .

o .
o .
Data Mining.
o .
o , ,
, .

Data Mining ,
.

1960- .
1968 IMS
IBM.
1970- .
1975 Conference on Data System Languages (CODASYL),
,
.
.. ,
.
1980- .
9


.

. , 1985 , SQL.
.
1990- .
- " ", "", "", "".
, ,
SQL.
DataMining, , web- .
Data Mining ,
[2]:

;
;
;
.

Data Mining

Data Mining - ,
( ) [3].
Data Mining -
(Gregory Piatetsky-Shapiro) - :
Data Mining - ,
, ,
.
Data Mining : ,
,
.
- ,
.
- ,
, ,
.
- , ,
.
- , ,
, , ..

10

(knowledge deployment)
(,
).
Data Mining.
Data Mining -
, .
Data Mining - ,
(patterns)
( SAS Institute).
Data Mining - , - ,


( Gartner Group).
Data Mining (patterns),
, ,
, .
"Mining" - " ",
.
- ,
.
.
Data Mining

Gartner Group, ,
1980- "Business Intelligence" (BI), . ,

.
1996 .
Business Intelligence - ,
,
,
.
BI
.
BI-, -
.
11

BI- (,
DSS, Decision Support System). ,
, .. .
Gartner Group Business Intelligence
:

(data warehousing, );
(OLAP);
- (Enterprise Information Systems, EIS);
(data mining);
(query and reporting tools).

Gartner ,

.
Data Mining

[4] -,
.
Data Mining (Enterprise Data Mining Buying
Guide) Aberdeen Group: "Data Mining -
.
, ,
Data
Mining .
Data Mining
, ,
, , ,
Data Mining .
Data Mining ,
" " . 75%
Data Mining , ,
. ,
,
".
(Herb Edelstein), Data
Mining, CRM: " Two Crows
, Data Mining .
,
. : Data Mining
, .
IT- , Data Mining .
, ,
. , Data Mining12

, ,
".
Data Mining,
, , ,
, .
Data Mining
, .
,
.
Data Mining
,
, Data Mining,
, .

Data Mining ""
.
.
Data Mining .
Data Mining, ,
.

,
.

Data Mining. .
Data Mining .

.
, 80%
Data Mining-.
, , ,
, .
,
Data Mining ,
.
13

Data Mining
. , Data Mining-
. ,
.

Data Mining- .

, - .

Data Mining, ,
.
, , ,
.
.
Data Mining

( ) OLAP
(verification-driven data
mining) "" ,
(OnLine Analytical Processing, OLAP),
Data Mining - .
Data Mining
.
,
Data Mining .

,
, Data Mining .
OLAP , Data Mining
.
Data Mining

Data Mining " "


. Data Mining
:

,
Data Mining,
;
,

Data Mining ;
14

Data Mining, ,
, ;
Data
Mining .

Data Mining , ,
, .
Data Mining
, , , ,
.
Data Mining
-
, .
Data Mining -
,
, :
"Amazon"
"
", Data Mining,
.
,
. - ,
- ,
(, , ..). ,
, , .
-
.
, , Data
Mining, [5]. ,
Data Mining, , , :

, ;
;
, ;
.

Data Mining
, " " (Pregibon, 1997).
Data Mining.
,
. - , Data Mining
. ,
15

Data Mining
.
Data Mining ,
,
.
,
Data Mining, - Knowledge Discovery Data
Mining (International Conferences on Knowledge Discovery and Data Mining).
WWW- - www.kdnuggets.com,
Data Mining -.
Data Mining: Data Mining and Knowledge Discovery, KDD
Explorations, ACM-TODS, IEEE-TKDE, JIIS, J. ACM, Machine Learning, Artificial
Intelligence.
: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD,
Machine learning (ICML), AAAI, IJCAI, COLT (Learning Theory).

16

, , , ,
, -.
, ,
.
, ,
.
, - ,

.

2.1 , .
2.1. "-"

18

Single

125

22

Married

100

30

Single

70

32

Married

120

24

Divorced

95

25

Married

60

32

Divorced

220

19

Single

85

22

Married

75

10

40

Single

90

17

.
- .
.
, , , ..
- , .
: , ..
, , , .
[6], ..
, .
(variable) - ,
, .
(value) .
, ,
.
, ,
, .
, ,
.
,
.
.
(population) - ,
.
(sample) - ,

.
- .
- .
.
. - ,
.
- ,
,
.
18

:
.
, - . ,
, ( )
( , ,
..), .
.
.
.

-
.
, .
- , .
Data Mining
/
(, , ).
.
.
, , .
,
,
.
. (
): 10, 15, 25 .
- ,
.
.
: , , , ..

: , , ,
.
(nominal scale) - , ;
,
.
19

, ,
.
: , , .
: (=), ( ).
(ordinal scale) - ,
,
.
.
,
" ", "
".
: (1, 2, 3-), ,
(1-, 23-, ..), ,
, .
: (=), ( ), (>),
(<).
(interval scale) - ,
, .
,
,
.
: - 19 , - 24, ..
5 , , 1,26 .
, ,
, , , .
: (=), ( ), (>),
(<), (+) (-).
(ratio scale) - ,
.
: (4 3 ). 1,33
.
1,2 , .
.
20

: (=), ( ), (>),
(<), (+) (-), (*) (/).
(dichotomous scale) - ,
.
: ( ).
,
, ,
2.2.
2.2.

(
)

( (
)
)

22

55

47

,
, , 2.3.
2.3.

( )

8
( )


( )

22

17

23

. , ,
.
.
. ,
.
,
.

21

- , (record data) [7].


: , ,
, .
- , ,
.
, ,
, .
, ,
. 2.1.

. 2.1.

: WWW-; ; (.
2.2); .

. 2.2.

, ,
, .

, , , .
, ( ,
), . 2.3.
22

. 2.3. " "


. :
Benzene Molecule: C6H6 (. 2.4)

. 2.4.

Kdnuggets, www.kdnuggets.com (, 2004 .) "


",
"" (flat table) (26% 24% ),
(14%) (11%).
: web-, XML,
, , .
,
Kdnuggets,
Data Mining.

,
. :
, , (, .).
"".
.
, Data Mining
,
.
23

- .
Data Mining /
,
Data Mining .
, " ",
. 2.5.

. 2.5.

(23%)
, . Text, CSV - 18%, 14%
Text, space or tab separated SAS; Excel - 9%, SPSS 8%, S-Plus/R - 4%, Weka ARFF - 6%, Data Mining - 2%.
,
Data Mining .
.


. .
(Database) -
.
,
,
24

.
.
,
.
,
,
.
.
.
.
- ,
.
-
.
,

- ,
, , , ,
.
(Database Management System, DBMS) ,

.
(Relational Database Management
System) - , .

( ).
.
, .
, -,
, , .

(C, C++, Pascal, Object Pascal). ,
, , , .
, ,
Data Mining, . ,
, FoxPro
, .
Access .
25

, ,
(, ),
, .
.
, :
1.
2.
3.
4.
5.
6.
7.
8.

;
;
;
;
;
( );
;
.

, ..
.
-
.

.


.
-
, .
: .
- ,
,
.
- . :
, ;
( ) ;
( ) .

:

/ ; ;
, .. ,

.

26

,

.
.
: ,
, .

? .
- ().
- , OLAP.
(dimension) - -
, .
:

;
;
-.

- ,
.
- ,
( , )
.
- - ,
, ,
.
, , ,
, , .
.
, .

;
.

.
: , .

. : ,
.
27

. - ,
, .
, ,
.

.
(Metadata) - .
: , , .
, , , ,
, , , ,
, ,
.
- .
, , ,
. -
.
- - ,
.
- ,
:

;
(, );
, , ..

.
, ,
, .
. , ,
, , .
. ,
, .

28

Data Mining
Data Mining -
(
) .
Data Mining
, .. .
Data Mining :
, , , k-
, , , , ; ,
, k- k-;
, Apriori;
, ,
.
, Data Mining -
.
,
.
, Data Mining
.
(method) , , ,
, , ,
.
.

, - ,
.
(algorithm) -
(), .
Data Mining

Data Mining [8] [9]:


1. ( ).
2.
( ).
[10],
. -
. , ,
29

, , ,
,
.
3. -
, .
, Data Mining
[11]:
( ) ->
-> ->
->
1. (Discovery)

.
.
(law) - ,
,
.
Data Mining ,
OLAP, ,
. -
. ,
,
.
:

(conditional logic);
(associations and affinities);
(trends and variations).

, , ,
.
:
25 35 1200
. ,
.
" ..., ...".
, , " < 20
> 700 , 75%
" " >35
30

> 1200 , 90%


". .
, , , :
" > 15 , >
35 65 % ".
, , :

( ,
);
(
);
( ).

,
.. ,
.

.
2. (Predictive Modeling)
Data Mining - -
.
.
:

(outcome prediction);
(forecasting).


.
( )
, ,
, .
(
) ( )
().
, .
, > 15 , 65 %
, > 35 . , > 35
> 1200 , 90%
, .

31

. .
, , .

.
: " < 20 > 700
, 75% "
, .. " < 20 "
" > 700 ",
, : - .
, , . ,
, .
:

, ;
, .

, > 15 , 65%
, > 35 .
, : -
> 15 , - > 35 .
, , , ,
, .. ( ),
, " ".
- .
3. (forensic analysis)
Data Mining ,
.
, , - (deviation detection).
,
.
, .
" > 35 > 1200
, 90 % ".
- 10 % ?
. -
, .
10% - .
[12].
32

Data Mining

Data Mining
.
Data Mining

Data Mining
.
, Data Mining
.
1. , .

/
. -
.
: , , k-
, .
2. ,
.
()
,
Data Mining.
, .

, .
,
(" ").
: ; ; ; , .
, , :
; ; ; .
, , -
, ,
.
. ,

.

. ,
, .

33

-: , () , -
. Data Mining . ,
-
Data Mining - ,
Data Mining [13].
.

- . ,
,
. ,
,
.
:

. ,
, , - , ,
, .
Data Mining :
.
[14].
, Data
Mining. Data Mining,

. ,
Data Mining.
[5, 14].
:

, ,
;
,
.

: ,

.
-

( ), ..
Data Mining.
.
34

Data mining
[14] :

(
, , , ,
, ..);
( ,
.);
( ,
, , .);
.

Data Mining
:
1. .
2. ( , ,
).
3. ( , ,
, .).
4. ( ).

Data Mining
Data Mining - ,
.
:

(, , );
( .. );
();
( , );
;
;
.

Data Mining Data Mining.


. -
Data Mining (..
) .
Data Mining
.
,
, .

35

, ,
, : k-, k-,
, ,
- , .

/ ()
() .
, ,
: , , ,
, .
Data Mining

Data Mining ,
.
, .
Data Mining :
, , , , ,
, .
- ,
, , ,
., .
3.1
[15]. ,
: , , /,
/, , /, , .
,
. ,
, Data Mining.
Data Mining,
, , , ,
, ,
.
(, SPSS, SAS,
STATGRAPHICS, Statistica, .)
( , ). ,
,
(, , , .)
.

.

36

, . , ,
Statistica, "
".

.

3.1. Data Mining

,
-

- -

-
-

(
)


-
-

/ / /


-
-
-

/ /
- /
-
-
-

37

k--


-
- /

38

Data Mining.
, Data Mining ,
. ,
Data Mining.
, , ,
Data Mining.
(tasks) Data Mining (regularity) [16]
(techniques) [17].
, Data Mining, .
: ,
, , , ,
, , , .
, , - Data Mining,
, ,
. Data Mining - ,
, , -
. ,
[18], Data Mining.
Data Mining
.
Data Mining

(Classification)
. Data Mining.
,
- ;
.
. :
(Nearest Neighbor); k- (k-Nearest Neighbor);
(Bayesian Networks); ;
(neural networks).
(Clustering)
.
. , ,
.
.
: " "
- .
39

(Associations)
.
.
Data Mining:
,
, .
-
Apriori.
(Sequence), (sequential association)
.
. ,

, , (..
). ,
.
,
, . Data Mining
(sequential pattern).
: X
Y.
. 60%
, 50%
. ,
, (Customer Lifecycle Management).
(Forecasting)
.

.
,
.
(Deviation Detection),

. - ,
,
.
(Estimation)
.
40

(Link Analysis) - .
(Visualization, Graph Mining)
.
,
.
- 2-D 3-D .
(Summarization) - , -
.
Data Mining

, Data Mining
:

;
;
.

Data Mining:
, , .
.
, .
Data Mining, ,
.
, Data Mining.
, Data Mining
.
(descriptive) ,
, .
,
, , .
.

.
.
(predictive) , ,
.
41


Data Mining : , ,
.
( )
: .
.

: .
: , , , .

,
.
.

:
.
: , , , .
, 50 , - 30 ,
- .
.
.

, Data Mining ,
Data Mining.
Data Mining.
, Data Mining -
, ,
, .
, Data Mining,
, ,
, .
, , , ,
, , , ?
42

:
1. - -
2. - -

" ", ,
.

. . 4.1. "",
"" "", .

. 4.1. ,

, .
, . ,
, ..

, , - ,
, .
, ..
. , , Business Intelligence
.

43

. . 4.2.
[17], , ,
Data Mining.

. 4.2. , ,

, (, , )
,
.
- - (
), . :
, , .
- - ,
Data Mining; :
44

( ),
, .
- Data Mining,
, ;
, , , .
, .
4.1. Data Mining
3

Data Mining

,
( ) ,
, ,
.
( ).
. - . (,
, , ). ,
, ; .
- .
- .

, , , , , , .
.
, , ,
, ,
.
,
"", "", "", "".

.
,
. , , .
, , ,
.
45

. (. informatio) 1. -;
2. , , (
);
3. () -
(), ; - ,
, , ,
.

- , - , ,
.., ,
.
, , ,
.
, , ,
.. ,
,
.


, ..
.
. " " ,
, .
. " ,
." .

.
, ,
.

.
.


. -
.
46

, .. .

.

,
.


.
.

,
( ).
,

, ..
.
.

.
.
.
, , . ,
, .
,
.
.
, ,
. ,
(, ,
, ,
).
.
,
47

; .
, ,
.

- , ,
.
, , ..
. ?

[19].
, .
. , - -
, -
.
, " -
, , , ,
, ".
, [20].
1. . " ".
2. . -
, , ; - .
3. .
" ".
- ,
- Internet .
4. . .
5. . , .
-
. .

, ..
.
"", "", ""

"", "", "",


,
. , .
-
. , Data Mining
: , ,
.

48

,
.
.
1. , , .
2. , , .
3. , , .

- - ,
, , .
.
, .
.
, , .
"" "", ,
, "", .
""
, ""
. "" .
, , ,
. :
, ,
, , .
,
.
, , .

. , ,
.
, .
, .
.
,
.
. , Data Mining
, ,
,
.

49

Data Mining.
Data Mining.
- - .


Data Mining.
.
.
- , ,
, , , -
;
, .
- ,
( ),
.
:

;
, ..
;
,
;
.

() ,
(, )
;
, ,
.
, ..
.

,
:

-
. ,
,
(.. : " ");
50

-
.
.

(, )
.
- ,
. ,
, ,
(
).
(supervised learning),
.

(.. , )
/ .
, ,
, - , , , .. ,
(, , 0 1).
,
. ,
, .
.
( ) (
).

. ,
, . (1930 .),

.
.
. ,
.
: ,
. , : 1 2.
5.1.
5.1.

1

18

25

1
51

22

100

30

70

32

120

24

15

25

22

32

50

19

45

22

75

10

40

90

. ,
.
(
), , 1 ( ) 2
( ). . 5.1 .

. 5.1.

, ,
, .

52

, ,

.
.
, ,
.

, ,
. .
( ) .
( ) :
.
(training set) - , ,
() .
() .
.
(test set) .
.
[21]:
.
1. : .
o .
o ,
.
o ,
.
2. : .
o () .
1.
.
2. -
.
3. , .. ,
, .
o ,
, .

, , ,
. 5.2. - 5.3.

53

. 5.2. .

. 5.3. .
,

. :

;
() ;
;
;
, , ;
54

;
CBR-;
.

(
, ) . 5.4 - 5.6.

. 5.4.
if X > 5 then grey
else if Y > 3 then orange
else if X > 2 then grey
else orange

. 5.5.

55

. 5.6.
:

-. (Cross-validation) -
, - .

.
, , ,
-.

, -
- .
. ,
,
.

, [21]:
, , , .
,
.
, .. - ,

.
.
:

;
56

, "
".
, ,
, ,
, .
,
, ,
.
"" " ",
" " "".

( ).
, "
".
- .
,
, "
".
"" :
"". (cluster) "", "".
, .
:

;
.

, , ,
, .. .

, , .
-
.

57

5.2
.
5.2.



,
,

. 5.7 .

. 5.7.

58

, (non-overlapping,
exclusive), (overlapping) [22].
. 5.8.

. 5.8.

,
. ,
"" , "",
..,
.

(, )
.
,
- .

,
.
.
,
.
.
, ,
.
[21].

, (Partitioning algorithms), .. :
o k ;
o .
(Hierarchy algorithms):
o : , ,
, ..
, (Density-based methods):
59

;
, .
- (Grid-based methods):
o -.
(Model-based):
o ,
.
o
o

;
;

;
.
, .

.


.
, , ,
, ..
-
.
,
.

. ,
.
, , (Hartigan,
1975).
, ,
, , ..
..
.

, ,
, . -
. -
.

60

-
, ,
.
, ..
,
, :

,
. , (1974),
(1981).
, , ,
.. , ,
. ,

.
, ,
.
, , .
,
.
, , , .

,
.
1971
, .
1974 (Sexton),
- ,
. ,
.
1981 ,
,
.

.
,
. .
61

, Data Mining,
" ",
, () . ,
, Data Mining, " ",
.. .
, .
. :
k- ( ),
( ), SOM.
.

62

Data Mining.

Data Mining.
.


, , , .

, .

, ,
,
Data Mining.
, ,
Data Mining, ,
.
( Prognosis), ,
.
.
(forecasting) Data Mining
.
(prognostics) - .

, ..
. ,
.
-
.

.
, .
: ,
, .
(market
forecasting).
63

, ,
( , ,
).
:

(, );
, ;
.

,
: , ;
.

:

;
.

.
.
Data Mining
. , , , ,
( - ).
.
?


.
,
, -
, ( ).
, ,
,
, ,
, .

,
.
Data Mining (Time-Series Data Mining).
64

[23].

Data Mining. . 6.1
Data Mining . , (23%)
. ( 14%),
( 9%), (8%).
6%.

. 6.1. Data Mining

,
.

:

, ,
.
.

- - ,
.


. ,
.
65

- .
, ,
,
.
:

;
.


: , , ,
.
"" .
, , ,
.
.
.
,
Data Mining.
,

.
,
.
,
.
,
, .
, , .
.
(..
),
.
,
[24]:
, , .

,
.

66

, ,
,
.
,
.

, .. .

.
,
, ,
.
,
.
. . 6.2. ,
" " , .
, ,

.

. 6.2.

( 12 ),
. 6.3, . ,
,
.

67

. 6.3. 12-

,
, , , .
,
.
,
.
, .
.
, () .
- ,
.

,
, .
.
, . , ,
, ,
.
.
. (..
); .
:
1. , , , ;
2. , , .

68

-
, , ,
.
:
1. ?
2. ()?
3. ?

, ,
. , ,
, ..
,
, , Data Mining.

, .
-
:

;
;
.

- , .
, .
- .
- , .
12 , ,
- , - 12 .
- , .
.
.
,
, , ,
. .
, , ,
- .

69

, ,
, , , ,
.
:
.
,
, - .

.

, ,
.
.
,
.
. .
.
:

(). .
-
.
(). .
, .
, " " .
(SSE), .
( ) .
.
().
.
.

, .
, ..
3%
1-3 .
- 3-5% , 7-12
;
.
.
70

- 5% .

, "" ,
"".
,
. ,
.
1. , ,
, .
2.
.

.
, ,
,
(
).
, , -
.
, .

Data Mining, ,
. Data Mining,
, .
,
.
, .
Data Mining, ,
.
,
, ,
, , .

- ,
,
,
[25].

71

,
, , CHI ACM-SIGGraph, ,
, "IEEE Trans. visualization and computer graphics".
.
,
, .
,
.
,
.
, :

.
, Data Mining , .
-
.
, .
,
.
- ,
.
: , , ,
..
:

;
;
(), .

.
. , " "
2000 2005 , 6.1.
6.1.

2000

1100

2001

1101

2002

1104
72

2003

1105

2004

1106

2005

1107

Excel .
.

, .
, ,
, . 6.4.

. 6.4. , y 1096

2000
2005 . , y,
, , x , 1096. ,
y 1096 1108 .
, y, ,
. 6.5.

73

. 6.5. , y 0

0 2000
.
,
, , ,
, [26].

. ,
Data Mining.
Data Mining.
, -, -,
, ,
, , " ",
.

Data Mining
Data Mining.
, ,
. ,
Data Mining - ,
- . Data
Mining.
,
, Data Mining .
[16] Data Mining:
.
,
Data Mining -. , ,
Data Mining ,
, 1000%
.
Data Mining
, .
Data Mining
[22, 27]: , , Web-.

Data Mining -. :
, , , CRM, , ,
, , .
74

Data Mining .
: , ; .
Data Mining . : ,
, , , ,
, , , .
Data Mining Web-. :
(search engines), .

Data Mining -

Data Mining
.
" ?"
Data Mining -
.
" ?".
Data Mining
, ,
.
Data Mining.
()
, .
" ?" Data Mining
. (
); , ,
"" ;
(" ", " ").
.
Data Mining "
" " " .

.
.
Data Mining ,
,
- , ,
.
.
.
75


, Data Mining,
.
.
.

" ",
, .

. Data Mining
, ,
, .

. ,
Data Mining, .
, ,
.

.

-.

Data Mining
, ,
, - .

. ,
.
,
. , ,
.
, ,
, .

Data Mining
Web-.


. Data Mining
Web Mining [28].
76


Data Mining
.
,
;
.. ,
, Data Mining.
Data Mining [29]:

;
;
;

;

;
;
;

;

;

;
,
.

Data Mining .
" ?", " ?", "
?"
, ,
, , ,
.
-
.
.

, , :
77

(
, ).
,
..
, ,
.
,
.

, Data
Mining [30]:


;
( - , , )
(, ..);
, ,
;
;
;
;
;
.

, Data Mining
,
.
Data Mining CRM

Data Mining -
CRM.
CRM (Customer Relationship Management) - .
"
" .

, , ,
. CRM
, .
: ,
, , .
Data Mining, ,
, ,
.

78

Data Mining
. ,
.
.
, .

.
Data Mining
, .
,
Data Mining, ,
. CRM Data Mining
.

,
, . :
,
,
( , .).
10 . ,
- Accenture.

,
(Data Mining),
.
(, , e-mail,
),
.
, , .
, ,
, , ,
. - ,
.
Data Mining

Data Mining - ,
,
.
, ,
.
79

, Data Mining

.
, ,
, , . Data
Mining .

Data Mining
.
, ,
.
.

Data
Mining, - (Microarray Data
Analysis, MDA). Microarray Data Analysis
[22].
:

;
;
;
.

Data Mining -
; ,
; .
, Data Mining "
" - , .. ,
.
Data Mining
.

Data Mining
. Data Mining - ,
.
, Mining
"".
80

Web Mining

Web Mining " Web". Web Intelligence Web


" " .
,
,
.
Web Mining , ,
Web-, Web-
,
.
Web Mining ,
,
. , Web Mining
Data Mining , ,
, Web-.
Web Mining [31], :
Web Content Mining Web Usage Mining.
Web Content Mining
, "
".
.
, , : ,
, , .
, (Agent Based Approach), :

(Intelligent Search Agents);


/ ;
.

Harvest (Brown ., 1994),


FAQ-Finder (Hammond ., 1995),
Information Manifold (Kirk ., 1995),
OCCAM (Kwok and Weld, 1996), and ParaSite (Spertus, 1997),
ILA (Information Learning Agent) (Perkowitz and Etzioni, 1995),
ShopBot (Doorenbos ., 1996).

, (Database Approach), :

;
web- (Web Query Systems);

web-:
81

W3QL (Konopnicki Shmueli, 1995),


WebLog (Lakshmanan ., 1996),
Lorel (Quass ., 1995),
UnQL (Buneman ., 1995 and 1996),
TSIMMIS (Chawathe .., 1994).

Web Usage Mining


Web- .
:

;
.

,
Web-.
Web Usage Mining :

;
;
;
.

Web Mining .
, - .

, ,
, .
Web-
.
Web Mining [31] :

Web Mining.
,
, ;
.

Text Mining

Text Mining ,
. Text Mining KDT
(Knowledge Discovering in Text - ).
Data Mining,
, Text Mining
.
, ,

82

. , Text Mining , -
.
Call Mining

[32], " "


.
Call Mining , Data Mining.
- -,
.
, ,
.
Call Mining ("" ) CallMiner, Nexidia, ScanSoft, Witness Systems. Call Mining
-
.
, ,
CallMiner. Call Mining
, ,
.
, , .
- -
Nexidia. ,
. .
.
, Call Mining
. ,
, , .. , ,
,
- .
, Datamonitor: "
".
Nexidia 100
300 . . CallMiner
450 . .
, Audio Mining Video Mining
, ,
. Audio Mining Video
Mining , -
.

83


,
, , .
,
.
- , ,
. , ,
.
,
,
.
. Microsoft Excel
,
.
, ,
, -
.
Microsoft Excel

Microsoft Excel .
, .
.
. , ,
. ,
/ " ".
, .

(Descriptive statistics ) -
,
, .
- ,
.
, 8.1.
8.1.
x

y
84

12

15

17

19

21

23,4

10

25,6

11

27,8

" " "


", ,
.
: ;
; ; ; ; ; ;
; ; ; ; ; .
" "
8.2.
8.2.
x

6,5

17,68

0,957427108 2,210922382

6,5

3,027650354 6,991550456

9,166666667 48,88177778

-1,2

-1,106006058

-0,128299221

20,8

18

85

11

27,8

65

176,8

10

10

(1)

11

27,8

(1)

(95,0%)

2,16585224

5,001457714

, .

,
.
, . ,
: ,
.
- ,
.

. , ,
, .
:
, . "" ,
.

.
.
,
. " "
" ",
.

.
.
.

86

,
.
, "" .

.
.
.
.
,
.
- ,
.
.
,
(n+1)/2, n - .
n/2
(n+2)/2.

.
,
,
.

.
- .
- .
- .
- .
- - ,
.
" " ,

.
( ).
( ).
87

,
, , , , (,
). .

. ,
, .
, ; .
(outliers) - , .
: .
.
.
, ,
, , (),
.
, , ,
.

"" , , ,
..
.


, .
, .
, r,
.
( ) , .
,
-1 +1 .
. 8.1.

88

. 8.1.

r,
-1,0 1,0 ,
.

:

x - ;
y - ;
n - .
- :
.
,
:


( ) - ;
(
) - ;
( ) -
.

( 8.1).
x y.

, x y. ,
, . 8.2. ,
x y,
x y.
.

89

. 8.2.

, x y.
(x y)
MS Excel (1;2).
0,998364, .. x y
. MS Excel "",
.
:
. ,

. , .
, .
. ,
, .
.

:
,
.

.
1. .
.
2. () .
3. . ,
.
4. ( ,
).
90

5. (
)
6. .
7. .
.
.
8. .


.
.
:
, , ,
, , , - .

: ,
, .
.

:

( );
;
;
( );
;
.

,
. .
.

, ,
.
.
.
:


, .. ; .
, ..
; .
91


.
() .
, .
, .. ,
. ,
.
, ..
, .
. ,
.
.
.
, ,
, .


.
.
: Y=a+b*X
Y a
( ) b, X.
a , -
B-.
( )
.
- ()
( ).
MS Excel
" " "". X Y.
Y - ,
. X - ,
.
16.
, 8.3
- 8.3.

92

8.3.

R

0,998364

R-

0,99673

R-

0,996321

0,42405

10

, 8.3, .
R-, ,
.
( ).
[0;1].
R- ,
, .. .
R- , ,
. ,
R-, , .
0,99673,
.
R - R -
(X) (Y).
R ,
.
R
. , R
(0,998364).
8.3.
t-
Y- 2,694545455

0,33176878

8,121757129

X 1 2,305454545

0,04668634

49,38177965

*
93

, 8.3.
b (2,305454545) , .. a
(2,694545455).
, :
Y= x*2,305454545+2,694545455


( ) ( b).
- ,
.
, , .
- ,
().
8.3. .
, ""
"".

8.3.
Y

9,610909091

-0,610909091 -1,528044662

7,305454545

-0,305454545 -0,764022331

11,91636364

0,083636364 0,209196591

14,22181818

0,778181818 1,946437843

16,52727273

0,472727273 1,182415512

18,83272727

0,167272727 0,418393181

21,13818182

-0,138181818 -0,34562915

23,44363636

-0,043636364 -0,109146047

25,74909091

-0,149090909 -0,372915662

10

28,05454545

-0,254545455 -0,636685276

94


.
- 0,778, - 0,043.
, .
8.3. , ""
.
,
.

. 8.3.


, ..
.
, Y=
x*2,305454545+2,694545455 x.
Y 8.4.
8.4. Y
x

Y()

11

28,05455

12

30,36

13

32,66545

14

34,97091

15

37,27636

16

39,58182
95

, Microsoft
Excel :

;
, ;
;
;
;
.

, ,
, ,
.
, , ,
.


, , , ,
. .

,
, .

.
, .

96

.
(decision trees)
. Data Mining
, .
,
.
, .. ,
.
,
, ..
.
(Hoveland, Hunt)
50- . .,
- " " ("Experiments in
Induction") - 1966 .
-
, . - ""
"" .
. 9.1 , - :
" ?" , .. , ,
( "" " "). ,
, .
"?" , .. .
,
, - . ,
.
.., ,
. : "" "
" .
( )
, .. - ""
" " .

97

. 9.1. " ?"


.
, :
: "?"
: " ?", "
?"
, , : "", " "
( ): "", "".
, ..
.
.
, ..
("" "").
, .
, , , ,
, .
. ,
,
, : , , ,
, . ,
98

( ) ,
.
, ,
: .
, ,
. ,
, ,
" ?"
, " : :".
. 9.2. ,
" ?". ,
.

. 9.2. " ?"

, (, ,
) .
99

, (splitting attribute).
, , ,
"" " " .
, , .
.
:
-.

(splitting criterion) [33].
. 9.2.
. , " ?",
: "" " ".
.
, ( )
,
.

.
.
"" [34].
,
.

. ,
, .
, , ,
, " ",
.
,
. ,
.

. : > 35 > 200, .
,
.

( ).
, ,
. , ,
, ,
.
100

, ,
( ,
).
,
;
, ,
, .. , . :
SLIQ, SPRINT.
.
,
, , .

.
,
, ,
, .
,
, , ,
, ,
. , , .
, Data Mining,
.

,
, .

.
""
"" (tree building) "" (tree pruning).
(
).
.
.

, .. .
,
, ,
.
. :
, ,
101

,
. ,
, "",
.
. -
Gini.

,
" "
(information gain measure) .
, (Breiman) .,
CART Gini.
.
T, n , Gini, .. gini(T),
:

T - , pj - j T, n - .
, ""

,
. ""
"", ,
,
. ""
, .
, ,
, ,
" " (Breiman,1984).
?
, ,
[39]. ,
, ,
, .
.
, .
,
102

" " ,
.
, "
" , , . 1984 . ,
, ,
.
,
, : ;
.
,
. :
;
.

. ,
, ,
, .. .
- ,
.
- " " (prepruning),
. .
. "
" (Breiman, 1984).
- .
, .
- ,
.
,
.
, ,
, [35].


(pruning) .
, ,
: .
103

,
, ,
.
,
, ,
.
,
. , ..
. ,
. , ,
.
,
, .
.
.

,
: CART, C4.5, CHAID, CN2, NewId, ITrule .
CART

CART (Classification and Regression Tree), ,


. 1974-1984
- Leo Breiman (Berkeley), Jerry Friedman (Stanford), Charles Stone (Berkeley)
Richard Olshen (Stanford).
, .
CART .
.
.
CART:

;
;
;
.

,
. ,
. ,
, .
( right) - , ; ( left) , .
104

,
, - Gini - . ,
. , ,
.
50 , -
100 0 .
. , CART
.
. ,
xi <= c, c
xi .
,
xi V(xi), V(xi) -
xi .
. , minimal cost-complexity tree
pruning, CART
. -
" "
.
, , "
".
(V-fold cross-validation)
CART.
, ,
,
.
, CART: ,
- Gini, minimal cost-complexity tree pruning V-fold crossvalidation, " , ", ,
.
C4.5

C4.5 .

. C4.5
.
C4.5 :


, .. .
.
.

.
105

- C4.8 - Weka J4.8


(Java). : C5.0, RuleQuest, .
C4.5 .
CART C4.5.
, .. .
:

- (binary), (multi-way)
- , Gini,


.


,
.

,
- , ..
.
, - Sprint,
[36]. Sprint,
CART,
.

;
,
.

, .
, , , ,

.

106

.
. " ".

;
: ,
( )
.

(Support Vector Machine - SVM)


. .
.
.
(plane) .
.10.1 , .
, - brown (), yellow (). , ,
brown - yellow,
. .

. 10.1.

- , ;
. 10.2.
: , - .

107

. 10.2.

, , ..
; . 10.3.

. 10.3.

, .
, .
. 10.3. , .
SVM


,
. , .
108

f(x),
- .
, ..
f(x), ,
.
f(x). b, ..
f(x)=ax+b. . 10.4.
, .. SVM-, ,
-
.
.

. 10.4. SVM

,
.
.
,
. ,
.

.
- .
,
,
.
,
, ,
.
109

- , ..
, ,
.
, .
,
,
.
: SVM- ,
, .
- -
,
.
, SVM ,
.
,
, , .
, ,
, .
, ,
.
[37, 38]:


( );
,
.

" "

, " " ("nearest neighbour")


,
.
,
( ) .
, ,
(, ..).
" "
, ..
"k- " ("k-nearest neighbour").
, k "" ()
" ".
, "" .
110

(Case Based Reasoning, CBR),


, .
- ,
.
, , :

;
, ,
;
, , ;
, ;
;
.

, , ,
,
, .
" ", ..
"" ,
.

, ,
- , .
, CBR- .
,
"" .

.
,
.
, .

" "

- ,
, -
, , .
"" ().
,
.
.

, - .
-
.
111

.
k-
().

. 10.5. ( )
"+" "-",
("+" "-"), , ,
. .
()
. ,
, : "+"
"-".

. 10.5. k

k-
.
, .
. k ,
(..
).
5. ,
(
( ) ). 2 "+" 3
"-" , k- "-"
.

112

k-
.
.
, . 10.6. (
) x
y ( ). (..
); k-
X ( ).

. 10.6. k

k-
, .. k, .
( ) X.
- (x4 ;y4). x4 (.. y4), ,
X (.. Y). ,
: Y y4 (Y = y4 ).
, k , ..
. X .
y3 y4 . ,
Y Y = (y3 + y4)/2.

,
Y X k .

113

,
.
, - .
k-
, "
".
k-
k. ,
.
k,
. ,
. , ,
k.
, ,
, , k
.
, k ,
() .
k -

k - - (Bishop, 1995).
, , STATISTICA (StatSoft) [39].
- - .
- v "". V ""
.
k k-
v- ( )
.
,
( ).
v.
v "" (),
(.. ).
k, ,
( ),
( -).
, - - ,
,
.

114

k - .
,
, ,
.
k-
.

Dell,
Inference. ,

.
, ,
. CBR
Intranet Dell.
Data Mining, k- CBR-,
. : CBR Express Case Point (Inference Corp.),
Apriori (Answer Systems), DP Umbrella (VYCOR Corp.), KATE tools (Acknosoft, ),
Pattern Recognition Workbench (Unica, ), ,
, Statistica.

: , ,
.
[11].

[40],
Data Mining.
- (naive-bayes
approach) [43] ,
. ,
"" .
"" - .
"" ,
.
:
1. .
2. :
o ;
o , ..
.

115

,
, ,
; .
, ,
. ,
?
, .
.
, - .
?
.
Data Mining [41]:

,
, ;

", ";
,
, , , , ;

(overfitting), ,
(, ).

- :

,
;

,
, [42];
-
, ;
[43];
-
,
[43].
,
, .


. (Paul Graham).
.

116

- ,
.
, .
- " - ".
, "
" , , .
" " ,
, ,
. ,
" ", ,
, .

117

.
,

.
(Neural Networks) - ,
, ,
( ).
,
, - .
.
-
, , , ,
, ,
.
, , ,
, ,
.
.
.
,
, .
, .
, - ..
Data Mining, ,
:

( ). :
, , .
.
: ,
( ). ,
.
( ).
.
, , .
.

,
.
118

.
.
,
.
( ).

.
. ,
, ,
"".

( ) - ,
.
-
, .

, ,
, (
).
. 11.1.

119

. 11.1.

, .
- , (
) .
( wi).
:

:
y = f(s).

, , -
, .
:

.
.
.


, .
- ,
( )
( ).
() - ,
.

, .

.

.
120

- , ,
[44].
- [45, 46].
.
- ,
.
- ,
() , .
i- ,
(i+1) . k- ,
.
, .
,
- .
, , , ,
, , .
. , ,

.
, , [46].
- , .
- , .
, -
, - . , -
, .
, - .
, , .
(input neuron),
.
(hidden neuron) - ,
.
(output neuron), ,
.

121


, .
.
.

.

.
,
.
- ,
, ,
.
.

.
.
, - ()
. ,
.
.
,
, .
.
,
" ".
.

.
,
.
,
, .

() .
.

122

- ,
.

. , .

.

,
(overfitting).
, -
,
.
,
.
, ()
. .
( )
.
"" , . .
-
( ).
.
. .
,
.

. -
.
,
.
, .
. 11.2.

123

. 11.2. .

.
,
, ..
.

. ,
.

:
.


[47]. - .
( ) - ,
.

. 1960-
.

124

- . 11.3.

. 11.3.

, , n , ,
3 .
.
(MLP) -
( ), ,
.
, - .
.
,
, .
.

.
. 11.4.

125

. 11.4.

, , n . ,
3 , .
. , ,
.
(Back propagation, backprop) -
,
.
, , ,
.
.

, ,
.
:

(
).
.
126

,
.
: BrainMaker, NeuroOffice,
NeuroPro, .
: ,
, , ,
. .

" "
Deductor (BaseGroup).
,
, : , , , , ,
, , , .
, , ,
, , ..
" ?".
, .. .
credit.txt.
.

- .
. - " ", - .
. 11.5.

127

. 11.5. " "


.
"". . 11.6.

128

. 11.6. " "

, ..
- 33 ( ),
- 1, - 1 ( ).
- , . .
11.7.

. 11.7. " "

.
" ", . 11.8.

129

. 11.8. " "

.
, 0,005,
10000.

.
, 4536 83,10%
, - 85,71% .
. 11.9.

130

. 11.9. " "


. :
, , ", ",
[48].
. 11.10 . ,
, .. 55 , ,
89 , .
, (1 4). ,
- 96,64%.

. 11.10.

"-" .
,
131

" " - "" "", ..


.
Matlab

MATLAB (The MathWorks)


. MATLAB "Neural
Network Toolbox"
.
MATLAB ,
,
, ,
.
Matlab.
15 -
- . .
.
.
, 15 (
), 8 1 (
).
: - logsig, - logsig,
- purelin.
Matlab :
Net=netff(PR, [S1,S2, : , Sn],{TF1,TF2, : , TFn},btf, blf, pf),

PR - R ;
Si - i- ;
TFi - i;
btf - , ;
blf - , ;
pf - .
,
, tansig, logsig, purelin.
Net=netff(minmax (P), [n,m, l],{ logsig, logsig, purelin },trainpr),

P - ;
132

n - ;
m - ;
l - .
. ,
, : Net.performFcn='SSE'.
10000
: net.trainParam.epochs=10000.
:
[net,tr]=train(net,P,T);

, , nn1.mat.
:
save nn1 net;

,
, .
Matlab
, Neural Network Toolbox.
Neural Network Toolbox
[49, 50].

133

. .

- .
.

,
, .
.

(, , ).


.
-
.
.

( ).
( ).

,
.
- ,
.
:
.

.
Back Propagation.
.
.

. -
, .
134

.
- , ,
,
.
.

.

.
. ,
.

, .

.
, ,
. .
,
.


.
. ,
, .
.
.
().
,
.
,
.
,
. ,
0 10,
.
.
, , .. ,
, ,
.
,
135

, , , [0..1].
,
.


. [44,
51, 52].

.
,
[53]:
1. ,
;
2.
;
3. ( , ,
- , .)
.


.
,
. , ,
, ,
, , ,
- ,
(
).

(Self-Organizing Maps, SOM)

, , - ,
,
. ,
,
. ,
.
(1982 ).
-
.
.
.
136

, ,
(, ,
).
,

,
, , ,
.
-
, .. .
, ,
( ).
, .. , .

.
:
[39].
. ,
. ,
, .
, ,
.
, , - ,
.
.
.
, ,
.

, , ;
: .
. , ,
. . 12.1

137

. 12.1.

.
,
, .
.

(,
).
.
"" "-".
,
( )
.
,
.
,
.
. ,
.
,
,
"".
, .. -.
, [39].
,
. ,
, .
.

138

( )
.
n-
. ,
.
,
.
,
.
. 12.2

. 12.2.

? .12.3 , , i-
( pr_a), . , -
, -
.

. 12.3. i-

139

, .12.2, ,
( ,
), - ( ,
).
, ( ) :

,
;
.

;
;
.

. ,
[15:30] , 15- 30-
. , .
.

. ,
.
.
- ,

.

. .
.

. -
, .
. , ,
, ,
.
, -
.
.

140

, ,
. ,
,
, - ;
.
, , SoMine,
Statistica, NeuroShell, NeuroScalp, Deductor .
Deductor.

. , ..
, - 21.
"banks.xls".
.
xls- .

" ". , ..
: , ,
. ,
, "". "" .
,
. , 95% - 5%.
,
.
5, . 12.4 :
Y ( ).

141

. 12.4. 5 " "

" ",
. 12.5, ,
.

142

. 12.5. 6 " "

, . 12.6,
: , .
:

( ). ,
.

. 12.6. 7 " "

- ""
.
.
,
.
" "
"-". ,
. 12.7.

143

. 12.7. " 10 "

, , ,
" " .

.
, , . 12.8

. 12.8.

.
.
,
. , ,
144

, .
,
.
, , : ,
, du (
) akt ( ) pr_a
( ).
,
: ,
, , ..
, .
, , .
(. 12.9) ,
- . (
) ,
.

. 12.9.


" ".
. 12.10. ,
, . ,
, .

145

. 12.10.


,
.
7 ,
, .


.
. ,

.
, :
- , - .
, - .

.
.
146

.
.
"" ,
- .
, (Tryon) 1939 ,
100 .
,
,
,
( , , ). ,
.
, .
,

.
,
, , .,
. .
:
1.
2.
3.
4.

.
.
.
, (),
, .

,
.
.
, , 14- ,
X Y. 13.1.
13.1.
X Y
1

27

19

11

46
147

25

15

36

27

35

25

10

43

11

44

36

24

26

14

10

26

14

11

45

12

33

23

13

27

16

14

10

47

. X
Y , . 13.1.

. 13.1. X Y

148

"" . (),
X Y "" , ();
.

. "",
.
, , .
- i j
, X Y:

(13.1)
: ,
, ,
.
, , :
, ()
. ,
( . 13.2),
(13.1) :

(13.2)

. 13.2.

: , ,
, .
- .
149

- .
,
. ,
.
. .
- ,
.
,
.
, .
, .
.
.
() .
, -
.
. .
,
: 100 700,
- 0 1.
, ,
, , , ..
,
, .. . -
.
.
(standardization) (normalization)

,
. .
:


;
Z- .

,
, ,
. ,
- .

150


.
,
, .

;
.

.
,
. .
.


.
(Agglomerative Nesting, AGNES)

.
.
.
,
.
() (DIvisive ANAlysis, DIANA)
.
,
,
.
.
13.3.

151

. 13.3.


Data Mining,
. , SPSS,
- Statgraf.
.
,
"" ( )
( ).

.
.
( dendron ""), .
,
()
.
(dendrogram) - , n ,

.
, ,
.
,
.
152

.
.
. 13.4.

. 13.4.

11, 10, 3 ..
. ,
( ), : 11
10; 3, 4 5; 8 9; 2 6. :
11, 10, 3, 4, 5 7, 8, 9. ,
.


( ), .
, .
.


.
( ),
"" "-" .
.
,
. , ,
,
.
. ,
"", -
.
. ,
.
153

,
. -
? ,
.
.

( ) .
,
.
"" "" ,
" " ,
.
.

(.. " ").
, "".

"", .
(Ward's method).
,
(Ward, 1963).
, .
,
, .. .
""
.
(
- unweighted pair-group method using arithmetic averages,
UPGMA (Sneath, Sokal, 1973)).

. ,
"", "" ,
.
(
- weighted pair-group method using arithmetic averages, WPGM A
(Sneath, Sokal, 1973)). ,
,
( , ).

.

154

(
- unweighted pair-group method using the centroid average (Sneath and Sokal,
1973)).

.
(
- weighted pair-group method using the centroid average, WPGMC (Sneath, Sokal
1973)). , ,
( ), .
,
.
SPSS

SPSS (SPSS).
SPSS
( ), () [54]. ,
, - .
,
.
, .
, .
N-1. ,
. ,
. ,

.

. SPSS :

(Between-groups linkage),
.

(Within-groups linkage).
- (Nearest
neighbor).
(Furthest neighbor).
(Centroid clustering) .
,
, .
- ,
(Median clustering).
.

155

( )
13.2. :

Stage - ();
Cluster Combined - (
);
Coefficients - .

13.2.
Cluster Combined

Coefficients

Cluster 1 Cluster 2
1

10

,000

14

1,461E-02

1,461E-02

1,461E-02

1,461E-02

13

3,490E-02

11

3,651E-02

4,144E-02

5,118E-02

10 4

12

,105

11 1

,120

12 1

1,217

13 1

7,516

, Cluster Combined :
9 10,
9, 10 .
2 14, 3 9, ..
Coefficients ,
;
, .
,
156

. ,
, .
SPSS :

Z- (Z-Scores). ,
.
-1 1.
-1 1.
0 1.
0 1.
1. .
1. .
1. .

, , ,
, .
, 0 1.

.
.
/ .

, .
,
. ,
,

.
13.2 , Coefficients ,
, ,
, .
1,217 7,516.
, (14)
(12).
,
, .
.

( , 0 25).
. 13.5.
, 9 5 .

157

,
25 .

. 13.5.

158

. .

. ,
,
. ,
.

. .

, .. ,
" ".

k- (k-means)

k-,
.
(Hartigan and Wong, 1978).
,
,
.
k- k ,
. , k-, () ,
, . k
, .
: k
, ( )
.

1. .

k, "" .
.
:
o
o
o

k- ;
k-;
k-.

.
159

2. .

,
. .

, :
o
o

, ..
, ;
.

. 14.1 k- k, .

160

. 14.1. k- (k=2)

.
, 2 , 3, 4, 5 ..,
.

k-
(.. , ).
.
161


.
k-:

;
;
.

k-:

, .
k-;
.
.

PAM ( partitioning around Medoids)

PAM k-, k- (k-medoids).


, k-means,
.
PAM ,
.

. ,
. 25 .

.
.
,

, . ..
.
; - .
.

- ,
.
, :

;
162

-
.

,
.
, ,
.

. .

, .
"" .
, .
,
.

, .
,
.
- ,
.
.
- -
.
SPSS

,
(,
), (,
).
SPSS.
.
: Analyze ()/Data Reduction ( )/Factor
( ):
Extraction:() .
, .
- -
. ""
"Save as variables" ( ).

163

"
", - ,
.
, fact1_1, fact1_2
.., k-.
:
Analyze ()/Classify()/K-Means Cluster: (
k-).
K Means Cluster Analysis ( k-)
fact1_1, fact1_2 ..
. .

, ,
, .
, k-
.
, , .
, ,
, . ,
,
,
.
:

;
;
.

,
.
SPSS, , (,
), (, ) ,
,
, ,
.

-. -
. ,
. ,
, , ,
. ,
.
164

, ,
, .
,
.
.
( ).
( ).
,
.
. :
;
;
,
; .
.
.
, .
:

,
;
-;
;
;
.

-
.
,
.
,

,
, .. , .
,
.
,
.
165

,
.

, .
, ,
- .
.
.
, :

.
.
.
.
, ,
, .
.
. ,
,
.

,
.
, .

, ,
, . ,
, "".
, ,
. .
,
. ,
- ,
, ,
, . ""
.
, ,
, .
: ;
; .
-
.

166


, , .
.

.
, .
.
,
.
, , .. ,
.
.
,
,
.
,
.

, , ""
. ,
, : ,
.
, ,
, . ,
, - .
, :
;
.
,
.
.
(summarized cluster representation), ,
[33].
,
. : BIRCH, CURE, CHAMELEON,
ROCK.
BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies)

[55].
,
, .
167

.
.
-
.
[33] , .
, ,
""
.
, ,
, ,
.
WaveCluster

WaveCluster
[56].
.
, ,
.
.
.
WaveCluster:
1.
2.
3.
4.

;
;
;
.

CLARA (Clustering LARge Applications)

CLARA Kaufmann Rousseeuw 1990


.
, , S+.
. CLARA
. ,
.
, PAM.
.

.
Clarans, CURE, DBScan

Clarans (Clustering Large Applications based upon RANdomized Search) [14]


.

168

, . ""
. Clarans
.
,
.
CURE [57] , DBScan [58],
(density).
BIRCH, Clarans, CURE, DBScan
, ,
. ,

[59].
,
- ,
.

169


, - Data Mining.
(association rule)
.
:

?
?
?
?

, .
. , , ,
, , .
.
:

: , ;
; ;
;
: , A,
, ?
: , ;
: ,
;
, ,
( );
Web-.

: ,
, 50%.

(association rule mining)


, ,
(market basket analysis).
- ,
.
, , ,
.
- , .
170

- ,
.
, .

, .
(Transaction database)
, (TID) ,
.
TID - , .
, ,
15.1. (TID) ,
,
.
15.1.
TID

100 , ,
200 ,
300 , , ,
400 ,
500 , , ,

,
.

, D.
( 15.2).
= a
= b
= c
= d
= e
171

= f
15.2.
TID

TID

100 , ,

100 a, b, c

200 ,

200 b, d

300 , , ,

300 b, a, d, c

400 ,

400 e, d

500 , , ,

500 a, b, c, d

600

600 f

(Itemset), , , {, , }.
:
abc={a,b,c}

, ..
3:
SUP(abc)=3.

, , abc
.
min_sup=3, {, , } - .
,
.
, ,
50%.
SUP(abc)=(3/6)*100%=50%

.
, ,
(min support).
(frequent).

172

: " A B".
:
" ( ) A,
, B)"

, .

.
" " ,
15.1.
. .
s, s%
A B , , .
- A, - B. "
" 3, 50%.
, , A
B.
" A B" , c%
, A, B.
, , , ,
, , (3/4)*100%, .. 75%.
" " 75%, .. 75%
, , B.


" A B",
. ,

.
,
.
, ,
, ,
. , , "
", ,
- .
173

,
. , 3%
.

AIS. , AIS [62],


( Agrawal, Imielinski and Swami)
IBM Almaden 1993 .
; 90-
,
.
AIS "
", .
SETM.
SQL . AIS,
SETM " ",
. SQL
, SETM .
AIS SETM -
, .
Apriori [63].
,
:

;
.

(candidate generation) - , ,
, i- (i - ).
.
(candidate counting) - ,
i- . ,
, (min_sup).
i- .
Apriori D.
. 15.1. 3.

174

. 15.1. Apriori
175

.
.
, 3, . e f,
, 1.
: a, b, c, d.
,
, 3.
, ab, ac,
bd, .
,
: abc, abd, bcd, acd,
, 3. abc
.
Apriori , - - ,
,
.
,
.
,
,
.
ad, bc, cd ,
abd, bcd, acd.

, .
,
abc
( ).
Apriori ,
. (negative border),
-, ,
,
.
Apriori

Apriori
. Apriori,
,
, - [31].
Apriori: AprioriTID AprioriHybrid.

176

AprioriTid

- , D
.
,
.
, , .
AprioriHybrid

Apriori AprioriTid ,
Apriori , AprioriTid; AprioriTid
Apriori . ,
-. ,
AprioriHybrid ,
Apriori AprioriTid. AprioriHybrid Apriori
AprioriTid, ,

. , Apriori AprioriTid
.

, Apriori.
, [31,
33].
- DHP, (J. Park, M.
Chen and P. Yu, 1995 ). - ,
Apriori [63, 64].
, k- -
. k-1
-. -
k- , . k k -. ,
Apriori, -, -
, .
: PARTITION, DIC,
" ".
PARTITION (A. Savasere, E. Omiecinski and S. Navathe, 1995 ).
()
,
[65].
Apriori "" .

. ,
.
177

DIC, Dynamic Itemset Counting (S. Brin R. Motwani, J. Ullman and S. Tsur, 1997
). ,
" " (start point),
[64].

,

.
Deductor.
, ,
, MS Excel.
MS Excel Deductor, ,
. - .
( - " ")
" (ID)", - "".
MS Excel Deductor . 15.2.
, 140 .

. 15.2. , Deductor MS Excel


178

" ".
,
"ID" "".
, . 15.3,
, ..
. "" ,

. - ,
, .

. 15.3.

:
20% 60% ,
40% 90% .
, ,
.
, , 30% 50%, ,
.
.

.
. 15.4.

179

. 15.4.

, -
, - .

; : " ", "", "
", "-". , .
" ".
- , ,
. ,
, .
, ,
15.3.
.
. , ,
: , , .

54,55 24
180

52,27 23

50,00 22

10

45,45 20

43,18 19

31,82 14

31,82 14

12

22,73 10

11

22,73 10

22,73 10

22,73 10

13 20,45 9
9

20,45 9

""

. ,
"-",
, . ,
, .
15.4. , ,
, 71%
. . , ,
.

, %

22,73 10

71,43

22,73 10

71,43

22,73 10

43,48

22,73 10

52,63

20,45 9

40,91
181

45,45 20

86,96

45,45 20

83,33

22,73 10

52,63

22,73 10

41,67

10

22,73 10

45,45

11

22,73 10

41,67

12

20,45 9

90,00

13

20,45 9

45,00

20,45 9

90,00

20,45 9

47,37

14
15


.
" " "-".
" " ,
: .
, , ,
, . ,
,
.
,
, . ,

.
"-" , ,
.
, "",
"", " ", " ",
. . 15.5.

182

. 15.5. "-"


.
, ,
.
, ,
, ,
.

183

.
", , , ,
." [65]

,
Data Mining,
"" . , ,
Data Mining - .
, .
1987 ACM SIGGRAPH IEEE Computer Society Technical Committee
of Computer Graphics, ,
,
.
,
, , , , , ..

, .
:

;
, ;
;
;
.

Data Mining

Data Mining .
Data Mining.
, , ,
, ,
.

, ,
.
,
, .
: , , .
.
: , .


.
184

(, ()
);
;
;
( , ).

Data Mining

( ), , Data
Mining . ,
"".
, , ,
. Data Mining
, ,
.
, Data
Mining. ,
, " ".
, . , ,
- .
.
- , .
, ""
. , .
,
, .
.
. .
, ..
Data Mining.
,
, .
", ". "" .
,
Data Mining . , , ,
..
, ,
, ,
.

185

,
, .
.
.
.
.
.
.
.
. ,

"" .

.
, ,
: , ,
.

, ,
[22]:

, ;
.

,
.
Data Mining
.

:

(univariate) , 1-D;
(bivariate) , 2-D;
(projection) , 3-D.

,
.
-
:

(,
);
186

;
;
;
, .

, :

(
);
-, - .
, - -
.

.

4 +


.
.
:

;
" ";
.

,
. ,
, . 16.1 [22].
(Alfred
Inselberg ) 1985 .

187

. 16.1.
" "

" "
[66].
"" .16.2.

. 16.2. " "

"". ""

(, , , , ).

.
188

. 16.3 , "
".

. 16.3. " "

, - .
, .
, ,
.


. ,
, ,
. ,
, , .
.
-
-

. , ,

.
,
, , .

.
,
- .
.

189


.
,
, .. .
. :
- , -
.
,
.
.
(,
)
.

, Data Mining,
.
, ,
: , , , ,
..
,
,
.
(Tufte's Principles)
[67] :

, ,
;
.

[65]
:
1.
2.
3.
4.
5.
6.
7.

.
.
.
.
.
.
.

,
,
, (
) .
190

,
, ,
.


.

.
. 16.4 MineSet [26], ,
, Data Mining.
,
( ,
).
-
.

. 16.4. MineSet.

,
, - .
,

, .
(Philip Russom)
[68]:

.
191

.
, .

1. .


( , ...).
, .
,
, .
,
. , , -
,
.
, ,
.
, ,
, -
.
"", "" "".

.
-
OLAP, Text Mining Data Mining,
CRM- .

, ,
, web- (
).
2. .


, .
,
,
,
.
,
.
, - , ,
. (
)
, .

.
(drill down) ,
192

, OLAP, Data Mining


.

.

, ,
"" ,
. OLAP
Data Mining
. Text Mining

,
.

. , ,
,
, .
,

. ,
.
3. , .


.

- , .
, OLAP (
) .

.
, ( ,
, , .),

.
Data Mining
. (
)
. Data Mining
,
,
, .
,
,
, , ..
193

Text Mining. , Text Mining


,
.
,
.
:
o
o

;
,
.

,
, .. Data Mining.
,
, ,

.
:
,
, ,
, ,
,
.

194

Data Mining, OLAP



,
, ,
. Data Mining,
, ,

().
, Data Mining
, ,
[11].

.

70- .
,
.
.
, ,
. ,
,
.
, .
, ,
;
. .

, ,

[71].
- ,
,
.
- " ", "
, ,
( - ), " [72].

195

- " ,
, " [73].
- " ,
,
,
". ,
, [74].
, , , ,
, .
: - ,

[75].
:

(..
);
( );
(
).

,
, [75, 76].
,
,
.
,
.

" ,
" [76].
, ,
:
,
[75].
,
[11, 77]:

;
OLAP;
Data Mining.

196

:
,
.
, ( OLAP)
.
Data Mining ,
,
, .

,
. .
(D.J. Power,
2000). C [75].

, (Data-driven DSS, Data-oriented DSS);


, (Model-driven DSS);
, (Knowledge-driven DSS);
, (Document-driven DSS);
, (Communications-Driven ?
Group DSS);
- - (Inter-Organizational IntraOrganizational DSS);
(Function-Specific
General Purpose DSS);
Web (Web-Based DSS).

, ,
: EIS DSS [75,78].
EIS (Execution Information System) - , .
,
.
, ,
,
. ,
; , ,
.
DSS (Decision Support System).
.
, .
, EIS, ,
,
. ,
197

.
,
.
, .. DSS.
, ..
(ad hoc) .
[79].
[11].
1. (OLTP-).

,
- .
.
2. ( OLAP-).

OLAP , ,
.
.
3. (Data Mining).

EIS DSS
. ,
.
, [80], :

;
;
;
, ;
.

OLAP-

OLAP, (On-Line
Analytical Processing),
(Multidimensional conceptual view).
OLAP (E. F. Codd) 1993 .
,
.
. ,
,
. OLAP-,
, ,
, , ,
198

. OLAP-
.
OLAP-

OLAP-.
: ,
, OLAP-,
. .
OLAP- OLAP [77]:

MOLAP (Multidimensional OLAP);


ROLAP (Relational OLAP);
HOLAP (Hybrid OLAP).

, OLAP-
.

MOLAP,
.
OLAP-.
. , , .
.
" " ,
.
ROLAP-
-.
.
OLAP-.
,
.
, .. HOLAP-,
, . OLAP OLAP-
. .
.

- OLAP-.
OLAP- OLAP- OLAP-.
OLAP-
- .
, . OLAP-
, -
. OLAP-
: MOLAP, ROLAP HOLAP.
OLAP- Microsoft.
OLAP- -. OLAP-
.
199

OLAP-
[81],
. ,
.
? OLAP

.
, SQL-.
17.1
[81]:
17.1.

OLTP


OLAP


OLAP Data Mining


. : OLAP

, Data Mining
.
OLAP Data Mining "" ,
. ,

. N. Raden, " ...
, ,
200

,
" [82].
K. Parsaye [83] "OLAP Data Mining" ( Data Mining)
.

,
.
,
( , , ), ,
.
( )
. J. Han
- "OLAP Mining" .
1. "Cubing then mining".

, .
2. "Mining then cubing". , ,

.
3. "Cubing while mining".

( ,
..).

Data Mining
. , Data Mining, ,
,
.

,
, ..
.
() .
, .
,
,
.
, , ,
. ,
.
,
, , ,
;
201

, ;
..
.
,

. ,
Data Mining.
(Bill Inmon) "
, , ,
, "
" ",
,
[84].
,
, ,
, .
,
, . ,
, ,
.
,
"" : .
.
, , ,
. .
, , ,
- " " [86].
,
.
""
"".


, [88] :

,
- -
,
, .
202

,
-. ,
.
-.
.
,
.

.
.
().


.
.

. ,
, .

.
. ,
, - .
( ,
, , ). ,
,
.
, - .
,

. ,
,
, .
. -,
. -,
. -,
.
.
.
.
, , ,
. .
; ,
-, .

OLAP (On-Line Analytical Processing).
, . OLAP
. , ,
, (,

..). , ,
( ),
203

.
OLAP-.
- OLAP-.
, "" .

,
,
Data Mining. ,


.

204

Data Mining.
Data Mining . ,
, , ,
, , , .
Data Mining .
Data Mining ,
.
Data Mining. :

;
;
;
;
;
;
;
.

Data Mining,
.
1.

- ,
.

, -,
.

.
- ,
.
,
- .
- ,
, , .
. ,
" ?"
, , ,
205

, ..
. ,
, .
.
- .

.
: , SADT IDEF0,
-, -
UML . ,
, , .
Data Mining. ,
, Data Mining.
2.

Data Mining :

;
.


.
. ,
.
.
. . "": ,
, .
,
.
. .
, .
. .
Data Mining ,
. Data
Mining, ,
.
.
3.

: Data Mining.
206

2 .
,
Data Mining.
, , , ,
80% , .
, .
1.

, ..
, Data Mining.
(, ,
); , ,
/ ;
( , ,
.).
2.

,
, , ,
.
.
, , ..
.
Data Mining
, , ,
.
,
, ,
, .
. ,
,
: , , , , .
,
. ,
.

,
.
, ,
/ .
207

/ ,
/.
, ,
.
.
. ,
. ,
-
, . ,
, .
.
()
. ()
().

.
.
3.

, .
, .
,
Data Mining.
. , ,
. ,
Data Mining - .
(Data quality) - , , ,
.
, -
"" .
- , , ,
.
: ,
.
, "
" ,
2005 Business Intelligence
Knightsbridge Solutions. 2005 ,
2005 (Duffie Brunson),
Knightsbridge Solutions, .
208

[90].
.
. ,
,
, -
,
.
.
. , .
, Basel II.
,

. ,
.
, , ""
, , .
. ,
. ,
,
(Extraction, Transformation, Loading - ETL),
, .
.
, - ,
(,
, ).
, .
,
, ,
, ,
, - ..
; ,

,
[91].
[92],
33
.
, :

, ;
, ;
, ;
, .

209

,
.
:

;
;
.

(Missing Values).
, :

(, );
(,
" " ).

.
.
.
.

(Duplicate Data).
, .. .
.

.
. ,
.
?
.
, . ,
, .
.
.
- .
.
,
. - ,
210

.
, .
-
- .
Data Mining ,
.
Data Mining .
, ,
.
. 18.1. , (
).

. 18.1.

, Data Mining
.
.
/ ,
.

(data cleaning, data cleansing scrubbing)


.
-
. (
,
-), .
,
.

.
- - .
211

,
.
[93].
[93].
1. ,
, .
2. ,
,
.
3.
, .
4.

.
5.
,

.

.
, , ,
Data Mining
. ,
,
, .

, [93] (
,
).
1.
2.
3.
4.
5.

.
.
.
.
.

1. .

.
,
.
2. .
,
,
212

.
;
.
, .
/
, , .
ETL ,
.
, , , ,
,
, , ,
. ,
.

, .
3. .

. , ,
, - , -
. ,
, , ,
.
4. .
ETL
,
.
5. .
,
,

.
.
(,
, .).
, ,
.

, ,

. , . 3
,
.
213

(
), .
;
.

Data Mining, ,
,
.
, ,
Data Mining ,
, , , ..
.
,
.

214

Data Mining.


, .
.
(Erhard Ram) (Hong Hai Do)
.
1. .
2. :
o ;
o .
3. ETL.

[93] ,
.
1.
, ,
,
, Data
Mining.
. MIGRATIONARCHITECT (Evoke Software)
.
: , , ,
,
, . MIGRATIONARCHITECT
.
Data Mining. , WIZRULE (WizSoft) DATAMININGSUITE
(Information Discovery) ,
, .
WIZRULE : , if-then
("-") , , - ,
" Edinburgh 52 ; 2
". WIZRULE
.
, , INTEGRITY (Vality),

, .. . INTEGRITY
- , , .
215

,
, ,
. INTEGRITY
(,
, , ) . INTEGRITY
.
,
, .
2.
-
- .
, , ,
.
.
- ,
- ,
. ,
,

.
2.1.
,

. ,
IDCENTRIC (First Logic), PUREINTEGRATE (Oracle), QUICKADDRESS (QAS Systems),
REUNION (Pitney Bowes) TRILLIUM (Trillium Software),
. :
,
, ,
.
, . ,
TRILLIUM () 200000
-.
,
.
2.2.
DATACLEANSER
(EDD), MERGE/PURGELIBRARY (Sagent/QMSoftware), MATCHIT (HelpITSystems)
MASTERMERGE (Pitney Bowes). ,
.
; DATACLEANSER
MERGE/PURGE LIBRARY ,
.
216

3. ETL
ETL
.
ETL API
,
.
ETL
, , COPYMANAGER (Information Builders), DATASTAGE
(Informix/Ardent), EXTRACT (ETI), POWERMART (Informatica), DECISIONBASE
(CA/Platinum), DATATRANSFORMATIONSERVICE (Microsoft), METASUITE
(Minerva/Carleton), SAGENTSOLUTIONPLATFORM (Sagent)
WAREHOUSEADMINISTRATOR (SAS).
, , , ..
.
"" DBMS,
- ODBC EDA.
.

.
,
C/C++
.
, ,
. (, COPYMANAGER,
DECISIONBASE, POWERMART, DATASTAGE, WAREHOUSEADMINISTRATOR),

.
(,
/ ).
ETL ,
API.
,
. ,

(sum, count, min, max, median, variance, deviation,:).

- ( ,
), (, , ,
), , ..
,
,
.
if-then case,
, - , ,
.
.

217

,
soundex. ,
,
,
.
, ,
:

, ;
/ .

, [94], .
. ,
. : Enterprise Integrator Apertus; Integrity Data
Reengineering Tool Validy Technology; Data Quality Administrator
Gladstone Computer Services; Inforefiner Platinium Technology; QDB Analyze
( QDB Solutions) Trillium Software System Hart-Hanks Data
Technologies.
,
, , .
.
, Trillium,
. ,
(, ),
. ,
Apertus Validy, .
, Object Query Language. ,
.
Validy
, ,
. .
/. , ,
. :
Nadis Group 1 Software Postalsoft.
: ,
/. ,
, .
, , ,
.
,
. , Nadis Universal
Name and Address data standard.
Group 1, Code-1 Plus,
. ZIP-
218

. , ,
,
, , ,
.
-

. -
, .
(Rich Olshefski) ,
[95].
. ,
- " "
.
1 , ,
. 1 ,
, .
2 ,
.
2 .
, .
" ". ,
. -
, ,
.

, ,
1 2. 1
, .
2. 2
, , ,
, , - 1.

,
"" .
, .
,
.
1
. - 2,
, , "".
219

, ,
. :

;
;
;
, .

" " 1 2.
?
,
. ,
.
,

.

.
.
, .

.
. ,
1 2. , ,
, -
, , .
. -,
,
. , -
, .
- " ", - .
, .
, ,
.
. , .

" " ,
, .

[96].
. .
,
, ( , ,
, , ). ,
,
. , , ,
- (
220

) .

,
.
. ,
. , "", "." ""
.

.
, .
,
, .
.
.
,
.
. ,
, , ,

.
, -,
.
, .
. ,

.
( )
.

.
,
, ,
,
.
,
Data Mining .
,
.
. ,
, .. , .
, ,
, Data Mining
.
221

, 80% ,
.

222

Data Mining.
Data Mining
, .
Data Mining, :

;
;
;
;
.

"".
""
"".

- , .
-

, [97].
- ,
.
,
. .

, , ,
.
,
.
, .
Data Mining.
Data Mining .
Data Mining , ,
, .
,
.
223

Data Mining
, .
Data Mining
, , ,
. Data Mining
.
.
, , ..
, ,
.

.
,
.
, ,
.
. ,
, , ..
,
.
, ,
, (
).
(, , ,
). . ,

.

.
, , .
,
,
, .
,
,
1.
2.
3.
4.
5.

(, ) ;
;
;
;
; ; , Data Mining;
224

6. () .

.
Data Mining :
.
(predictive) .
, ..
().
, ,
.
() . Data Mining

. (
, , , )
,
.
,
.
- (
) .
, ,
.
,
. :
.
Data Mining ,
.
- ,
.
:

;
;
;
, .

() .
(descriptive)
.
, , , .
225

, ,
;
.
, ,
() "".

.

.
, , , .
, (
).

- ,
, , ..
/,

.
.
.
- , ,
.
- , ,
.
, .

:
Y=f(x1,...,xn),

x1,...,xn - , Y - .
:
Y=f(x1,...,xn,z1,...,zr,w1,...,ws),

x1,...,xn - ,
;
226

z1,...,zr - , ,
;
w1,...,ws - .
Y - .

. ,
,
.
, , ,
,
.
, , ,
. Data Mining
.
4.

Data Mining.
.
, 6
. ,
: 1 (,
, ) 2 (, ,
).
,

: ()
.

( ).
, , : " >20
= "married", "1".
, ,
( ) .
"" "
", (.. ),
, .
, , ,
. , , ,
.
227

Data Mining.
,
. ,
, . Data
Mining ,
. Data Mining
.
Data Mining Group PMML
(Predictive Model Markup Language), ,
Data
Mining. .
Data Mining
,
.

.
, - . :
,
, , ;
. ( )
.
, ,
, ,
, , .
, ,

" ". " ".
, ,
:

;
;
,
. .

, Data Mining .
- ,
.
- .
, .
228

, ,
.
- ,
:

( - );
( -
).


.
t-1. -> t-1-> .
t. -> t -> .
t+1. -> t+1 -> .

.
5.

.
.
.
(adequacy of a model) -
.
,
,
, .
, ,
.
.
.
,
.
.
.
"" ,
, , -
.
. .
,
229

. ,
, .
, ,
,
.
, [98].

.
( ), ,
.

- ,
(, ).

.
, Data
Mining, : , , .
,
.
6.

,
.
,
, .
.
, , -
[77].
,
. " ",

.
, Statistica (Statsoft) [39]
" ", : (,
); ; -.
7.

, .

, Data Mining.
()
(target attribute).
230

8.


Data Mining ,
, ""
.
,
. ,
, (
).
. ,
().
. . ,
, .
, , .. ,
.
:

;
;
;
, ;
(, ,
- , ..).

, , ,
.
.
: " >20 =
"married", "1". -
, , , ,
. :
" >30 = "married", "1".
Data Mining

Data Mining . Data Mining



.
. :
;
; , ,
, ; .

231


.
, . Data
Mining , ,
.
, .. , ""
.
, 18 20 .
, ,
, , , .
, , , .
,
.
.

,
, . ,
, ,
. , ,
.
, , ,
. , , .

, " ,
, ". ,
, ,
,
.
. , , ,
, .
(..
), , , ,
..

Data Mining (,
).
Data Mining, Data
Mining, , , ..
, ,
, ( , " ",
).
232

Data Mining ,
.
.
Data Mining
, .
Data Mining ,
. , ..
. ,
.
, Data Mining .
, ,
.

233

Data Mining.
Data Mining
, - .
, , ,
. ,
- ,
.
, :
" ,
?". "", , ,
. , , ,
, ,
. Data
Mining.
Data Mining ,
-
.
Data Mining , ,
,
Data Mining.

Data Mining, ,
: " ?"
Data Mining, ,
.
.
(flow of Data) Data Mining [17],
..
. -
.
, Data Mining:
.
,
,
, ;
.
, [99].
234


.
-
. , ,
.

, , ,
. ,
Data Mining.
. Data Mining .
Data
Mining, .
, .
,
.
Data Mining
.
. Data Mining

Data Mining -
, Data Mining.
, Data Mining, ,
. 21.1: ,
, .

. 21.1. Data Mining

.
(Domain experts) - ,
, , , , , , ..
.
235

, ,
, , ,
, . , .
(Database administrator) - , ,
,
.
,
, , .
:
; ;
; ; ;
; ;
.
(Mining specialists) - ,
, , .
Data Mining
.

.
Data Mining
,
.
.
Data Mining- ,
..

. Data Mining
. ,
, - .

, ,
. Data Mining
, ,
( - - ),
.
, , , Data Mining
.
.
:
236

(Project Manager);
IT (IT Architect);
(Solution Architect);
(Data Architect);
(Data Modeler);
Data Mining (Data Mining Expert);
(Business Analyst).


. , ..
, (outsourcing).
,
.
Data Mining .
Data Mining, ,
:

( );
( );
( Data Mining-
);
( );
- ( , data
mining);
( );
.

KDnuggets, -
, Data Mining
(34%), (19%), , -,
.
Data Mining ,
, , ,
-.
, Data Mining ,
.
Data Mining ,
.
Data Mining, ,
.

( , Data Mining ).
- ,
. ,
237

, ,
. Data Mining
.

, Data Mining.

.
Data Mining
.
Data Mining
, ,
. Data Mining
.
, ,
.
Data Mining ,
, .
CRISP-DM

Data Mining :
, Data Mining.
- , Data Mining.
Data Mining Data Mining.
CRISP-DM [100] (The Cross Industrie Standard Process for Data Mining -
Data Mining)
. CRISP-DM NCR, SPSS DimlerChrysler.
CRISP, Data Mining
.
Data Mining CRISP-DM :
1.
2.
3.
4.
5.
6.

(Business understanding).
(Data understanding).
(Data preparation).
(Modeling).
(Evaluation).
(Deployment).

- , .
Data Mining CRISP-DM . 21.2.
238

. 21.2. , CRISP-DM

CRISP-DM Data Mining -,


Data Mining .
CRISP-DM, Data Mining,
,
Data Mining.
CRISP-DM
[101], ,
( ): , ,
.
Data Mining
, .
,
() , Data
Mining-. , ..
, .
,
Data Mining.
CRISP-DM - , Data Mining.
, ,
, Two Crows, SEMMA,
.
239

SEMMA

SEMMA SAS Data Mining Solution (SAS) [102].


Sample (" ", .. ), Explore
(" "), Modify (" "), Model
(" "), Assess ("
"). Data Mining
SEMMA . 21.3.

. 21.3. Data Mining SEMMA

SEMMA ,
,
. SEMMA
,
. , SEMMA
,
,
, ,
.
- .
SEMMA
, , .
KDnuggets (2004 .), 42%
CRISP-DM, 10% - SEMMA, 6% -
, 28% - ,
6% . 7% .
240

Data Mining

, Data Mining, ..
Data Mining.
, , -
Data Mining,
.
:
1. ,
Data Mining.
2. , .
PMML

PMML (Predictive Modeling markup Language) - ( )


.
PMML Data Mining.
IT- DMG (Data Mining
Group). DMG [103] - , ,
.
- XML. ,
XML, .
PMML Data Mining
.
PMML -
.
PMML-
PMML-. , ,
,
.
PMML, " Data Mining ",
Data Mining.
.
PMML
.
PMML :

( );
( );
(, );
, .
241

PMML
, , , , ,
, .
,


Data Mining. ,
, ,
SQL.
,
, : CWM Data Mining, JDM.
2000 MDC (MetaData Coalition, www.mdcinfo.com) OMG (Object
Management Group, www.omg.org), -
- OIM (Open Information Model)
CWM (Common Warehouse Metamodel) -
OMG. CWM
, , XML,
, OLAP, ,
.
JDM (The Java Data Mining standard - Java Specification Request 73, JSR-73). ,
JSR 73, Java Data Mining API (JDM) -
Java API ( )
Data Mining Java-.
SQL,
Data Mining,
. :
SQL/MM, OLE DB for Data Mining.
SQL/MM SQL
Data Mining.
The OLE DB for Data Mining standard of Microsoft. ,
SQL/MM, Data Mining .
OLE DB.
, Data Mining,
:

, Data Mining ( ,
, , ,
);
web- (SOAP/XML, WSRF, .), Grid- (OGSA, OGSA/DAI, ..),
Web (RDF, OWL, ..);

242

, :
, ,
(real time) Data Mining, (data webs).

, Data Mining , ,
, .
"" Data Mining .

243

Data Mining
Data Mining
, . .
, ,
, .
Data Mining (Enterprise Data
Mining Buying Guide) Aberdeen Group: "Data Mining -
.
, ,
Data
Mining ".
Data Mining,
:

Data Mining;
Data Mining,
;
Data Mining- ;
Data Mining- ;
, ,
,
Data Mining.

, ,
, Data Mining.
Data Mining

90- Data Mining


. 90- , ,
, 50 .
, Data Mining,
. ,
Data Mining, Data Mining, OLAP
. Data Mining
BI-,
, (ad-hoc query),
(reporting), OLAP.
Data Mining
, .
Data Mining ,
.
244

.
, , ,
,
Data Mining. SPSS (SPSS, Clementine),
Statistica (StatSoft), SAS Institute (SAS Enterprise Miner). OLAP Data Mining, ,
Cognos. , Data Mining :
Microsoft (Microsoft SQL Server), Oracle, IBM (IBM Intelligent Miner for Data).
Data Mining .
- .
" Data Mining,
", 2005 Kdnuggets.
. 22.1.

245

. 22.1. Data Mining, 2005

2002 2003 , ,
, - .
, . ,
: 2003 , 2002 ,
Weka Prudsys Xelopes R, 2005
Weka , Xelopes
246

.
: Microsoft SQL Data Mining 2003
, 2002 , , 2005 - .
,
.
, ,
. , ,
. ,
,
.
,
,
, ,
,
( ). ,
,
Data Mining.
, ,
, -
,
.
, 2005 , :

: (US $10000 )

Fair Isaac, IBM, Insightful, KXEN, Oracle, SAS, SPSS.

: ( $1000 $9999)

Angoss, CART/MARS/TreeNet/Random Forests, Equbits, GhostMiner, Gornik, Mineset,


MATLAB, Megaputer, Microsoft SQL Server, Statsoft Statistica, ThinkAnalytics.

: ( $1 $999): Excel, See5.


: C4.5, R, Weka, Xelopes.

Data Mining .
Data Mining
. : .
, .
1. .

247

- ,
, ,
.
, .. ,
, , , .
, .

, " -"
.

(,
).
2. / .
Data Mining-
, .
, , . Data Mining
() .
()
. :
txt, dbf, xls, csv .

( )
.
3.
,
,
.
4.
5. Data Mining-
6. .
,
Data Mining.
7. .

(Wizard).

248

8. , ,
,
.
9.
.
10. .

.
11. , .
- ,

. ,
. ,
.
,
, .
12. .
Data Mining ,
. ()
(
),
, .
13. .
14. (
), .
15. , , .
, .
(),
.

Data Mining .
, Data Mining.
.
, ,
, ,
. -
.
16. , . Data Mining
,
.
249

17. , ,
: PC Standalone (95/98/2000/NT), Unix Server, Unix Standalone, PC Client, NT
Server.
, ,
Data Mining.
, , .
, , ,
,
. , , Data Mining
,
, .
Data Mining

Data Mining
- .
Data Mining KDnuggets:
; .
:

;
;
;
;
(Text Mining), (Information Retrieval (IR));
.

. ,
, .
:

Clementine (http://www.spss.com/clementine). Data Mining Clementine


-, .
Clementine Data Mining: , ,
, . Clementine Data Mining
CRISP-DM.
DBMiner 2.0 Enterprise (http://www.dbminer.com),
; Microsoft SQL 7.0 Plato.
IBM Intelligent Miner for Data (http://www.ibm.com/software/data/iminer/fordata/).
Data Mining-, Data Mining
: . XML
PMML.
KXEN (Knowledge eXtraction ENgines). ,
(Vapnik) SVM. , , SVM.
Oracle Data Mining (ODM) (http://otn.oracle.com/products/bi/9idmining.html).
GUI, PL/SQL-, Java-. :
250

, ,
, SVM .
Polyanalyst (http://www.megaputer.com/). , Data
Mining. , , ,
, . OLE DB for Data Mining DCOM-.
SAS Enterprise Miner (http://www.sas.com/). ,
GUI. SEMMA.
SPSS (http://www.spss.com/clementine/). ,
Data Mining.
Statistica Data Miner (http://www.StatSoft.com/). ,
,
, , .

, Polyanalyst,
Deductor,
. Deductor .
Weka (http://www.cs.waikato.ac.nz/ml/weka/index.html). Weka
Data Mining-.
Weka Java .
, :

;
;
, ;
;
;
BI (Business Intelligence), Database and OLAP software;
;
,
Data Mining;
Web Mining: , XML mining;
Web;
Audio and Video Mining.

.
Data Mining , Data Mining. Two Crows.
Data Mining

Azmy SuperQuery (http://www.azmy.com/), ;


Clementine, SPSS, ;
IBM Intelligent Miner for Data (http://www.software.ibm.com/data/intelli-mine/);
IREX (http://www.giwebb.com),
, , ;
251

The LPA Data Mining Toolkit (http://www.lpa.co.uk/dtm.htm)


.
Magnum Opus (http://www.rulequest.com/MagnumOpus-info.html)
,
Windows, Linux Solaris;
Nuggets (http://www.data-mine.com/) - ,
;
Megaputer Polyanalyst Suite (http://www.megaputer.com/),
;
Purple Insight MineSet Data Mining,
;
Wizsoft WizRule:
; WizWhy: Data Mining;
Xpertrule Miner 4.0 (http://www.attar.com/);
XAffinity(TM), .

Apriori, priori;
Apriori, FP-growth, Eclat and DIC implementations (http://www.adrem.ua.ac.be/) by Bart
Goethals;
ARtool (http://www.cs.umb.edu/),
(binary databases);
DM-II system (http://www.comp.nus.edu.sg/), CBA

;
FIMI, Frequent Itemset Mining Implementations (http://fimi.cs.helsinki.fi/) -
, .

ClustanGraphics3, (http://www.clustan.com/) "


", , www.clustan.com;
CViz Cluster Visualization, (http://www.alphaworks.ibm.com/tech/cviz)-
,
;
IBM Intelligent Miner for Data, (http://www-4.ibm.com/software/data/iminer/),
;
Neusciences aXi.Kohonen, (http://www.neusciences.com/), ActiveX Control
, Delphi-;
PolyAnalyst, (http://www.megaputer.com/), ,
(Localization of Anomalies, LA);
StarProbe, (http://www.roselladb.com/starprobe.htm) Web -
, , , ,
..;
Visipoint (http://www.visipoint.fi/).
(Self-Organizing Map clustering) .

:
252

Autoclass C (http://ic.arc.nasa.gov/projects/bayes-group/autoclass/autoclass-c-program.html,
http://ic.arc.nasa.gov), " " NASA,
- Unix Windows;
CLUTO (http://www.cs.umn.edu/~karypis/cluto, http://www.cs.umn.edu/~karypis/cluto).
,
;
Databionic ESOM Tools (http://databionic-esom.sourceforge.net/).
, ,
ESOM - ;
MCLUST/EMCLUST (http://www.stat.washington.edu/fraley/mclust_home.html).
(modelbased) , .
- S-PLUS;
PermutMatrix (http://www.lirmm.fr/). ,
,
;
PROXIMUS (http://www.cs.purdue.edu/homes/koyuturk/proximus/).
, ;
ReCkless (http://cde.iiit.net/RNNs/) ,
k- .

;
Snob (http://www.csse.monash.edu.au/), MML
(Minimum Message Length - );
SOM in Excel (http://www.geocities.com/adotsaha/NN/SOMinExcel.html),
Microsoft Excel Angshuman Saha.

,
, ,
.
.
.

.
, 2
. ,
, : , , ,
, .
,
.
Data Mining


Alyuda Forecaster XL (http://www.alyuda.com/forecasting-tool-for-excel.htm).
Excel-
.
253

- - Excel-
ExcelNeuralPackage (http://www.neurok.ru/demo/enp/demo_enp.htm).
-
. free-
.

, Data Mining
, .
.

. Data Mining-
Business Intelligence, ,
, .



Data Mining.
, , Business Intelligence,
Data Mining, ,
,
.

254

Data Mining. SAS Enterprise Miner


SAS Enterprise Miner ( SAS Institute Inc., [102]) -
SAS,
, .
SAS,
Enterprise Miner ,
Data Mining (SEMMA)
. SAS Enterprise Miner
SAS Warehouse Administrator,
,
SAS. Data Mining ,
-.
SAS Enterprise Miner. SAS Enterprise Miner
Data Mining ,
[104].


- ,
. SAS Enterprise Miner
, ,
, ,
, ,
.
SAS Enterprise Miner

, - .

SAS Enterprise Miner 5.1 Data Mining


. ,
,
.
,

.
(GUI)

SAS Enterprise Miner ,



Data Mining SEMMA.
SAS Enterprise
255

Miner ,
,
, .
.
"
". Data
Mining ,
,
, .
SAS Enterprise Miner . 23.1.

. 23.1. SAS Enterprise Miner


SAS Enterprise Miner 5.1


Java- / SAS-,
, , .
-
.
,
,
. ,
, ,
, ,
.

256

Enterprise Miner .

,
.
,

SAS Enterprise Miner


, , ,
, , ,
, ,
SAS SAS code,
.
, ,
,
,
.
- SAS Enteprise Miner 5.1
Java
.
Java-.

.

SAS Enterprise Miner


, ,
, , ,
(memorybased reasoning),
, , , .
Enterprise Miner
,
, .

, , ,
.
,
.

SAS Enterprise Miner ,



, .

,
. Enterprise
257

Miner Repository. ,
, ,
.
.
.
XML-,
. SAS Enterprise Miner
,
, ,
, .
(SAS Metadata Server),

. Web-
.


- ,
. ,
, .
SAS Enterprise Miner
,
SAS, C, Java
PMML. (
) SAS, Web
.
.
,
Enterprise Miner, -
,
Web
.

Enterprise Miner
,
SAS.
, SAS Enterprise Miner 5.1,
, SAS XML-.
, Java API,
Enterprise Miner
. ,
,
, , OLAP-
.

258

,
.
Enterprise Miner SAS, ,
SAS ETL Studio, OLAP,
, SAS Text Miner. SAS
- .
,

SAS Enterprise Miner Web-


,
.
SAS Enterprise Miner Windows,
UNIX~. .
SAS Enterprise Miner 5.1

, :

.
Web-.
SAS.
XML.

.

, .
SAS macro.

Java API.
Web-:

.
, ..
, -,
.

- .
( ).
-
.
259

.
- .


50 .
SAS ETL Studio SAS Metadata Server:

SAS ETL Studio ,


Enterprise Miner.
SAS ETL Studio -
Enterprise Miner.

.
.
.
.
.
N .
.

, .
.
.
.

: , , , ,
, .
: bucketed ( ), ,
.
: ,
, .

,
.
, n .

.
.
.
.
260

M-.
.

n, , , , ,
, .
, , , ,
.
.
.

.
-
n .
.

logworth-.
:

"" , -
.
,
.
/
.

.
/
, : ,
, , , ,
.
Java- :

.
WHERE.
.
.
, Enterprise Miner,
.


.
261


.
,
GIF TIF.

- k .
.
.
,
.
,
.
PMML.


- :

, .
, ,
.

.
.
.
.


.

.
PMML.
Web-

-
- .

.

, ,
- R2.
262

.
.
.
.
.


.
: , ,
.

.


.
, ,
, .

.

.
SAS Code Node

SAS
.
SAS.
.
Enterprise Miner.
, ,
..
.

,
, : , AIC, SBC,
, , ROC, , KS
(-).
, ,
.
.

.

.
263

, .
: , ,
.
.
.
: , , , ,
.
PMML.

CHAID ( -).
.
C 4.5.

.

: -,
F-, , , .

.
.
.
.
:

.
,
.
13 ,
.
.

- ARBORETUM.

:

.
10 .
264

.
.
.


.
.
PMML.

(DM Neural node):

.
.
.


.
, .
.
.

k- .
.

.
: , , .


.
.
.
.
ROC.
.
().
.
.

.
SAS, C, Java PMML.
265

, ,
SAS, C Java.
.

.
.
, , ,
.


Data Mining . ,
,
.
SAS -

SAS -
SAS Intelligent Warehousing solutions, . 23.2.

. 23.2. SAS Intelligent Warehousing solutions

ERP/OLTP-,

ERP/OLTP- ( SAS/ACCESS).

(SAS Data Quality-Cleanse).
(SAS/Warehouse
Administrator).
(SAS Scalable Performance
Data Server).
:
266

OLAP- (SAS OLAP Server),


(SAS/ETS),
(SAS/OR),
(SAS/IML),
(SAS/STAT),
(SAS Enterprise
Miner).
(SAS/Enterprise Guide,
SAS/EIS, SAS/InterNet, AppDevStudio),
(SAS/
Rapid Result) , ,

( ) Web- (Web-).
Web- SAS SAS Information Delivery
Portal.
o
o
o
o
o
o

SASR Enterprise Miner

Microsoft Windows (32-)


Windows NT 4 Workstation, Windows 2000 Professional, Windows XP Professional, AIX (64) 5.1, HPUX (64-) 11 i (11.11), Solaris 8 9 (64)
Microsoft Windows (32-, 64-)
Windows NT 4 Server 4.0, Windows 2000, Windows Server 2003, AIX (64-)
5.1.
HPUX (64-), 11 i (11.11), Linux Intel (32-)
Red Hat Linux 8.0, Red Hat Advanced Server 2.1, SuSE Linux Enterprise Server 8 Solaris 8
9 (64-), Tru64 UNIX (64-) Version 5.1A 5.1 B.
1 .
: 512 , 512 M .
: 40 M 3 (
Win XP . SAS).

SAS, SAS/STAT, Web Java 1.4.1, (
SAS JRE 1.4.1),
,
.

267

Data Mining. PolyAnalyst


. PolyAnalyst

. PolyAnalyst
,
, ,
, ,
. PolyAnalyst - Megaputer Intelligence
"" [105].

PolyAnalyst - .
PolyAnalyst Workplace.
- PolyAnalyst Knowledge Server.
:
.
PolyAnalyst ++ Microsoft's COM
(ActiveX).
. PolyAnalyst .
24.1.

. 24.1. PolyAnalyst
268

(Exploration Engines) PolyAnalyst


.
PolyAnalyst , ,
CRM- ERP- .
PolyAnalyst Workplace -

Workplace - , . Workplace
,
. 24.2.

. 24.2. PolyAnalyst

:
,
, , ,
, drop-down pop-up ,
.
Data Mining PolyAnalyst "".
, , , , ..
.
HTML .
PolyAnalyst

PolyAnalyst 4.6 18 ,
Data Text Mining. Know-How
.

,
,
,
269

,
.

PolyAnalyst.

Find Laws (FL) -


FL - .
( )
,
. FL

. ,
, ,
"".
PolyNet Predictor (PN) -
,
.
.
,
. ,
, , - ,
.
,
.
Stepwise Linear Regression (LR) -

, ,
. ,
PolyAnalyst , :

. ,
,
.

.
Memory based Reasoning (MR) - " "
PolyAnalyst "
".
.
" " PolyAnalyst
270


. MR
, (string data
type), .

Find Dependencies (FD) - N-


,

, ()
, , .
FD ,
, , .
.
FL, PN, LR,
, , ,
. FD , ,
PolyAnalyst, .
Find Clusters (FC) - N-
,
(),
. FC ,
.
( ), ,
, .

"" - , . ,
-
. ,
, ,
, :
, N , (2N-1)4.

PolyAnalyst ,
..
.
Classify (CL) -
CL .

. 0
1. ,
271

"1", , "0" .
.
Discriminate (DS) -
CL. ,
, ,
, , ,
. CL,
, ,
.
Decision Tree (DT) -
PolyAnalyst ,
(information gain).
, ( )
.
.
DT PolyAnalyst.
Decision Forest (DF) -
,
, .
PolyAnalyst , (decision
forest). -
. ,
, ,
.

Market Basket Analysis (BA) - " "


,
. .
, ,
,
.. BA ,
- ( , ), 0 1,
().

. , :
"", - "" "".
PolyAnalyst
.
Transactional Basket Analysis (TB) - ""

272

Transactional Basket Analysis - BA,


, . ,
, (
).
"" - X-SellAnalyst, on-line
-.

PolyAnalyst Data Mining


- Text Mining.
. 24.3.

. 24.3.

Text Analysis () -
Text Analysis
.
, / ,
( "-")
.
Data Mining, PolyAnalyst. ,

.
Text Categorizer (TC) -

273



.

.
Link Terms (LT) -
,
, .
, .
PolyAnalyst :
1. , .
2. , ,
.

-
.
, .
Text OLAP ( ) Taxonomies () -
. Text OLAP
(), . : "[] []
([] [] [])". PolyAnalyst

.

, .
.
Text OLAP,
, .
.
,
.

. :
, , (,
(Link Analysis), ,
, )
.
Data Mining Text Mining
.

PolyAnalyst
.
274

: , , -
.
Data Mining

.
. , , -
Lift, Gain charts,
. ,
Data Mining:
.
Link Analysis (LA) -
Link Analysis

,
.
Symbolic Rule Language (SRL) -
SRL - PolyAnalyst,
Data Mining
, . SRL
,
, , ,
. SRL
.


Data Mining.


.

, .

, .
,
,
" " (GT-search). ""
, ,
.
,
. ,
, ,
.
275

""
PolyAnalyst -
(Symbolic Rule Language), : ,
.
, ,
.
,
.
PolyAnalyst


PolyAnalyst . : , (yes/no),
, , ,
.

PolyAnalyst . :
"" (.csv), Microsoft Excel 97/2000, ODBC , SAS data files, Oracle Express, IBM Visual Warehouse.
OLE DB for Data Mining
4.6 PolyAnalyst Microsoft OLE DB for Data Mining
(Version 1.0).
(LR, FD, CL, FC, DT, DF, FL,PN, BA, TB) "Mining
Models" (MM).
OLE DB ADO
, ADO COM-.
SQL- ( SQL for DM). Mining
Models PMML.
"PolyAnalyst DataMining Provider" Microsoft Analysis Services(
SQL Server 2000).
In-place Data Mining
PolyAnalyst OLE DB
PA.
PolyAnalyst SQL-
.
. . 24.4.

276

. 24.4. In-place Data Mining

PolyAnalyst Scheduler -
PolyAnalyst .
,
,
.
. Scheduler
.
24.1 PolyAnalyst6:
.
24.1. PolyAnalyst

PolyAnalyst 4.6,

: FL, FD, PN, FC, BA, , MB, CL, DS, DT, DF,
LR, LA, TA, TC, LT, SS. , OLE DB.
- MS Windows NT/2000/XP

PolyAnalyst 3.5 Professional


(.)

: FL, FD, PN, FC, CL, DS, LR, SS.


- MS Windows NT/2000/XP

PolyAnalyst 3.5 Power (.)

: FD, PN, FC, CL, DS, LR, SS. MS Windows 98/NT/2000/XP

PolyAnalyst 3.5 Lite (.)

: FD, FC, CL, DS, LR, SS. - MS


Windows 98/NT/2000/XP

277

PolyAnalyst Knowledge Server 4.6, : FL, FD, PN, FC, BA, , MB, CL, DS, DT, DF,

LR, LA, TA, TC, LT, SS. , OLE DB, InPlace Data Mining. - MS Windows NT/2000/XP
server, - MS Windows 98/NT/2000/XP.
/

PolyAnalyst COM - SDK

Data Mining

COM-, ,

WebAnalyst

PolyAnalyst TextAnalyst,
(Data Mining Text Mining),
- WebAnalyst.
WebAnalyst - ,

web- e-business.
WebAnalyst ,
, ,
. ,
WebAnalyst
.

,
(HTTP), - web-.
.

( WebAnalyst),
.

WebAnalyst Data Text Mining


PolyAnalyst TextAnalyst, .
WebAnalyst [106]:

Web-;
;

;
;
;

.

278

"" WebAnalyst :
Web-; ;
, ; Web-;
.

279

Data Mining. Cognos


STATISTICA Data Miner
Cognos ( - Cognos [107]) -
( . Business Intelligence Tools),
BI-. Cognos
. 25.1 [108].

. 25.1. Cognos

Cognos,
, .
1. .
.
:
o Decision Stream - (data marts),
;
o Impromptu - ,
;
o PowerPlay - ;
o Impromptu Web Reports -
Web;
o Cognos Query - ,
.. Web;
280

Visualizer - .
.
.
, , ,
.
-
( drill through):
o PowerPlay - (OLAP) -;
o Impromptu -
( Windows);
o Impromptu Web Reports -
( Web);
o Visualizer - .
.
.
,
.
,
()
.
:
o Visualizer -

;
o PowerPlay ;
o Impromptu ;
o Cognos Query - Web- .
(data mining).
,
,
, :
o Scenario - ;
o 4Thought - ;
o Visualazer .
.
, Access Manager
Cognos.
Access Manager,
.
;
.
Cognos BI , Cognos
Architect.
-.

Cognos.
o

2.

3.

4.

5.

6.

281

Cognos 4Thought

Cognos 4Thought (. 25.2)


, ,
Cognos.

. 25.2. Cognos 4Thought

Cognos 4Thought
.
.
Cognos 4Thought . 4Thought
,
, .
. 25.3 Cognos 4Thought
, 4Thought.

282

. 25.3. Impromptu, PowerPlay, Scenario 4Thought

Impromptu, PowerPlay, Scenario 4Thought


,

-,
.
Cognos PowerPlay -
OLAP-.
,
.
,

, OLAP.
, :
Windows.
PowerPlay
, ,
(Databases), (Data Warehouses), (Data Marts)
(Spreadsheets).
PowerPlay 4Thought.
.mdc.

.
OLAP- Cognos Data Mining
(4Thought Scenario), Cognos
OLAP Data Mining.
283

Cognos Impromptu - Cognos


,
. -
,
.
Impromptu ,
. Impromptu

, . ,
,
.
Impromptu
Cognos 4Thought.
Cognos Scenario - ()
(Data Mining), (
) "
" .
Scenario ,
,
.
.
, ,
. , Scenario
, , .
Scenario ( )
4Thought .
Cognos 4Thought ,
,
.
, , ,
.
4Thought :
1. . ,
, MS Excel.
Cognos (Impromptu, ReportNet, PowerPlay Scenario) .
4Thought ,
;
2. . 4Thought,
Impromptu,
( ),
(, , -
, ,
-
284

, ..,
). Impromptu
4Thought.

4Thought
( ,
),
. : , ,
, .
3. .
, . ,
4Thought (
, ).
4. . 4Thought ,
;
, ( ),
..
5. . 4Thought
.
, ,
.
6. .
, .

4Thought
.
-
() :
.
(
),
.
Cognos 4Thought , ,
, : " ,
?"
, , ,
,
.

.
Cognos 4Thought ( )
,
. ,
.
4Thought .
,
. 4Thought
285

,
, .
Cognos (. 25.3)
-
( ).
PowerPlay Transformation
Server.
( )
Access Manager, PowerPlay
Transformation Server.
PowerPlay Impromptu ,
, ,
, 4Thought Scenario -
- , .
.
Cognos .
/-
Upfront, Cognos PowerPlay Enterprise Server.
STATISTICA Data Miner

. STATISTICA Data Miner ( - StatSoft [109])



- ,
- [110, 111].
STATISTICA :

;
, MS Office;
;
;
;
;
;
COM-,
( Visual Basic
( ), Java, C/C++).

STATISTICA Data Miner Data Mining (. 25.4),


300 ,
Data Mining,
, .

286

. 25.4. Data Mining

STATISTICA Data Miner (.


25.5):

. 25.5. STATISTICA Data Miner


1. Data Acquisition - .
, .
2. Data Preparation, Cleaning, Transformation - , .
, , ..
3. Data Analysis, Modeling, Classification, Forecasting - , ,
, .
, ,
, ..
287

4. Reports - . ,
(, , ).
STATISTICA Data Miner

STATISTICA Data Miner :


1. General Slicer/Dicer and Drill-Down Explorer - / .
, , ,
, ..
2. General Classifier - . STATISTICA Data Miner
: , ,
, ..
3. General Modeler/Multivariate Explorer - ,
. , ,
.
4. General Forecaster - . ,
, , ,
, ..
5. General Neural Networks Explorer - .
.

StatSoft.
, STATISTICA Data Miner Data
Mining, Data Mining:

Feature Selection and Variable Filtering (for very large data sets) -
( ).

. ,
.
Association Rules - .
. ,
: "", 95
100 "B" "".
Interactive Drill-Down Explorer - .
.
,
.
Generalized EM & k-Means Cluster Analysis -
. -
.
, ,
.
Generalized Additive Models (GAM) - (GAM).
, Hastie Tibshirani.
General Classification and Regression Trees (GTrees) -
(GTrees). ,
Breiman, Friedman, Olshen Stone (1984). ,
,
..
.
288

General CHAID (Chi-square Automatic Interaction Detection) Models - CHAID (- ).


,
.
Interactive Classification and Regression Trees -
.
, STATISTICA Data Miner
.
Boosted Trees - .
, "" ,

,
.
() .
Multivariate Adaptive Regression Splines (Mar Splines) -
(Mar Splines).
Friedman (1991; Multivariate Adaptive Regression Splines, Annals of Statistics,
19, 1-141); STATISTICA Data Miner MARSPLINES ,

.

- ,
,
. , . ,
, .
- .
- - ,

.
,
. , " ",

.
-
.
-
,
(, , ..)
() . ,
(.. ,
) ,

.

Goodness of Fit Computations - .


,
.
Rapid Deployment of Predictive Models - (
).
289

.
.

, STATISTICA
, ,
. StatSoft
,
.
, ,
() ,
: , ,
..
Data Miner.
1. Data Miner " " "" (.
25.6). " - " " -
", STATISTICA Data Mining.

. 25.6. " "


290

2. Boston2.sta STATISTICA.
.
- Low, - Medium
- High Price.
- Cat1 12 - Ord1-Ord12.
, 1012 , Boston2.sta.
. 25.7.

. 25.7.

3. "
", . 25.8.

291

. 25.8.

( )
( ), , .
OK.
4. " " (

Data Miner). , . 25.9,


.

. 25.9. " "

.
260 , .
292

,
, .
.
,
"".
.
, , Descriptive Statistics Standard Classification Trees with Deployment
(C And RT) . Data Miner .

. 25.10. Data Miner

Data Miner
. / .
5. . ,
, .

293

. 25.11. Data Miner

( ).
.
STATISTICA.
6. , .
, STATISTICA Data Miner
,
, .

, .

294

Oracle Data Mining Deductor


Oracle Data Mining

1998 Oracle [112] 7 Data Mining. Oracle8i


Data mining. 1999 Oracle Darwin
(Thinking Machines Corp.). 2000-2001 Darwin, Oracle Data
Mining Suite. 2001 Oracle9i Data Mining.
Oracle Data Mining Oracle Enterprise Edition ( Oracle
Database 10g). Oracle Data Mining (ODM)
, , Data Mining.
Personal Edition, Standard Edition, OneStandard Edition .
ODM ,
, , ,
, [113].
,
, .
ODM ,
, , .
,
, SAS.
,
, .

, , .
ODM .
,
, .

, C++, Java,
ODM Software
Development Kit (SDK).
ODM :
-, ,
.
Oracle Data Mining [114]:

Oracle Database (DataMining Server).


DM- .
API .

295


, .
.
Oracle Data Mining API. Java API Java
JDM ( Data Mining).
Data Mining 10g ,
26.1.
26.1. , Oracle Data Mining

Na_ve Bayes, Adaptive Bayes Network

Support Vector Machine


Minimal Descriptor Length

Enhanced K-means, O-cluster

Apriory Algorithm

Non-Negative Matrix Factorization

, Oracle Data Mining, ,



. ,
ODM , ,
.
Java API PL/SQL API, ODM
Client, , ,
, .
Oracle Data Mining -

- Oracle Data Mining .


:

;
;
.

;
;
.

296

Naive Bayes (NB):

, ABN ( ).
< 200.
, ABN.

Adaptive Bayes Network (ABN):

.
( ).
, NB.
.

Support Vector Machine.

.
. Support Vector Machine.

- ,
. .
- Minimum Descriptor Length (MDL).

Enhanced k-means Clustering


.
, .
.
O-Cluster
, , .
, .
, .. 10, , 1000.

297

Deductor

Deductor ( -
BaseGroup Labs [115]). Deductor :
Deductor Studio Deductor Warehouse [48] .
Deductor . 26.1.

. 26.1. Deductor

Deductor Warehouse - ,
.
,

. Deductor Warehouse ,
.
Deductor Studio - ,
. , ,
. Deductor Studio ,
,
.

Deductor Studio . . 26.2


.

298

. 26.2. Deductor Studio

.
-
. Deductor Warehouse
. , :

;
Microsoft Excel;
Microsoft Access;
Dbase;
CSV-;
ADO- - ODBC- (Oracle, MS
SQL, Sybase ).

, - ,
.
, ,
.

. , ,
( ), .
. ,
, .
299

,
.
.
,
. ,
.
- - ,
.
Deductor Studio

Deductor Studio
:

;
;
;
.

. 26.3 Deductor Studio.


.
.

. 26.3. Deductor Studio


300

, , ,
. , ,
,
, , . :

Deductor Warehouse ;
Microsoft Excel;
Microsoft Word;
HTML;
XML;
Dbase;
Windows;
.

OLAP- (-, -);


;
, ;
;
"-";
;
- ;
.

.
, " ",
. ,
, , .
, ,
.
.
. 26.4.

301

. 26.4. Deductor Studio


Deductor Warehouse

Deductor Warehouse - ,
.
"",
, "" . .
26.5.

. 26.5. ""

.
"" .

302

Deductor Warehouse ,
.
Deductor Warehouse ? -
, ,
.
,
. ,
.
, "" "",
,
.
Deductor Warehouse , ..
.
, ,
. Deductor
Studio.

,
.
. ,
, , , .
. , ,
, , .
Data Mining. , ,
, .
.
. 26.6 , ,
.

303

. 26.6. , Deductor

1.


,
, , , .
, . ,
""
.

.

- .
-
.

, .

( ). ,
, ,
.
304


.
, ,
.

.
-
. .

,
,
. ,
.
. , ,
.
- -. ,
. ""
: , ""
.
( 7-9) ,
("" ).
-
"" .
: ,
"", - ""
.

:
, .
, ..
:
1. ;
2. .


, .
,
() () .
- ""
"", "" "".
2.
305

, ,
.
, . ,
( ,
).



, , ,
. ,
, , .
: ( ,
),
, , .
.

. ,
0 10, 10 0
1, 1 2 .. 0 , 1 - , 9
10 - .

, , (,
, ) .
, , ,
.
"" .


, , .
, 0 - "", 1 - "", 2 - "". "" - "", "" "", "" - "", "" - "".
.
,
. ,
, " ",
( ).
" "
, ,
,
.
( -
, - "" ).
306

,
, ,
, .


(, , , , ). ,
,
, .
, , , - ,
, , , .

- .
.
, .
Deductor Studio , , "". -,
- .

- .
.
- ,
.
, ,
.

"" , ,
- . , ,
,
.
, , ,
. Deductor Studio ,
/
.

,
, .. .
, .
,
( ,
).
307

, .
0 1.

,
() .

.
. ( )
,
.
3. Data Mining
Data Mining Deductor :

;
;
;
;
;
;
.

,
Deductor .

308

KXEN

Data Mining. KXEN,
- [116],
1998 . KXEN "Knowledge eXtraction
Engines" - "" .
, KXEN
[117]. KXEN , .
KXEN - ,
Data Mining .

.
, KXEN (
, ) , Data Mining.
- KXEN -
. ,

.
KXEN :

/ ( .. );
/;
;
( ).

, .. . -
"" ,
( ,
).


KXEN , ,
, :
; , ;
; ;
.
.
309

KXEN ,
( )

. Data Mining
KXEN . 27.1.

. 27.1. Data Mining KXEN

, KXEN
on-line "-".
, ,
,
.
KXEN :

: , KXEN
( DB2, Oracle MS
SQL Server, .. ODBC);
, :
+ score-;
:
++, XML, PMML, HTML, AWK, SQL, JAVA, VB, SAS,
.

KXEN Analytic FrameworkTM


,
. KXEN ,
, .
, ,
-.
KXEN /.
KXEN - , , ,
.
310

. ,
KXEN, .
KXEN
.
-
, KXEN.
, ,
-.
KXEN Data Mining.
, KXEN
. ,
, , .
,
, , .
KXEN

1990- .
,
. ,
, .

.
? ,
,
. : " , -
, ?" : "".
.
, ,
, .
KXEN. , KXEN
.
. KXEN, , .

.
, KXEN,
.
( ),
. (Structured
Risk Minimization). KXEN ,
,
.
311

,
-
. , - .

( , ,
..)
, ,
.
KXEN ? KXEN

. KXEN .
, " " (Data Manipulation),

(, ), .
, ,
.

. KXEN
, , - ,

.
.
KXEN ,
, .. ,
, , ,
, .
.
,

.
( ),
- .
, ,
. :
1. API.
2. .
3.
.

. KXEN

. KXEN
, .. " " ( ). ,

, .
4. .
312


on-line, ,
, Java, SQL, PMML .
KXEN

.
, KXEN
. -
; ,
. KXEN
, .
KXEN Clementine,
Data Mining SPSS,
, KXEN
Data Mining.
, : "
KXEN ,
() Data Mining?"
: ,
. ,

.
KXEN API,

-. ,
,
,
, .
KXEN
,
.
. KXEN
, . ,
,
, , :

;
;

;
,
.

KXEN Analytic Framework Version 3.0

KXEN Analytic Framework ,


, .
313

"" DBMS- (, Oracle MS SQLServer) ODBS.


KXEN Analytic Framework
.
, KXEN.
, KXEN
. ,
,
-. . 27.2
KXEN Analytic Framework Version 3.0.

. 27.2. KXEN Analytic Framework Version 3.0

KXEN.

(KXEN Event Log - KEL)


, .
KEL
. , ""
314

(, ,
) (,
).
SQL
. KEL ,
KXEN.

" " ,
.

(KXEN Sequence Coder - KSC)


. , "" ,
Web-,
. .
KEL,
KXEN.

,
.

(KXEN Consistent Coder - K2C)


,
KXEN. K2C
,
.

,
.

(KXEN Robust Regression - K2R)


, ,
315

, .
, .
, K2R
( 10 000). K2R
,
.

.
.

(KXEN Smart Segmenter - K2S)


() .
, , .
.
,
.

, , .

KXEN (Support Vector Machine - KSVM)


.
,
.
, .

, ,
.

(KXEN Time Series - KTS)


.
,
316

. KTS ,
, .
:
.

(KXEN Model Export - KMX)


: SQL, C, VB, SAS, PMML
-.

.
:
,
.

. ,
.
IOLAP

, , IOLAPTM KXEN -
,
.
OLAP-
, , .
. ,
, , 200
?
IOLAP:

,
.

( ).
.

.

IOLAP ,
KXEN, Microsoft Excel. IOLAP
OLAP- .
317

Data Mining
Data Mining,
. ,
: -
.
Data Mining . ,
,
, ,
,
,
.
, . ,
,
, ,
.
,
, , ,
-.
,
.
, " "
, Data Mining ( ,
1-2 , ,
). -
,
: " "
.
,
Data Mining. (
) .
,
, , , Data Mining.
: Data Mining /
.
Data Mining-

Meta Group, 85% Data


Mining , ..
318

-. KDnuggets
, Data
Mining.
Data Mining -
Two Crows (www.twocrows.com). Data
Mining, ,
Data Mining . Data Mining , Two Crows.
Data Mining-
, :

IBM Global Business Intelligence Solutions, www.ibm.com/bi;


SAS Institute, www.sas.com/datamining;
SPSS, www.spss.com;
StatSoft, www.StatSoft.com.


. , , Arvato Business Intelligence, www.arvatobi.fr,
Data Mining , ,
.
.
, Blue Hawk LLC, www.bluehawk.biz, Data Mining
Direct Marketing CRM.
Data Mining.
Bayesia, (www.bayesia.com), "
" . Visual Analytics
(www.visualanalytics.com) -
Data Mining.
, Data Mining

.
. Data
Mining ,
.
, . , -,
( ), -,
( ). ,
,
,
.
.
, -
.
.
,
.
319

-
:

,
; , ,
, "" ""
"", "" "" - ,
. ,
, ,
.
.


, .

, Data Mining .
SnowCactus,
Data Mining.
SnowCactus Data Mining
- [118], :

, ;
;
;
;
;
;
.

-
,
.

SnowCactus
.
,
Data Mining .
. 28.1. , , Data
Mining. , Data Mining
, ,
.

320

. 28.1. Data Mining SnowCactus

.
1. -
-.
:
,
.
, .
, .

,
Data Mining.
2.
- ,
,
. -
,
,
.
3.

321


. ,
.
4.
- - .
, ,
, . -

.
5.

. , , -,
, ,
, , ,
.
-
, -. ,
. -,
,
.

, -
. " ?" .
dm-Score -
- .
1. dm-Score (,
- ).
dm-Score (dm - Data Mining)

.
, ..
, ,
(), ,
.

, ..,
- , , ,
,
.
322

dm-Score ,

. , dm-Score
, , .. -.
, ,
-.
dm-Score :

( );
.
, ;
( )
;
;

;
, ..
, ..;
(
);
;
, .

, dm-Score ,
: ,
, .
, .
, ,
.
dm-Score . 28.2.

323

. 28.2. dm-Score

dm-Score -
.
( ). dm-Score
,
Data Mining. ,
.
dm-Score ,
.
,
. , dm-Score ,
, -
.

, ,
, .
.
(
, ), dmScore .
( ).
dm-Score
, ,
.. ,
, ,
.
324

dm-Score ,

, ,
.. , dm-Score,
, .
Data
Mining , ,
.
- . Data Mining, ,
,

.
. , Data Mining

. ,
- , . .
. , Data Mining
, , .
, Data Mining ,
, . ,
.
. ,
. ,
Data Mining, ,
. ,
.
.
. ,
, - , , -
, ..

.
. :
,
(, )
..
2. : - .
- ,
Data Mining .
IT-,
.
325

,

. , -
.
Data Mining,
, ..
,
. -

Data Mining.
()
, , ,
- ,
, . ,
, 20-25 .
.
?
, ,
35-45 ,
.
, 20-25 .
: Data Mining , IT-
. ?
,
.
, Data Mining
, , .

Data Mining
,
,
. Data Mining
. ,
, .

326

You might also like