You are on page 1of 86

Constructing the Index System of Association

to Stock Investment Decision

Constructing the Index System of Association


to Stock Investment Decision

Student: Yi-chun Tsai

Advisor: Yuan-chen Cheng

A Thesis
Submitted to Institute of Applied Information on Graduate Studies
of Leader University in partial fulfillment of the Requirements for
the degree of Master Science

June 2005
Tainan, Taiwan

(Index System of Association )


(Joint Probability)(Conditional Probability)

(Decision Support
System, DSS)
(Data
Warehouse)

ABSTRACT

This research constructs an index system of association, which is based on the


joint probability and conditional probability factors, to support the stock investment
decision. This research divides the relational degree of the association index to
extremely high, high, medium and weak. By improving the data warehouse structure,
we can make the decision support system more flexible and efficient.

Keywords: Index system of association, Joint probability, Conditional probability,


Decision support, Data warehouse.

II

III

.I
ABSTRACT..II

..III

..IV

..VI
.......VIII
..1
1.1 .1
1.2 .2
1.3 .2
1.4 .....3
1.5 .....3
..5
2.1 .5
2.1.1 ..5
2.1.2 ..7
2.1.3 ..9
2.1.4 ......9
2.2 ...10
2.2.1 11
2.2.2 ....12
....13
3.1 ....13
IV

3.2 ...13
....17
4.1 ...17
4.2 ...18
4.3 ...22
4.4 X Y ...26
4.5 ...33
4.6 ...35
4.7 ...44
4.8 ...52
....55
5.1 ...55
5.2 X Y ..57
5.3 .......63
....66
6.1 ...66
6.2 ...67
...71
7.1 ...71
7.2 ...72
..73

2-1

..10

3-1

..14

4-1

..19

4-2

..25

4-3 (a)

..27

4-3 (b)

..27

4-3 (c)

..28

4-4 (a)

..30

4-4 (b)

..30

4-4 (c)

..31

4-5

..33

4-6

..34

4-7 (a)

37

4-7 (b)

37

4-7 (c)

38

4-8 (a)

40

4-8 (b)

40

4-8 (c)

41

4-9

..43

4-10 (a)

45

4-10 (b)

45

4-10 (c)

46
VI

4-11 (a)

48

4-11 (b)

48

4-11 (c)

49

4-12

..51

4-13

..53

5-1

..56

5-2

..57

5-3 (a)

..58

5-3 (b)

..58

5-3 (c)

..59

5-4 (a)

..61

5-4 (b)

..61

5-4 (c)

..62

5-5

..64

6-1

..68

VII

1-1

2-1

3-1

...15

3-2

..15

4-1

..17

4-2

-.20

4-3

......20

4-4

-.21

4-5

..21

4-6

-.....22

4-7

..23

4-8

..23

4-9

..26

4-10 (a)

..28

4-10 (b)

..29

4-10 (c)

..29

4-11 (a)

..31

4-11 (b)

..32

4-11 (c)

..32

4-12

..32

4-13

..35

4-14

..36
VIII

4-15 (a)

..38

4-15 (b)

..39

4-15 (c)

..39

4-16 (a)

..41

4-16 (b)

..42

4-16(c)

..42

4-17

..43

4-18 (a)

46

4-18 (b)

47

4-18 (c)

47

4-19(a)

49

4-19 (b)

50

4-19 (c)

50

4-20

..51

5-1 (a)

..59

5-1 (b)

..60

5-1 (c)

..60

5-2 (a)

..62

5-2 (b)

..63

5-2 (c)

..63

5-3

..64

6-1

..66

6-2

IX

..70

1.1

(Over Fitting)

(Relative Strength Index,


RSI)(BIA)(Over Bought Over Sold, OBOS)(Advanced
Decline Line, ADL)(Advanced Dechne Ratio, ADR)(CDP)
(Psychologicalline, PSY)(On Balance Volume, OBV)
(Momentum, MTM)(Williams %R, W%R)(Volume,

VOL)(Moving Average, MA)(Stochastics Kdline, KD )

(Data Mining)

1.2

1.3

(Data Warehouses)

1.4

1.5

(Interesting)

1-1

1-1

2.1
2.1.1

(Knowledge
Discovery in Database)

(Decision Tree) (Artificial Neural


Network)(Inductive Logic)(Nearest Neighbor, NN)
(Baysian Network), (Attribute-Oriented Induction) /
(Binary/Quantitative Association Rules)

2-1

1.
2.
3. NBA
4.
5.
6.
7.
(Association Rule)
(Clustering)(Classification)(Sequential Pattern)(Decision
Tree)

2-1

(Knowledge Discovery in Data, KDD)


[Jiawci Han and Micheline Kamber, 2001]
1. (Data Cleaning)
(Missing Value)
6

2. (Data Integration)

3. (Data Selection)

4. (Data Transformation)

5. (Data Mining)(Classification)
(Clustering)(Summarization)...

6. (Pattern Evaluation)
7. (Knowledge Presentation)

2.1.2

[Agrawal
and Imiclinski and Swami, 1993] Agrawal

[Manaila and Toivonen and Verkamo, 1994; Park and


Chen and Yu, 1995a & 1995b]

Agrawal Srikant 1994 Apriori

[Ramaswamy and Mahajan and Silberschatz, 1998] Apriori

Apriori
Apriori

X Y X IY I XY= X Y
I D
X Y (Support) s (Confidence) c
D X Y
D X Y [2003]
(Minsup)
(Minconf)
1. (Frequent Item Sets)
2.

Agrawal 1993
[Agrawal and Srikant,
1995]EPISODES [Mannila and Toivonen and Verkamo, 1997]
[Koperski and Han, 1995] [Savasere and Omieciaski and Navathe,
1998] [Lu and Han and Feng, 1998]

2.1.3

70%

2.1.4

20~30 5 ~7 80% A
9

2-1

2-1

Agrawal

1993

Agrawal and Srikant

1995

Koperski and Han

1995

Mannila and Toivonen and EPISODES


Verkamo

1997

Savasere and Omieciaski


and Navathe

1998

Lu and Han and Feng

Cheng and Tsai

2005

1998

2.2

Lu

10

()

[1999]

2.2.1

(Joint Probability)
(Joint Probability Function)
X, Y X x1, x2, x3, , xnY y1, y2, y3, ,
ym f ( xi , y j )

0 f ( xi , y j ) 1
n

f ( x , y
i =1 j =1

11

) =1

(2-1)

f ( xi , y j )

2.2.2

(Conditional Probability) B
A A
f ( x, y ) Y = yi xi

f ( xi | Y = y j ) =

f ( xi , y j )
fy (y j )

(2-2)

X = xi yj

f ( y j | X = xi ) =

12

f ( xi , y j )
f x ( xi )

(2-3)

3.1

A Bi , i=1, 2, , n
n P ( A Bi ) , i=1, 2, , n
P ( Bi | A) , i=1, 2, , n

P( Bi | A) =

P ( A Bi )
, P( A) 0
P ( A)

(3-1)

3.2

I (Index of Association) X
P ( A Bi ) Y P( Bi | A)

I k = f ( X , Y ) , k=1, 2, 3, 4 I 1 I 2 I 3
I 4 I j ( 3-2 3-1)

13

I 1 ( P( X Q3 ) P(Y Q3 ))
I 2 (( P(Q2 < X < Q3 ) P(Y Q3 )) ( P( X Q3 ) P(Q2 < Y < Q3 )))
I 3 (( P ( X Q2 ) P(Y Q3 )) ( P(Q2 < X < Q3 ) P(Q2 < Y < Q3 ))

(3-2)

( P( X Q3 ) P(Y Q2 )))
I 4 (( P( X Q2 ) P(Q2 < Y < Q3 )) ( P(Q2 < X < Q3 ) P(Y Q2 ))
( P( X Q2 ) P(Y Q2 )))

Q3 Q2
30 X, Y
( 3-1)(mean) Q3
75%( 3-2)

3-1

Q3

Q3 Q2

Q2

Q3

Q3
|

Q2
Q2

14

3-1

25%

Q1

Q3

3-1

3-2

P ( X Q3 ) = 25% , P (Q2 < X < Q3 ) = 25% , P ( X Q2 ) = 50%

15

Q3 , ( 3-3), ( 3-4)

x
i =1

(x
i =1

(3-3)

)2

(3-4)

()
, X ( 3-5), S 2 ( 3-6)
(Unbiased Estimator)

X =

x
i =1

n 1

S=

(x
i =1

X )2

n 1

16

(3-5)

(3-6)

4.1

( 4-1)

, X , Y Q3 Q2
P(Ik)

XY

Q3

Q2

4-1

17

P(Ik)

4.2

(
)

( 4-1)

18

4-1

IC

2363 2379 5351

2303 2311 2325 2329


2330 2344 5346 5347

2316 2335 2355 2367


2368 2383

2340 2384

2327 2370

2331 2350 2357 2376


2377

2378

2317 2387

2352 2358

2336 2361 2365 2380

2328 2341

2324 2353 2356 2362


2364 2381 2382 2385

2343 2347 2359 2360


2373 2374

2321 2326 2332 2345


2366

2323 2349

19

Microsoft Windows XP Professional Microsoft

Visual Basic 6 Microsoft Office Excel 2003 4-2

4-2 -

12 :

1999 2004

4-3
20

4-4 -

4-5

21

4-6 -

_ 2004 12 29 83_ 2004 12 29


_ 2004 12 29 83 _
2004 12 28 81.5 1.5

()() =

(4-1)

4.3

22

4-7

A Bi , i=1, 2, , n
n P( Bi | A) , i=1,

2, , n 4-8

4-8

23

=66%
=60%

=70%

66%
1999 2004

1999

( 3-1)

24

4-2

0.8260

0.6778

0.7487

0.6770

0.7100

0.6752

0.7100

0.6739

0.7074

0.6739

0.7010

0.6713

0.6958

0.6709

0.6942

0.6701

0.6941

0.6658

0.6817

0.6636

0.6817

0.6636

0.6804

0.6636

0.6791

0.6610

0.6779

25

4.4 X Y

P( A Bij ) , i=1, 2, , 27;

j=1, 2, , 24A B i
(B1j)(B2j) (B27j)
j 24 (6 4
) 648
, X ,

X X =0.34881 , SX =0.07289, Q3=0.39801


X ~ N(0.34881, 0.072892)

1999 1 1 2004 12 31

4-9

(_)( 4-3(a) 4-10(a))(_)( 4-3(a)


4-10(b))(_)( 4-3(a) 4-10(c))

26

P( A B1 j ) ~ P ( A B27 j ) , j=1, 2, , 24

4-3(a)

1999

0.43

0.46

0.46

0.46

2000

0.40

0.33

0.39

0.46

2001

0.44

0.45

0.52

0.48

2002

0.40

0.36

0.38

0.45

2003

0.30

0.48

0.41

0.37

2004

0.38

0.40

0.41

0.42

4-3(b)

1999

0.39

0.50

0.41

0.46

2000

0.32

0.29

0.29

0.56

2001

0.42

0.34

0.51

0.47

2002

0.29

0.34

0.35

0.38

2003

0.30

0.47

0.32

0.28

2004

0.38

0.37

0.34

0.31

27

4-3(c)

1999

0.41

0.43

0.36

0.41

2000

0.26

0.32

0.31

0.54

2001

0.35

0.21

0.39

0.50

2002

0.27

0.33

0.39

0.38

2003

0.27

0.45

0.36

0.35

2004

0.31

0.34

0.33

0.32

( 4-8(a)(b)(c))

0.6
0.5
0.4

0.3

0.2
0.1
0
1999

2000

2001

2002

2003

2004

4-10 (a)

28

0.6
0.5
0.4

0.3

0.2
0.1
0
1999

2000

2001

2002

2003

2004

4-10 (b)

0.6
0.5
0.4

0.3

0.2
0.1
0
1999

2000

2001

2002

2003

2004

4-10 (c)

P( Bij | A) , i=1, 2, , 27;

j=1, 2, , 24 648
, Y
, X Y =0.6866 , SY=0.10207 ,

Q3=0.7555Y ~ N(0.6866, 0.102072)


(_)( 4-4(a) 4-11(a))(_)( 4-4(b) 4-11(b))(
_)( 4-4(c) 4-11(c))

29

P( B1 j | A) ~ P ( A B27 j ) , j=1, 2, , 24

4-4 (a)

1999

0.87

0.76

0.86

0.74

2000

0.93

0.77

0.75

0.76

2001

0.78

0.85

0.82

0.89

2002

0.92

0.74

0.83

0.94

2003

0.74

0.84

0.96

0.89

2004

0.76

0.81

0.79

0.90

4-4 (b)

1999

0.80

0.83

0.76

0.74

2000

0.74

0.67

0.56

0.93

2001

0.75

0.64

0.79

0.86

2002

0.67

0.71

0.77

0.81

2003

0.74

0.81

0.75

0.67

2004

0.76

0.75

0.67

0.67

30

4-4 (c)

1999

0.83

0.71

0.68

0.65

2000

0.59

0.73

0.61

0.90

2001

0.63

0.39

0.62

0.91

2002

0.63

0.68

0.87

0.81

2003

0.65

0.78

0.86

0.85

2004

0.62

0.69

0.64

0.70

( 4-11(a)(b)(c))

1.2
1

0.8
0.6
0.4
0.2
0
1999

2000

2001

2002

2003

2004

4-11 (a)

31

0.8
0.6
0.4
0.2
0
1999

2000

2001

2002

2003

2004

4-11 (b)

0.8
0.6
0.4
0.2
0
1999

2000

2001

2002

2003

2004

4-11 (c)

66% 1999 2004


:

220
200
180
160
140
120
100
80
60
40
20
0

0.2

0.4

0.6

0.8

1.2

4-12
32

4.5

( 3-1) 4.4
24 , I1 16

, I2 4 , I3 3
, I4 1
P(I1)=66.7%P(I2)=16.7%P(I3)=12.5%P(I4)=4.2%

4-5
, X

Q3
>=0.39801

,
Y

Q3

Q3 Q2
< 0.39801
> 0.34881

Q2
<=0.34881

>=0.7555

16

<0.7555
>0.6866

<=0.6866

Q3
|

Q2
Q2

33

4-6
P(I1)
4-12

4-6 (%)
4-5 & P(I1)

P(I2)

P(I3)

P(I4)

66.7

16.7

12.5

4.2

25.0

20.8

8.3

45.8

25.0

12.5

12.5

50.0

20.8

8.3

8.3

62.5

20.8

8.3

8.3

62.5

20.8

8.3

8.3

62.5

16.7

20.8

4.2

58.3

16.7

16.7

4.2

62.5

16.7

12.5

12.5

58.3

10

16.7

8.3

4.2

70.8

11

16.7

4.2

8.3

70.8

12

16.7

4.2

4.2

75.0

13

16.7

0.0

8.3

75.0

14

12.5

20.8

4.2

62.5

15

12.5

16.7

8.3

62.5

16

12.5

12.5

12.5

62.5

17

12.5

12.5

12.5

62.5

18

12.5

8.3

16.7

62.5

19

12.5

8.3

12.5

66.7

20

12.5

8.3

4.2

75.0

21

12.5

4.2

16.7

66.7

22

12.5

4.2

16.7

66.7

23

8.3

25.0

4.2

62.5

24

8.3

16.7

20.8

54.2

25

8.3

12.5

12.5

66.7

26

8.3

12.5

8.3

70.8

27

8.3

4.2

25.0

62.5

34

100%

(%)

80%

60%

40%

20%

0%
1

11

13

15

17

19

21

23

25

27

4-13

4.6

( 4-14)

35

4-14

P(I1) 20.8

(_)(_)(_)

(_)(_)(_)(_
)(_)(_)(_)(_
)(_)(_)(_)(
_)

C 26 = 15

(4-2)

, X, X X =0.28364 , SX =0.07433
, Q3=0.33381 X ~ N(0.28364,

0.074332) (_)( 4-7(a) 4-15(a))(_


36

)( 4-7(b) 4-15(b))(_)( 4-7(c) 4-15(c))

4-7 (a)

1999

0.36

0.38

0.38

0.41

2000

0.32

0.29

0.24

0.43

2001

0.35

0.29

0.41

0.44

2002

0.25

0.30

0.30

0.35

2003

0.20

0.41

0.32

0.25

2004

0.29

0.31

0.30

0.31

4-7 (b)

1999

0.31

0.25

0.38

0.39

2000

0.27

0.23

0.29

0.39

2001

0.35

0.32

0.33

0.38

2002

0.31

0.28

0.29

0.32

2003

0.23

0.38

0.27

0.23

2004

0.33

0.34

0.38

0.35

37

4-7(c)

1999

0.20

0.32

0.29

0.30

2000

0.32

0.23

0.31

0.37

2001

0.37

0.29

0.39

0.41

2002

0.24

0.20

0.27

0.35

2003

0.25

0.39

0.24

0.23

2004

0.19

0.28

0.31

0.26

( 4-15(a)(b)(c))

0.5

0.4

0.3
0.2

0.1

0
1999

2000

2001

2002

2003

2004

4-15 (a)

38

0.5

0.4

0.3
0.2

0.1

0
1999

2000

2001

2002

2003

2004

4-15 (b)

0.5

0.4

0.3
0.2

0.1

0
1999

2000

2001

2002

2003

2004

4-15 (c)

, Y , X Y =0.55449 ,

SY=0.11793, Q3=0.6341Y ~
N(0.55449, 0.117932) (_)( 4-8(a) 4-16(a))
(_)( 4-8(b) 4-16(b))(_)( 4-8(c)
4-15(b))

39

4-8 (a)

1999

0.73

0.63

0.70

0.65

2000

0.74

0.67

0.47

0.71

2001

0.63

0.55

0.64

0.80

2002

0.58

0.61

0.67

0.74

2003

0.48

0.70

0.75

0.59

2004

0.59

0.63

0.58

0.67

4-8 (b)

1999

0.63

0.41

0.70

0.63

2000

0.63

0.53

0.56

0.64

2001

0.63

0.61

0.51

0.69

2002

0.71

0.58

0.63

0.68

2003

0.57

0.65

0.64

0.56

2004

0.66

0.69

0.73

0.77

40

4-8 (c)

1999

0.40

0.54

0.54

0.49

2000

0.74

0.53

0.61

0.62

2001

0.66

0.55

0.62

0.74

2002

0.54

0.42

0.60

0.74

2003

0.61

0.68

0.57

0.56

2004

0.38

0.56

0.61

0.57

( 4-16(a)(b)(c))

0.8

0.6
0.4

0.2

0
1999

2000

2001

2002

2003

2004

4-16 (a)

41

0.8

0.6
0.4

0.2

0
1999

2000

2001

2002

2003

2004

4-16 (b)

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
1999

2000

2001

2002

2003

2004

4-16(c)

( 3-1) 4.6
, I1 9 , I2 6
, I3 4 , I4 5
P(I1)=37.5%P(I2)=25.0%

P(I3)=16.7%P(I4)=20.8%
( 4-9)
P(I1)
( 4-17)
( 4-9)
42

4-9 (
%)
4-14 & P(I1)

P(I2)

P(I3)

P(I4)

37.5

25.0

16.7

20.8

29.2

20.8

20.8

29.2

20.8

12.5

12.5

54.2

20.8

8.3

8.3

62.5

20.8

4.2

8.3

66.7

20.8

4.2

0.0

75.0

20.8

0.0

0.0

79.2

20.8

0.0

0.0

79.2

16.7

12.5

8.3

62.5

10

16.7

8.3

8.3

66.7

11

16.7

16.7

12.5

54.2

12

16.7

25.0

4.2

54.2

13

12.5

4.2

8.3

75.0

14

12.5

12.5

8.3

66.7

15

8.3

12.5

8.3

70.8

100%

(%)

80%

60%

40%

20%

0%
1

4-17

7
8
9 10

11

12

13

14

15

43

4.7

(_)(
_)(_)(_
)(_)(_)(_
)(_)(_)(_
)

C 35 = 10

(4-3)

X,

X X =0.23449 , SX =0.06787

, Q3=0.2803X ~ N(0.23449, 0.067872)


(_)( 4-10(a) 4-18(a))(_
)( 4-10(b) 4-18(b))(_)( 4-10(c)
4-18(c))

44

4-10(a)

1999

0.31

0.24

0.33

0.33

2000

0.24

0.22

0.21

0.37

2001

0.30

0.23

0.28

0.38

2002

0.22

0.22

0.24

0.28

2003

0.20

0.31

0.23

0.12

2004

0.26

0.26

0.28

0.28

4-10(b)

1999

0.25

0.28

0.28

0.28

2000

0.23

0.20

0.21

0.37

2001

0.32

0.19

0.28

0.38

2002

0.16

0.22

0.15

0.29

2003

0.13

0.34

0.29

0.18

2004

0.24

0.22

0.19

0.23

45

4-10(c)

1999

0.36

0.26

0.32

0.29

2000

0.29

0.26

0.20

0.33

2001

0.32

0.19

0.34

0.39

2002

0.25

0.22

0.23

0.31

2003

0.09

0.28

0.26

0.15

2004

0.16

0.18

0.22

0.20

( 4-18(a)(b)(c))

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
1999

2000

2001

2002

2003

2004

4-18(a)

46

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
1999

2000

2001

2002

2003

2004

4-18(b)

0.5

0.4

0.3
0.2

0.1

0
1999

2000

2001

2002

2003

2004

4-18(c)

, Y , X Y =0.45661 ,

SY=0.11547, Q3=0.53456Y ~
N(0.45661, 0.115472) (_
)( 4-11(a) 4-19(a))(_)( 4-11(b)

4-19(b))(_)( 4-11(c) 4-19(c))

47

4-11(a)

1999

0.63

0.39

0.62

0.53

2000

0.56

0.50

0.42

0.62

2001

0.53

0.42

0.44

0.69

2002

0.50

0.45

0.53

0.58

2003

0.48

0.54

0.54

0.30

2004

0.52

0.53

0.55

0.60

4-11(b)

1999

0.50

0.46

0.51

0.44

2000

0.52

0.47

0.42

0.62

2001

0.56

0.36

0.44

0.69

2002

0.38

0.45

0.33

0.61

2003

0.30

0.59

0.68

0.44

2004

0.48

0.44

0.36

0.50

48

4-11(c)

1999

0.73

0.44

0.59

0.47

2000

0.67

0.60

0.39

0.55

2001

0.56

0.36

0.54

0.71

2002

0.58

0.45

0.50

0.65

2003

0.22

0.49

0.61

0.37

2004

0.31

0.38

0.42

0.27

( 4-19(a)(b)(c))

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
1999

2000

2001

2002

2003

2004

4-19(a)

49

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
1999

2000

2001

2002

2003

2004

4-19(b)

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1999

2000

2001

2002

2003

2004

4-19(c)

( 3-1) 4.7
, I1 8 , I2

5 , I3 0 , I4 11
P(I1)=33.3%

P(I2)=20.8%P(I3)=0%P(I4)=45.8%
4-12
P(I1)
( 4-20)
( 4-12)

50

4-12

(%)
4-11 &

P(I1)

P(I2)

P(I3)

P(I4)

33.3

20.8

0.0

45.8

29.2

16.7

16.7

37.5

25.0

20.8

4.2

50.0

25.0

0.0

16.7

58.3

20.8

16.7

0.0

62.5

20.8

4.2

8.3

66.7

16.7

12.5

8.3

62.5

16.7

8.3

12.5

62.5

16.7

8.3

0.0

75.0

10

12.5

8.3

4.2

75.0

100%

(%)

80%

60%

40%

20%

0%
1

5
6

10

4-20

51

4.8

1999 2004
2005

P( Bi | A) , i=1, 2, , 27

52

4-13
1999/1/1~2004/12/30 2005/1/1~2005/6/30

0.8571

0.8285

0.6285

0.6285

0.6285

0.6142

0.6142

0.6

0.6

10

0.6

11

0.6

12

0.6

13

0.5857

14

0.5857

15

0.5857

16

0.5797

17

0.5714

18

0.5714

19

0.5614

20

0.5571

21

0.5571

22

0.5468

23

0.5142

24

0.5

25

0.4714

26

0.4571

27

0.4428

53

1999 2004
1999 2002
66% 8.3
16.7 2004 2005

IC
IC
2005

54

5.1

( 5-1)

55

5-1

2801

2807

2808

2809

2811

2812

2816

2820

2822

2823

2825

2827

2831

2833

2834

2836

2838

2841

2845

=65%
=60%
=63%

=63% ( 5-2)

56

5-2

0.7809

0.6417

0.6417

0.6378

0.6378

0.6314

0.6314

5.2 X Y

216
, X
, X X =0.33417 , SX =0.08401 ,

Q3=0.39087X ~ N(0.33417, 0.084012) (


_)(_)(_)(_)(_
)(_) (_)
( 5-3(a)(b)(c)) 5-1(a)(b)(c))

57

5-3(a)

1999

0.33

0.38

0.51

0.61

2000

0.29

0.36

0.36

0.56

2001

0.47

0.53

0.61

0.53

2002

0.38

0.39

0.38

0.35

2003

0.25

0.39

0.23

0.25

2004

0.38

0.31

0.36

0.29

5-3(b)

1999

0.30

0.26

0.35

0.32

2000

0.16

0.19

0.24

0.49

2001

0.37

0.39

0.43

0.37

2002

0.25

0.39

0.38

0.40

2003

0.27

0.44

0.27

0.23

2004

0.34

0.29

0.30

0.37

58

5-3(c)

1999

0.36

0.31

0.36

0.33

2000

0.32

0.20

0.21

0.47

2001

0.32

0.47

0.57

0.48

2002

0.33

0.42

0.33

0.31

2003

0.267

0.28

0.24

0.25

2004

0.362

0.20

0.23

0.20

( 5-1(a)(b)(c))

0.7
0.6

0.5
0.4
0.3
0.2
0.1
0
1999

2000

2001

2002

2003

2004

5-1(a)

59

0.6

0.5
0.4
0.3
0.2
0.1
0
1999

2000

2001

2002

2003

2004

5-1(b)

0.7
0.6

0.5
0.4
0.3
0.2
0.1
0
1999

2000

2001

2002

2003

2004

5-1(c)

_
216
, Y
, X Y =065811 , SY=0.13167, Q3=0.74699
Y ~ N(0.65811, 0.131672)
(_)( 5-4(a) 5-2(a))(_)( 5-4(a)

5-2(b))(_)( 5-4(a) 5-2(c))


P ( B1 j | A) ~ P ( A B7 j ) , j=1, 2, , 7
60

5-4(a)

1999

0.67

0.63

0.95

0.98

2000

0.67

0.83

0.69

0.93

2001

0.84

1.00

0.95

0.97

2002

0.87

0.81

0.83

0.74

2003

0.61

0.68

0.54

0.59

2004

0.76

0.63

0.70

0.63

5-4(b)

1999

0.60

0.44

0.65

0.51

2000

0.37

0.43

0.47

0.81

2001

0.66

0.73

0.67

0.69

2002

0.58

0.81

0.83

0.84

2003

0.65

0.76

0.64

0.56

2004

0.69

0.59

0.58

0.80

61

5-4(c)

1999

0.73

0.51

0.68

0.53

2000

0.74

0.47

0.42

0.79

2001

0.56

0.88

0.90

0.89

2002

0.75

0.87

0.73

0.65

2003

0.65

0.49

0.57

0.59

2004

0.72

0.41

0.45

0.43

( 5-2(a)(b)(c))

1.2
1
0.8

0.6

0.4
0.2
0
1999

2000

2001
2002

2003

2004

5-2(a)

62

0.8
0.6
0.4
0.2
0
1999

2000

2001

2002

2003

2004

5-2(b)
1

0.8
0.6
0.4
0.2
0
1999

2000

2001

2002

2003

2004

5-2(c)

5.3

3-1 5.2
, I1 7 , I2 5
, I3 4 , I4 8
P(I1)=29.2% P(I2)=20.8% P(I3)=16.7%

P(I4)=33.3%
5-5
P(I1)
63

( 5-3)

5-5 (%)
5-3 & P(I1)

P(I2)

P(I3)

P(I4)

29.2

20.8

16.7

33.3

20.8

0.0

16.7

62.5

16.7

12.5

8.3

62.5

12.5

16.7

12.5

58.3

12.5

8.3

16.7

62.5

8.3

12.5

12.5

66.7

4.2

16.7

8.3

70.8

100%
90%

80%

70%
60%

50%

40%
30%

20%

10%

0%

5-3

64

65

6-1

6.1

66

= 0%
P ( A B ij ) , i=1, 2, , 56; j=1,

2, , 24A B i
( B 1j)( B 2j) ( B 56j)
j 24 (6

4 ) 1344
, X
,

X X =0.16987 , SX =0.05849 ,

Q3=0.20935X ~ N(0.16987, 0.058492)


P( B ij | A) ,

i=1, 2, , 27; j=1, 2, , 56


1344
, Y , X Y =0.33652 ,

SY=0.10731, Q3=0.40895Y ~
N(0.33652, 0.107312)

6.2

( 3-1) 6.1
, I 1 9 , I 2 3
, I 3 3 , I 4 9
P( I 1 )=37.5% P( I 2 )=12.5% P( I 3 )=12.5%

67

P( I 4 )=37.5%

, I 1 3
, I 2 , I 3 0
, I 4 21

P( I 1 )=12.5%P( I 2 )=0%P( I 3 )=0%P( I 4 )=87.5%

6-1
6-2

6-1 (%)
4-5 & P( I 1 ) P( I 2 ) P( I 3 ) P( I 4 )

37.5 12.5 12.5 37.5


1

37.5 12.5
8.3 41.7
2

37.5
4.2 12.5 45.8
3

33.3
8.3
8.3 50.0
4

33.3

4.2

16.7

45.8

33.3

4.2

12.5

50.0

33.3

4.2

4.2

58.3

33.3

0.0

16.7

50.0

29.2

12.5

16.7

41.7

10

29.2

8.3

16.7

45.8

11

29.2

0.0

20.8

50.0

12

25.0

20.8

12.5

41.7

13

25.0

20.8

8.3

45.8

14

25.0

16.7

20.8

37.5

15

25.0

16.7

16.7

41.7

16

25.0

12.5

12.5

50.0

68

17

25.0

12.5

12.5

50.0

18

25.0

12.5

8.3

54.2

19

25.0

4.2

12.5

58.3

20

25.0

4.2

12.5

58.3

21

20.8

25.0

16.7

37.5

22

20.8

8.3

16.7

54.2

23

20.8

4.2

4.2

70.8

24

16.7

20.8

16.7

45.8

25

16.7

20.8

12.5

50.0

26

16.7

20.8

8.3

54.2

27

16.7

16.7

16.7

50.0

28

16.7

12.5

29.2

41.7

29

16.7

12.5

20.8

50.0

30

16.7

12.5

20.8

50.0

31

16.7

12.5

16.7

54.2

32

16.7

12.5

12.5

58.3

33

16.7

12.5

8.3

62.5

34

16.7

12.5

8.3

62.5

35

16.7

8.3

29.2

45.8

36

16.7

8.3

12.5

62.5

37

16.7

8.3

12.5

62.5

38

16.7

8.3

4.2

70.8

39

12.5

16.7

8.3

62.5

40

12.5

12.5

16.7

58.3

41

12.5

8.3

20.8

58.3

42

12.5

8.3

20.8

58.3

43

12.5

8.3

16.7

62.5

44

12.5

4.2

12.5

70.8

45

12.5

0.0

0.0

87.5

46

8.3

33.3

12.5

45.8

47

8.3

33.3

8.3

50.0

48

8.3

20.8

20.8

50.0

49

8.3

16.7

16.7

58.3

50

8.3

16.7

8.3

66.7

51

8.3

12.5

16.7

62.5

52

8.3

8.3

12.5

70.8

53

8.3

0.0

4.2

87.5

69

54

8.3

0.0

4.2

87.5

55

4.2

25.0

8.3

62.5

56

4.2

8.3

16.7

70.8

100%

80%

60%

40%

(%)20%

0%
1

6-2

7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55

70

7.1

()()(
)

(
)()()

71

7.2

(
)

72

11-13 2003
74-15 117-143
1999

Agrawal R., T. Imielinski and A. Swami. Mining association rules between sets of
items in large databases. In Proc. 1993 ACM-SIGMOD Int. Conf. Management
of Data (SIGMOD93), pages 207-216, Washington, DC, May 1993.
Agrawal R. and R. Srikant. Fast algorithms for mining association rules. In Proc.
1994 Int. Conf. Very Large Data Bases (VLDB,94), pages 487-499, Santiago,
Chile, Sept. 1994.
Agrawal R. and R. Srikant. Mining sequential patterns. In Proc. 1995 Int. Conf. Data
Engineering (ICDE95), pages 3-14, Taipei, Taiwan, Mar. 1995.
Jiawci Han and Micheline Kamber. Datat Mining, Concepts and Techniques, pages
1-33, 2001.
Koperski K. and J. Han. Discovery of spatial association rules in geographic
information databases. In Proc. 4th Int. Symp. Large Spatial Databases
(SSD95), pages 47-66, Portland, ME, Aug. 1995.
Lu H., J. Han and L. Feng. Stock movement and n-dimensional inter-transaction
association rules. In Proc. 1998 SIGMOD Workshop on Research Issues on Data
Mining and Knowledge Discovery (DMKD98), pages 12:1-12:7, Seattle, WA,
June 1998.
Mannila H., H. Toivonen and A. I. Verkamo. Efficient algorithms for discovering
73

association rules. In Proc. AAAI94 Workshop Knowledge Discovery in


Databases (KDD94), pages 181-192, Seattle, WA, July 1994.
Mannila H., H. Toivonen and A. I. Verkamo. Discovery of frequent episodes in event
sequences. Data Mining and Knowledge Discovery, 1:259-289, 1997.
Park J. S., M. S. Chen and P. S. Yu. An effective hash-based algorithm for mining
association rules. In Proc. 1995 ACM-SIGMOD Int. Conf. Management of Data
(SIGMOD95), pages 175-186, San Jose, CA, May 1995a.
Park J. S., M. S. Chen and P. S. Yu. Efficient parallel mining for association rules. In
Proc. 4th Int. Conf. Information and Knowledge Management, pages 31-36,
Baltimore, MD, Nov. 1995b.
Ramaswamy S., S. Mahajan and A. Silberschatz. On the discovery of interesting
patterns in association rules. In Proc. 1998 Int. Conf. Very Large Data Bases
(VLDB98), pages 368-379, New York, Aug. 1998.
Savasere A., E. Omiecinski and S. Navathe. Mining for strong negative associations
in a large database of customer transactions. In Proc. 1998 Int. Conf. Data
Engineering (ICDE98), pages 494-502, Orlando, FL, Feb. 1998.

74

You might also like