You are on page 1of 41

..-..

(voron@forecsys.ru),
..-.. . .


( + )


( + )

. .

1 / 41




, ,



, ,

. .

2 / 41

( )
X ; Y (, );
y : X Y .
: xi = (xi1 , . . . , xin )
yi = y (x), i = 1, . . . , :
1


x1 . . . x1n
y1

y
. . . . . . . . . . .
. . .
x1 . . . xn
y
: a : X Y,
xi = (
xi1 , . . . , xin ), i = 1, . . . , k:
1

x1 . . . x1n
a(
x1 )
a?
. . . . . . . . . . .
...
1
n
xk . . . xk
a(
xk )
. .

3 / 41


, , (|Y| < ):
x ; y , ;
x ; y ;
x ; y ;
x ; y / ;
x ; y ;
x ; y ;
x ; y : / ;
x ; y ;

(Y = R Rm ):
x ; y ;
x h,i; y ;
x . ; y ;
x . ; y ;
x ; y ;

. .

4 / 41


:
();
(, );
( );
;
;
;
;
;
.

:
( );
;
;
;
;
. .

5 / 41

.

, L = 98. .
, %
15
14
13
12
11
10
9
8
7
6
5
4
3
3

10

11

12

, %
. .

6 / 41


X = {x1 , . . . , xL } ;
A = {a1 , . . . , aD } ;
I (a, x) = [ a x];
LD- :
a1

a2

a3

a4

a5

a6

aD

x1
...
x

1
0
0

1
0
0

0
0
1

0
0
0

0
1
0

1
1
0

1
1
0

x+1
...
xL

0
0
0

0
0
1

0
0
1

1
1
1

1
0
1

1
0
1

0
1
0

n(a, X ) =
(a, X ) =

X
()


X
()
k = L

I (a, x) a A X X;

xX
1
|X | n(a, X )

a X ;
. .

7 / 41


:
(X ) = arg min (a, X ).
aA


= X , |X | = , |X
| = k.
X X
P
P E C1
.
L

X X


:

.
CCV(, X) = E (X ), X

:




(X ), X > .
Q (, X) = P (X ), X
. .

8 / 41


(, , 1971)
X, A, [0, 1], = k

Q (, X) 6 |A| 32 exp 2 .
:
|A|.
:
108 1011 ;
= 106 1010 .
:
L D;
I (a, x), X, .
. .

9 / 41


A:
a 6 b: I (a, x) 6 I (b, x) x X;
a b: a 6 b kb ak = 1.

hA, E i:
A

;

E = (a, b) : a b .

:
6 A;
(a, b) xab X,
, I (a, xab ) = 0, I (b, xab ) = 1;



Am = a A : n(a, X) = m , m = 0, . . . , L;
. .

10 / 41

2
1
0

x1
x2
x3
x4
x5
x6
x7
x8
x9
x10

0
0
0
0
0
0
0
0
0
0
0
. .

11 / 41

2
1
0

x1
x2
x3
x4
x5
x6
x7
x8
x9
x10

0
0
0
0
0
0
0
0
0
0
0

1
0
0
0
0
0
0
0
0
0

0
1
0
0
0
0
0
0
0
0

1
0 0
0 0
1 0
0 1
0 0
0 0
0 0
0 0
0 0
0 0

0
0
0
0
1
0
0
0
0
0

. .

12 / 41

2
1
0

x1
x2
x3
x4
x5
x6
x7
x8
x9
x10

0
0
0
0
0
0
0
0
0
0
0

1
0
0
0
0
0
0
0
0
0

0
1
0
0
0
0
0
0
0
0

1
0 0
0 0
1 0
0 1
0 0
0 0
0 0
0 0
0 0
0 0

0
0
0
0
1
0
0
0
0
0

1
1
0
0
0
0
0
0
0
0

. .

0
1
1
0
0
0
0
0
0
0

0
0
1
1
0
0
0
0
0
0

2
0 0
0 0
0 0
1 0
1 1
0 1
0 0
0 0
0 0
0 0

1
0
0
0
1
0
0
0
0
0

1
0
0
0
0
1
0
0
0
0

0
0
1
0
0
0
1
0
0
0

...
...
...
...
...
...
...
...
...
...
13 / 41

a A

u(a) a ,
a:



u(a) = |Xa |, Xa = xab X a b ;
Xa a.

q(a) a
, , a:



q(a) = |Xa |, Xa = x X b A : b < a, I (b, x) < I (a, x) ;

Xa a.
. .

14 / 41


(, , , 2010)
, X, A (0, 1)
u
X CLuq

u, mq
Q (, X) 6
HLuq
,
L m k

C
L
aA
u = |Xa | a,
q = |Xa | a,
m = n(a, X) a,
z
s
X
Cms CLm
, m
HL (z) =
, z = 0, . . . ,
CL
s=0
:


u
: P X = a 6 CLuq
/CL .
. .

15 / 41

( )

( )

. .

16 / 41


CCV Q -
A, CCV Q
[.]

[., ., .]

- [.]

[.]

[.]
CCV :
[.],
[., .],
[.]
. .

17 / 41

( )
-
.

.

.

. . .
. 2011.
http://www.machinelearning.ru/wiki/images/d/d9/Voron-2011-tnop.pdf

. .

18 / 41



, ,



z(t, ) =

Xi (t)Yi ()

: z(t, ) -;
: Xi (t) i- ,
Yi () i- .

-
I (p, k) =

apg Cgk

: I (p, k) p- k- ;
: apg p- g - ,
Cgk g - k- .


p(w |d) =

p(w |t)p(t|d)

: p(w |d) w d;
: p(w |t) w t,
p(t|d) t d.
. .

19 / 41



, ,

d
: p(w |d) =

p(w |t)p(t|d)

tT
,,
( |!)

#$
!

("| ):

"

#" $

0.023
0.016
0.009

0.014
0.009
0.006

0.018
0.013
0.011

" , , "#$ :
-
.

GC- GA- .
,
( , )
. .
,
.
( ).
. .

20 / 41



, ,

(topic modeling)


, , ,

:

(expert search), ,


( )
, ,

. .

21 / 41



, ,


PLSA Probabilistic Latent Semantic Analysis [Hofmann, 1999]
:
XX
X
ndw ln
wt td max,
dD w d

tT


P
P
wt > 0;
wt = 1;
td > 0;
td = 1
w W

dD


kF k min
,

ndw 
nd W D

F = p(w |d) =
;

= wt W T wt = p(w |t);

= td T D td = p(t|d).
. .

22 / 41



, ,

-
E-: p(t|d, w ) t, d, w
wt , td :
p(t|d, w ) =

p(w , t|d)
p(w |t)p(t|d)
wt td
=
=P
.
p(w |d)
p(w |d)
s ws sd

-:

, ndwt = ndw p(t|d, w ):
nwt
,
nt
ndt
=
,
nd

wt =
td

nwt =

ndwt ,

ndwt ,

nt =

nwt ;

ndt .

w W

dD

ndt =

nd =

w d

tT

- .
. .

23 / 41



, ,

Weiwei Cui, Shixia Liu, Li Tan, Conglei Shi, Yangqiu Song, Zekai J. Gao, Xin
Tong, Huamin Qu TextFlow: Towards Better Understanding of Evolving Topics
in Text // IEEE Transactions On Visualization And Computer Graphics,
Vol. 17, No. 12, December 2011.

. .

24 / 41



, ,

. n-

Shoaib Jameel, Wai Lam. An N-Gram Topic Model for Time-Stamped


Documents // 35th ECIR 2013, Moscow, March 2427. pp. 292304.

. .

25 / 41



, ,

. n-

Shoaib Jameel, Wai Lam. An N-Gram Topic Model for Time-Stamped


Documents // 35th ECIR 2013, Moscow, March 2427. pp. 292304.

. .

26 / 41



, ,

I. Vulic, W. De Smet, J. Tang, M.-F. Moens. Probabilistic topic modeling in


multilingual settings: a short overview of its methodology with applications //
NIPS, 78 December 2012. Pp. 111.
. .

27 / 41



, ,


(ARTM)
[., 2013]
BigARTM
ARTM ++ [. ., 2014]
,
[.,2012]

[., 2013]

n- [., 2013]
,
PLSA LDA [., 2013]

. .

28 / 41



, ,

,
,
0 [., 2013]:
PLSA

LDA

0.8

0.8

D
0.7

0.7

0.6

0.6

0.5
0.5
0.4
0.4
0.3
0.3

0.2

0.2
0.1
0

0.1

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

, = 0.1

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

, = 0.1

. .

29 / 41



, ,

wt td
95%,
[., 2013]:

1.0

4 200
4 000
3 800
3 600
3 400
3 200
3 000
2 800
2 600
2 400
2 200
2 000
1 800
1 600
1 400
1 200
1 000
800

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0

15:1:15%

10

15

1:2:10%

. .

20

25

30

35

40

15:2:15%, th:10%, ph:0.1%


30 / 41



, ,

(ARTM)
:
XX
X
ndw ln
wt td + R(, ) max,
dD w d

tT

: - EM-
wt nwt ,

td ndt ,






R
R
wt nwt + wt
,
td ndt + td
.
wt +
td +
R

. .

31 / 41



, ,

( )
?
?
?
(, ,
, )?

. .

32 / 41


, ,

..:

Rn+1 Rn , Tn+1 Tn n+1 n .
. .

33 / 41


, ,

..
1

,
600

599-
6-

216


. .

34 / 41


, ,

. .

35 / 41


, ,

(2- )

10
20 ( + )
50
20
. .

36 / 41


, ,

-
: .
: .

1-4

1.0

1.0

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

: .
. .

37 / 41


, ,

.
(x 1 , . . . x n ) R n x;
:
X

n
j
a(x, ) = sign
j x = signh, xi.
j=1

X

Q() =
L h, xi iyi + kk2 min .

2
i =1

log(1 + e ) ;
L (M) = (1 M)+
;

M
e
AdaBoost;
. .

38 / 41


, ,

5
,
( 12 ).

.

http://www.MachineLearning.ru/wiki/index.php?title=User:Vokov


http://www.MachineLearning.ru/wiki/images/e/e3/Voron-2014-task-ekg.pdf
http://www.MachineLearning.ru/wiki/images/3/37/Voron-2014-task-ekg-data.rar

. .

39 / 41

?
:
Python, scikit-learn scikit-learn.org
RapidMiner rapidminer.com
WEKA www.cs.waikato.ac.nz/ml/weka
:
kaggle.com
UCI:
archive.ics.uci.edu/ml
:
Poligon.MachineLearning.ru
- :
www.MachineLearning.ru
: :Vokov

:

voron@forecsys.ru
www.MachineLearning.ru/wiki, :Vokov

strijov@forecsys.ru
http://www.strijov.com
www.MachineLearning.ru/wiki, :Strijov

You might also like