Machine Learning and Finance II

Machine learning and portfolio selections. II.
László (Laci) Györfi1
1 Departmentof Computer Science and Information Theory

Budapest University of Technology and Economics
Budapest, Hungary
August 8, 2007
e-mail: gyorfi@szit.bme.hu
www.szit.bme.hu/˜gyorfi
www.szit.bme.hu/˜oti/portfolio
Györfi Machine learning and portfolio selections. II.

Dynamic portfolio selection: general case
(1) (d)
xi = (xi , . . . xi ) the return vector on day i
b = b1 is the portfolio vector for the first day
initial capital S0
S1 = S0 · hb1 , x1 i

(1) (d)
initial capital S0
S1 = S0 · hb1 , x1 i
for the second day, S1 new initial capital, the portfolio vector
b2 = b(x1 )
S2 = S0 · hb1 , x1 i · hb(x1 ) , x2 i .

(1) (d)
initial capital S0
S1 = S0 · hb1 , x1 i
for the second day, S1 new initial capital, the portfolio vector
b2 = b(x1 )
S2 = S0 · hb1 , x1 i · hb(x1 ) , x2 i .
nth day a portfolio strategy bn = b(x1 , . . . , xn−1 ) = b(xn−1
1 )
Yn D E
Sn = S0 b(xi−1
1 ) , xi = S0 e nWn (B)
i=1
with the average growth rate
n
1 X D i−1 E
Wn (B) = ln b(x1 ) , xi .
n
i=1

log-optimum portfolio
X1 , X2 , . . . drawn from the vector valued stationary and ergodic

process
log-optimum portfolio B∗ = {b∗ (·)}
E{ln b∗ (Xn−1 n−1 n−1 n−1

1 ) , Xn | X1 } = max E{ln b(X1 ) , Xn | X1 }
b(·)
Xn−1
1 = X1 , . . . , Xn−1

Optimality
Algoet and Cover (1988): If Sn∗ = Sn (B∗ ) denotes the capital after
day n achieved by a log-optimum portfolio strategy B∗ , then for
any portfolio strategy B with capital Sn = Sn (B) and for any
process {Xn }∞−∞ ,

1 1 ∗
lim sup ln Sn − ln Sn ≤ 0 almost surely
n→∞ n n

Optimality
Algoet and Cover (1988): If Sn∗ = Sn (B∗ ) denotes the capital after
day n achieved by a log-optimum portfolio strategy B∗ , then for
any portfolio strategy B with capital Sn = Sn (B) and for any
process {Xn }∞−∞ ,

1 1 ∗
n→∞ n n
for stationary ergodic process {Xn }∞

−∞ ,
1
lim ln Sn∗ = W ∗ almost surely,
n→∞ n
where
∗ −1 −1

W = E max E{ln b(X−∞ ) , X0 | X−∞ }
b(·)
is the maximal growth rate of any portfolio.

Martingale difference sequences
for the proof of optimality we use the concept of martingale

differences:
Definition
there are two sequences of random variables:
{Zn } {Xn }
Zn is a function of X1 , . . . , Xn ,
E{Zn | X1 , . . . , Xn−1 } = 0 almost surely.
Then {Zn } is called martingale difference sequence with respect to
{Xn }.

A strong law of large numbers
Chow Theorem: If {Zn } is a martingale difference sequence with

respect to {Xn } and
∞
X E{Zn2 }
<∞
n2
n=1
then
n
1X
lim Zi = 0 a.s.
n→∞ n
i=1

A weak law of large numbers
Lemma: If {Zn } is a martingale difference sequence with respect
to {Xn } then {Zn } are uncorrelated.

Proof. Put i < j.
E{Zi Zj }

Proof. Put i < j.
E{Zi Zj } = E{E{Zi Zj | X1 , . . . , Xj−1 }}

Proof. Put i < j.
E{Zi Zj } = E{E{Zi Zj | X1 , . . . , Xj−1 }}
= E{Zi E{Zj | X1 , . . . , Xj−1 }}

Proof. Put i < j.
E{Zi Zj } = E{E{Zi Zj | X1 , . . . , Xj−1 }}
= E{Zi E{Zj | X1 , . . . , Xj−1 }}
= E{Zi · 0}

Proof. Put i < j.
E{Zi Zj } = E{E{Zi Zj | X1 , . . . , Xj−1 }}
= E{Zi E{Zj | X1 , . . . , Xj−1 }}
= E{Zi · 0} = 0

Proof. Put i < j.
E{Zi Zj } = E{E{Zi Zj | X1 , . . . , Xj−1 }}
= E{Zi E{Zj | X1 , . . . , Xj−1 }}
= E{Zi · 0} = 0
Corollary
 !2 
 1Xn 
E Zi
 n 
i=1

Proof. Put i < j.
E{Zi Zj } = E{E{Zi Zj | X1 , . . . , Xj−1 }}
= E{Zi E{Zj | X1 , . . . , Xj−1 }}
= E{Zi · 0} = 0
Corollary
 !2 
 1Xn n n
 1 XX
E Zi = E{Zi Zj }
 n  n2
i=1 i=1 j=1

Proof. Put i < j.
E{Zi Zj } = E{E{Zi Zj | X1 , . . . , Xj−1 }}
= E{Zi E{Zj | X1 , . . . , Xj−1 }}
= E{Zi · 0} = 0
Corollary
 !2 
 1Xn n n
 1 XX
E Zi = E{Zi Zj }
 n  n2
i=1 i=1 j=1
n
1 X
= E{Zi2 }
n2
i=1

Proof. Put i < j.
E{Zi Zj } = E{E{Zi Zj | X1 , . . . , Xj−1 }}
= E{Zi E{Zj | X1 , . . . , Xj−1 }}
= E{Zi · 0} = 0
Corollary
 !2 
 1Xn n n
 1 XX
E Zi = E{Zi Zj }
 n  n2
i=1 i=1 j=1
n
1 X
= E{Zi2 }
n2
i=1
→ 0
if, for example, E{Zi2 } is a bounded sequence.
Constructing martingale difference sequence
{Yn } is an arbitrary sequence such that Yn is a function of

X1 , . . . , Xn


X1 , . . . , Xn
Put
Zn = Yn − E{Yn | X1 , . . . , Xn−1 }
Then {Zn } is a martingale difference sequence:


X1 , . . . , Xn
Put
Zn = Yn − E{Yn | X1 , . . . , Xn−1 }


X1 , . . . , Xn
Put
Zn = Yn − E{Yn | X1 , . . . , Xn−1 }
E{Zn | X1 , . . . , Xn−1 }


X1 , . . . , Xn
Put
Zn = Yn − E{Yn | X1 , . . . , Xn−1 }
E{Zn | X1 , . . . , Xn−1 }
= E{Yn − E{Yn | X1 , . . . , Xn−1 } | X1 , . . . , Xn−1 }
= 0
almost surely.

Optimality
E{ln b∗ (Xn−1 n−1 n−1 n−1

1 ) , Xn | X1 } = max E{ln b(X1 ) , Xn | X1 }
b(·)

Optimality
E{ln b∗ (Xn−1 n−1 n−1 n−1

1 ) , Xn | X1 } = max E{ln b(X1 ) , Xn | X1 }
b(·)
If Sn∗ = Sn (B∗ ) denotes the capital after day n achieved by a

log-optimum portfolio strategy B∗ , then for any portfolio strategy
B with capital Sn = Sn (B) and for any process {Xn }∞ −∞ ,

1 1 ∗
n→∞ n n

Proof of optimality
n
1 1X D E
ln Sn = ln b(X1i−1 ) , Xi
n n
i=1

Proof of optimality
n
1 1X D E
n n
i=1
n
1 X D E
= E{ln b(X1i−1 ) , Xi | Xi−1
1 }
n
i=1
n
1 X D E D E
+ ln b(Xi−1
1 ) , X i − E{ln b(X i−1
1 ) , X i | X i−1
1 }
n
i=1

Proof of optimality
n
1 1X D E
n n
i=1
n
1 X D E
= E{ln b(X1i−1 ) , Xi | Xi−1
1 }
n
i=1
n
1 X D E D E
+ ln b(Xi−1
1 ) , X i − E{ln b(X i−1
1 ) , X i | X i−1
1 }
n
i=1
and
n
1 1X D E
ln Sn∗ = E{ln b∗ (X1i−1 ) , Xi | X1i−1 }
n n
i=1
n
1 X D ∗ i−1 E D E
+ ln b (X1 ) , Xi − E{ln b∗ (Xi−1
1 ) , X i | X i−1
1 }
n
i=1

Universally consistent portfolio
These limit relations give rise to the following definition:

Definition
An empirical (data driven) portfolio strategy B is called
universally consistent with respect to a class C of stationary
and ergodic processes {Xn }∞ −∞ , if for each process in the class,
1
lim ln Sn (B) = W ∗ almost surely.
n→∞ n

Empirical portfolio selection
E{ln b∗ (Xn−1 n−1 n−1 n−1

1 ) , Xn | X1 } = max E{ln b(X1 ) , Xn | X1 }
b(·)

E{ln b∗ (Xn−1 n−1 n−1 n−1

1 ) , Xn | X1 } = max E{ln b(X1 ) , Xn | X1 }
b(·)
fixed integer k > 0

E{ln b(Xn−1 n−1 n−1 n−1

1 ) , Xn | X1 } ≈ E{ln b(Xn−k ) , Xn | Xn−k }

E{ln b∗ (Xn−1 n−1 n−1 n−1

1 ) , Xn | X1 } = max E{ln b(X1 ) , Xn | X1 }
b(·)
fixed integer k > 0

E{ln b(Xn−1 n−1 n−1 n−1

and
b∗ (Xn−1 n−1 n−1 n−1

1 ) ≈ bk (Xn−k ) = arg max E{ln b(Xn−k ) , Xn | Xn−k }
b(·)

E{ln b∗ (Xn−1 n−1 n−1 n−1

1 ) , Xn | X1 } = max E{ln b(X1 ) , Xn | X1 }
b(·)
fixed integer k > 0

E{ln b(Xn−1 n−1 n−1 n−1

and
b∗ (Xn−1 n−1 n−1 n−1

b(·)
because of stationarity
D E
bk (xk1 ) = arg max E{ln b(xk1 ) , Xk+1 | Xk1 = xk1 }
b(·)
= arg max E{ln hb , Xk+1 i | Xk1 = xk1 },

b

E{ln b∗ (Xn−1 n−1 n−1 n−1

1 ) , Xn | X1 } = max E{ln b(X1 ) , Xn | X1 }
b(·)
fixed integer k > 0

E{ln b(Xn−1 n−1 n−1 n−1

and
b∗ (Xn−1 n−1 n−1 n−1

b(·)
because of stationarity
D E
bk (xk1 ) = arg max E{ln b(xk1 ) , Xk+1 | Xk1 = xk1 }
b(·)
= arg max E{ln hb , Xk+1 i | Xk1 = xk1 },

b
which is the maximization of the regression function
mb (xk1 ) = E{ln hb , Xk+1 i | Xk1 = xk1 }
Regression function
Y real valued
X observation vector

Regression function
Y real valued
Regression function
m(x) = E{Y | X = x}
i.i.d. data: Dn = {(X1 , Y1 ), . . . , (Xn , Yn )}

Regression function
Y real valued
Regression function
m(x) = E{Y | X = x}

Regression function estimate
mn (x) = mn (x, Dn )

Regression function
Y real valued
Regression function
m(x) = E{Y | X = x}

local averaging estimates

n
X
mn (x) = Wni (x; X1 , . . . , Xn )Yi
i=1

Regression function
Y real valued
Regression function
m(x) = E{Y | X = x}

local averaging estimates

n
X
mn (x) = Wni (x; X1 , . . . , Xn )Yi
i=1
L. Györfi, M. Kohler, A. Krzyzak, H. Walk (2002) A

Distribution-Free Theory of Nonparametric Regression,
Springer-Verlag, New York.
Correspondence
X Xk1

Correspondence
X Xk1
Y ln hb , Xk+1 i

Correspondence
X Xk1
Y ln hb , Xk+1 i
m(x) = E{Y | X = x} mb (xk1 ) = E{ln hb , Xk+1 i | Xk1 = xk1 }

Partitioning regression estimate
Partition Pn = {An,1 , An,2 . . . }


An (x) is the cell of the partition Pn into which x falls


Pn
Yi I[Xi ∈An (x)]
mn (x) = Pi=1 n
i=1 I[Xi ∈An (x)]


Pn
Yi I[Xi ∈An (x)]
mn (x) = Pi=1 n
i=1 I[Xi ∈An (x)]
Let Gn be the quantizer corresponding to the partition Pn :

Gn (x) = j if x ∈ An,j .


Pn
Yi I[Xi ∈An (x)]
mn (x) = Pi=1 n
i=1 I[Xi ∈An (x)]
Let Gn be the quantizer corresponding to the partition Pn :

Gn (x) = j if x ∈ An,j .
the set of matches
In = {i ≤ n : Gn (x) = Gn (Xi )}
Then P
i∈In Yi
mn (x) = .
|In |

Partitioning-based portfolio selection
fix k, ` = 1, 2, . . .
P` = {A`,j , j = 1, 2, . . . , m` } finite partitions of Rd ,

fix k, ` = 1, 2, . . .
G` be the corresponding quantizer: G` (x) = j, if x ∈ A`,j .

fix k, ` = 1, 2, . . .
G` (xn1 ) = G` (x1 ), . . . , G` (xn ),

fix k, ` = 1, 2, . . .
G` (xn1 ) = G` (x1 ), . . . , G` (xn ),
the set of matches:
i−1
Jn = {k < i < n : G` (xi−k ) = G` (xn−1
n−k )}

fix k, ` = 1, 2, . . .
G` (xn1 ) = G` (x1 ), . . . , G` (xn ),
the set of matches:
i−1
Jn = {k < i < n : G` (xi−k ) = G` (xn−1
n−k )}
X
b(k,`) (x1n−1 ) = arg max ln hb , xi i
b i∈Jn
if the set In is non-void, and b0 = (1/d, . . . , 1/d) otherwise.

Elementary portfolios
for fixed k, ` = 1, 2, . . .,
B(k,`) = {b(k,`) (·)}, are called elementary portfolios

Elementary portfolios
for fixed k, ` = 1, 2, . . .,
B(k,`) = {b(k,`) (·)}, are called elementary portfolios
(k,`)
That is, bn quantizes the sequence xn−1
1 according to the
partition P` , and browses through all past appearances of the last
seen quantized string G` (xn−1
n−k ) of length k.
Then it designs a fixed portfolio vector according to the returns on
the days following the occurence of the string.

Combining elementary portfolios
How to choose k, `
small k or small `: large bias
large k and large `: few matching, large variance

Combining elementary portfolios
How to choose k, `
small k or small `: large bias
large k and large `: few matching, large variance
Machine learning: combination of experts
N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games.

Cambridge University Press, 2006.

Exponential weighing
(k,`)
combine the elementary portfolio strategies B(k,`) = {bn }

(k,`)
let {qk,` } be a probability distribution on the set of all pairs (k, `)
such that for all k, `, qk,` > 0.

(k,`)
for η > 0 put
(k,`)
wn,k,` = qk,` e η ln Sn−1 (B )

(k,`)
for η > 0 put
(k,`)
for η = 1,
(k,`) )
wn,k,` = qk,` e ln Sn−1 (B = qk,` Sn−1 (B(k,`) )

(k,`)
for η > 0 put
(k,`)
for η = 1,
(k,`) )
and
wn,k,`
vn,k,` = P .
i,j wn,i,j

(k,`)
for η > 0 put
(k,`)
for η = 1,
(k,`) )
and
wn,k,`
vn,k,` = P .
i,j wn,i,j
the combined portfolio b:

(k,`)
X
bn (x1n−1 ) = vn,k,` bn (x1n−1 ).
k,`

n D
Y E
Sn (B) = bi (x1i−1 ) , xi
i=1

n D
Y E
Sn (B) = bi (x1i−1 ) , xi
i=1
D E
n
P (k,`) i−1
Y k,` w i,k,` b i (x 1 ) , x i
= P
i=1 k,` wi,k,`

n D
Y E
Sn (B) = bi (x1i−1 ) , xi
i=1
D E
n
P (k,`) i−1
Y k,` w i,k,` b i (x 1 ) , x i
= P
i=1 k,` wi,k,`
D E
(k,`) ) b(k,`) (xi−1 ) , x
P
n
Y k,` qk,` Si−1 (B i 1 i
= P (k,`) )
i=1 k,` qk,` Si−1 (B

n D
Y E
Sn (B) = bi (x1i−1 ) , xi
i=1
D E
n
P (k,`) i−1
Y k,` w i,k,` b i (x 1 ) , x i
= P
i=1 k,` wi,k,`
D E
(k,`) ) b(k,`) (xi−1 ) , x
P
n
Y k,` qk,` Si−1 (B i 1 i
= P (k,`) )
i=1 k,` qk,` Si−1 (B
n P (k,`) )
k,` qk,` Si (B
Y
= P (k,`) )
i=1 k,` qk,` Si−1 (B

n D
Y E
Sn (B) = bi (x1i−1 ) , xi
i=1
D E
n
P (k,`) i−1
Y k,` w i,k,` b i (x 1 ) , x i
= P
i=1 k,` wi,k,`
D E
(k,`) ) b(k,`) (xi−1 ) , x
P
n
Y k,` qk,` Si−1 (B i 1 i
= P (k,`) )
i=1 k,` qk,` Si−1 (B
n P (k,`) )
k,` qk,` Si (B
Y
= P (k,`) )
i=1 k,` qk,` Si−1 (B
X
= qk,` Sn (B(k,`) ),
k,`

The strategy B = BH then arises from weighing the elementary
(k,`)
portfolio strategies B(k,`) = {bn } such that the investor’s capital
becomes X
Sn (B) = qk,` Sn (B(k,`) ).
k,`

Theorem
Assume that
(a) the sequence of partitions is nested, that is, any cell of P`+1 is
a subset of a cell of P` , ` = 1, 2, . . .;
(b) if diam(A) = supx,y∈A kx − yk denotes the diameter of a set,
then for any sphere S centered at the origin
lim max diam(A`,j ) = 0 .

`→∞ j:A`,j ∩S6=∅
Then the portfolio scheme BH defined above is universally

consistent with respect to the class of all ergodic processes such
that E{| ln X (j) | < ∞, for j = 1, 2, . . . , d.

L. Györfi, D. Schäfer (2003) ”Nonparametric prediction”, in
Advances in Learning Theory: Methods, Models and Applications,
J. A. K. Suykens, G. Horváth, S. Basu, C. Micchelli, J. Vandevalle
(Eds.), IOS Press, NATO Science Series, pp. 341-356.
www.szit.bme.hu/˜gyorfi/histog.ps

Proof
We have to prove that
1
lim inf Wn (B) = lim inf ln Sn (B) ≥ W ∗ a.s.
n→∞ n→∞ n
W.l.o.g. we may assume S0 = 1, so that
1
Wn (B) = ln Sn (B)
n

Proof
1
n→∞ n→∞ n
1
Wn (B) = ln Sn (B)
n  
1 X
= ln qk,` Sn (B(k,`) )
n
k,`

Proof
1
n→∞ n→∞ n
1
Wn (B) = ln Sn (B)
n  
1 X
= ln qk,` Sn (B(k,`) )
n
k,`
!
1 (k,`)
≥ ln sup qk,` Sn (B )
n k,`

Proof
1
n→∞ n→∞ n
1
Wn (B) = ln Sn (B)
n  
1 X
= ln qk,` Sn (B(k,`) )
n
k,`
!
1 (k,`)
n k,`
1
= sup ln qk,` + ln Sn (B(k,`) )
n k,`

Proof
1
n→∞ n→∞ n
1
Wn (B) = ln Sn (B)
n  
1 X
= ln qk,` Sn (B(k,`) )
n
k,`
!
1 (k,`)
n k,`
1
= sup ln qk,` + ln Sn (B(k,`) )
n k,`

(k,`) ln qk,`
= sup Wn (B )+ .
k,` n
Thus

(k,`) ln qk,`
lim inf Wn (B) ≥ lim inf sup Wn (B )+
n→∞ n→∞ k,` n

Thus

(k,`) ln qk,`
n→∞ n→∞ k,` n

(k,`) ln qk,`
≥ sup lim inf Wn (B )+
k,` n→∞ n

Thus

(k,`) ln qk,`
n→∞ n→∞ k,` n

(k,`) ln qk,`
k,` n→∞ n
= sup lim inf Wn (B(k,`) )
k,` n→∞

Thus

(k,`) ln qk,`
n→∞ n→∞ k,` n

(k,`) ln qk,`
k,` n→∞ n
= sup lim inf Wn (B(k,`) )
k,` n→∞
= sup k,`
k,`
Since the partitions P` are nested, we have that
sup k,` = lim lim k,` = W ∗ .

k,` k→∞ l→∞

Kernel regression estimate
Kernel function K (x) ≥ 0

Bandwidth h > 0
Pn
x−Xi
i=1 Yi K h
mn (x) = P
n x−Xi
i=1 K h
Naive (window) kernel function K (x) = I{kxk≤1}

Pn
Yi I{kx−Xi k≤h}
mn (x) = Pi=1 n
i=1 I{kx−Xi k≤h}

Kernel-based portfolio selection
choose the radius rk,` > 0 such that for any fixed k,
lim rk,` = 0.
`→∞

Kernel-based portfolio selection
choose the radius rk,` > 0 such that for any fixed k,
lim rk,` = 0.
`→∞
for n > k + 1, define the expert b(k,`) by

X
b(k,`) (xn−1
1 ) = arg max ln hb , xi i ,
b i−1 n−1
{k<i<n:kxi−k −xn−k k≤rk,` }
if the sum is non-void, and b0 = (1/d, . . . , 1/d) otherwise.

Theorem
The kernel-based portfolio scheme is universally consistent with

respect to the class of all ergodic processes such that
E{| ln X (j) | < ∞, for j = 1, 2, . . . , d.
L. Györfi, G. Lugosi, F. Udina (2006) ”Nonparametric kernel-based

sequential investment strategies”, Mathematical Finance, 16, pp.
337-357
www.szit.bme.hu/˜gyorfi/kernel.pdf

k-nearest neighbor (NN) regression estimate
n
X
mn (x) = Wni (x; X1 , . . . , Xn )Yi .
i=1
Wni is 1/k if Xi is one of the k nearest neighbors of x among

X1 , . . . , Xn , and Wni is 0 otherwise.

Nearest-neighbor-based portfolio selection
choose p` ∈ (0, 1) such that lim`→∞ p` = 0

for fixed positive integers k, ` (n > k + `ˆ + 1) introduce the set of
the `ˆ = bp` nc nearest neighbor matches:
(k,`)
Ĵn i−1
= {i; k + 1 ≤ i ≤ n such that xi−k is among the `ˆ NNs of xn−1
n−k
in xk1 , . . . , xn−1
n−k }.

Nearest-neighbor-based portfolio selection
choose p` ∈ (0, 1) such that lim`→∞ p` = 0

for fixed positive integers k, ` (n > k + `ˆ + 1) introduce the set of
the `ˆ = bp` nc nearest neighbor matches:
(k,`)
Ĵn i−1
= {i; k + 1 ≤ i ≤ n such that xi−k is among the `ˆ NNs of xn−1
n−k
in xk1 , . . . , xn−1
n−k }.
Define the portfolio vector by

X
b(k,`) (xn−1
1 ) = arg max ln hb , xi i
b n
(k,`)
o
i∈Ĵn
if the sum is non-void, and b0 = (1/d, . . . , 1/d) otherwise.

Theorem
If for any vector s = sk1 the random variable
kXk1 − sk
has continuous distribution function, then the nearest-neighbor

portfolio scheme is universally consistent with respect to the class
of all ergodic processes such that E{| ln X (j) |} < ∞, for
j = 1, 2, . . . d.
NN is robust, there is no scaling problem

Theorem
If for any vector s = sk1 the random variable
kXk1 − sk
has continuous distribution function, then the nearest-neighbor

portfolio scheme is universally consistent with respect to the class
of all ergodic processes such that E{| ln X (j) |} < ∞, for
j = 1, 2, . . . d.
NN is robust, there is no scaling problem
L. Györfi, F. Udina, H. Walk (2006) ”Nonparametric nearest

neighbor based empirical portfolio selection strategies”,
(submitted), www.szit.bme.hu/˜gyorfi/NN.pdf

Semi-log-optimal portfolio
empirical log-optimal:
X
h(k,`) (xn−1
b i∈Jn

X
h(k,`) (xn−1
b i∈Jn
Taylor expansion: ln z ≈ h(z) = z − 1 − 12 (z − 1)2

X
h(k,`) (xn−1
b i∈Jn
Taylor expansion: ln z ≈ h(z) = z − 1 − 12 (z − 1)2 empirical

semi-log-optimal:
X
h̃(k,`) (xn−1
1 ) = arg max h(hb , xi i) = arg max{hb , mi−hb , Cbi}
b i∈Jn b

X
h(k,`) (xn−1
b i∈Jn

semi-log-optimal:
X
h̃(k,`) (xn−1
b i∈Jn b
smaller computational complexity: quadratic programming

X
h(k,`) (xn−1
b i∈Jn

semi-log-optimal:
X
h̃(k,`) (xn−1
b i∈Jn b
smaller computational complexity: quadratic programming
L. Györfi, A. Urbán, I. Vajda (2007) ”Kernel-based

semi-log-optimal portfolio selection strategies”, International
Journal of Theoretical and Applied Finance, 10, pp. 505-516.
www.szit.bme.hu/∼gyorfi/semi.pdf

Conditions of the model:
Assume that
the assets are arbitrarily divisible,
the assets are available in unbounded quantities at the current
price at any given trading period,
there are no transaction costs,
the behavior of the market is not affected by the actions of
the investor using the strategy under investigation.

NYSE data sets
At www.szit.bme.hu/~oti/portfolio there are two benchmark

data set from NYSE:
The first data set consists of daily data of 36 stocks with
length 22 years.
The second data set contains 23 stocks and has length 44
years.

NYSE data sets
At www.szit.bme.hu/~oti/portfolio there are two benchmark

data set from NYSE:
The first data set consists of daily data of 36 stocks with
length 22 years.
The second data set contains 23 stocks and has length 44
years.
Our experiment is on the second data set.

Experiments on average annual yields (AAY)
Kernel based semi-log-optimal portfolio selection with

k = 1, . . . , 5 and l = 1, . . . , 10
2
rk,l = 0.0001 · d · k · `,


k = 1, . . . , 5 and l = 1, . . . , 10
2
rk,l = 0.0001 · d · k · `,
AAY of kernel based semi-log-optimal portfolio is 222.7%


k = 1, . . . , 5 and l = 1, . . . , 10
2
rk,l = 0.0001 · d · k · `,
double the capital


k = 1, . . . , 5 and l = 1, . . . , 10
2
rk,l = 0.0001 · d · k · `,
double the capital
MORRIS had the best AAY, 30.1%


k = 1, . . . , 5 and l = 1, . . . , 10
2
rk,l = 0.0001 · d · k · `,
double the capital
MORRIS had the best AAY, 30.1%
the BCRP had average AAY 35.2%

The average annual yields of the individual experts.
k 1 2 3 4 5
`
1 29.4% 27.8% 22.9% 23.9% 23.7%
2 201.0% 124.6% 98.3% 36.1% 90.8%
3 114.2% 62.9% 38.8% 91.4% 31.6%
4 172.5% 155.0% 100.6% 162.3% 50.8%
5 233.5% 170.2% 166.6% 171.1% 107.7%
6 245.4% 216.2% 176.9% 182.0% 143.0%
7 261.8% 211.0% 181.2% 165.6% 158.7%
8 229.4% 189.2% 171.0% 138.8% 131.3%
9 219.3% 172.8% 162.4% 118.7% 116.0%
10 210.6% 151.5% 131.8% 103.4% 110.9%

Machine Learning and Finance II

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning and Finance II

Uploaded by

Copyright:

Available Formats

Machine learning and portfolio selections. II.

László (Laci) Györfi1

1 Departmentof Computer Science and Information Theory

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

X1 , X2 , . . . drawn from the vector valued stationary and ergodic

E{ln b∗ (Xn−1 n−1 n−1 n−1

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

for stationary ergodic process {Xn }∞

is the maximal growth rate of any portfolio.

Györfi Machine learning and portfolio selections. II.

for the proof of optimality we use the concept of martingale

Györfi Machine learning and portfolio selections. II.

Chow Theorem: If {Zn } is a martingale difference sequence with

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

{Yn } is an arbitrary sequence such that Yn is a function of

Györfi Machine learning and portfolio selections. II.

{Yn } is an arbitrary sequence such that Yn is a function of

Györfi Machine learning and portfolio selections. II.

{Yn } is an arbitrary sequence such that Yn is a function of

Györfi Machine learning and portfolio selections. II.

{Yn } is an arbitrary sequence such that Yn is a function of

Györfi Machine learning and portfolio selections. II.

{Yn } is an arbitrary sequence such that Yn is a function of

Györfi Machine learning and portfolio selections. II.

log-optimum portfolio B∗ = {b∗ (·)}

E{ln b∗ (Xn−1 n−1 n−1 n−1

Györfi Machine learning and portfolio selections. II.

log-optimum portfolio B∗ = {b∗ (·)}

E{ln b∗ (Xn−1 n−1 n−1 n−1

If Sn∗ = Sn (B∗ ) denotes the capital after day n achieved by a

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

Györfi Machine learning and portfolio selections. II.

These limit relations give rise to the following definition:

Györfi Machine learning and portfolio selections. II.

E{ln b∗ (Xn−1 n−1 n−1 n−1

Györfi Machine learning and portfolio selections. II.

E{ln b∗ (Xn−1 n−1 n−1 n−1

fixed integer k > 0

Györfi Machine learning and portfolio selections. II.

E{ln b∗ (Xn−1 n−1 n−1 n−1

fixed integer k > 0

Györfi Machine learning and portfolio selections. II.

E{ln b∗ (Xn−1 n−1 n−1 n−1

fixed integer k > 0

= arg max E{ln hb , Xk+1 i | Xk1 = xk1 },

Györfi Machine learning and portfolio selections. II.

E{ln b∗ (Xn−1 n−1 n−1 n−1

fixed integer k > 0

sup k,` = lim lim k,` = W ∗ .