You are on page 1of 96

Python for Finance

Dr. Yves J. Hilpisch

06 July 2012

Training at EuroPython 2012 Conference

Finance, Derivatives Analytics & Python Programming

Visixion GmbH

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

1 / 96

1 2

Useful

Python

Libraries

Array operations with NumPy Basic Array Operations Random Numbers 2d Plotting Exercise: NumPy in Action Selected Financial Topics Approximation Optimization Numerical Integration Case Study: Numerical Option Pricing Binomial Option Pricing Model Python Implementations Monte Carlo Approach Time Series Analysis with pandas Series class DataFrame class Plotting with pandas Exercise: pandas in Action Case Study: Analyzing Stock Quotes with Fast Data Mining with PyTables Introductory PyTables Example Exercise: PyTables with pandas Case Study: Simulation Research Project The Financial Model Python and PyTables Implementation Speeding-up Code with Cython Fundamentals about Cython Example Code for Cython Use Conclusion
Y. Hilpisch (Visixion GmbH)

6 7 8 9 10

pandas

Python for Finance

EuroPython, Florence, 2012

2 / 96

Guidelines for the Training

1 2 3

The training addresses a number of important

However, it cannot be exhaustive with regard to We intend to provide an overview which allows

Python issues and libraries for Finance Python's capabilities in this eld

to get an impression of Python's advantages for Finance further self-study

4 5

The majority of the content is included in the slides However, we will strive to provide hands-on experience through interactive parts

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

3 / 96

Throughout the training: Results matter more than Style


Bruce LeeThe Tao of Jeet Kune Do: There is no mystery about my style. My movements are simple, direct and non-classical. The extraordinary part of it lies in its simplicity. Every movement in Jeet Kune Do is being so of itself. There is nothing articial about it. I always believe that the easy way is the right way.

The Tao of

My

Python:

There is no mystery about my style. My lines of code are simple, direct and non-classical. The extraordinary part of it lies in its simplicity. Every line of code in my Python is being so of itself. There is nothing articial about it. I always believe that the easy way is the right way.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

4 / 96

Some of the most important Python libraries for Finance projects


NumPy: array operations SciPy: scientic library matplotlib: 2d and 3d plotting pandas: time series and panel data PyTables: database optimized for fast I/O Cython: C extensions for Python IPython: popular interactive Python shell

Useful Python Libraries

operations

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

5 / 96

Some recommended Python resources


www.python.org: the Python home www.scipy.org: SciPy and NumPy home http://matplotlib.sourceforge.org: home of the matplotlib plotting http://pandas.paydata.org: home of pandas www.pytables.org: home of PyTables www.cython.org: home of Cython http://ipython.org/: home of the IPython interactive shell http://scipy-lectures.github.com: tutorial notes with scientic focus www.scipy.org/Topical_Software: Python software/libs by topic

Useful Python Libraries

library

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

6 / 96

Major benets of Python for Finance applications


multi-purpose:
in contrast to special purpose packages,

Useful Python Libraries

Python

is general or

multi-purposeyou can use it for everything (synergies)

high productivity:
domain of

Python;

prototyping and rapid application development are an original with

Python,

you can re-use rst code versions when migrating

to standard development cycle

easy-to-maintain:
understand

Python

due to its compactness and readability, team members can easily code from others

low cost:

open source, easy-to-learn, easy-to-code, easy-to-maintain, only a fraction

of the code of other languages needed, multi-platform

future-proof :

huge development eorts around the world, growing number of

successful scientic and corporate applications, growing number of

Python

experts

good performance:
compiled languages

using the right libraries ensures execution speed comparable to

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

7 / 96

Visixion's experience with Python


DEXISION:
delivered On Demand (since 2006,

Useful Python Libraries

full-edged Derivatives Analytics suite implemented in

www.dexision.com)

Python

and

research: Python

www.visixion.com) trainings: Python trainings


services industry

used to implement a number of numerical research projects (see

with focus on Finance for clients from the nancial

client projects: Python used to implement client specic nancial applications teaching: Python used to implement and illustrate nancial models in derivatives
course at Saarland University (see Course Web Site)

talks:

we have given a number of talks at Python conferences about the use of Python for Finance book: Python used to illustrate nancial models in our recent bookDerivatives Analytics with PythonMarket-Based Valuation of European and American Stock Index Options

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

8 / 96

Arrays (I)
NumPy
the convenience of

Array operations with NumPy

Basic Array Operations

is a powerful library that allows array manipulations (linear algebra) in a compact

form and at high speed. The speed comes from the implementation in C. So you have

Python

combined with the speed of C when doing array operations.

>>> from numpy import * >>> a=arange(0.0,20.0,1.0) >>> a array([ 0., 1., 2., 3., 4., 5., 6., 11., 12., 13., 14., 15., 16., 17., >>> a.resize(4,5) >>> a array([[ 0., 1., 2., 3., 4.], [ 5., 6., 7., 8., 9.], [ 10., 11., 12., 13., 14.], [ 15., 16., 17., 18., 19.]]) >>> a[0] array([ 0., 1., 2., 3., 4.]) >>> a[3] array([ 15., 16., 17., 18., 19.]) >>> a[1,4] 9.0 >>> a[1,2:4] array([ 7., 8.]) >>>

7., 8., 18., 19.])

9., 10.,

Care is to be taken with the conventions regarding array indices. The best way to learn these is to play with arrays.
Python for Finance

Y. Hilpisch (Visixion GmbH)

EuroPython, Florence, 2012

9 / 96

Arrays (II)
With

Array operations with NumPy

Basic Array Operations

NumPy,

array operations are as easy as operations on integers or oats.

>>> a*0.5 array([[ 0. , [ 2.5, [ 5. , [ 7.5, >>> a**2 array([[ 0., [ 25., [ 100., [ 225., >>> a+a array([[ 0., [ 10., [ 20., [ 30., >>>

0.5, 3. , 5.5, 8. ,

1. , 3.5, 6. , 8.5,

1.5, 4. , 6.5, 9. ,

2. ], 4.5], 7. ], 9.5]]) 16.], 81.], 196.], 361.]])

1., 4., 9., 36., 49., 64., 121., 144., 169., 256., 289., 324., 2., 4., 12., 14., 22., 24., 32., 34.,

6., 8.], 16., 18.], 26., 28.], 36., 38.]])

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

10 / 96

Looping over arrays


follows:

Array operations with NumPy

Basic Array Operations

Sometimes you need to loop over arrays to check something. Looping is easily done as

>>> b=arange(0.0,25.1,0.5) >>> b array([ 0. , 0.5, 1. , 4.5, 5. , 5.5, 9. , 9.5, 10. , 13.5, 14. , 14.5, 18. , 18.5, 19. , 22.5, 23. , 23.5, >>> for i in range(50): if b[i]==15.0: print "15.0 at

1.5, 6. , 10.5, 15. , 19.5, 24. ,

2. , 6.5, 11. , 15.5, 20. , 24.5,

2.5, 3. , 3.5, 4. , 7. , 7.5, 8. , 8.5, 11.5, 12. , 12.5, 13. , 16. , 16.5, 17. , 17.5, 20.5, 21. , 21.5, 22. , 25. ])

index no.", i

15.0 at index no. 30 >>> for i in enumerate(b[0:6]): print i, (0, 0.0) (1, 0.5) (2, 1.0) (3, 1.5) (4, 2.0) (5, 2.5) >>>
The use of

arange

and

range

should be obvious. The rst can produce arrays of oat

type while the latter can only generate integers; and indices of arrays are always integers that is why we loop over integers and not over oats or something else.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

11 / 96

Random Numbers

Array operations with NumPy

Random Numbers

Sciences and Finance cannot live without random numbers, be them either pseudo-random or quasi-random.

NumPy

has built in convenient functions for the

generation of pseudo-random numbers in the

NumPy

sub-module

random.

>>> from numpy.random import * >>> a=random(20) >>> a array([ 0.66064392, 0.4315458 , 0.70880114, 0.00276342, 0.83383503, 0.24952601, 0.04636591, 0.10729739, 0.19072693, 0.82089409, 0.29784537, 0.35496562, 0.546188 , 0.52711541, 0.07060185, 0.60602829, 0.91907393, 0.52241082, 0.07597062, 0.27253169]) >>> b=standard_normal((4,5)) >>> b array([[-0.59317286, 0.27533818, -0.46122351, -0.05138033, -1.8371135 ], [-1.15520074, 1.04980946, 0.31082909, 0.32662006, -0.36752163], [ 0.66452767, -0.88077193, 1.18253972, 0.16836824, -1.40541028], [ 0.01481426, -0.88137549, 0.74594197, -0.97360666, -0.77270426]]) >>> c=random((2,3,4)) >>> shape(c) (2, 3, 4) >>> c array([[[ 0.09864194, 0.76069475, 0.54398641, 0.73081207], [ 0.81036431, 0.24343805, 0.38178278, 0.9414989 ], [ 0.0533329 , 0.0346994 , 0.67048989, 0.99188034]], [[ 0.27786962, 0.87359556, 0.14993006, 0.20461863], [ 0.59543661, 0.24566182, 0.47176266, 0.3328179 ], [ 0.8340118 , 0.96561975, 0.17854239, 0.81699292]]])

>>>

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

12 / 96

2d plotting (I)
module

Array operations with NumPy

2d Plotting

More often than not, one wants to visualze results from calculations or simulations. The

matplotlib

is quite powerful when it comes to 2d visualizations of any kind.

The most important types of graphics in general are lines, dots and bars.

>>> b=standard_normal((4,5)) >>> b array([[-0.57180547, -1.32783183, -0.27474264, 0.6301795 , 0.71101905], [ 0.29724602, 0.289595 , 0.1056877 , 0.06424294, -0.35708164], [ 0.25890926, 0.79000265, -0.47457278, 0.11719325, 0.39121246], [-0.24544426, 1.59194504, -1.6703606 , -0.00169267, -0.63803156]]) >>> from matplotlib.pyplot import * >>> plot(b) [<matplotlib.lines.Line2D object at 0x2b9e790>] >>> grid(True) >>> axis('tight') (0.0, 50.0, 0.0, 25.0) >>> show()

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

13 / 96

2d plotting (II)

Array operations with NumPy

2d Plotting

The gure below shows the output. Notice that

dierent values eachwhich is due to the array size of

matplotlib produces 4 5.

ve lines with four

1.5 1.0 0.5 0.0 0.5 1.0 1.5 0.0

0.5

1.0

1.5

2.0

2.5

3.0

Figure: Example of gure with matplotlibhere: lines


Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

14 / 96

2d plotting (III)

Array operations with NumPy

2d Plotting

The next example combines a dot sub-plot with a bar sub-plot the result of which is shown in the next gure. Here, due to resizing of the array we have only a one-dimensional set of numbers.

>>> d=standard_normal((4,5)) >>> d=resize(d,20) >>> d array([ 0.12709036, -1.19800928, 0.22527268, 0.39149983, 0.19080228, 0.57113933, -1.07355946, 0.8428513 , -2.22197056, 1.58069866, 0.6992034 , -1.45520777, 0.42116251, -0.26856476, 1.09870092, 0.83489701, -2.34729449, -0.58642723, 0.34725616, -0.56177434]) >>> subplot(211) <matplotlib.axes.AxesSubplot object at 0x3585590> >>> plot(d,'ro') [<matplotlib.lines.Line2D object at 0x3565490>] >>> subplot(212) <matplotlib.axes.AxesSubplot object at 0x3585250> >>> bar(range(20),d) [<matplotlib.patches.Rectangle object at ...] >>> grid(True) >>> show()

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

15 / 96

2d plotting (IV)
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.50 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.50

Array operations with NumPy

2d Plotting

10

15

20

10

15

20

Figure: Example of gure with matplotlibhere: dots & bars


Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

16 / 96

Let's implement some array manipulations with NumPy


1

Array operations with NumPy

Exercise: NumPy in Action

We generate two series with 50 standard normal (pseudo-)random numbers (array of size

(50, 2),

call it

rn)

Calculate the sum of the two 50 number vectorsonce vector-wise and once using the

sum

functionand store it in another array; call it

Calculate the cumulative sum (index by index) of called

rn_cum;

once using the

cumsum

rn_sum rn_sum and store

it in an array

function, once by iterating over all items in

the vector
4 5

Plot the

rn_cum

vector

Plot a histogram for each of the two 50 number vectors of subplots)

rn

into a single gure (as

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

17 / 96

Regression (I)
conclusions.

Selected Financial Topics

Approximation

It is often the case in Finance that one has to approximate something to draw

Two important approximation techniques are regression and interpolation. The type of regression we consider rst is called ordinary least squares regression (OLS). In its most simple form, ordinary monomials desired function

x, x2 , x3 , ...

are used to approximate a

y = f (x) given (y1 , x1 ), (y2 , x2 ), ..., (yN , xN ).

a number

of obervations

Say we want to approximate

f (x)

with a polynomial of order 2

g (x) = a1 + a2 x + a3 x2
where the

ai

are regression parameters.

The task is then to


N a1 ,a2 ,a3

min

(yn g (xn ; a1 , a2 , a3 ))2


n

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

18 / 96

Regression (II)

Selected Financial Topics

Approximation

As an example, we want to approximate the cosine function over the interval given 20 observations.

[0, /2]

The code (see next slide) is straightforward since NumPy has built-in functions polyfit and polyval. From polyfit you get the minimizing regression parameters back, while you can them with polyval to generate values based on these parameters. The result is shown in the next gure for three dierent regression functions.

use

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

19 / 96

Regression (III)

Selected Financial Topics

Approximation

# # Ordinary Least Squares Regression # a_REG . py # from numpy import * from matplotlib . pyplot import * # Regression x = linspace (0.0 , pi /2 , 20 ) y = cos (x ) g1 = polyfit (x ,y ,0 ) g2 = polyfit (x ,y ,1 ) g3 = polyfit (x ,y ,2 ) g1y = polyval ( g1 , x) g2y = polyval ( g2 , x) g3y = polyval ( g3 , x) # Graphical Analysis plot (x ,y , ' y ' ) plot (x , g1y , ' rx ' ) plot (x , g2y , ' bo ' ) plot (x , g3y , ' g > ' )

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

20 / 96

Regression (IV)
1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.0

Selected Financial Topics

Approximation

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Figure: Approximation of cosine function (line) with constant regression (red crosses), linear regression (blue dots) and quadratic regression (green triangles)
Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

21 / 96

Splines (I)

Selected Financial Topics

Approximation

The concept of interpolation is much more involved but nevertheless almost as straightforward in applications. The most common type of interpolation is with cubic splines for which you nd functions in the sub-module

scipy.interpolate.

The example remains the same and the code (see next slide) is as compact as before while the resultsee the respective gureseems almost perfect. Roughly speaking, cubic splines interpolation is (intelligent) regression between every two observation points with a polynomial of order 3. This is of course much more exible than a single regression with a polynomial of order 2. Two drawbacks in algorithmic terms are, however, that the observations have been ordered in the x-dimension. Furthermore, cubic splines are of limited or no use for higher dimensional problems where OLS regression is applicable as easy as in the two-dimensional world.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

22 / 96

Splines (II)

Selected Financial Topics

Approximation

# # Cubic Spline Interpolation # b_SPLINE . py # from numpy import * from scipy . interpolate import * from matplotlib . pyplot import * # Interpolation x = linspace (0.0 , pi /2 , 20 ) y = cos (x ) gp = splrep (x ,y ,k= 3) gy = splev (x ,gp , der =0 ) # Graphical Analysis plot (x ,y , ' b ' ) plot (x ,gy , ' ro ' )

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

23 / 96

Splines (III)
1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.0

Selected Financial Topics

Approximation

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Figure: Approximation of cosine function (line) with cubic splines interpolation (red dots)
Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

24 / 96

Optimization Methods (I)

Selected Financial Topics

Optimization

Strictly speaking, regression and interpolation are two special forms of optimization (some kind of minimization). However, optimization techniques are needed much more often in science and nance. An important area is, for example, the calibration of derivatives model parameters to a given set of market-observed option prices or implied volatilities. The two major approaches are global and local optimization. While the rst looks for a global minimum or maximum of a function (which does not have to exist at all), the second looks for a local minimum or maximum.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

25 / 96

Optimization Methods (II)


function value of

Selected Financial Topics

Optimization

As an example, we take the sine function over the intervall

[, 0]

with a minimum

at

/2.
delivers respective functions via the sub-module

Again, the module

scipy

optimize.

# # Finding a Minimum # c_OPT . py # from numpy import * from scipy . optimize import * # Finding a Minimum def y(x ): if x <- pi or x > 0: return 0 .0 return sin (x ) gmin = brute (y ,(( - pi ,0 ,0. 01 ) ,) , finish = None ) lmin = fmin (y ,- 0.5 ) # Output print " Global Minimum is " , gmin print " Local Minimum is " , lmin

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

26 / 96

Optimization Methods (III)


Both functions

Selected Financial Topics

Optimization

brute

(global brute force algorithm) and

fmin

(local convex

optimization algorithm) also work in multi-dimensional settings. In general, the solution of the local optimization is strongly dependent on the initialization. Here the

0.5

did quite well in reaching

/2

as the solution.

>>> ================================ RESTART ================================ >>> Optimization terminated successfully. Current function value: -1.000000 Iterations: 18 Function evaluations: 36 Global Minimum is -1.57159265359 Local Minimum is [-1.57080078] >>>

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

27 / 96

Numerical Integration

Selected Financial Topics

Numerical Integration

It is not always possible to analytically integrate a given function. Then numerical integration often comes into play. We want to check numerical integration where we can do it analytically as well: 1 x e dx 0 1 0 The value of the integral is e e 1.7182818284590451. For numerical integration, again the function

scipy helps out with the sub-module integrate quad, implementing a numerical quadrature scheme:

which contains

# # Numerically Integrate a Function # d_INT . py # from numpy import * from scipy . integrate import * # Numerical Integration def f(x ): return exp (x ) Int = quad ( lambda u: f(u ),0 ,1 )[ 0] # Output print " Value of the Integral is " , Int

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

28 / 96

The CRR model (I)

Case Study: Numerical Option Pricing

Binomial Option Pricing Model

To better understand how to implement the binomial option pricing model of Cox, Ross and Rubinstein (1979, henceforth: CRR), a little more background seems helpful. There are two securities traded in the model: a risky stock index and a risk-less zero-coupon bond. The time horizon

[0, T ]

is divided in equidistant time intervals

so that one gets

T /t + 1

points in time

t {0, t, 2 t, ..., T }. r,
the stock

The zero-coupon bond grows p.a. in value with the risk-less short rate

Bt = B0 ert

where

B0 > 0. S0
at

Starting from a strictly positive, xed stock index level of index evolves according to the law

t = 0,

St+t St m
where Here,

m 0

is selected randomly from {u, d}. < d < ert < u e t as well as

1 as a simplication which leads to d

a recombining tree.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

29 / 96

The CRR model (II)

Case Study: Numerical Option Pricing

Binomial Option Pricing Model

Assuming risk-neutral valuation holds, the following relationship can be derived

St

= =

ert EQ t [St+t ] ert (quSt + (1 q )dSt )

Against this background, the risk-neutral (or martingale) probability is

q=
The value of a European call option payos

ert d ud
discounting the nal

C0 is then obtained by CT (ST , K ) max[ST K, 0] at t = T to t = 0: C0 = erT EQ 0 [C T ]

The discounting can be done step-by-step and node-by-node backwards starting at

t = T t .

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

30 / 96

The CRR model (III)

Case Study: Numerical Option Pricing

Binomial Option Pricing Model

From an algorithmical point of view, one has to rst generate the index level values, determines then the nal payos of the call option and nally discounts them back. This is what we now will do, starting with a somewhat `naive' implementation. But before we do it, we generate a

Python

module which contains all parameters

that we will need for dierent implementations afterwards. All parameters can be imported by using the module name is

import

lename without the sux `.py' (i.e. the lename is

a_Parameters).

a_Parameters.py

command and the respective and the

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

31 / 96

The CRR model (IV)

Case Study: Numerical Option Pricing

Binomial Option Pricing Model

# # Model Parameters for European Call Option and Binomial Models # a_Parameters . py # from math import exp , sqrt # Option Parameters s0 = 105 . 00 # Initial Index Level K = 100 . 00 # Strike Level T = 1. # Call Option Maturity r = 0 . 05 # Constant Short Rate vola = 0 . 25 # Constant Volatility of Diffusion # Time Parameters t = 3 # Time Intervals delta = T/ t # Length of Time Interval df = exp (-r * delta ) # Discount Factor # u d q Binomial Parameters = exp ( vola * sqrt ( delta )) = 1 /u = ( exp ( r* delta )-d )/( u -d ) # Up - Movement # Down - Movement # Martingale Probability

The next slide presents the rst version of the binomial model which uses Excel-like cell iterations extensively. We will see that there are ways to a more compact and faster implementation.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

32 / 96

Case Study: Numerical Option Pricing

Python Implementations

# # Valuation of European Call Option in CRR1979 Model # Naive Version (= Excel - like Iterations ) # b_CRR1979_Naive . py # from numpy import * from a_Parameters import * # Array Initialization for Index Levels s = zeros (( t+1 ,t+1) , ' float ' ) s[0 ,0] = s0 z = 0 for j in range (1 ,t+1 , 1 ): z = z+1 for i in range (z +1 ): s[i , j] = s[0 , 0 ]*( u ** j )*( d **( i*2 )) # Array Initialization for Inner Values iv = zeros (( t+1 ,t +1), ' float ' ) z = 0 for j in range (0 ,t+1 , 1 ): for i in range (z +1 ): iv [i ,j] = round ( max (s[i ,j]-K , 0),8) z = z+1 # Valuation pv = zeros (( t+1 ,t +1), ' float ' ) # Present Value Array pv [: , t] = iv [: , t] # Last Time Step Initial Values z = t+1 for j in range (t -1 ,-1 ,-1 ): z = z -1 for i in range (z ): pv [i ,j] = ( q* pv [i ,j+1 ]+( 1 -q )* pv [i+1 ,j+1 ])* df # Output print " Value of European call option is " , pv [0 ,0]

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

33 / 96

Ouput of rst implementation


>>> Value of >>> s array([[ [ [ [ >>> iv array([[ [ [ [ >>> pv array([[ [ [ [ >>>

Case Study: Numerical Option Pricing

Python Implementations

A run of the module gives the following output and arrays where one can follow the three steps easily (index levels, inner values, discounting):

European call option is 16.2929324488 105. 0. 0. 0. 5. 0. 0. 0. , 121.30377267, 140.13909775, , 90.88752771, 105. , , 0. , 78.67183517, , 0. , 0. , 161.89905958], 121.30377267], 90.88752771], 68.09798666]])

, 21.30377267, 40.13909775, 61.89905958], , 0. , 5. , 21.30377267], , 0. , 0. , 0. ], , 0. , 0. , 0. ]])

16.29293245, 26.59599847, 41.79195237, 61.89905958], 0. , 5.61452766, 10.93666406, 21.30377267], 0. , 0. , 0. , 0. ], 0. , 0. , 0. , 0. ]])

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

34 / 96

Somewhat better implementation (I)


# # Valuation of European Call Option in CRR1979 Model # Advanced Version (= NumPy Iterations ) # c_CRR1979_Advanced . py # from numpy import * from a_Parameters import * # Array Initialization for Index Levels mu = arange (t+1) mu = resize (mu ,( t +1 ,t+1 )) md = transpose ( mu ) mu = u **( mu - md ) md = d ** md s = s0 * mu * md

Case Study: Numerical Option Pricing

Python Implementations

Our alternative version makes more use of the capabilities of

NumPy

the consequence is

more compact code even if it is not so easy to read in a rst instance.

# Valuation pv = maximum (s -K , 0) Qu = zeros (( t+1 ,t +1), ' float ' ) Qu [: ,:] = q Qd = 1 - Qu z = 0 for i in range (t -1 ,-1 ,-1 ): pv [0:t -z ,i] = ( Qu [ 0:t -z , i ]* pv [ 0:t -z , i+1 ]+ Qd [0:t -z ,i ]* pv [1:t -z+1 ,i +1 ])* df z = z+1 # Output print " Value of European call option is " , pv [0 ,0]

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

35 / 96

Somewhat better implementation (I)

Case Study: Numerical Option Pricing

Python Implementations

The valuation result is, as expected, the same for the parameter denitions from before. However, three time intervals are of course not enough to come close to the Black-Scholes benchmark of 15.6547197268. With 1,000 time intervals, however, the algorithms come quite close to it:

>>> ================================ RESTART ================================ >>> Value of European call option is 15.6537846075 >>>

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

36 / 96

Comparison

Case Study: Numerical Option Pricing

Python Implementations

The major dierence between the two algorithms is execution time. The second implementation which avoids about 30 times faster than the rst one. You should make this a principle for your own coding eorts: whenever possible avoid necessary iterations in

Python

iterations as much as possible is

Python

and delegate them to

NumPy.

Apart from time savings, you generally also get more compact and readable code. A direct comparison illustrates this point:

# Naive Version --- Iterations in Python # # Array Initialization for Inner Values iv = zeros((t+1,t+1),'float') z = 0 for j in range(0,t+1,1): for i in range(z+1): iv[i,j] = max(s[i,j]-K,0) z = z+1 # Advanced Version --- Iterations with NumPy/C # pv = maximum(s-K,0)

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

37 / 96

FFT version of the CRR model (I)


the binomial model.

Case Study: Numerical Option Pricing

Python Implementations

To conclude this section, we apply the Fast Fourier Transform (FFT) algorithm to

Nowadays this numerical routine plays a central role in Derivatives Analytics and other areas of science. It is used regularly for plain vanilla option pricing in productive environments in investment banks or hedge funds. In general, however, it is not applied to a binomial model but the application in this case is straightforward and therefore a quick win for us. In this module (see next slide),

Python

iterations are all avoidedthis is possible

since for European options only the nal payos are relevant. The speed advantage of this algorithm is again considerable: it is 30 times faster than our advanced algorithm from before and 900 times faster than the naive version.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

38 / 96

FFT version of the CRR model (II)

Case Study: Numerical Option Pricing

Python Implementations

# # Valuation of European Call Option in CRR1979 Model # FFT Version # d_CRR1979_FFT . py # from numpy import * from numpy . fft import fft , ifft from a_Parameters import * # Array Generation for Index Levels md = arange (t+ 1) mu = resize ( md [-1] ,t+ 1) mu = u **( mu - md ) md = d ** md s = s0 * mu * md # Valuation by FFT C_T = maximum (s -K ,0) Q = zeros (t+1 , ' float ' ) Q[ 0] = q Q[ 1] = 1 - q l = sqrt ( t+1 ) v1 = ifft ( C_T )* l v2 = ( sqrt ( t+1 )* fft (Q )/( l *( 1 +r* delta )))** t C_0 = fft ( v1 * v2 )/ l # Output print " Value of European call option is " , real ( C_0 [0 ])

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

39 / 96

Monte Carlo Simulation (I)


option.

Case Study: Numerical Option Pricing

Monte Carlo Approach

Finally, we apply Monte Carlo simulation (MCS) to value the same European call

Here it is where pseudo-random numbers come into play. Similarly to the FFT algorithm we only care about the nal index level at simulate it with pseudo-random numbers. We get the following simple simulation algorithm:

and

consider the date of maturity T and write start iterating i = 1, 2, ..., I


T

ST = S0 e(r 2

)T +

T wT

(1)
T T

iterate until i = I sum up all inner values at T , take the average and discount back to t = 0:
T

draw a standard normally distributed pseudo-random number w (i) determine at T the index level S (i) by applying the number w (i) to equation (1) determine the inner value of the call at T as max[S (i) K, 0]
C0 (K, T ) erT 1 I max[ST (i) K, 0]
I

this is the MCS estimator for the European call option value
Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

40 / 96

Monte Carlo Simulation (II)


The

Case Study: Numerical Option Pricing

Monte Carlo Approach

Although the word `iterating' sounds like looping over arrays we can again avoid array loops completely on the

Python/NumPy

Python

level.

implementation is really compactonly 5 lines of code for the

core algorithm With another 5 lines we can produce a histogram of the index levels at displayed in the respective gure.

as

# # Valuation of European Call Option via Monte Carlo Simulation # g_MCS . py # from numpy . random import * from matplotlib . pyplot import * from a_Parameters import * from numpy import * # Valuation via MCS paths = 500000 rand = standard_normal ( paths ) sT = s0 * exp (( r -0.5 * vola ** 2 )* T+ sqrt ( T )* vola * rand ) pv = sum ( maximum (sT -K ,0 )* exp (-r *T ))/ paths print " Value of European call option is " , pv # Graphical Analysis figure () hist ( sT , 100 ) xlabel ( ' index level at T ' ) ylabel ( ' frequency ' ) show ()
Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

41 / 96

Monte Carlo Simulation (III)


20000

Case Study: Numerical Option Pricing

Monte Carlo Approach

15000

frequency

10000

5000

0 50

50

100 150 index level at T

200

250

Figure: Histogram of simulated stock index levels at T


Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

42 / 96

Monte Carlo Simulation (IV)

Case Study: Numerical Option Pricing

Monte Carlo Approach

The algorithm produces quite an accurate estimate for the European call option value although the implementation is rather simplistic (i.e. there are, for example, no variance reduction techniques involved):

>>> ================================ RESTART ================================ >>> Value of European call option is 15.6306695905 >>>

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

43 / 96

A fundamental class in pandas is the Series class (I)


The If A

Time Series Analysis with pandas

Series class

Series class is explicitly designed to handle indexed (time) series1 s is a Series object, s.index gives its index simple example is s=Series([1,2,3,4,5],index=['a','b','c','d','e'])

In [16]: s=Series([1,2,3,4,5],index=['a','b','c','d','e']) In [17]: s Out[17]: a 1 b 2 c 3 d 4 e 5 In [18]: s.index Out[18]: Index([a, b, c, d, e], dtype=object) In [19]: s.mean() Out[19]: 3.0 In [20]:
There are lots of useful methods in the

Series

class ...

The major pandas source is http://pandas.pydata.org


Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

44 / 96

A fundamental class in pandas is the Series class (II)


A major strength of dates and times An simple example using the management

Time Series Analysis with pandas

Series class

pandas

is the handling of time series data, i.e. data indexed by

DateRange

function shall illustrate the time series

In [3]: x=standard_normal(250) In [4]: index=DateRange('01/01/2012',periods=len(x)) In [5]: s=Series(x,index=index) In [6]: s Out[6]: 2012-01-02 2012-01-03 2012-01-04 2012-01-05 ...

1.06959238875 0.794515407245 -1.01590534404 -0.751618588824

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

45 / 96

Another fundamental class in pandas is the DataFrame class


This class's intellectual father is the language/package The

Time Series Analysis with pandas

DataFrame class

data.frame

class from the statistical

indexed

DataFrame

class is explicitly designed to handle

multiple,

maybe

hierarchically
class,

(time) series

The following example illustrates some convenient features of the i.e. data alignment and handling of missing data

DataFrame

In [35]: s=Series(standard_normal(4),index=['1','2','3','5']) In [36]: t=Series(standard_normal(4),index=['1','2','3','4']) In [37]: df=DataFrame({'s':s,'t':t}) In [38]: df['SUM']=df['s']+df['t'] In [39]: print df.to_string() s t SUM 1 -0.125697 0.016357 -0.109340 2 0.135457 -0.907421 -0.771964 3 1.549149 -0.599659 0.949491 4 NaN 0.734753 NaN 5 -1.236310 NaN NaN In [40]: df['SUM'].mean() Out[40]: 0.022728863312009556
Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

46 / 96

The two main pandas classes have methods for easy plotting
The

Time Series Analysis with pandas

Plotting with pandas

Series

and

DataFrame

classes have methods to easily generate plots

The two major methods are

plot

and

hist

Again, an example shall illustrate the usage of the methods

In [54]: index=DateRange(start='1/1/2013',periods=250) In [55]: x=standard_normal(250) In [56]: y=standard_normal(250) In [57]: df=DataFrame({'x':x,'y':y},index=index) In [58]: df.cumsum().plot() Out[58]: <matplotlib.axes.AxesSubplot at 0x3082c10> In [59]: df['x'].hist() Out[59]: <matplotlib.axes.AxesSubplot at 0x3468190> In [60]:

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

47 / 96

The results of which can then be saved for further use

Time Series Analysis with pandas

Plotting with pandas

Figure: Example plots with pandas

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

48 / 96

To begin with, we want to use pandas with dummy data


1

Time Series Analysis with pandas

Exercise: pandas in Action

We generate three series with 100 standard normal (pseudo-)random numbers (array of size

(100, 3)).

We use the array to ll a

DataFrame

object, generate an index starting on 01 Jan

2013 with 30-day steps and give the three series the names
3

['A','B','C']

We then generate a 4-th series with name sum of the three other series.

CUMSUM

which contains the cumulative

4 5 6

We plot the 4th series with the built-in method

We also generate a histogram for the 3rd series with We save the histogram as PDF le.

plot. matplotlib

and set

bins=20.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

49 / 96

Case Study: Analyzing Stock Quotes with pandas

and store it in a

pandas DataFrame

object

Series

data analysis:

calculate the daily log returns (use the

shift

method of the

object) and generate a new column with the log returns in the

pandas DataFrame object;

calculate 252 day rolling means and standard deviations of the log returns as well as their rolling correlations and generate respective columns

plotting:

plot the log returns together with the daily DAX quotes into a single gure;

plot in another gure the rolling means and the standard deviations of the log returns as well as their correlation

data storage:
HDFStore
G

save the

pandas DataFrame

to a

PyTables/HDF5

database (use the

function) read historical quotes of the DAX index (ticker

data gathering:

DAXI )beginningwith01January 2000f romfinance.yahoo.com

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

50 / 96

1. Data Gathering

Case Study: Analyzing Stock Quotes with pandas

# # Analysis of Historical Stock Data # with pandas # RFE_Data . py # # (c) Visixion GmbH # Script for Illustration Purposes Only . # from pylab import * # 1. Data Gathering from pandas . io . data import * DAX = DataReader ( ' ^ GDAXI ' , ' yahoo ' , start = ' 01 / 01 / 2000 ' )

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

51 / 96

2. Data Analysis (I)

Case Study: Analyzing Stock Quotes with pandas

# 2. Data Analysis from pandas import * DAX [ ' Ret ' ]= log ( DAX [ ' Close ' ]/ DAX [ ' Close ' ]. shift (1 )) DAX [ ' rMe ' ]= rolling_mean ( DAX [ ' Ret ' ], 252 )* 252 DAX [ ' rSD ' ]= rolling_std ( DAX [ ' Ret ' ], 252 )* sqrt ( 252 ) DAX [ ' Cor ' ]= rolling_corr ( DAX [ ' rMe ' ], DAX [ ' rSD ' ], 252 )

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

52 / 96

2. Data Analysis (II)2

Case Study: Analyzing Stock Quotes with pandas

print DAX.ix[-20:].to_string() Open High Low Date 2012-05-31 6297.68 6322.69 6208.09 2012-06-01 6259.76 6259.76 6008.47 2012-06-04 5976.46 6030.81 5942.38 2012-06-05 5999.86 6011.56 5914.43 2012-06-06 6028.36 6102.42 5996.41 2012-06-07 6117.76 6230.22 6099.08 2012-06-08 6082.63 6144.76 6053.95 2012-06-11 6255.65 6287.54 6130.28 2012-06-12 6141.92 6211.14 6083.81 2012-06-13 6183.80 6221.36 6093.61 2012-06-14 6146.92 6167.49 6078.22 2012-06-15 6164.56 6251.59 6158.78 2012-06-18 6304.77 6316.14 6221.87 2012-06-19 6254.77 6375.27 6233.25 2012-06-20 6364.06 6402.21 6333.97 2012-06-21 6357.25 6427.49 6331.79 2012-06-22 6273.10 6318.06 6256.34 2012-06-25 6229.43 6229.43 6118.72 2012-06-26 6157.84 6165.28 6109.93 2012-06-27 6155.91 6230.51 6131.30

Close 6264.38 6050.29 5978.23 5969.40 6093.99 6144.22 6130.82 6141.05 6161.24 6152.49 6138.61 6229.41 6248.20 6363.36 6392.13 6343.13 6263.25 6132.39 6136.69 6228.99

Volume Adj Close 33014600 42856100 23699300 22355900 32200300 28859800 22742300 29749700 28227200 28021500 29461700 70434200 28946700 25250900 22461300 30737700 25903100 25886800 25550800 25213500 6264.38 6050.29 5978.23 5969.40 6093.99 6144.22 6130.82 6141.05 6161.24 6152.49 6138.61 6229.41 6248.20 6363.36 6392.13 6343.13 6263.25 6132.39 6136.69 6228.99

Ret -0.002618 -0.034773 -0.011982 -0.001478 0.020657 0.008209 -0.002183 0.001667 0.003282 -0.001421 -0.002259 0.014683 0.003012 0.018263 0.004511 -0.007695 -0.012673 -0.021115 0.000701 0.014929

rMe -0.125673 -0.154371 -0.180338 -0.169200 -0.150697 -0.159234 -0.148888 -0.146535 -0.150797 -0.150285 -0.171289 -0.155601 -0.134741 -0.112545 -0.106139 -0.122593 -0.152372 -0.184679 -0.189818 -0.178054

rSD 0.302519 0.304405 0.304262 0.304029 0.304764 0.304395 0.304165 0.304173 0.304089 0.304087 0.303470 0.303859 0.303387 0.303949 0.303985 0.303932 0.303660 0.304118 0.304050 0.304430

Cor -0.698396 -0.695765 -0.692978 -0.690525 -0.687612 -0.684112 -0.680271 -0.676090 -0.671883 -0.666613 -0.661152 -0.654909 -0.648260 -0.640783 -0.633014 -0.625235 -0.617441 -0.609470 -0.600965 -0.592066

Quelle: http://finance.yahoo.com, 24. June 2012


Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

53 / 96

3. Plotting (I)

Case Study: Analyzing Stock Quotes with pandas

# 3. Plotting figure () subplot ( 211 ) DAX [ ' Close ' ]. plot () ylabel ( ' Index Level ' ) subplot ( 212 ) DAX [ ' Ret ' ]. plot () ylabel ( ' Log Returns ' ) DAX [[ ' rMe ' , ' rSD ' , ' Cor ' ]]. plot ()

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

54 / 96

3. Plotting (II)3
Index Level
3

Case Study: Analyzing Stock Quotes with pandas

9000 8000 7000 6000 5000 4000 3000 2000 001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 0.15 2 0.10 0.05 0.00 0.05 0.10 2 2 1 1 5 7 3 9 8 0 6 4 200 200 200 200 200 200 200 200 200 201 201 201

Quelle: http://finance.yahoo.com, 28. June 2012


Y. Hilpisch (Visixion GmbH)

Log Returns

Python for Finance

EuroPython, Florence, 2012

55 / 96

3. Plotting (III)4
1.0 0.5

Case Study: Analyzing Stock Quotes with pandas

Cor rMe rSD

0.0

0.5

1.0 2 2 1 1 5 7 3 9 8 0 6 4 200 200 200 200 200 200 200 200 200 201 201 201

Quelle: http://finance.yahoo.com, 28. June 2012


Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

56 / 96

4. Data Storage (in HDF5 format)

Case Study: Analyzing Stock Quotes with pandas

# 4. Data Storage h5file = HDFStore ( ' DAX . h5 ' ) h5file [ ' DAX ' ]= DAX h5file . close ()

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

57 / 96

The whole Python script

Case Study: Analyzing Stock Quotes with pandas

# # Analysis of Historical Stock Data with pandas # RFE_Data . py # # (c) Visixion GmbH # Script for Illustration Purposes Only . # from pylab import * # 1. Data Gathering from pandas . io . data import * DAX = DataReader ( ' ^ GDAXI ' , ' yahoo ' , start = ' 01 / 01 / 2000 ' ) # 2. Data Analysis from pandas import * DAX [ ' Ret ' ]= log ( DAX [ ' Close ' ]/ DAX [ ' Close ' ]. shift ( 1 )) DAX [ ' rMe ' ]= rolling_mean ( DAX [ ' Ret ' ], 252 )* 252 DAX [ ' rSD ' ]= rolling_std ( DAX [ ' Ret ' ], 252 )* sqrt ( 252 ) DAX [ ' Cor ' ]= rolling_corr ( DAX [ ' rMe ' ], DAX [ ' rSD ' ], 252 ) # 3. Plotting figure () subplot ( 211 ); DAX [ ' Close ' ]. plot (); ylabel ( ' Stock Price ' ) subplot ( 212 ); DAX [ ' Ret ' ]. plot (); ylabel ( ' Log Returns ' ) DAX [[ ' rMe ' , ' rSD ' , ' Cor ' ]]. plot () # 4. Data Storage h5file = HDFStore ( ' DAX . h5 ' ) h5file [ ' DAX ' ]= DAX h5file . close ()

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

58 / 96

PyTables

A Pythonic database5

Fast Data Mining with PyTables

PyTables is a package for managing hierarchical datasets and designed to eciently and easily cope with extremely large amounts of data. PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code ..., makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data. One important feature of PyTables is that it optimizes memory and disk resources so that data takes much less space (specially if on-ight compression is used) than other solutions such as relational or object oriented databases. One characteristic that sets PyTables apart from similar tools is its capability to perform extremely fast queries on your tables in order to facilitate as much as possible your main goal: get important information *out* of your datasets.

All quotes from www.pytables.org


Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

59 / 96

Some of the most important PyTables functions/methods

Fast Data Mining with PyTables

openFile: create new le or open existing le, like in h5=openFile('data.h5','w'); 'r'=read only, 'a'=read/write .close(): close database, like in h5.close() h5.createGroup: create a new group, as in group=h5.createGroup(root,'Name') IsDescription: class for column descriptions of tables, used as in: class Row(IsDescription): name = StringCol(20,pos=1) data = FloatCol(pos=2) h5.createTable: create new table, as in tab=h5.createTable(group,'Name',Row) tab.iterrows(): iterate over table rows tab.where('condition'): SQL-like queries with exible conditions tab.row: return current/last row of table, used as in r=tab.row row.append(): append row to table, as in r.append() tab.flush(): ush table buer to disk/le h5.createArray: create an array, as in arr=h5.createArray(group,'Name',zeros((10,5))

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

60 / 96

Let's start with a simple example (I)


In [59]: from tables import * In [60]: h5=openFile('Test_Data.h5','w') In [61]: class Row(IsDescription): ....: number = FloatCol(pos=1) ....: sqrt = FloatCol(pos=2) ....:

Fast Data Mining with PyTables

Introductory PyTables Example

In [62]: tab=h5.createTable(h5.root,'Numbers',Row) In [63]: tab Out[63]: /Numbers (Table(0,)) '' description := { "number": Float64Col(shape=(), dflt=0.0, pos=0), "sqrt": Float64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (512,) In [64]: r=tab.row In [65]: for x in range(1000): ....: r['number']=x ....: r['sqrt']=sqrt(x) ....: r.append() ....:
Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

61 / 96

Let's start with a simple example (II)

Fast Data Mining with PyTables

Introductory PyTables Example

In [66]: tab Out[66]: /Numbers (Table(0,)) '' description := { "number": Float64Col(shape=(), dflt=0.0, pos=0), "sqrt": Float64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (512,) In [67]: tab.flush() In [68]: tab Out[68]: /Numbers (Table(1000,)) '' description := { "number": Float64Col(shape=(), dflt=0.0, pos=0), "sqrt": Float64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (512,) In [69]: tab[:5] Out[69]: array([(0.0, 0.0), (1.0, 1.0), (2.0, 1.4142135623730951), (3.0, 1.7320508075688772), (4.0, 2.0)], dtype=[('number', '<f8'), ('sqrt', '<f8')]) In [70]:

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

62 / 96

Let's start with a simple example (III)


In [7]: h5=openFile('Test_Data.h5','a')

Fast Data Mining with PyTables

Introductory PyTables Example

In [8]: h5 Out[8]: File(filename=Test_Data.h5, title='', mode='a', rootUEP='/', filters=Filters(complevel=0, shuffle=False, fletcher32=False)) / (RootGroup) '' /Numbers (Table(1000,)) '' description := { "number": Float64Col(shape=(), dflt=0.0, pos=0), "sqrt": Float64Col(shape=(), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (512,) In [9]: tab=h5.root.Numbers In [10]: tab[:5]['sqrt'] Out[10]: array([ 0. , 1. , 1.41421356, 1.73205081, 2. ])

In [11]: from pylab import * In [12]: plot(tab[:]['sqrt']) Out[12]: [<matplotlib.lines.Line2D at 0x7fe65cf12d10>] In [13]: show()

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

63 / 96

We now want to implement another PyTables database


1 2

Fast Data Mining with PyTables

Exercise: PyTables with pandas

create a le: rst create a PyTables database le row description: by sub-classing from IsDescription
Row
containing a

name,

number

and the

3 4

create group: create a group with name create table: in the group Tables create
sub-class

square Tables

dene a table row with name

of that number

Row

a table with name

Numbers

using the

populate table:
square

by iteration generate 100,000 rows in the table

in the i-th row should be

i-th number,

the

number

Numbers;

the name

is a random number and the

is the square of the random number determine the mean, meadian and standard deviation of both the

data analysis:

number column and the square colum; regress the square column against the number column using polyfit and polyval visualization: generate a histogram of the square column and plot the cumulative sum of the number column array: create a group named Arrays, create an array of size (1000,1000) in it,
polulate the array with random numbers, double each random number, save the array and close the le

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

64 / 96

And want to work with the data stored in that database


open the le: rst open the previous PyTables database le open the table: open the table with name Numbers write a function: write a function that takes as input the starting
ending row and print each row in between in the form
4 5 6

Fast Data Mining with PyTables

Exercise: PyTables with pandas

1 2 3

transfer data: transfer the data stored in the table plot the data: plot the data in the DataFrame object close the le: after completion, close the le

name | number | square into a pandas DataFrame object


in separate subplots

row and the

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

65 / 96

Background of this case study


1

Case Study: Simulation Research Project

Python
such as

lends itself pretty well to implement numerical research projects in elds

physics engineering nance ... storing results automatically to a PyTables database or automatically generating Latex table output

Frequently, the results of such a project are to be documented and published in the form of a Latex document This case study is about potential approaches to automize numerical research with Python, e.g. by

The example project is taken from nance: we compare valuation results for European call options from Monte Carlo simulations with their analytical values

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

66 / 96

Model EconomyBlack-Scholes-Merton continuous time


we consider a model economy with nal date

Case Study: Simulation Research Project

The Financial Model

T, 0 < T < {, F , F, P }

uncertainty is represented by a ltered probability space for

0tT

the risk-neutral index dynamics are given by the SDE

dSt = rdt + dZt St St


index level at date

(2)

with

0 < S0 < , r F,
i.e.

constant risk-less short rate,

constant

volatility of the index and the process

Zt

standard Brownian motion

generates the ltration

Ft F (S0st )

a risk-less zero-coupon bond satises the DE

dBt = rdt Bt
the time

(3)

t 0t<T

value of a zero-coupon bond paying one unit of currency at r (T t) is Bt (T ) = e

with

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

67 / 96

Model EconomyBlack-Scholes-Merton discrete time


to simulate the nancial model, i.e. to generate numerical values for (2) has to be discretized to this end, divide the given time interval that now

Case Study: Simulation Research Project

The Financial Model

St ,

the SDE

t {0, t, 2t, ..., T },

i.e. there are

[0, T ] in equidistant sub-intervals t such M + 1 points in time with M T /t

a discrete version of the continuous time market model (2)(3) is

St Stt Bt Btt
for

= =

r 2

t+

tzt

(4)

ert zt

(5)

t {t, ..., T }

and standard normally distributed

this scheme is an Euler discretization which is known to be exact for the geometric Brownian motion (2)

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

68 / 96

European Optionsrisk-neutral valuation


attainable/redundant and

Case Study: Simulation Research Project

The Financial Model

by the Fundamental Theorem of Asset Pricing, the time

value of an

FT measurable

contingent claim

VT hT (ST ) 0

(satisfying suitable integrability conditions) is given by arbitrage as

Vt = EQ t (Bt (T )VT )
with

V0 = EQ 0 (B0 (T )VT )

as the important special case for valuation purposes


6

denotes the expectation operator

and

the unique risk-neutral probability

measure equivalent to the real world measure

P Q
follows

the Black-Scholes-Merton (BSM) model (2)(3) is known to be complete from which uniqueness of the risk-neutral measure the dening characteristic of martingale in the BSM model, the present value of a call option is given by

is that it makes the discounted index level process a

V0 = erT EQ 0 (max[ST K, 0])

6 E () t

is short for the conditional expectation E(|F )


t

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

69 / 96

European OptionsMonte Carlo valuation


assume risk-neutrality holds simulate

Case Study: Simulation Research Project

The Financial Model

I index level paths with M + 1 St,i , t {0, ..., T }, i {1, ..., I } t=T t=0
the option value is calculate the MCS estimator

points in time leading to index level values

for for

VT,i = hT (ST,i )

by arbitrage

0M CS = erT 1 V I

VT,i
i=1

(6)

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

70 / 96

European OptionsAnalytical value


The time

Case Study: Simulation Research Project

The Financial Model

analytical value of a European call option in the BSM model is

C (S, t = 0) = S0 N(d1 ) KerT N(d2 )


with

(7)

N(d) d1 d2

= = =

1 2 log log
S0 K

e 2 x dx
2 )T 2 2 )T 2

+ (r + T + (r T

S0 K

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

71 / 96

A Monte Carlo simulation study


on a stock index

Case Study: Simulation Research Project

The Financial Model

We set up a Monte Carlo simulation study for the valuation of European call options

We want to evaluate the impact of dierent simulation congurations on the accuracy of the MCS estimator (6) As benchmark we have available the analytical option value from formula (7) As model parameters we chose:

S0 = 100, = 0.25, r = 0.05, K {90, 100, 110}, T {1/12, 1/2, 1}


for a total of 9 option values As variance reduction techniques, we introduce moment matching and antithetic variates We assume for the number of time intervals

M {25, 50}

and the number of paths

I {25000, 50000}
All in all, we get 16 dierent congurations for the simulation set-up We say that a valuation is accurate if the valuation error is smaller than 1 percent or smaller than 1 cent

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

72 / 96

Monte Carlo SimulationSimulation parameters

Case Study: Simulation Research Project

Python and PyTables Implementation

# General Simulation Parameters write = True cL =[( False , False ) ,( False , True ) ,( True , False ) ,( True , True )] # 1st = moMatch -- Random Number Correction ( std + mean + drift ) # 2nd = antiPaths -- Antithetic Paths for Variance Reduction mL =[ 25 , 50 ] # Time Steps iL =[ 25000 , 50000 ] # Number of Paths per Valuation SEED = 100000 # Seed Value R= 10 # Number of Simulation Runs PY1 = 0. 010 # Performance Yardstick 1 : Abs . Error in Currency Units PY2 = 0. 010 # Performance Yardstick 2 : Rel . Error in Decimals tL =[ 1 .0/ 12 , 1.0/2 ,1 .0] # Maturity List kL =[ 90 , 100 , 110 ] # Strike List

for c in cL : # Variance Reduction Techniques moMatch , antiPaths = c for M in mL : # Number of Time Steps for I in iL : # Number of Paths ... # Name of the Simulation Setup name =( ' Call_ ' + str ( R )+ ' _ ' + str (M )+ ' _ ' + str (I / 1000 )+ ' _ ' + str ( moMatch )[ 0 ]+ str ( antiPaths )[ 0 ]+ ' _ ' + str ( PY1 * 100 )+ ' _ ' + str ( PY2 * 100 )) seed ( SEED ) # RNG seed value for i in range (R ): # Simulation Runs ... for T in tL : # Times -to - Maturity ... for K in kL : # Strikes
Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

73 / 96

Monte Carlo SimulationVariance reduction


Matching of 1. and 2. moment of random numbers:

Case Study: Simulation Research Project

Python and PyTables Implementation

# Function for Random Numbers def RNG (M ,I ): if antiPaths == True : randh = standard_normal (( M +1 , I/2 )) rand = concatenate (( randh ,- randh ), 1) else : rand = standard_normal (( M+1 ,I )) if moMatch == True : rand = rand / std ( rand ) rand = rand - mean ( rand ) return rand
Matching of 1. moment for index level dynamics:

# Function for BSM Index Process def eulerSLog ( S0 , vol , r ): ran = RNG (M ,I ) sdt = sqrt ( dt ) S= zeros (( M+1 ,I), ' d ' ) S[0 ,:]= log ( S0 ) for t in range (1 , M+1 ,1 ): S[t ,:]+= S [t -1 ,:] S[t ,:]+=( r - vol ** 2/2 )* dt S[t ,:]+= vol * ran [t ]* sdt if moMatch == True : S[t ,:] -= mean ( vol * ran [t ]* sdt ) return exp (S )

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

74 / 96

Monte Carlo SimulationEvaluating accuracy

Case Study: Simulation Research Project

Python and PyTables Implementation

for c in cL : # Variance Reduction Techniques moMatch , antiPaths = c for M in mL : # Number of Time Steps for I in iL : # Number of Paths ... seed ( SEED ) # RNG seed value for i in range (R ): # Simulation Runs ... for T in tL : # Times -to - Maturity ... for K in kL : # Strikes h= maximum (S[ -1] -K ,0 ) # Inner Value Vector ## MCS Estimator V0_MCS = exp (- r*T )* sum (h )/ I ## BSM Analytical Value V0 = BSM_Call (S0 ,K ,T ,r , vol , 0) ## Errors diff = V0_MCS - V0 rdiff = diff / V0 absError . append ( diff ) relError . append ( rdiff * 100 ) ... if abs ( diff )< PY1 or abs ( diff )/ V0 < PY2 : print " Accuracy ok !\ n"+ br CORR = True else : print " Accuracy NOT ok !\ n" + br CORR = False ; errors = errors +1

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

75 / 96

PyTables

Table records by sub-classing

Case Study: Simulation Research Project

Python and PyTables Implementation

# # Creating a Database for Simulation Results # (c) Visixion GmbH - Y. Hilpisch # Script for illustration purposes only # from tables import * from numpy import * # Record to store set of simulation results class SimResult ( IsDescription ): id_number = Int32Col ( pos =1 ) sim_name = StringCol ( 32 , pos =2) seed = Int32Col ( pos =3 ) runs = Int32Col ( pos =4 ) time_steps = Int32Col ( pos =5 ) paths = Int32Col ( pos = 6) ... # Record to store single simulation result class ValResult ( IsDescription ): id_number = Int32Col ( pos = 1) sim_name = StringCol (32 , pos =2) opt_T = Float32Col ( pos = 3) opt_K = Float32Col ( pos = 4) euro_ana = Float32Col ( pos = 5) euro_mcs = Float32Col ( pos = 6) correct = StringCol (8 , pos =7 ) val_err_abs = Float32Col ( pos =8 ) ...

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

76 / 96

PyTables

Generating hierarchical database

Case Study: Simulation Research Project

Python and PyTables Implementation

# Generate new hdf5 file for results storage filename = " MCS_Results_Comp . h5 " def CreateFile ( filename ): h5file = openFile ( filename , mode = "w" , title = " BSM_MCS_Results " ) ## Open / Generate hdf5 file in " write " mode group = h5file . createGroup ( "/" , ' results ' , ' Results ' ) ## Create a group called " Results " h5file . createTable ( group , ' Sim_Results ' , SimResult , " Simulation Results " ) ## In the group " Results ": ## Create a table called " Simulation Results " with Record " SimResult " h5file . createTable ( group , ' Val_Results ' , ValResult , " Valuation Results ") ## Create a table called " Valuation Results " with Record " ValResult " h5file . close ()

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

77 / 96

PyTables

Write results to table

Case Study: Simulation Research Project

Python and PyTables Implementation

# Fill the table with simulation results def ResWrite ( name , SEED ,R ,M ,I , moMatch , antiPaths ,l , atol , rtol , errors , absError , relError ,t1 , t2 , d1 , d2 ): h5file = openFile ( filename , mode = "a" ) table = h5file . root . results . Sim_Results simres = table . row idn = 1 if len ( table )> 0: for x in table . iterrows (): idn = max ( idn ,x[ ' id_number ' ]); idn = idn + 1 simres [ ' id_number ' ] = idn simres [ ' sim_name ' ] = name simres [ ' seed ' ] = SEED simres [ ' runs ' ] = R simres [ ' time_steps ' ] = M simres [ ' paths ' ] = I simres [ ' mo_match ' ] = moMatch simres [ ' anti_paths ' ] = antiPaths simres [ ' opt_prices ' ] = l simres [ ' abs_tol ' ] = atol simres [ ' rel_tol ' ] = rtol simres [ ' errors ' ] = errors simres [ ' error_ratio ' ]= float ( errors )/ l simres [ ' av_val_err ' ] = sum ( array ( absError ))/ l ... simres [ ' time_opt ' ] = (t2 - t1 )/ l simres [ ' start_date ' ] = str ( d1 ) simres [ ' end_date ' ] = str ( d2 ) simres . append () table . flush () h5file . close ()
Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

78 / 96

PyTables

Writing results to database during script execution

Case Study: Simulation Research Project

Python and PyTables Implementation

... from MCS_Results_PyTables import * ... write = True ... for c in cL : # Variance Reduction Techniques moMatch , antiPaths = c for M in mL : # Number of Time Steps for I in iL : # Number of Paths if write == True : h5file = openFile ( filename , mode = ' a ' ) ... if write == True : ValWrite ( h5file , name ,T ,K ,V0 , V0_MCS , str ( CORR ) , M ,I , str ( moMatch ), str ( antiPaths ), datetime . now ()) ... if write == True : h5file . close () ... if write == True : ResWrite ( name , SEED ,R ,M ,I , str ( moMatch ) , str ( antiPaths ),l , PY1 , PY2 , errors , absError , relError ,t1 ,t2 ,d1 , d2 )

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

79 / 96

PyTables

Print results in Latex table form

Case Study: Simulation Research Project

Python and PyTables Implementation

# Print simulation results ( Latex table output ) def PrintTex ( filename = filename , idl =0 , idh = 50 ): h5file = openFile ( filename , mode ="r ") table = h5file . root . results . Sim_Results for simres in table . where ( ''' idl <= id_number <= idh ''' ): print ( str ( simres [ ' runs ' ])+ ' & ' + str ( simres [ ' time_steps ' ])+ ' & ' + str ( simres [ ' paths ' ])+ ' & ' + str ( simres [ ' mo_match ' ])+ ' & ' + str ( simres [ ' anti_paths ' ])+ ' & ' + ' %. 3f ' % simres [ ' abs_tol ' ]+ ' & ' + ' %. 3f ' % simres [ ' rel_tol ' ]+ ' & ' + str ( simres [ ' opt_prices ' ])+ ' & ' + str ( simres [ ' errors ' ])+ ' & ' + ' %. 3f ' % simres [ ' av_val_err ' ]+ ' & ' + ' %. 3f ' % simres [ ' time_opt ' ]+ " \\ tn ") h5file . close ()

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

80 / 96

Putting it all togetherPress F5 and type PrintTex()


>>> PrintTex() 10 & 25 & 25000 10 & 25 & 50000 10 & 50 & 25000 10 & 50 & 50000 10 & 25 & 25000 10 & 25 & 50000 10 & 50 & 25000 10 & 50 & 50000 10 & 25 & 25000 10 & 25 & 50000 10 & 50 & 25000 10 & 50 & 50000 10 & 25 & 25000 10 & 25 & 50000 10 & 50 & 25000 10 & 50 & 50000 & & & & & & & & & & & & & & & &

Case Study: Simulation Research Project

Python and PyTables Implementation

False & False & 0.010 & 0.010 & 90 & 22 & 0.013 & 0.059 \tn False & False & 0.010 & 0.010 & 90 & 9 & -0.010 & 0.093 \tn False & False & 0.010 & 0.010 & 90 & 8 & -0.005 & 0.088 \tn False & False & 0.010 & 0.010 & 90 & 6 & -0.017 & 0.152 \tn False & True & 0.010 & 0.010 & 90 & 10 & 0.002 & 0.051 \tn False & True & 0.010 & 0.010 & 90 & 5 & 0.008 & 0.081 \tn False & True & 0.010 & 0.010 & 90 & 12 & -0.005 & 0.078 \tn False & True & 0.010 & 0.010 & 90 & 1 & -0.008 & 0.143 \tn True & False & 0.010 & 0.010 & 90 & 5 & 0.010 & 0.066 \tn True & False & 0.010 & 0.010 & 90 & 2 & -0.006 & 0.108 \tn True & False & 0.010 & 0.010 & 90 & 3 & -0.008 & 0.104 \tn True & False & 0.010 & 0.010 & 90 & 5 & -0.007 & 0.188 \tn True & True & 0.010 & 0.010 & 90 & 13 & 0.001 & 0.061 \tn True & True & 0.010 & 0.010 & 90 & 3 & 0.009 & 0.101 \tn True & True & 0.010 & 0.010 & 90 & 11 & -0.004 & 0.100 \tn True & True & 0.010 & 0.010 & 90 & 1 & -0.006 & 0.197 \tn

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

81 / 96

ResultsCopy & paste it to your Latex document

Case Study: Simulation Research Project

Python and PyTables Implementation

\newcommand{\tn}{\tabularnewline} \def\TMP{\begin{center}\begin{tabular}{c c c c c r r r r r r} \hline\hline $R$ & $M$ & $I$ & \textbf{MM} & \textbf{AP}& \textbf{ATol}& \textbf{RTol}& \textbf{\#Op} & \textbf{Err} & \textbf{AvEr} & \textbf{Sec/O.} \tn [0.5ex] % inserts table heading \hline % inserts single horizontal line 10 & 25 & 25000 & False & False & 0.010 & 0.010 & 90 & 22 & 0.013 & 0.059 \tn 10 & 25 & 50000 & False & False & 0.010 & 0.010 & 90 & 9 & -0.010 & 0.093 \tn 10 & 50 & 25000 & False & False & 0.010 & 0.010 & 90 & 8 & -0.005 & 0.088 \tn 10 & 50 & 50000 & False & False & 0.010 & 0.010 & 90 & 6 & -0.017 & 0.152 \tn 10 & 25 & 25000 & False & True & 0.010 & 0.010 & 90 & 10 & 0.002 & 0.051 \tn 10 & 25 & 50000 & False & True & 0.010 & 0.010 & 90 & 5 & 0.008 & 0.081 \tn 10 & 50 & 25000 & False & True & 0.010 & 0.010 & 90 & 12 & -0.005 & 0.078 \tn 10 & 50 & 50000 & False & True & 0.010 & 0.010 & 90 & 1 & -0.008 & 0.143 \tn 10 & 25 & 25000 & True & False & 0.010 & 0.010 & 90 & 5 & 0.010 & 0.066 \tn 10 & 25 & 50000 & True & False & 0.010 & 0.010 & 90 & 2 & -0.006 & 0.108 \tn 10 & 50 & 25000 & True & False & 0.010 & 0.010 & 90 & 3 & -0.008 & 0.104 \tn 10 & 50 & 50000 & True & False & 0.010 & 0.010 & 90 & 5 & -0.007 & 0.188 \tn 10 & 25 & 25000 & True & True & 0.010 & 0.010 & 90 & 13 & 0.001 & 0.061 \tn 10 & 25 & 50000 & True & True & 0.010 & 0.010 & 90 & 3 & 0.009 & 0.101 \tn 10 & 50 & 25000 & True & True & 0.010 & 0.010 & 90 & 11 & -0.004 & 0.100 \tn 10 & 50 & 50000 & True & True & 0.010 & 0.010 & 90 & 1 & -0.006 & 0.197 \tn \hline \hline %inserts double line \end{tabular}\end{center}} \newdimen\TMPsize\settowidth{\TMPsize}{\TMP} \begin{table}[h]\begin{center}\begin{minipage}{\TMPsize} \footnotesize{\caption[Results]{\label{tab:RESULTS_1}Simulation results ...}}} \vspace{0.3ex} % title of Table \TMP \end{minipage}\end{center}\end{table} $R$ = number of runs, $M$ = number of time intervals, ...
Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

82 / 96

Resultsthe nal product: Latex table


1 2

Case Study: Simulation Research Project

Python and PyTables Implementation

Table: Simulation results for dierent congurations of the MCS algorithm and an accuracy level of P Y = 0.01 and P Y = 0.01.
R M I MM AP ATol RTol #Op Err AvEr Sec/O. 10 25 25000 False False 0.010 0.010 90 22 0.013 0.059 10 25 50000 False False 0.010 0.010 90 9 -0.010 0.093 10 50 25000 False False 0.010 0.010 90 8 -0.005 0.088 10 50 50000 False False 0.010 0.010 90 6 -0.017 0.152 10 25 25000 False True 0.010 0.010 90 10 0.002 0.051 10 25 50000 False True 0.010 0.010 90 5 0.008 0.081 10 50 25000 False True 0.010 0.010 90 12 -0.005 0.078 10 50 50000 False True 0.010 0.010 90 1 -0.008 0.143 10 25 25000 True False 0.010 0.010 90 5 0.010 0.066 10 25 50000 True False 0.010 0.010 90 2 -0.006 0.108 10 50 25000 True False 0.010 0.010 90 3 -0.008 0.104 10 50 50000 True False 0.010 0.010 90 5 -0.007 0.188 10 25 25000 True True 0.010 0.010 90 13 0.001 0.061 10 25 50000 True True 0.010 0.010 90 3 0.009 0.101 10 50 25000 True True 0.010 0.010 90 11 -0.004 0.100 10 50 50000 True True 0.010 0.010 90 1 -0.006 0.197 R = number of runs, M = number of time intervals, I = number of simulation paths, CV = control variates, MM = moment matching, AP = antithetic paths, ATol = absolute performance yardstick, RTol = relative performance yardstick, #Op = number of options, Err = number of errors, AvEr = average error in currency units, Sec/O. = seconds per option valuation.
Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

83 / 96

Summary of the case study's insights


1 2 3

Case Study: Simulation Research Project

Python and PyTables Implementation

Python

is powerful when it comes to setting up numerical research projects

frequently, results of such projects are to be reported in the form of Latex documents using an example from mathematical nance, we show how to automate report generation through some simple methods:

iterating over several lists writing simulation results to a PyTables database reading results from database and printing strings in Latex format

once you have set up the numerical study, you only have three steps to generate a Latex table with the results

start your script by pressing F5 type PrintTex() copy the output to your Latex document

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

84 / 96

Cython

: introducing static typing for speed-ups

Speeding-up Code with Cython

Fundamentals about Cython

As any other programming language with dynamic typing,

Python

suers from a

lack of speed in comparison with static typing languages as C or C++. Fortunately, the primary

Python

execution environment is written in C, so it is

possible to access external modules written in C. But it is not trivial to write the necessary C glue code. This problem is solved by

Python

CythonCython Cython

is a programming language based on

with extra syntax allowing for optional static type declarations.

The source code gets translated into optimized C/C++ code and compiled as

Python

extension modules, so

combines the best best of two worlds: the

very fast program execution of C and the high-level, object-oriented and fast programming of

Python.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

85 / 96

How to work with Cython

Speeding-up Code with Cython

Fundamentals about Cython

Write your

Python

and/or

Translate the code by easiest way).

Cython code and using Distutils (the Python

save it with le extension most exible way) or

.pyx. pyximport

(the

Import the module just like a

module.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

86 / 96

Cython

Examples (I)

Speeding-up Code with Cython

Example Code for Cython Use

Consider the integral

2 f (x)dx with f (x) x2 x. To approximate its value by its 0 lower sum we can use the following Python functions, saved in a module called a_Integrate_Norm.py, say.

# # Integration with Pure Python # a_Integrate_Norm . py # import time def f(x ): return x **2 - x def integrate (a ,b , N ): t0 = time . time () s= 0 dx =( b -a )/ N for i in range (N ): s += f(a +i* dx ) t1 = time . time () print " Approximate value is %s " %( s* dx ) print " Computation made in : %. 2f seconds " %( t1 - t0 )

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

87 / 96

Cython

Examples (II)

Speeding-up Code with Cython

Example Code for Cython Use

Now, we apply this code in the shell.

In [76]: from a_Integrate_Norm import * In [77]: integrate(0,2.0,10000) Approximate value is 0.66646668 Computation made in : 0.01 seconds In [78]: integrate(0,2.0,100000) Approximate value is 0.6666466668 Computation made in : 0.05 seconds In [79]: integrate(0,2.0,1000000) Approximate value is 0.666664666668 Computation made in : 0.40 seconds In [80]: integrate(0,2.0,10000000) Approximate value is 0.666666466667 Computation made in : 3.83 seconds

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

88 / 96

Cython

Examples (III)

Speeding-up Code with Cython

Example Code for Cython Use

Rename the module from

to the shell. To import and compile

a_Integrate_Norm.py to b_Integrate_Norm.pyx Cython modules just type

and switch

In [81]: import pyximport In [82]: pyximport.install()

After that, you can import and execute the functions as usual.

In [83]: from b_Integrate_Norm import * In [84]: integrate(0,2.0,1000) Approx. value is 0.664668 Computation made in : 0.00 seconds In [85]: integrate(0,2.0,10000000) Approx. value is 0.666666466667 Computation made in : 1.17 seconds In [86]:
Only the precompiling of the pure

Python

code delivers a speed-up of nearly 70%.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

89 / 96

Cython

Examples (IV)

Speeding-up Code with Cython

Example Code for Cython Use

Next, we apply static typing to the code.

# # Integration with Static Typing in Cython # c_Integrate_with_static_typing . pyx # import time def f( double x ): return x **2 - x def integrate ( double a , double b , int N ): t0 = time . time () cdef int i cdef double s , dx s= 0 dx =( b -a )/ N for i in range (N ): s += f(a +i* dx ) t1 = time . time () print " Approximate value is %s " %( s* dx ) print " Computation made in : %. 2f seconds " %( t1 - t0 )

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

90 / 96

Cython

Examples (V)

Speeding-up Code with Cython

Example Code for Cython Use

In the shell, import and execute the code as above.

In [93]: from c_Integrate_with_static_typing import * In [94]: integrate(0,2.0,10000) Approximate value is 0.66646668 Computation made in : 0.00 seconds In [95]: integrate(0,2.0,10000000) Approximate value is 0.666666466667 Computation made in : 0.98 seconds In [96]:
The static typing results in a further 20% reduction in time.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

91 / 96

Cython

Examples (VI)

Speeding-up Code with Cython

Example Code for Cython Use

The return value of functions also has a type, so it could be a good idea to dene this type for the often called function f.

# # Integration with Static Typing in Cython # d_Integrate_with_static_typing_2 . pyx # import time cdef double f( double x ): return x **2 - x def integrate ( double a , double b , int N ): t0 = time . time () cdef int i cdef double s , dx s= 0 dx =( b -a )/ N for i in range (N ): s += f(a +i* dx ) t1 = time . time () print " Approximate value is %s " %( s* dx ) print " Computation made in : %. 2f seconds " %( t1 - t0 )

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

92 / 96

Cython

Examples (VII)

Speeding-up Code with Cython

Example Code for Cython Use

Using this version, reduces calculation time to 0.05 secondsa speed-up factor of 75 times compared to pure

Python.

In [98]: import pyximport In [99]: pyximport.install() In [100]: from d_Integrate_with_static_typing_2 import * In [101]: integrate(0,2.0,10000000) Approximate value is 0.666666466667 Computation made in : 0.05 seconds In [102]:
Caution: Functions with a static type are no longer callable from

Python

contexts.

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

93 / 96

Typical Cython developement cycle

Speeding-up Code with Cython

Example Code for Cython Use

Typically, one applies a development process described roughly as follows: write your

Python

code

x it and optimize it (e.g. with respect to

NumPy

vectorization)

prole your program to identify the most time consuming parts re-write these parts with

Cython

extensions to speed them up

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

94 / 96

Conclusions
1 2 3 4 5 6 7 8

Conclusion

Python

is powerful and multi-purpose

For Finance, it oers numerous really helpful libraries There are a number of good development tools available It allows high productivity levelsfor lone warriors as well as for teams It is easy-to-maintaincompact, readable code It is compact and nevertheless quite fast (when done right) It is low/no cost and future-proof It is fun to work withbe it in Finance or any other area

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

95 / 96

Contact
Dr. Yves J. Hilpisch Visixion GmbH Rathausstrasse 75-79 66333 Voelklingen Germany
www.visixion.com www.dxevo.com www.dexision.com

Conclusion

E contact@visixion.com T +49 6898 932350 F +49 6898 932352

Y. Hilpisch (Visixion GmbH)

Python for Finance

EuroPython, Florence, 2012

96 / 96

You might also like