Professional Documents
Culture Documents
Introduction to
Numerical Computing
with Numpy
Enthought, Inc.
www.enthought.com
(c) 2001-2017, Enthought, Inc.
Enthought, Inc.
515 Congress Avenue
Suite 2100
Austin, TX 78701
www.enthought.com
Q2-2017
letter
1.0.0
Introduction to Numerical Computing with Numpy
Enthought, Inc.
www.enthought.com
About Enthought . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
An interlude: Matplotlib basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Introducing NumPy Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Multi-Dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Slicing/Indexing Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Fancy Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Creating arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Array Creation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Structured Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Array Calculation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Array Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Universal Function Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
The array data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Pandas
Crash
Course
•
•
•
•
•
•
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> t = np.linspace(0,2*np.pi,
... 50)
>>> x = np.sin(t)
>>> y = np.cos(t)
subplot
>>> plt.subplot(2, 1, 1)
>>> plt.plot(x)
•
>>> a = np.array([0, 1, 2, 3]) # Shape returns a tuple
>>> a # listing the length of the
array([0, 1, 2, 3]) # array along each dimension.
>>> a.shape
(4,)
>>> type(a)
numpy.ndarray
>>> a.itemsize
4
>>> a.dtype
dtype('int32')
# Return the number of bytes
# used by the data portion of
# the array.
>>> a.nbytes
>>> a.ndim 16
1
# in-place operations
>>> x *= c
>>> x
array([ 0.,0.628,…,6.283])
a 10 1 2 3
index 0 1 2 3
4 >>> a[1]
>>> a.size
2 x 4 = 8 a 2 0 1 2 3 array([10, 11, 12, -1])
8 10 11 12 13
0
dim 1 a 1 0 1 2 3
>>> a.ndim 10 11 12 -1
2 dim 0 a 0 1 2 3
10 11 12 13 0 1 2 3
var[lower:upper:step]
a = np.arange(25).reshape(5, 5)
0 1 2 3 4
0
0 1 2 3 4
1
5 6 7 8 9
2
10 11 12 13 14
3
15 16 17 18 19
4
20 21 22 23 24
>>> a = np.array((0, 1, 2, 3, 4))
# changing b changed a!
>>> a
array([ 0, 1, 10, 3, 4])
-3 -2 -1
0 1 2 3 4 5 6 7
a 0 10 20 30 40 50 60 70
y 10 20 30
>>> a[[0, 1, 2, 3, 4],
... [1, 2, 3, 4, 5]]
array([ 1, 12, 23, 34, 45])
0 1 2 3 4 5
>>> a[3:, [0, 2, 5]]
array([[30, 32, 35], 0
[40, 42, 45], 0 1 2 3 4 5
1
[50, 52, 55]]) 10 11 12 13 14 15
2
>>> mask = np.array( 20 21 22 23 24 25
3
... [1, 0, 1, 0, 0, 1],
30 31 32 33 34 35
... dtype=bool) 4
>>> a[mask, 2] 40 41 42 43 44 45
array([2, 22, 52]) 5
50 51 52 53 54 55
a = np.arange(25).reshape(5, 5)
0 1 2 3 4
0
0 1 2 3 4
1
5 6 7 8 9
2
10 11 12 13 14
3
15 16 17 18 19
4
20 21 22 23 24
0
0 1
0 1 2 3
0 1 2
-3 -3
-1 -1 -1 -1
-2 -2 -2
-4
# Default to double precision >>> a = np.array([0,1,2,3],
>>> a = np.array([0,1.0,2,3]) ... dtype='uint8')
>>> a.dtype >>> a.dtype
dtype('float64') dtype('uint8')
>>> a.nbytes >>> a.nbytes
32 4
0 1 2 3
a 0 1 2 3
>>> a = np.array([0,1.,2,3],
... dtype='float32')
>>> a.dtype
dtype('float32') Base 2 Base 10
float64
dtype
>>> np.ones((2, 3),
... dtype='float32')
>>> np.arange(4) array([[ 1., 1., 1.],
array([0, 1, 2, 3]) [ 1., 1., 1.]],
>>> np.arange(0, 2*pi, pi/4) dtype=float32)
array([ 0.000, 0.785, 1.571, >>> np.zeros(3)
2.356, 3.142, 3.927, 4.712,
array([ 0., 0., 0.])
5.497])
0 1 2
# Be careful… 0
1. 1. 1.
>>> np.arange(1.5, 2.1, 0.3) 1
0. 0. 0.
array([ 1.5, 1.8, 2.1]) 1. 1. 1.
dtype: float64
dtype: float32
zeros(3) zeros((3, ))
# Generate an n by n identity # empty(shape, dtype=float64,
# array. The default dtype is # order='C')
# float64. >>> a = np.empty(2)
>>> a = np.identity(4) >>> a
>>> a array([1.78021120e-306,
array([[ 1., 0., 0., 0.], 6.95357225e-308])
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.], # fill array with 5.0
[ 0., 0., 0., 1.]]) >>> a.fill(5.0)
>>> a.dtype array([5., 5.])
dtype('float64')
>>> np.identity(4, dtype=int) # alternative approach
array([[ 1, 0, 0, 0], # (slightly slower)
[ 0, 1, 0, 0], >>> a[:] = 4.0
[ 0, 0, 1, 0], array([4., 4.])
[ 0, 0, 0, 1]])
# loadtxt() automatically
# generates an array from the
# Generate N evenly spaced # txt file
# elements on a log scale arr = np.loadtxt('Data.txt',
# between base**start and ... skiprows=1,
... dtype=int, delimiter=",",
# base**stop (default base=10.
... usecols = (0,1,2,4),
>>> np.logspace(0, 1, 5) ... comments = "%")
array([1., 1.77, 3.16, 5.62, 10.])
# Save an array into a txt file
np.savetxt('filename', arr)
“Structured” Arrays
>>> import numpy as np
# "Data structure" (dtype) that describes the fields and
# type of the items in each array element.
>>> particle_dtype = np.dtype([('mass','float32'),
... ('velocity', 'float32')])
# This must be a list of tuples.
>>> particles = np.array([(1,1), (1,2), (2,1), (1,3)],
dtype=particle_dtype)
>>> print(particles)
[(1.0, 1.0) (1.0, 2.0) (2.0, 1.0) (1.0, 3.0)]
# Retrieve the mass for all particles through indexing.
>>> print(particles['mass'])
[ 1. 1. 2. 1.]
# Retrieve particle 0 through indexing.
>>> particles[0]
(1.0, 1.0)
# Sort particles in place, with velocity as the primary field and
# mass as the secondary field.
>>> particles.sort(order=('velocity','mass'))
>>> print(particles)
[(1.0, 1.0) (2.0, 1.0) (1.0, 2.0) (1.0, 3.0)]
41
# See demo/multitype_array/particle.py.
“Structured” Arrays
Elements of an array can be EXAMPLE
any fixed-size data structure! # structured data format
>>> fmt = np.dtype([('name', 'S10'),
name char[10] ('age', int),
age int ('weight', float)
weight double ])
>>> a = np.empty((3,4), dtype=fmt)
>>> a.itemsize
Brad Jane John Fred
22
33 25 47 54 >>> a['name'] = [['Brad', … ,'Jill']]
135.0 105.0 225.0 140.0 >>> a['age'] = [[33, … , 54]]
Henry George Brian Amy >>> a['weight'] = [[135, … , 145]]
29 61 32 27
>>> print(a)
[[('Brad', 33, 135.0)
154.0 202.0 137.0 187.0 …
Ron Susan Jennifer Jill ('Jill', 54, 145.0)]]
19 33 18 54
188.0 135.0 88.0 145.0
42
Nested Datatype
nested.dat
43
44
# Methods act on data stored 1 2 3
# in the array 4 5 6
>>> a = np.array([[1,2,3],
[4,5,6]]) sum(a)
-1
# .sum() defaults to adding up
1
all the values in an array. 1 2 3
>>> a.sum() 0 -2
21 4 5 6
# supply the keyword axis to 5 7 9
# sum along the 0th axis sum(a, axis=0)
>>> a.sum(axis=0)
array([5, 7, 9]) -1
1
# supply the keyword axis to
1 2 3 6
0 -2
# sum along the last axis 4 5 6 15
>>> a.sum(axis=-1)
array([ 6, 15]) sum(a, axis=-1)
0 1 2 3 4 5
0
225 196 169 144 121 100
1
81 64 49 36 25 16
2
9 4 1 0 1 4
3
9 16 25 36 49 64
4
81 100 121 144 169 196
Array Broadcasting
NumPy arrays of different dimensionality can be combined in the same
expression. Arrays with smaller dimension are broadcasted to match the
larger arrays, without copying data. Broadcasting has two rules.
RULE 1: PREPEND ONES TO SMALLER ARRAYS’ SHAPE
>>> import numpy as np
>>> a = np.ones((3, 5)) # a.shape == (3, 5)
>>> b = np.ones((5,)) # b.shape == (5,)
>>> b.reshape(1, 5) # result is a (1,5)-shaped array.
>>> b[np.newaxis, :] # equivalent, more concise.
RULE 2: DIMENSIONS OF SIZE 1 ARE REPEATED WITHOUT COPYING
>>> c = a + b # c.shape == (3, 5)
# is logically equivalent to...
>>> tmp_b = b.reshape(1, 5)
>>> tmp_b_repeat = tmp_b.repeat(3, axis=0)
>>> c = a + tmp_b_repeat
# But broadcasting makes no copies of "b"s data!
52
Array Broadcasting
4x3 4x3
4x3 3
stretch
4x1 3
stretch stretch 53
Broadcasting Rules
The trailing axes of either arrays must be 1 or both must have the same
size for broadcasting to occur. Otherwise, a
"ValueError: shape mismatch: objects cannot be
broadcast to a single shape" exception is thrown.
mismatch!
4x3 4
0 0 0 0 1 2 3
10 10 10
+ =
20 20 20
30 30 30
54
Broadcasting in Action
0 0 1 2 0 1 2
10 10 11 12
20
+ = 20 21 22
30 30 31 32
55
56
Broadcasting’s Usefulness
Broadcasting can often be used to replace needless data replication inside a
NumPy array expression.
57
Broadcasting Indices
Broadcasting can also be used to slice elements from different “depths” in
a 3-D (or any other shape) array. This is a very powerful feature of
indexing.
>>> yi, xi = np.meshgrid(np.arange(3), np.arange(3),
... sparse=True)
>>> zi = np.array([[0, 1, 2],
... [1, 1, 2],
... [2, 2, 2]])
>>> horizon = data_cube[xi, yi, zi]
Indices Selected Data
58
Universal Function Methods
The mathematical, comparative, logical, and bitwise
operators op that take two arguments (binary operators)
have special methods that operate on arrays:
op.reduce(a,axis=0)
op.accumulate(a,axis=0)
op.outer(a,b)
op.reduceat(a,indices)
59
op.reduce()
0
60 64 68 0
3 0 1 2
0 1 2 33 10 11 12
10 11 12 63 20 21 22
20 21 22 93 30 31 32
30 31 32
61
op.accumulate()
EXAMPLE
op.reduceat(a,indices)
>>> a = np.array(
applies op to ranges in the 1-D array a [0,10,20,30,40,50])
defined by the values in indices. >>> indices = np.array([1,4])
The resulting array has the same length >>> np.add.reduceat(a,indices)
array([60, 90])
as indices.
0 10 20 30 40 50
For example:
y = add.reduceat(a, indices) 1 4
indices [i 1] For multidimensional arrays,
y[i] a[n] reduceat() is always applied along the
last axis (sum of rows for 2-D arrays).
n indices [i]
This is different from the default for
reduce() and accumulate().
63
op.outer()
0 1 2 3 4 5 6 7 8
0 1 2
3 4 5
6 7 8
dtype * float64
ndim 2
shape 3 3
strides 24 8
data * 0 1 2 3 4 5 6 7 8
0 1 2
3 4 5
6 7 8
>>> a = np.arange(6) >>> b = a.reshape(2, 3)
>>> a >>> b
0 1 2
0 1 2 3 4 5
0
0 1 2
a b 1
0 1 2 3 4 5 3 4 5
0 1 strides 24 8 strides 8 24
0 1 2 0
0 a 1 0 3 0 1 2 0 3
a 1 0 1 2 1 4 3 4 5 1 4
2
3 4 5 2 5 2 5
>>> a = np.array([[0,1,2], >>> a = np.arange(6)
... [3,4,5]]) >>> a
array([0, 1, 2, 3, 4, 5])
# Return a new array with a >>> a.shape
# different shape (a view (6,)
# where possible)
>>> a.reshape(3,2) # Reshape array in-place to
array([[0, 1], # 2x3
[2, 3],
[4, 5]]) >>> a.shape = (2,3)
>>> a
# Reshape cannot change the array([[0, 1, 2],
# number of elements in an [3, 4, 5]])
# array
>>> a.reshape(4,2)
ValueError: total size of new
array must be unchanged 0 1 2 3 4 5 0 1 2
0
0 1 2
a 1
0 1 2 3 4 5 3 4 5
# Create a 2D array
>>> a = np.array([[0,1],
... [2,3]])