This tutorial is unfinished. The original authors were not NumPy experts nor native English speakers so it needs reviewing. Please do not hesitate to click the edit button. You will need to create a User Account first.
Contents
Quick Tour
NumPy is a Python library for working with multidimensional arrays. The main data type is an array. An array is a set of elements, all of the same type, indexed by a vector of nonnegative integers.
Arrays can be created in different ways:
>>> from numpy import *
>>> a = array( [ 10, 20, 30, 40 ] ) # create an array out of a list
>>> a
array([10, 20, 30, 40])
>>> b = arange( 4 ) # create an array from 0 to 3
>>> print b
[0 1 2 3]
and new arrays can be obtained by operating with existing arrays:
>>> c = a+sin(b) # elementwise operations
>>> c
array([ 10. , 20.84147098, 30.90929743, 40.14112001])
Arrays can have more than one dimension:
>>> x = ones( (3,4) )
>>> x
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
>>> x.shape # a tuple with the dimensions
(3, 4)
and you can change the dimensions of existing arrays:
>>> y = arange(12)
>>> y
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> y.shape = 3,4 # does not modify the total number of elements
>>> y
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
It is possible to operate with arrays of different dimensions as long as they fit well.
>>> 3*a # multiply each element of a by 3
array([ 30, 60, 90, 120])
>>> a+y # sum a to each row of y
array([[10, 21, 32, 43],
[14, 25, 36, 47],
[18, 29, 40, 51]])
Similarly to Python lists, arrays can be indexed, sliced and iterated over.
>>> a[2:4] = -7,-3 # modify last two elements of a
>>> for i in a: # iterate over a
... print i
...
10
20
-7
-3
When indexing more than one dimension, indices are separated by commas.
>>> x[1,2] = 20
>>> x[1,:] # x's second row
array([ 1, 1, 20, 1])
>>> x[0] = a # change first row of x
>>> x
array([[10, 20, -7, -3],
[ 1, 1, 20, 1],
[ 1, 1, 1, 1]])
Prerequisites
Before reading this tutorial you should know a bit of Python. If this is not the case, or if you want to refresh your memory, take a look at the Python tutorial. In particular, you may wish to read up to section 6 (Modules).
You also need to have some software installed on your computer. You need at least
but there are some other tools that may be useful to you:
ipython is an enhanced Interactive Python shell which is very convenient for exploring NumPy's features.
matplotlib will enable you to plot graphics.
SciPy provides a lot of scientific routines working on the top of NumPy.
The Basics
NumPy's main object is the homogeneous multidimensional array. This is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. Typical examples of multidimensional arrays include vectors, matrices, images and spreadsheets.
By 'multidimensional', we mean that arrays can have several dimensions or axes. Because the word dimension is ambiguous, we use axis instead. The number of axes will often be called rank.
For example, the coordinates of a point in 3D space [1, 2, 1] is an array of rank 1 ---it has one axis. That axis has a length of 3. As another example, the array
[[ 1., 0., 0.],
[ 0., 1., 2.]]
is an array of rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3. For more details see the Numpy Glossary.
The multidimensional array class is called ndarray. Note that this is not the same as the Standard Python Library class array, which is only for one-dimensional arrays. The more important attributes of an ndarray object are:
- ndarray.ndim
the number of axes (dimensions) of the array. In the Python world, the number of dimensions is often referred to as rank.
- ndarray.shape
the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.
- ndarray.size
the total number of elements of the array. This is equal to the product of the elements of shape.
- ndarray.dtype
an object describing the type of the elements in the array. One can create or specify dtype's using standard Python types. NumPy provides a bunch of them, for example: bool_, character, int_, int8, int16, int32, int64, float_, float8, float16, float32, float64, complex_, complex64, object_.
- ndarray.itemsize
the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.
- ndarray.data
- the buffer containig the actual elements of the array. Normally, we won't need to use this attribute because we will access to the elements in an array using indexing facilities.
An example
We define the following array:
>>> a = arange(10).reshape(2,5)
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
We have just created an array object with a label a attached to it. The array a has several attributes --or properties. In Python, attributes of a specific object are denoted name_object.attribute. In our case:
a.shape is (2,5)
a.ndim is 2 (which is the length of a.shape)
a.size is 10
a.dtype.name is int32
a.itemsize is 4, which means that an int32 takes 4 bytes in memory.
You can check all these attributes just by typing them in the interactive session:
>>> a.shape
(2, 5)
>>> a.dtype.name
'int32'
And so on.
Array Creation
There are many ways to create arrays. For example, you can create an array from a regular Python list or tuple using the array function.
>>> a = array( [2,3,4] )
>>> a
array([2, 3, 4])
>>> type(a) # a is an object of the ndarray class
<type 'numpy.ndarray'>
array transforms sequences of sequences into bidimensional arrays, and it transforms sequences of sequences of sequences into tridimensional arrays, and so on. The type of the resulting array is deduced from the type of the elements in the sequences.
>>> b = array( [ (1.5,2,3), (4,5,6) ] ) # this will be an array of floats
>>> b
array([[ 1.5, 2. , 3. ],
[ 4. , 5. , 6. ]])
Once we have an array we can take a look at its attributes:
>>> b.ndim # number of dimensions
2
>>> b.shape # the dimensions
(2, 3)
>>> b.dtype # the type (8 byte floats)
dtype('float64')
>>> b.itemsize # the size the type
8
The type of the array can also be explicitly specified at creation time:
>>> c = array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+0.j, 2.+0.j],
[ 3.+0.j, 4.+0.j]])
Warning
A frequent error consists in calling array with a set of elements as argument, that is,
>>> a = array(1,2,3,4) # WRONGThis does not do what we expect. The first argument of array must be the entire set of data. Thus, it must be called like that:
>>> a = array([1,2,3,4]) # RIGHT
The function array is not the only one that creates arrays. Usually the elements of the array are not known from the beginning, and a placeholder array is needed. There are some functions to create arrays with some initial content. By default, the type of the created array is float64.
The function zeros creates an array full of zeros, and the function ones creates an array full of ones.
>>> zeros( (3,4) ) # the parameter specifies the shape
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
>>> ones( (2,3,4), dtype=int16 ) # dtype can also be specified
array([[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]],
[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]]], dtype=int16)
The function empty creates an array without filling it in. Then the initial content is random and it depends on the state of the memory.
>>> empty( (2,3) )
array([[ 3.73603959e-262, 6.02658058e-154, 6.55490914e-260],
[ 5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])
>>> empty( (2,3) ) # the content may change in different invocations
array([[ 3.14678735e-307, 6.02658058e-154, 6.55490914e-260],
[ 5.30498948e-313, 3.73603967e-262, 8.70018275e-313]])
To create sequences of numbers, NumPy provides a function analogous to range that returns arrays instead of lists
>>> arange( 10, 30, 5 )
array([10, 15, 20, 25])
>>> arange( 0, 2, 0.3 ) # it accepts float arguments
array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])
Using arange with floating point arguments, it is generally not possible to predict the number of elements obtained (because of the floating point precision). For this reason, it is usually better to use the function linspace that receives as an argument the number of elements that we want, instead of the step:
>>> linspace( 0, 2, 9 ) # 9 numbers from 0 to 2
array([ 0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])
>>> x = linspace( 0, 2*pi, 100 ) # useful to evaluate function at lots of points
>>> f = sin(x)
- See also
array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange, linspace, rand, randn, fromfunction, fromfile
Printing Arrays
When you print an array, NumPy displays it in a similar way to nested lists, but with the following layout:
- the last axis is printed from left to right,
- the last but one, from top to bottom,
- and the rest, also from top to bottom, separating each slice by an empty line.
One dimensional arrays are then printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices.
>>> a = arange(6) # 1d array
>>> print a
[0 1 2 3 4 5]
>>>
>>> b = arange(12).reshape(4,3) # 2d array
>>> print b
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
>>>
>>> c = arange(24).reshape(2,3,4) # 3d array
>>> print c
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
See below to get more details on reshape.
If an array is too large to be printed, NumPy automatically skips the central part of the array and only prints the corners:
>>> print arange(10000)
[ 0 1 2 ..., 9997 9998 9999]
>>>
>>> print arange(10000).reshape(100,100)
[[ 0 1 2 ..., 97 98 99]
[ 100 101 102 ..., 197 198 199]
[ 200 201 202 ..., 297 298 299]
...,
[9700 9701 9702 ..., 9797 9798 9799]
[9800 9801 9802 ..., 9897 9898 9899]
[9900 9901 9902 ..., 9997 9998 9999]]
To disable this behaviour and force NumPy to print the entire array, you can change the printing options using set_printoptions.
>>> set_printoptions(threshold=nan)
Basic Operations
Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.
>>> a = array( [20,30,40,50] )
>>> b = arange( 4 )
>>> c = a-b
>>> c
array([20, 29, 38, 47])
>>> b**2
array([0, 1, 4, 9])
>>> 10*sin(a)
array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
>>> a<35
array([True, True, False, False], dtype=bool)
Note
Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product can be performed using the dot function or creating matrix objects.
>>> A = array( [[1,1], ... [0,1]] ) >>> B = array( [[2,0], ... [3,4]] ) >>> A*B # elementwise product array([[2, 0], [0, 4]]) >>> dot(A,B) # matrix product array([[5, 4], [3, 4]]) >>> mat(A)*mat(B) # matrix product between matrix objects matrix([[5, 4], [3, 4]])
NumPy arrays are not specifically designed to represent matrices, but any kind of multidimensional data. This motivates the use of * for the elementwise product. To work with matrices easily, however, NumPy also offers a subclass of ndarray called matrix that operates like standard 2D matrices, see the matrix section of this tutorial.
It is possible to perform some operations inplace so that no new array is created.
>>> a = ones((2,3), dtype=int)
>>> b = random.random((2,3))
>>> a *= 3
>>> a
array([[3, 3, 3],
[3, 3, 3]])
>>> b += a
>>> b
array([[ 3.69092703, 3.8324276 , 3.0114541 ],
[ 3.18679111, 3.3039349 , 3.37600289]])
>>> a += b # b is converted to integer type
>>> a
array([[6, 6, 6],
[6, 6, 6]])
When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise one.
>>> a = ones(3, dtype=int32)
>>> b = linspace(0,pi,3)
>>> b.dtype.name
'float64'
>>> c = a+b
>>> c
array([ 1. , 2.57079633, 4.14159265])
>>> c.dtype.name
'float64'
>>> d = exp(c*1j)
>>> d
array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
-0.54030231-0.84147098j])
>>> d.dtype.name
'complex128'
This is called upcasting and it's ruled by the following table.
|
bool_ |
uint8 |
uint16 |
uint32 |
uint64 |
int8 |
int16 |
int32 |
int64 |
float32 |
float64 |
float96 |
complex64 |
complex128 |
complex192 |
bool_ |
bool |
uint8 |
uint16 |
uint32 |
uint64 |
int8 |
int16 |
int32 |
int64 |
float32 |
float64 |
float96 |
complex64 |
complex128 |
complex192 |
uint8 |
|
uint8 |
uint16 |
uint32 |
uint64 |
int16 |
int16 |
int32 |
int64 |
float32 |
float64 |
float96 |
complex64 |
complex128 |
complex192 |
uint16 |
|
|
uint16 |
uint32 |
uint64 |
int32 |
int32 |
int32 |
int64 |
float32 |
float64 |
float96 |
complex64 |
complex128 |
complex192 |
uint32 |
|
|
|
uint32 |
uint64 |
int64 |
int64 |
int64 |
int64 |
float64 |
float64 |
float96 |
complex128 |
complex128 |
complex192 |
uint64 |
|
|
|
|
uint64 |
float64 |
float64 |
float64 |
float64 |
float64 |
float64 |
float96 |
complex128 |
complex128 |
complex192 |
int8 |
|
|
|
|
|
int8 |
int16 |
int32 |
int64 |
float32 |
float64 |
float96 |
complex64 |
complex128 |
complex192 |
int16 |
|
|
|
|
|
|
int16 |
int32 |
int64 |
float32 |
float64 |
float96 |
complex64 |
complex128 |
complex192 |
int32 |
|
|
|
|
|
|
|
int32 |
int64 |
float64 |
float64 |
float96 |
complex128 |
complex128 |
complex192 |
int64 |
|
|
|
|
|
|
|
|
int64 |
float64 |
float64 |
float96 |
complex128 |
complex128 |
complex192 |
float32 |
|
|
|
|
|
|
|
|
|
float32 |
float64 |
float96 |
complex64 |
complex128 |
complex192 |
float64 |
|
|
|
|
|
|
|
|
|
|
float64 |
float96 |
complex128 |
complex128 |
complex192 |
float96 |
|
|
|
|
|
|
|
|
|
|
|
float96 |
complex192 |
complex192 |
complex192 |
complex64 |
|
|
|
|
|
|
|
|
|
|
|
|
complex64 |
complex128 |
complex192 |
complex128 |
|
|
|
|
|
|
|
|
|
|
|
|
|
complex128 |
complex192 |
complex192 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
complex192 |
Many unary operations, like computing the sum of all the elements in the array, are implemented as methods of the ndarray class.
>>> a = random.random((2,3))
>>> a
array([[ 0.6903007 , 0.39168346, 0.16524769],
[ 0.48819875, 0.77188505, 0.94792155]])
>>> a.sum()
3.4552372100521485
>>> a.min()
0.16524768654743593
>>> a.max()
0.9479215542670073
By default, these operations apply to the array as if it were a list of numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified axis of an array:
>>> b = arange(12).reshape(3,4)
>>> b
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> b.sum(axis=0) # sum of each column
array([12, 15, 18, 21])
>>>
>>> b.min(axis=1) # min of each row
array([0, 4, 8])
>>>
>>> b.cumsum(axis=1) # cumulative sum along the rows
array([[ 0, 1, 3, 6],
[ 4, 9, 15, 22],
[ 8, 17, 27, 38]])
- See also
all, alltrue, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, conjugate, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sometrue, sort, std, sum, trace, transpose, var, vdot, vectorize, where
Indexing, Slicing and Iterating
One dimensional arrays can be indexed, sliced and iterated over pretty much like lists and other Python sequences.
>>> a = arange(10)**3
>>> a
array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a