This tutorial is unfinished. The original authors were not NumPy experts nor native English speakers so it needs reviewing. Please do not hesitate to click the edit button. You will need to create a User Account first.

Quick Tour

NumPy is a Python library for working with multidimensional arrays. The main data type is an array. An array is a set of elements, all of the same type, indexed by a vector of nonnegative integers.

Arrays can be created in different ways:

>>> from numpy import *
>>> a = array( [ 10, 20, 30, 40 ] )   # create an array out of a list
>>> a
array([10, 20, 30, 40])
>>> b = arange( 4 )                   # create an array from 0 to 3
>>> print b
[0 1 2 3]

and new arrays can be obtained by operating with existing arrays:

>>> c = a+sin(b)                      # elementwise operations
>>> c
array([ 10.        ,  20.84147098,  30.90929743,  40.14112001])

Arrays can have more than one dimension:

>>> x = ones( (3,4) )
>>> x
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])
>>> x.shape                            # a tuple with the dimensions
(3, 4)

and you can change the dimensions of existing arrays:

>>> y = arange(12)
>>> y
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> y.shape = 3,4              # does not modify the total number of elements
>>> y
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

It is possible to operate with arrays of different dimensions as long as they fit well.

>>> 3*a                                # multiply each element of a by 3
array([ 30,  60,  90, 120])
>>> a+y                                # sum a to each row of y
array([[10, 21, 32, 43],
       [14, 25, 36, 47],
       [18, 29, 40, 51]])

Similarly to Python lists, arrays can be indexed, sliced and iterated over.

>>> a[2:4] = -7,-3                     # modify last two elements of a
>>> for i in a:                        # iterate over a
...     print i
...
10
20
-7
-3

When indexing more than one dimension, indices are separated by commas.

>>> x[1,2] = 20
>>> x[1,:]                             # x's second row
array([ 1,  1, 20,  1])
>>> x[0] = a                           # change first row of x
>>> x
array([[10, 20, -7, -3],
       [ 1,  1, 20,  1],
       [ 1,  1,  1,  1]])

Prerequisites

Before reading this tutorial you should know a bit of Python. If this is not the case, or if you want to refresh your memory, take a look at the Python tutorial. In particular, you may wish to read up to section 6 (Modules).

You also need to have some software installed on your computer. You need at least

but there are some other tools that may be useful to you:

The Basics

NumPy's main object is the homogeneous multidimensional array. This is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. Typical examples of multidimensional arrays include vectors, matrices, images and spreadsheets.

By 'multidimensional', we mean that arrays can have several dimensions or axes. Because the word dimension is ambiguous, we use axis instead. The number of axes will often be called rank.

For example, the coordinates of a point in 3D space [1, 2, 1] is an array of rank 1 ---it has one axis. That axis has a length of 3. As another example, the array

[[ 1., 0., 0.],
 [ 0., 1., 2.]]

is an array of rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3. For more details see the Numpy Glossary.

The multidimensional array class is called ndarray. Note that this is not the same as the Standard Python Library class array, which is only for one-dimensional arrays. The more important attributes of an ndarray object are:

ndarray.ndim

the number of axes (dimensions) of the array. In the Python world, the number of dimensions is often referred to as rank.

ndarray.shape

the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.

ndarray.size

the total number of elements of the array. This is equal to the product of the elements of shape.

ndarray.dtype

an object describing the type of the elements in the array. One can create or specify dtype's using standard Python types. NumPy provides a bunch of them, for example: bool_, character, int_, int8, int16, int32, int64, float_, float8, float16, float32, float64, complex_, complex64, object_.

ndarray.itemsize

the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.

ndarray.data
the buffer containig the actual elements of the array. Normally, we won't need to use this attribute because we will access to the elements in an array using indexing facilities.

An example

We define the following array:

>>> a = arange(10).reshape(2,5)
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

We have just created an array object with a label a attached to it. The array a has several attributes --or properties. In Python, attributes of a specific object are denoted name_object.attribute. In our case:

You can check all these attributes just by typing them in the interactive session:

>>> a.shape
(2, 5)
>>> a.dtype.name
'int32'

And so on.

Array Creation

There are many ways to create arrays. For example, you can create an array from a regular Python list or tuple using the array function.

>>> a = array( [2,3,4] )
>>> a
array([2, 3, 4])
>>> type(a)                                     # a is an object of the ndarray class
<type 'numpy.ndarray'>

array transforms sequences of sequences into bidimensional arrays, and it transforms sequences of sequences of sequences into tridimensional arrays, and so on. The type of the resulting array is deduced from the type of the elements in the sequences.

>>> b = array( [ (1.5,2,3), (4,5,6) ] )         # this will be an array of floats
>>> b
array([[ 1.5,  2. ,  3. ],
       [ 4. ,  5. ,  6. ]])

Once we have an array we can take a look at its attributes:

>>> b.ndim                                      # number of dimensions
2
>>> b.shape                                     # the dimensions
(2, 3)
>>> b.dtype                                     # the type (8 byte floats)
dtype('float64')
>>> b.itemsize                                  # the size the type
8

The type of the array can also be explicitly specified at creation time:

>>> c = array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j]])

Warning

The function array is not the only one that creates arrays. Usually the elements of the array are not known from the beginning, and a placeholder array is needed. There are some functions to create arrays with some initial content. By default, the type of the created array is float64.

The function zeros creates an array full of zeros, and the function ones creates an array full of ones.

>>> zeros( (3,4) )                              # the parameter specifies the shape
array([[0.,  0.,  0.,  0.],
       [0.,  0.,  0.,  0.],
       [0.,  0.,  0.,  0.]])
>>> ones( (2,3,4), dtype=int16 )                # dtype can also be specified
array([[[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]],
       [[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]]], dtype=int16)

The function empty creates an array without filling it in. Then the initial content is random and it depends on the state of the memory.

>>> empty( (2,3) )
array([[  3.73603959e-262,   6.02658058e-154,   6.55490914e-260],
       [  5.30498948e-313,   3.14673309e-307,   1.00000000e+000]])
>>> empty( (2,3) )                              # the content may change in different invocations
array([[  3.14678735e-307,   6.02658058e-154,   6.55490914e-260],
       [  5.30498948e-313,   3.73603967e-262,   8.70018275e-313]])

To create sequences of numbers, NumPy provides a function analogous to range that returns arrays instead of lists

>>> arange( 10, 30, 5 )
array([10, 15, 20, 25])
>>> arange( 0, 2, 0.3 )                 # it accepts float arguments
array([ 0. ,  0.3,  0.6,  0.9,  1.2,  1.5,  1.8])

Using arange with floating point arguments, it is generally not possible to predict the number of elements obtained (because of the floating point precision). For this reason, it is usually better to use the function linspace that receives as an argument the number of elements that we want, instead of the step:

>>> linspace( 0, 2, 9 )                 # 9 numbers from 0 to 2
array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ,  1.25,  1.5 ,  1.75,  2.  ])
>>> x = linspace( 0, 2*pi, 100 )        # useful to evaluate function at lots of points
>>> f = sin(x)
See also

array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange, linspace, rand, randn, fromfunction, fromfile

Printing Arrays

When you print an array, NumPy displays it in a similar way to nested lists, but with the following layout:

One dimensional arrays are then printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices.

>>> a = arange(6)                         # 1d array
>>> print a
[0 1 2 3 4 5]
>>>
>>> b = arange(12).reshape(4,3)           # 2d array
>>> print b
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
>>>
>>> c = arange(24).reshape(2,3,4)         # 3d array
>>> print c
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]
 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

See below to get more details on reshape.

If an array is too large to be printed, NumPy automatically skips the central part of the array and only prints the corners:

>>> print arange(10000)
[   0    1    2 ..., 9997 9998 9999]
>>>
>>> print arange(10000).reshape(100,100)
[[   0    1    2 ...,   97   98   99]
 [ 100  101  102 ...,  197  198  199]
 [ 200  201  202 ...,  297  298  299]
 ...,
 [9700 9701 9702 ..., 9797 9798 9799]
 [9800 9801 9802 ..., 9897 9898 9899]
 [9900 9901 9902 ..., 9997 9998 9999]]

To disable this behaviour and force NumPy to print the entire array, you can change the printing options using set_printoptions.

>>> set_printoptions(threshold=nan)

Basic Operations

Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.

>>> a = array( [20,30,40,50] )
>>> b = arange( 4 )
>>> c = a-b
>>> c
array([20, 29, 38, 47])
>>> b**2
array([0, 1, 4, 9])
>>> 10*sin(a)
array([ 9.12945251, -9.88031624,  7.4511316 , -2.62374854])
>>> a<35
array([True, True, False, False], dtype=bool)

Note

It is possible to perform some operations inplace so that no new array is created.

>>> a = ones((2,3), dtype=int)
>>> b = random.random((2,3))
>>> a *= 3
>>> a
array([[3, 3, 3],
       [3, 3, 3]])
>>> b += a
>>> b
array([[ 3.69092703,  3.8324276 ,  3.0114541 ],
       [ 3.18679111,  3.3039349 ,  3.37600289]])
>>> a += b                                  # b is converted to integer type
>>> a
array([[6, 6, 6],
       [6, 6, 6]])

When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise one.

>>> a = ones(3, dtype=int32)
>>> b = linspace(0,pi,3)
>>> b.dtype.name
'float64'
>>> c = a+b
>>> c
array([ 1.        ,  2.57079633,  4.14159265])
>>> c.dtype.name
'float64'
>>> d = exp(c*1j)
>>> d
array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
       -0.54030231-0.84147098j])
>>> d.dtype.name
'complex128'

This is called upcasting and it's ruled by the following table.

bool_

uint8

uint16

uint32

uint64

int8

int16

int32

int64

float32

float64

float96

complex64

complex128

complex192

bool_

bool

uint8

uint16

uint32

uint64

int8

int16

int32

int64

float32

float64

float96

complex64

complex128

complex192

uint8

uint8

uint16

uint32

uint64

int16

int16

int32

int64

float32

float64

float96

complex64

complex128

complex192

uint16

uint16

uint32

uint64

int32

int32

int32

int64

float32

float64

float96

complex64

complex128

complex192

uint32

uint32

uint64

int64

int64

int64

int64

float64

float64

float96

complex128

complex128

complex192

uint64

uint64

float64

float64

float64

float64

float64

float64

float96

complex128

complex128

complex192

int8

int8

int16

int32

int64

float32

float64

float96

complex64

complex128

complex192

int16

int16

int32

int64

float32

float64

float96

complex64

complex128

complex192

int32

int32

int64

float64

float64

float96

complex128

complex128

complex192

int64

int64

float64

float64

float96

complex128

complex128

complex192

float32

float32

float64

float96

complex64

complex128

complex192

float64

float64

float96

complex128

complex128

complex192

float96

float96

complex192

complex192

complex192

complex64

complex64

complex128

complex192

complex128

complex128

complex192

complex192

complex192

Many unary operations, like computing the sum of all the elements in the array, are implemented as methods of the ndarray class.

>>> a = random.random((2,3))
>>> a
array([[ 0.6903007 ,  0.39168346,  0.16524769],
       [ 0.48819875,  0.77188505,  0.94792155]])
>>> a.sum()
3.4552372100521485
>>> a.min()
0.16524768654743593
>>> a.max()
0.9479215542670073

By default, these operations apply to the array as if it were a list of numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified axis of an array:

>>> b = arange(12).reshape(3,4)
>>> b
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> b.sum(axis=0)                            # sum of each column
array([12, 15, 18, 21])
>>>
>>> b.min(axis=1)                            # min of each row
array([0, 4, 8])
>>>
>>> b.cumsum(axis=1)                         # cumulative sum along the rows
array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]])
See also

all, alltrue, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, conjugate, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sometrue, sort, std, sum, trace, transpose, var, vdot, vectorize, where

Indexing, Slicing and Iterating

One dimensional arrays can be indexed, sliced and iterated over pretty much like lists and other Python sequences.

>>> a = arange(10)**3
>>> a
array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a