BaseArray was a proposed base multidimensional array type, ready-to-include in the Python core sometime in the future. This page aims at summarizing the various efforts that have gone in this direction. The long-term outcome of these is PEP 3118, Revising the buffer protocol. Note that alot of the text on this page is taken directly from the text description of the PEP.

Introduction

Multidimensional arrays are often used in scientific and engineering programming, but they have uses in other areas as well, as evidenced by the popularity of spreadsheets and image-processing applications. Python, however, has no default multidimensional array object.

Since 1995, Numeric has been filling the need for multidimensional arrays for many users as a standard optional Python library. The only thing holding it back from the standard library has been stability, and, in particular, a desire on the part of the user community to have a faster release cycle for the array. While some changes have occurred in the functionality of the array object as it has progressed from Numeric to numarray to scipy core (numpy), what has not changed significantly is the interface, which allows Python users to interact with an array of bytes described by a certain shape, memory layout, and type description.

In fact, this interface has recently been formalized by the creation of a description called the "array_interface", which any Python object can consume and/or export. To improve the usefullness of array_interface, however, it would be usefull to have a mechanism via which objects could use the interface quickly on the C-level.

It would be beneficial to the community as a whole to place this common interface into Python itself. This would allow a wider Python community to quickly interact with and use the data in multidimension arrays without requiring or depending on a third-party package. It would also allow Python to work seamlessly with more capable multidimensional array objects that scientific users could install.

The PEP proposes adding a new builtin type to the Python language, a generic multi-dimensional array type (basearray), and an associated type dsecribing the type of data it carries (datatype). The basearray type would have a C-structure similar to that which has been constant in Numeric and few other features. Together, these objects would implement the array interface specification introduced to Numeric and numarray in April 2005, and encourage the use of this interface both in extensions and Python code in general.

Purpose of basearray

The obvious purpose of basearray is to provide a base multidimensional array type for the Python distribution. This, however, is also the means by which other goals, more important in the long run, can be achieved.

Code description

The proposed basearray type does not have a fully-filled type-object structure. In other words, basearray is above all a place-holder and base-type, of which other, more capable array objects can be subtypes of. Besides serving as a base type, the object exports and consumes the array interface.

Alongside basearray and datatype (a descriptor of the type an array carries), an array iterator type is defined to facilitate some procedures. There are also two auxialliary structures and a number of C-API functions.

Ultimately, there are two files to be added: basearray.c and basearray.h.

C structures defined

Related projects

Summer of Code Project

Preparing the interface is formally part of a Google Summer Of Code project (list of PSF projects), currently being worked on by KarolLangner - Base multidimensional array type for Python core.

Original application

Proposal title: Base multidimensional array type for Python core

Author: Karol Marek Langner

Mentor: Travis E. Oliphant

Goals

The goal is to prepare a simple, generic multidimensional array interface that can be readily placed in the Python core as a new built-in base type (called, for instance, "dimarray"), and possibly included in a future Python distribution (maybe 2.6?). This new base type will have the same C-structure as the current array implementation in numpy and will be based on a interface recently formulated by Travis Oliphant within a Python Enhancement Proposal (http://svn.scipy.org/svn/PEP/). After preparing a "ready to insert" version of the array interface, it will be applied to numpy and several other packages that work with multidimensional data, and possibly modified in order to work out an optimal scope.

Entire application: BaseArray/Application

Changes in schedule

Due to a late start, the planned realization dates for the project need to be changed. The objective now is to have a complete, minimum base type by roughly July 10th. After that, work will be focused on using it packages that utilize multidimensional data (as described in the application), with roughly two weeks for each package.

BaseArray (last edited 2007-10-27 15:50:24 by KarolLangner)