Mlabwrap proxy objects

The problem with proxied objects

Matlab's structs are represented by python's Mlabobjectproxy objects in mlabwrap. The python variable ps in the following example is a Mlabobjectproxy object that proxies a matlab struct of the form struct('a', [11, 12]).

>>> ps = mlab.struct('a', [11, 12])
>>> ps
<MlabObjectProxy of matlab-class: 'struct'; internal name: 'PROXY_VAL6__'; has parent: no>
    a: [2x1 double]


>>> ps.a
array([[ 11.],
       [ 12.]])
>>> ps.a[1]
array([ 12.])

Trouble arisses when we attempt indexed assignment to the array field a of the struct proxy object ps. For example:

>>> ps.a[1] = 13
>>> ps.a
array([[ 11.],
       [ 12.]])

This is not the desired effect. We would like ps.a[1] to equal 13 both on python's side and matlab's side.

The problem is a result of how python interprets attribute references with the dot notation.

For a matlab proxy object, the actual data is stored in the matlab workspace, and the proxy object transparently calls into matlab to fetch the required values. So, python interprets the expression ps.a = 3 as ps.__setattr__('a', 3), and similarly x = ps.a as x = ps.__getattr__('a'). Because ps.a is a proxy object, Mlabwrap can override __setattr__ and __getattr__ for the Mlabproxy objects. Therefore a = ps.a called the proxy object __getattr__ method to fetch the data from the matlab workspace variable and copy it into the python array. Similarly, ps.a = 3 takes the data (3) from python and copies it into the variable in the matlab workspace.

In the example above python interprets ps.a[1] = 3 as ps.__getattr__('a').__setitem__(1, 3) where ps.__getattr__('a') asks matlab for the value of ps.a and returns some python array, which is a copy of the array from matlab that contains the same data. The new value is therefore set in the python array copy, but not set in the matlab workspace array that the data came from.

A solution

The problem above arises because of the difference in storage between proxy objects and matlab objects that can be fully converted, such as arrays. A solution therefore would be to convert all matlab variables fully to python objects. In the case of matlab structs and objects, we can convert to numpy recarrays and do away with Mlabproxy objects altogether.

When we pull something across from matlab into python simply return a recarray containing the same data. When we call a matlab function on a python variable just copy the variable's data into matlab. We don't proxy anything.

  • Matlab matrices are already converted into numpy arrays.

Currently both matlab structs and objects are proxied in mlabwrap. Instead:

  • Matlab structs can be converted into numpy recarrays. Recarrays are isomorphic to matlab structs. Their attributes are ordered, allowing for arrays of records that may be indexed by position and attribute. Their attributes may be recarrays, allowing for abitrary recursive structures.
  • Matlab objects in memory or on disk are in fact structs of their fields along with a special attribute that defines their class name. Hence matlab objects can also be converted to recarrays. Method calls should not require special provisions since in matlab method calls use the same syntax as function calls. The function or method that is called depends on the object passed, so calling method1(o) in matlab, causes matlab to identify o as class someclass, and then inspect the someclass object definition directory for a function method1; if this does not exist, matlab looks for method1 on the global path. To give another example, object o has method some_method(self, arg1). In matlab we call this method on object o by some_method(o, 3);.

A quick review of numpy recarrays:

>>> import numpy
>>> ra_child1 = numpy.rec.array([1, 2, 3], names=['a', 'b', 'c'])
>>> ra_child1
recarray((1, 2, 3),
      dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4')])
>>> ra_child1['a']
1
>>> ra_child1['c']
3
>>> ra_child2 = numpy.rec.array([5, 6], names=['x', 'y'])
>>> desc = numpy.dtype({'names' : ['child', 'n'], 'formats': ['object', 'float']})
>>> ra_parent = numpy.array([(ra_child1, 3.2), (ra_child2, 4.5)], dtype=desc)
>>> ra_parent
array([((1, 2, 3), 3.2000000000000002), ((5, 6), 4.5)],
      dtype=[('child', '|O4'), ('n', '<f8')])
>>> ra_parent[0]
((1, 2, 3), 3.2000000000000002)
>>> ra_parent[1]
((5, 6), 4.5)
>>> ra_parent['child']
array([(1, 2, 3), (5, 6)], dtype=object)
>>> ra_parent['n']
array([ 3.2,  4.5])
>>> ra_parent[0]['child']['a']
1
>>> ra_parent[1]['child']['x']
5

Problems with the solution

This last type of conversion, matlab objects to python recarrays, still has problems. In particular, using recarrays to represent matlab objects by their underlying structs results in certain non-matlab behavior on the python side of things. In matlab we cannot directly access an object's fields. Instead we must either call an accessor method on the object to return a field, or overload the object's subsref method. subsref is called by matlab when the dot notation is used on an object. In this way we can use dot notation to access an object's fields. However, since this dot notation is actually calling an accessor method, this accessor method can really return anything it likes. We can't count on the dot notation to return the field it names. For example, in matlab we cannot count on

x = o.field1;

actually setting x to the value of o's field1 since o.field1 might actually call a method that returns o's field2, or 8.

On the python side, if po is o's python representation, we can definitely count on

x = po.field1

setting x to the value of o's field1 since po is just a recarray of o's fields.

(More: add real funny subsref example.)

A proposed solution to this problem (with the proposed solution) is to embrace this difference in behavior and make it clearly defined. On the python side, whenever an expression appears on the right side of an mlab. the user can expect matlab-like behavior. A few examples:

>>> I = mlab.eye(4)
>>> B = mlab.inv(A)
>>> x = mlab.some_accessor_method(po)

Otherwise, python behavior should be expected.