Arrays proposal: array broadcasting, and adding new axes

I didn't know whether to start a new thread for each topic, or to stick with one thread. I started a new thread, but can do the other if more appropriate.

When composing this message, I didn't know how much background to give, so I probably erred on the long side.

First the background, then the proposal.

Brad

Background:

SRFI 231 does not have array broadcasting, either explicit or explicit, or a way to add new axes to an existing array.

Array broadcasting repeats an existing array or scalar as array "slices" along one or more new axes (or existing, length one, axes). Broadcasting is very common in array libraries---it's in Racket's math/array, Python's NumPy, Stan, Julia, Javascript stdlib, Matlab, etc.; links can be found here:

But I didn't see a need for broadcasting in SRFI 231 given that SRFI's other features.

For example, in Python broadcasting is used to implement both APL-style inner and outer products. (References are in the message linked above, and there are a few other comments in that thread.) SRFI 231 has APL-style array-inner-product and array-outer-product directly.

Or, broadcasting is used to simplify notation, to write A x s for an array A and a scalar s and have the value of s "broadcast" to an implicit array the same size and shape as A for component-wise multiplication.

Or, to use another common example, one could have a two-dimensional array A and a one-dimensional vector v, interpreted as a row vector, and write A-v to subtract v from each row of A.

But in SRFI 231 these examples are not so hard to do, because the SRFI has array-map (which deals directly with the first example) and array-curry, which decomposes an array by dimension; in the second example one would interpret the two-dimensional array A as a one-dimensional array of one-dimensional arrays, and then use array-map to subtract v from each of those one-dimensional rows (as arrays themselves). The array broadcasting approach would extend the vector v to a two-dimensional array of the right size and shape and the subtract that new array from A component-wise.

A further complication is that SRFI 231 arrays can have nonzero lower bounds in indices, while the other systems have (implicit) zero lower bounds in each dimension.

SRFI 231 does have a very general

(array-share old-array new-domain mapping-from-new-domain-to-old-domain)

but the SRFI requires that the affine mapping-from-new-domain-to-old-domain from new multi-indices to existing multi-indices be one-to-one, so that mutating one array entry does not change a different array entry. To write broadcasting in this way necessarily requires that the mapping be many-to-one. (By the way, it appears that checking whether the mapping is 1-1 is equivalent to the small lattice vector problem, which I hear is NP-hard.)

Proposal:

In the mail thread starting here:

I laid out some of the issues I found problematic when considering integrating array broadcasting into SRFI 231 and then made the following initial proposal for new functionality.

I invite comments about this, or any other, aspect of the array SRFI.

==============================

Procedure: (array<-scalar obj)

Returns (make-array (make-interval '#()) (lambda () obj)). (People seem
to like treating scalars like arrays, so let's make it easier to do so.)

Procedure: (array-add-axes array vector)

Sample calls:

(array-add-axes A '#(* 0 1 * 2 * 3))
(array-add-axes A (vector '* 0 1 '* 2 '* 3))
(array-add-axes A '#(* * *))      ;; if A is zero-dimensional

Assumes that the first argument A is an array with dimension d.

Assumes the second argument is a vector v whose elements are the exact
integers 0, 1, ..., d-1, in order, interspersed with zero or more
instances of the symbol *. [In a later message I note that there is no real reason to require that the integers be in order, in which case this routine would generalize array-permute.]

Returns a new array A* that shares its elements with A in the same
lexicographical order.

Any entry in v with a nonnegative integer j entry denotes an axis in the
domain of A* with the same lower and upper bounds as the jth axis of
domain of A.

Any entry in v with the symbol * denotes a new axis in the domain of A*
with lower bound 0 and upper bound 1.

[In a later message, I ask whether array-squeeze, which is a partial inverse to array-add-axes and is given in the SRFI document as an example, should be added to the SRFI if array-add-axes is added to the SRFI.]

Procedure: (array-broadcast list-of-As)

Assumes that list-of-As is a nonempty list of immutable arrays.

Let D be the maximum of the dimensions of the arrays in list-of-As.

A new list of arrays, list-of-A*s, is constructed by adding axes to the
left of existing axes of the arrays in list-of-As with array-add-axes so
that all arrays in list-of-A*s have dimension D.

The arrays in list-of-A*s are now required to have the following
properties, checked in order:

  1. If the domain of any A* in list-of-A*s has an axis j with nonzero
    lower bound, then the domains of all arrays A* in list-of-A*s must have
    the same lower and upper bounds for axis j.

  2. If the domain of any A* in list-of-A*s has an axis j with an
    upper-bound u_j different from 1, then axis j in the domain of all
    arrays A* in list-of-A*s must have either upper bound u_j or upper bound 1.

We then use the usual broadcast rules (which we have not yet specified
here) to form a new list-of-A**s containing arrays A** all with the same
domain.

1 Like

This is a detail, but I am curious about the choice of array<-scalar over scalar->array.
Am I missing some convention?

I can't recreate my thinking, it should really be scalar->array.

1 Like

Maybe scalar->array isn't needed, it's a one-line definition:

> (define (scalar->array obj) (list*->array 0 obj))
> (array->list* (scalar->array 42))
42

I guess the question is how would people want to use it.

I would include scalar->array since the operations matches a conceptual operation.
The other expression (list*->array 0 obj) requires switching to a more low-level way of thinking.

1 Like

Yes, good.

But the operation wraps any value, not just a scalar and possibly another array, in an array of dimension zero, as in

heine:~/text/courses> /usr/local/racket/bin/racket
Welcome to Racket v8.16.0.2 [cs].
> (require math/array)
>  (array-shape (array (array #[0 1 2])))
'#()

So maybe object->array? Just ->array? (But the latter might be confusing, because people might think that (->array '#(0 1 2)) would process the vector into an array with three elements.)

But the operation wraps any value, not just a scalar ...
You are right, I didn't further than scalars.

I think object->array is a fine choice.

Should object->array have the calling sequence

(object->array object [ storage-class [ mutable? ] ] )

like the other conversion routines, or just the simpler

(object->array object)

which returns an immutable array with generic-storage-class?

The advantage of the second is that there is no error checking needed, no possible way to raise an exception, and I can't see the benefit of the first except for a certain uniformity of calling sequence.

Maybe I'm just tired, but I've been going back and forth on this while implementing it.

The most recent github commit Add interval-insert-axis, compute-broadcast-array, object->array, arr… · gambiteer/srfi-231@90c70e0 · GitHub adds interval-insert-axis, array-insert-axis, compute-broadcast-array, and object->array, all routines that are needed to implement general array broadcasting.

In the next commit, I will add a stripped-down version of the array library, with all the functionality but no error checking and no "optimizations". (I put "optimizations in quotes because it seems that sometimes the stripped-down library is faster than the general library.) The usual generic-arrays.scm has much code specialization for special cases and tries to catch and raise errors as soon as possible.

The idea behind mini-arrays.scm is to offer a second implementation, so that independent implementers can choose where on the axis between mini-arrays.scm and generic-arrays.scm they would like their implementations to sit.

There will also be a file test-mini-arrays.scm that is instrumented to include or elide tests to see that appropriate exceptions are raised, so it can be used to test both generic-arrays.scm and mini-arrays.scm.

TL;DR: Should array-broadcast and array-insert-axis return only immutable array results?

NumPy has deprecated allowing a broadcasted array be mutable ("writable"), probably because, in general, changing one element of a broadcasted array changes other elements. See numpy.broadcast_arrays — NumPy v2.3 Manual . Strictly speaking, this restriction is not needed when the broadcast result has no more elements than the argument.

math/array has a similar routine, array-broadcast, 6.10 Pointwise Operations , on which I modeled my own version.

I think NumPy's restriction is a good idea, so I thought I would have my array-broadcast return only immutable arrays.

NumPy also has the function numpy.expand_dims, which adds an axis of width 1 to its array argument. The result of numpy.expand_dims has the same elements as the argument, in the same order, so there is no need to restrict the result to be immutable, and indeed NumPy does not restrict the result to be immutable.

math/array has the similar (array-axis-insert arr k [dk]), which adds a new axis with an optional parameter that specifies an axis width greater than 1.

I modeled the my new array-axis-insert on math/array's procedure.

The question is whether arrays returned by my array-axis-insert should be immutable, or should be immutable only when dk>1, to use math/array's notation.

One could similarly ask whether the result of array-broadcast should be immutable only when there are more elements in the result than in the argument. Perhaps a similar question led to the following design decision in math/array: 6.10 Pointwise Operations

When array-strictness is #t, array-broadcast returns a strict array when arr is nonstrict and the result has more elements than arr.

Here is an example of broadcasting that does not involve mutation and illustrates a situation that has been troubling me.

TL;DR: I think broadcasting, either explicit or implicit, should be applied only to specialized (strict) arrays, and the resulting arrays should be immutable.

You start with an N x N array of 3-vectors, a vector-field.

The task is to subtract the average of the 3-vectors from the N x N array of 3-vectors.

With broadcasting, one might write:

(define (recenter A)
  
  ;; A is an array of dimension N x N  x 3 
  ;; representing a two-dimensional N x N field of three-vectors.
  
  ;; We want to subtract the mean of the three-vectors from all the
  ;; entries of A, which we could write as follows.

  ;; The "problem" is that the array-map marked by
  ;; <<<<<<<<<<<<<<<
  ;; returns a generalized array (and I don't want to change that)
  ;; so every time an element of the average of the three-vectors is
  ;; accessed, the internal array-reduce is recomputed.

  (array-copy
   (array-map fl- A (array-broadcast
                     (array-map  ;; <<<<<<<<<<<<<<<
                      ;; calculate the average of each coordinate
                      ;; of the three-vectors.
                      (lambda (slice)
                        (/ (array-reduce fl+ slice)
                           (interval-volume (array-domain slice))))
                      (array-curry
                       ;; write A as a three-dimensional array of NxN arrays
                       (array-permute
                        ;; put the coordinates of the three-vectors first
                        A '#(2 0 1))
                       2))
                     (array-domain A)))))

Here I've explicitly called array-broadcast, but with implicit broadcasting that explicit call would be elided.

I now have an efficient implementation of array broadcasting for specialized (strict) arrays, but not for generalized (nonstrict) arrays (which is not so easy, in general). So here are two routines with as efficient implementations of broadcasting generalized and specialized arrays as I can imagine:

(define (recenter-generalized A)
  (array-copy
   (array-map - A (let* ((average
                          ;; the generalized array result is not copied to a specialized array
                          (array-map
                           (lambda (slice)
                             (/ (array-reduce fl+ slice)
                                (interval-volume (array-domain slice))))
                           (array-curry
                            (array-permute
                             A '#(2 0 1))
                            2)))
                         (average-getter ;; use internal unsafe getter
                          (%%array-unsafe-getter average)))
                    (make-array (array-domain A)
                                (lambda (i j k)
                                  (average-getter k)))))))

(define (recenter-specialized A)
  (array-copy
   (array-map - A (array-broadcast
                   ;; copy the array to a specialized array first
                   (array-copy
                    (array-map
                     (lambda (slice)
                       (/ (array-reduce fl+ slice)
                          (interval-volume (array-domain slice))))
                     (array-curry
                      (array-permute
                       A '#(2 0 1))
                      2)))
                   (array-domain A)))))

And we can test it on very small and not quite so small arrays:

(define A (list->array (make-interval '#(10 10 3)) (map inexact (iota 300))))
(define B (time (recenter-specialized A)))
(define C (time (recenter-generalized A)))

(define A* (list->array (make-interval '#(100 100 3)) (map inexact (iota 30000))))
(define B* (time (recenter-specialized A*)))
(define C* (time (recenter-generalized A*)))

With a 10 x 10 array of 3-vectors, the timings are not sooooo different:

(time (recenter-specialized A))
    0.000084 secs real time
    0.000085 secs cpu time (0.000047 user, 0.000038 system)
    no collections
    37376 bytes allocated
    12 minor faults
    no major faults
    292467 cpu cycles
(time (recenter-generalized A))
    0.000894 secs real time
    0.000894 secs cpu time (0.000666 user, 0.000228 system)
    no collections
    574640 bytes allocated
    71 minor faults
    no major faults
    3202257 cpu cycles

But with a 100 x 100 array of 3-vectors, the timings differ dramatically:

(time (recenter-specialized A*))
    0.001860 secs real time
    0.001860 secs cpu time (0.001860 user, 0.000000 system)
    no collections
    783088 bytes allocated
    84 minor faults
    no major faults
    6671997 cpu cycles
(time (recenter-generalized A*))
    6.449153 secs real time
    6.433722 secs cpu time (6.428914 user, 0.004808 system)
    no collections
    55809952 bytes allocated
    6907 minor faults
    no major faults
    23162995368 cpu cycles

A library should not allow the programmer to fall inadvertently into such a hole,, so I conclude that generalized (nonstrict) arrays should not be broadcast.

But if you have a call

(array-map f A B)

and B is supposed to be broadcast to the domain (shape) of A, then if B is generalized one has two choices: raise an error and say that B is not specialized; or silently copy B to a specialized B*, and then broadcast B*.

The thing is, in all other circumstances, the array elements of the arguments to array-map are not evaluated by array-map---array-map is supposed to set up work to do be done later by routines like array-fold or array-every. Copying B to B* would require array-map to evaluate B's getter on its domain, which would break this convention of the library. I conclude that an error should be raised, that B should not be implicitly copied to B* by array-map.

Brad

1 Like