Bug in flomat / block-diagonal?

Hi everyone,

When I use flomat's block-diagonal function, I'm getting random instances of NaN in the result:

#lang racket

(require flomat)

(define m
(block-diagonal (eye 100) (eye 100)))

(for* ([r (in-range (nrows m))]
[c (in-range (ncols m))])
(when (nan? (ref m r c))
(printf "Row ~a column ~a is nan\n"
r
c)))

I would expect there to be no instances of NaN in the matrix, however a typical result is:

Row 1 column 196 is nan
Row 8 column 196 is nan
Row 10 column 148 is nan
Row 15 column 129 is nan
Row 18 column 129 is nan
Row 24 column 156 is nan
Row 41 column 155 is nan
Row 41 column 196 is nan
Row 46 column 196 is nan
Row 47 column 161 is nan
Row 50 column 161 is nan
Row 55 column 150 is nan
Row 56 column 179 is nan
Row 59 column 123 is nan
Row 70 column 126 is nan
Row 76 column 128 is nan
Row 86 column 158 is nan
Row 90 column 126 is nan
Row 98 column 150 is nan
Row 112 column 89 is nan
Row 134 column 95 is nan
Row 143 column 85 is nan
Row 145 column 72 is nan
Row 149 column 88 is nan
Row 168 column 87 is nan

Almost every time I call this, I get a different report.

Is this just me?

I'm also having suspicious temporary hangs, which are making me wonder if there is a memory problem somewhere.

Tim

1 Like

Hi Tim,

I can confirm the bug. And I believe I have found the problem

The function block-diagonal is defined like this:

(define (flomat-block-diagonal C . Cs)
  (define rows (map flomat-m (cons C Cs)))
  (define cols (map flomat-n (cons C Cs)))
  ; 2. Find size for result matrix and allocate
  (define m (apply + rows))
  (define n (apply + cols))
  (define a (alloc-flomat m n))
  (define lda m)
  (unsafe-vector-clear (* m n) a)
  ; 3. Fill in blocks
  (define i 0)
  (define j 0) 
  (for ([B (in-list (cons C Cs))])
    (define-param (mb nb b ldb) B)
    (define aij (ptr-elm a lda i j))
    (unsafe-matrix-copy! mb nb b ldb aij lda)
    (set! i (+ i mb))
    (set! j (+ j nb)))
  (flomat m n a lda))

The function call (unsafe-vector-clear (* m n) a) was meant to fill the matrix with zeros.

(define (unsafe-vector-clear n a [lda 1])
  (cblas_daxpy n -1.0 a lda a lda))

The clbas_daxpy computes -1 times the matrix a plus the matrix a.
As long as an entry x in the matrix are not NaN, then -1*x+x is indeed 0.0.
But ... if z is NaN then it becomes NaN (sigh).

So the definition of unsafe-vector-clear needs to change.

The problem doesn't affect other functions than block-diagonal since there are no other calls to unsafe-vector-clear in flomat.

Tim, will you make an issue on Github?

@timjervis

I have committed a repair.

The line

(unsafe-vector-clear (* m n) a)

was replaced with

  (memset a 0 0 (* m n) _double)

Or if z is an infinity.
You said you committed a repair; to what repository?

1 Like

To GitHub - soegaard/sci: Racket libraries for scientific computing