Stream-filter not in constant space?

The following runs in constant space:

(letrec
  ((make-nats (λ (n) (stream-cons n (make-nats (add1 n)))))
   (m (λ (x) (zero? (remainder x 10000))))
   (stream-filter
     (λ (f strm)
       (let ((kar (stream-first strm)))
         (if (f kar)
           (stream-cons kar (stream-filter f (stream-rest strm)))
           (stream-filter f (stream-rest strm)))))))
  (let loop ((nats (make-nats 0)))
    (loop (stream-rest (stream-filter m nats)))))

The following does not run in constant space:

(letrec
  ((make-nats (λ (n) (stream-cons n (make-nats (add1 n)))))
   (m (λ (x) (zero? (remainder x 10000)))))
  (let loop ((nats (make-nats 0)))
    (loop (stream-rest (stream-filter m nats)))))

I have looked in module stream.rkt, but cannot easily pinpoint a problem.
I don''t see immediately what make-do-stream does.
Should I post an issue on Issues · racket/racket (github.com)?
May be after some more study and with some help I can repair this if recognized as a problem.
How should I attack such a task?

1 Like

Sorry I forgot to tell: windows 10, racket 8.7 cs.

I'm not sure what is happening, but in case it's useful, I rewrote your code. (I hope I didn't make a mistake.)

#lang racket

(define (make-nats n)
  (stream-cons n (make-nats (add1 n))))

(define (m x)
  (and (zero? (remainder x 100))
       (displayln x)
       #t))

(define (my-stream-filter f strm)
  (let ([kar (stream-first strm)])
    (if (f kar)
        (stream-cons kar (my-stream-filter f (stream-rest strm)))
        (my-stream-filter f (stream-rest strm)))))

(let loop ([nats (make-nats 0)])
  (loop (stream-rest (my-stream-filter m nats))))

#;(let loop ([nats (make-nats 0)])
  (loop (stream-rest (stream-filter m nats))))

I added (displayln x) in m and I noticed the filter is evaluated multiple times with your code. (Perhaps it's a race condition???)

My guess is that the temporary variables that use the official implementation to avoid double evaluation are not collected.

1 Like

I haven't looked at your code, but in case it's helpful: the Scheme standard for streams specifies memoization as an essential feature, so we should expect the memory requirements to grow with the number of values forced. If for some reason the second version is forcing more values (or running faster, thus forcing more values in the same amount of time) than the first, that could account for your observation.

No, streams are designed to run the filter example in bound space. See SRFI 41.

I think racket/stream‘s stream-filter forces things a bit too much.

Consider:

(define st
  (stream-cons 1
               (begin
                 (displayln 'should-not-be-evaled)
                 empty-stream)))

(stream-first st)

This results in 1, without “should-not-be-evaled” being printed.

However, if we do:

(define st* (stream-filter (lambda (x) #t) st))
(stream-first st*)

then “should-not-be-evaled” is printed.

@joskoot’s variant of stream-filter doesn’t print “should-not-be-evaled”, but it doesn’t handle streams of multiple values. Also, it doesn’t delay the initial evaluation. E.g. (stream-filter (λ (_) #f) (in-naturals)) should not loop forever.

Here’s my variant that I think addresses these issues.

(require racket/private/for)
(define (stream-filter f s)
  (stream-lazy
   (let loop ([s s])
     (cond
       [(stream-empty? s) empty-stream]
       [else
        (cond
          [(call-with-values (λ () (stream-first s)) f)
           (make-do-stream (lambda () #f)
                           (lambda () (stream-first s))
                           (lambda () (loop (stream-rest s))))]
          [else (loop (stream-rest s))])]))))

When I run it, it does appear to take constant space in @joskoot’s original program.

Welp, please ignore my above program.

Consider:

(define st
  (for/stream ([i 10000001])
    (modulo i 10000000)))

(define st* (time (stream-filter zero? st))) ; should be fast
(time (stream-rest st*)) ; should be fast
(time (stream-first (stream-rest st*))) ; should take time
(time (stream-first (stream-rest st*))) ; should be fast

st is a stream that starts and ends with 0, and it has a bunch of non-zeros in the middle. After we filter for 0s, we should get a stream of exactly two elements.

My above program fails to delay stream-rest, and also fails to memoize the fact that filtering skips ~10000000 elements.

Here’s a more correct version.

(define (stream-filter f s)
  (stream-lazy
   (let loop ([s s])
     (cond
       [(stream-empty? s) empty-stream]
       [else
        (cond
          [(call-with-values (λ () (stream-first s)) f)
           (define stream-tail #f)
           (make-do-stream (lambda () #f)
                           (lambda () (stream-first s))
                           (lambda ()
                             (or stream-tail
                                 (stream-lazy
                                  (begin
                                    (set! stream-tail (loop (stream-rest s)))
                                    stream-tail)))))]
          [else (loop (stream-rest s))])]))))

To sorawee
Correct, stream-filter forces too much, I think That may be the problem.

To sorawee
When an infinite stream is bound to a variable that cannot be garbage collected after forcing it,
then the traversal cannot be space bound. This is expected behaviour. The same holds for promises.

May be your correct version works. I did not check this.

Maybe the problem is in make-do-stream.
However, I did not (yet) grasp all details of make-do-stream.
I have a problem with the generalization of type stream.
For example:

(for ((x (in-stream '(a b c)))) (displayln x)) ; --> prints a b c.

Maybe the problem is in this generalization. I don't know.

I think this is too important to be forgotten.

@joskoot: Do you want to send a bug report to GitHub - racket/racket: The Racket repository so it's easier to track it. Otherwise some of us can send it.

@sorawee: Perhaps you can send it instead. Is your version good enough to be merged? I still don't understand all the details, so I can't be sure if it fixes all the problems.

Is there a nice way to test for constant memory consumption? What I've been doing so far is observing from DrRacket that the memory doesn't grow unbounded in @joskoot's original program, but I would like to write a test programmatically.

You could use current-memory-use to count bytes but a weak box is a more direct instrument if you can get a reference to the specific objects that should go away.

Robby

Here’s a fundamental issue.

Stream in Racket is NOT the same as stream in SRFI-41.

In particular, the underlying stream mechanism in Racket is more general. It allows each element to produce multiple values. It doesn’t mandate memoization. It doesn’t enforce laziness. Etc.

However, some stream constructors in Racket, like stream and stream-cons, do follow the SRFI-41 specification. They are memoizing, enforcing single-valuedness, etc.

Now, the issue is, what should we do with operations on streams? Should they work with arbitrary Racket streams (which may result in streams that don’t conform SRFI-41)?

We can try to make the stream operations work with arbitrary Racket streams while conforming to SRFI-41. But fully doing that is probably not a good idea. Memoizing multiple values, for example, requires wrapping values in another container, which is prohibitively expensive for many streams that are expected to work fast.

A couple of data points:

  • Streams were added to Racket in this commit by @mflatt.

  • However, the code for SRFI-41 conforming streams, which was also added in the above commit, appears to have been ported to Racket by Jacob J. A. Koot, which if I understand correctly, is @joskoot.

  • Separately, there’s also a SRFI-41 library.

  • It appears that the added stream operations in the commit were expected to work with multiple valued streams, at least according to the documentation (notice the signature of the filtering function that accommodates multiple values).

  • But these operations didn’t actually work with multiple valued streams.

  • I fixed the issue with multiple valued streams in https://github.com/racket/racket/pull/3333

  • It’s not just stream-filter. stream-map and probably more are also faulty.

#lang racket

(require srfi/41)

(define t (stream-cons 0 t))

(define s (stream-filter negative? t))
(stream-map add1 s)

terminates, but without (require srfi/41), it doesn’t terminate. This is because the operation doesn’t produce a lazy stream.

What’s the conclusion? I don’t know. Perhaps if you want an SRFI-41 conforming stream, you should (require srfi/41). Perhaps I was wrong to generalize stream operations to support multiple values, which make them deviate even more from SRFI-41. I honestly don’t know.

3 Likes

Indeed @joskoot is Jacob J. A. Koot or less formally Jos Koot or simply Jos.
Indeed I adapted SRFI 41 for Racket years ago and Matthew Flatt ported it to Racket.
Much has been modified since then.
Using SRFI-41 library you loose in--stream.
Syntax lazy is restricted to single values, which makes sense, because when forcing a lazy promise procedure force must check whether or not the result is a promise. The same applies to stream-lazy
which is in the SRFI 41 implementation but not exported. In my opinion it should be exported too.
Before committing an issue in github racket/racket, I await further comments.
However, I think that bound space traversal is important (as long as the content of stream structs
become garbage collectable when the forced value is lifted into the enclosing stream. This is why traversing a stream that stays being bound cannot be done in bound space. Exactly the same applies to recursive promises. See SRFI 45)
Best wishes, Jos

In windows I use Task manager to see whether or not a non ending computation is done in bound space.

Here’s another difference:

#lang racket

(require srfi/41)

(define (f x)
  (println x)
  (add1 x))

(define t (stream-cons 0 t))

(define s (stream-map f t))
(stream-first s)
(stream-first s)

produces:

> 0
1
1

But without (require srfi/41), it produces:

> 0
1
> 0
1

On the other hand:

#lang racket

(require srfi/41)

(define (f x)
  (values 1 2))

(define t (stream-cons 0 t))

(define s (stream-map f t))
(stream-first s)

fails with:

result arity mismatch;
 expected number of values not received
  expected: 1
  received: 2

Without (require srfi/41), it produces:

1
2

It boils down to, do we want to memoize multiple values?

1 Like

A promise doesn’t necessarily need to hold a single value. Racket’s promise can hold multiple values, at the cost of extra allocation for wrapper.

> (define p (delay (values 1 2)))
> (force p)
1
2

The question is, is it worth that extra allocation?

Though one possibility is to special-case single value. This reduces allocation for a single value, which should be the common case, but penalizes multiple values for yet another extra allocation.

Yes that’s for delay, but lazy does not accept multiple values:

#lang racket

(define p (lazy (values 1 2)))

(force p)

result arity mismatch;

expected number of values not received

expected: 1

received: 2

values...:

1

2

It is not difficult to adapt lazy such as to accept and memorise multiple values,

but this complicates the test on the forced value being a promise or not.

I submitted https://github.com/racket/racket/pull/4554/ to fix this problem, though it still needs doc update + tests.

1 Like

I found a bug in SRFI-41, which makes stream operations take linear space instead of constant space. If anyone is interested, the discussion thread is here.

3 Likes