Basic Scheme stream question

AlexG · December 23, 2024, 8:22pm

I wonder if any Scheme-whizz can help out with this basically noob question.
Implementing the Catalan numbers using streams, why is the 2nd solution so much faster (factor of ~4) than the 1st and the 3rd?
Technically that's the one that calculates the whole range on every invocation, instead of incrementally, like the other two, so I'd expect the 2nd one to be the slow one, not the other way around!

(define (catalans n #:optional (meth 'a)) ; catalan nums starting from nth

  (define (catalan_1 n n! acc) ; not fast
    (if (zero? n) (stream-cons 1 (catalan_1 1 1 1))
        (let* ((2n (* 2 n))
               (new-acc (* (/ acc (add1 n)) 2n (sub1 2n))))
          (stream-cons (/ new-acc n!)
                       (catalan_1 (add1 n) (* n! (add1 n)) new-acc)))))

  (define (catalan_2 n n!) ; fast
    (if (zero? n) (stream-cons 1 (catalan_2 1 1))
        (stream-cons (/ (foldl * 1 (range (+ n 2) (add1 (* 2 n)))) n!)
                     (catalan_2 (add1 n) (* n! (add1 n))))))
  
  (define (catalan_3 n top nfact) ; not fast
    (stream-cons (/ top nfact)
                 (catalan_3 (add1 n)
                            (* (/ top (+ 2 n)) (sub1 (* 2 (add1 n))) (* 2 (add1 n)))
                            (* nfact (add1 n)))))

  (cond [(eq? meth 'a)(catalan_1 n (factorial n) 1)]
        [(eq? meth 'b)(catalan_2 n (factorial n))]
        [(eq? meth 'c)
         (let ((top (foldl * 1 (range (+ n 2)(add1 (* 2 n))))))
           (catalan_3 n top (factorial n)))]
        [else (error 'huh?)]))

Some results

> (time (let ((out (stream-ref (catalans 0 #:optional 'b) 20000))) 'out))
cpu time: 704 real time: 703 gc time: 87
'out
> (time (let ((out (stream-ref (catalans 0 #:optional 'c) 20000))) 'out))
cpu time: 2816 real time: 2823 gc time: 133
'out
> (time (let ((out (stream-ref (catalans 0 #:optional 'a) 20000))) 'out))
cpu time: 2813 real time: 2812 gc time: 136
'out

sorawee · December 23, 2024, 8:56pm

The second solution is the slow one if you access many elements, since it needs to do the fold many times.

(define (test-a)
  (define seq (catalans 0 #:optional 'a))
  (time (for ([i 500])
          (stream-ref seq i))))

(define (test-b)
  (define seq (catalans 0 #:optional 'b))
  (time (for ([i 500])
          (stream-ref seq i))))

(define (test-c)
  (define seq (catalans 0 #:optional 'c))
  (time (for ([i 500])
          (stream-ref seq i))))

(test-a) ;=> cpu time: 13 real time: 14 gc time: 0
(test-b) ;=> cpu time: 32 real time: 33 gc time: 2
(test-c) ;=> cpu time: 13 real time: 14 gc time: 0

However, when you do a single stream-ref, only a single fold is run. Other folds are delayed, because stream delays computation on unused elements. The second solution arrives at the solution straightforwardly. Compute the numerator and denominator, and divide one by the other, so it’s the fastest. Other solutions are more convoluted for the purpose of a single access. It builds up the numerator by multiplying two more numbers, and dividing by one more number, in every recursion call. So it makes sense that it’s slower.

AlexG · December 23, 2024, 9:31pm

Ah, OK, thank you.

I realize now that the mistake I'd been making is thinking that stream-ref needed to compute all elements previous to n. Silly me.

In fact now I see that if I use a function show, based on stream-take then the result is much more like what I expected.

(define (series:realize n str) (stream->list (stream-take str n)))
(define show series:realize)

> (time (let ((out (show 500 (catalans 0 #:optional 'a)))) 'out))
cpu time: 4 real time: 4 gc time: 0
'out
> (time (let ((out (show 500 (catalans 0 #:optional 'b)))) 'out))
cpu time: 54 real time: 54 gc time: 1
'out
> (time (let ((out (show 500 (catalans 0 #:optional 'c)))) 'out))
cpu time: 8 real time: 8 gc time: 0
'out

Actually, just "realizing" the stream, produces a difference of more than tenfold, rather than just the 2x that your example using for did! I find that most interesting in itself. Streams seem to work really well in Racket!

Thanks again!

AlexG · December 23, 2024, 9:47pm

BTW, why can't I mark this question as solved?
Is it because I posted it in "General" instead of "Q&A"?
Just for future reference.

sorawee · December 23, 2024, 10:09pm

I changed the category type to "Q&A".

AlexG · December 23, 2024, 11:02pm

Although something is still niggling a bit:

stream-ref may have only-computed the last value, but it relied on every previous one, so what exactly was it that was delayed? I mean, I can see from the test results the validity of the statement that the fold was only executed once, but wasn't it needed for every previous result, too?

Oh well, I'm sure I'll get it eventually. Some day

sorawee · December 23, 2024, 11:33pm

You are right that “[in] the last value, it relied on every previous one”, but that computation is for n!, which is relatively fast to compute.

The thing that solution 2 saves compared to solution 1 is not having to do (* (/ acc (add1 n)) 2n (sub1 2n))) in every recursion.

Consider (stream-ref st 20000). Here’s the computation in each solution

First solution

Element 0: base case, not interesting, let’s skip.

Element 1:

(* (/ acc (add1 n)) 2n (sub1 2n)))
Head: nothing, because (/ new-acc n!) is delayed
Tail: forced because we need to access later elements, (* n! (add1 n) computed

Element 2:

(* (/ acc (add1 n)) 2n (sub1 2n)))
Head: nothing, because (/ new-acc n!) is delayed
Tail: forced because we need to access later elements, (* n! (add1 n) computed

…

Element 20000:

(* (/ acc (add1 n)) 2n (sub1 2n)))
Head: forced, returning (/ new-acc n!)
Tail: delayed, since there’s no access to later elements.

Second solution