Why does Racket have sequences and streams?
The first few paragraphs in https://docs.racket-lang.org/reference/sequences.html are pretty helpful for this question.
The stream interface allows you to create an ordered collection where you can (efficiently) ask:
- What is the first element of the collection?
- What is the collection without the first element?
Obviously, a stream supports iteration over its elements. This iteration doesnāt rely on any hidden internal state.
The sequence interface also allows you to create an ordered collection that supports iteration over elements, just like stream. However, unlike stream, a sequence doesnāt need to follow the above interface, and it particularly can have internal states. As a result, a sequence is more general than a stream. This is codified in Racket: every stream is automatically a sequence, but a sequence is not necessarily a stream.
A vector, for example, does not fit well to the stream abstraction (b/c a vector without the first element is expensive to compute ā requiring an entire vector copy). However, you can have an internal state, which is a vector index, that allows you to iterate over a vector efficiently. So a vector fits well to the sequence abstraction.
However, this doesnāt mean you should always use the sequence abstraction, b/c while it is more flexible, itās also more difficult to write compared to the straightforward stream protocol.
Also, itās possible to convert between the two with (in-stream a-stream)
and (sequence->stream a-sequence)
, so it looks like there are no applications where only one or the other works.
(in-stream a-stream)
is not really needed. a-stream
is already a sequence. in-stream
is mostly useful inside a for
form, as it allows for-loop iteration optimization.
It is true that you can convert a sequence to a stream, but this conversion is not free.
Are there any recommendations about when to use sequences vs. streams?
What do you mean by āuseā?
If you meant to ask about creating a new kind of iterable collection, you should see if it needs internal states for iteration. If negative, then use streams. If positive, then use sequences.
But otherwise, you are using an existing collection, so thereās no point to āchooseā to use sequences vs streams. It is what it is. in-naturals
, for example, is a stream, and thus also a sequence. A vector is only a sequence.
Is there anything else one should know about sequences vs. streams?
Just like āstackā which could refer to an ADT and concrete data structure, sequence/stream can also refer to an interface and concrete data.
make-do-sequence
directly creates a kind of sequence, which is very flexible, but also very unwieldy.
stream
/ stream-cons
/ for/stream
creates a kind of stream that is lazy and memoizing.
My impression is that if I write a library that provides infinite values, I should focus on providing them as something conforming to the sequence API rather than the streams API.
Sequence interface is what iteration needs. But anything that is a stream is also a sequence, so if implementing it as a stream is easier, you should do it instead.
What I am confused about is whether I should wrap a in-stream outside the in-* function that returns stream? Which of the following forms is the best?
This question is about for-loop optimization. To find an answer, you can use the macro stepper to see the expanded code. For example:
(define xs '(1 2 3))
(for/list ([x xs]) x)
expands to:
(define-values (xs) '(1 2 3))
(#%app
call-with-values
(lambda ()
(#%app
alt-reverse
(let-values ([(pos->vals pos-pre-inc pos-next init pos-cont? val-cont? all-cont?)
(#%app make-sequence '(x) xs)])
(#%app void)
(#%app
(letrec-values (((for-loop)
(lambda (fold-var pos)
(if (if pos-cont? (#%app pos-cont? pos) (quote #t))
(let-values ([(x all-cont?/pos)
(let-values ([(x) (#%app pos->vals pos)])
(#%app
values
x
(if all-cont?
(lambda (pos) (#%app all-cont? pos x))
(quote #f))))]
[(pos)
(if pos-pre-inc
(#%app pos-pre-inc pos)
pos)])
(if (if val-cont? (#%app val-cont? x) (quote #t))
(let-values ([(fold-var)
(let-values (((fold-var) fold-var))
(let-values ([(fold-var)
(let-values ()
(#%app
cons
(let-values () x)
fold-var))])
(#%app values fold-var)))])
(if (if (if all-cont?/pos
(#%app all-cont?/pos pos)
(quote #t))
(#%app not (quote #f))
(quote #f))
(#%app for-loop fold-var (#%app pos-next pos))
fold-var))
fold-var))
fold-var))))
for-loop)
null
init))))
print-values)
But
(define xs '(1 2 3))
(for/list ([x (in-list xs)]) x)
is expanded to:
(define-values (xs) '(1 2 3))
(#%app
call-with-values
(lambda ()
(#%app
alt-reverse
(let-values ([(lst) xs])
(if (#%app variable-reference-from-unsafe? (#%variable-reference))
(#%app void)
(let-values () (#%app check-list lst)))
(#%app
(letrec-values (((for-loop)
(lambda (fold-var lst)
(if (#%app pair? lst)
(let-values ([(x) (#%app unsafe-car lst)]
[(rest) (#%app unsafe-cdr lst)])
(if (quote #t)
(let-values ([(fold-var)
(let-values (((fold-var) fold-var))
(let-values ([(fold-var)
(let-values ()
(#%app
cons
(let-values () x)
fold-var))])
(#%app values fold-var)))])
(if (if (quote #t) (#%app not (quote #f)) (quote #f))
(#%app for-loop fold-var rest)
fold-var))
fold-var))
fold-var))))
for-loop)
null
lst))))
print-values)
You can see that the latter can directly use unsafe-car
for accessing an element, while the former requires a lot of indirection to access an element, so the former is slower.
But note that (for/list ([x '(1 2 3)]) x)
is as efficient as (for/list ([x (in-list '(1 2 3))]) x)
. This is because '(1 2 3)
is a literal, so for
knows that it is a list and automatically adds in-list
for you. (It is an optimization that I added a while back).
(for/list ([x (in-stream (in-list xs))]) x)
is probably slower than (for/list ([x (in-list xs)]) x)
, but faster than (for/list ([x xs]) x)
.
Iād use the first form (without in-list or in-stream) for short lists that are written as lists in the code (i.e. ā(ā¦) or (list ā¦)).
(for/list ([x (list 1 2 3)]) x)
unfortunately isnāt as optimized as (for/list ([x (in-list (list 1 2 3))]) x)
. So if the performance is a concern, you should do the latter. And unlike, (for/list ([x '(1 2 3)]) x)
, it is significantly more difficult to make the same kind of optimization.
The third form (with in-list and in-stream) seems redundant to meā¦.
I agree with this.