Sequences vs. streams

I read the Racket Reference sections about sequences and streams and am a bit confused. I have a few questions:

  • Why does Racket have sequences and streams?

    Conceptually, they seem the same. Also, it's possible to convert between the two with (in-stream a-stream) and (sequence->stream a-sequence), so it looks like there are no applications where only one or the other works.

  • Are there any recommendations about when to use sequences vs. streams?

    My impression is that sequences are used much more often (see all the in-* procedures and their prominent use in the for forms), so they seem to be the default that should be used if one wants to generate values lazily.

  • Is there anything else one should know about sequences vs. streams?

4 Likes

I'm new to Racket, so take what I say with a grain of salt, but in my mind the main difference is that a steam is a potentially infinite sequence. For example, you might have a stream that returns the prime numbers in order -- this is definitely an infinite sequence, and not something that you can hold in finite memory. The lazy evaluation is what makes this possible -- values are computed on the fly as needed -- ie, as you recur over the stream, the next prime value is calculated and returned, and nothing more is calculated until you need another

See [1] for a discussion in Scheme

[1] Structure and Interpretation of Computer Programs
[2] Structure and Interpretation of Computer Programs, 2e: 3.5 (same content as above but mobile friendly)

Thanks for the feedback.

I think we're talking about different definitions of "sequence". I was referring to the "Sequences" in the Racket Reference, which can describe infinite sequences. For example, there's in-naturals, which gives you an infinite sequence of "all" natural numbers.

But it's good that you reminded me that there's another "sequence" concept (lists, vectors, etc.) which differs from the "sequence" concept I referred to above. Thanks! I'll also read more of the SICP section you quoted. For the record, I also found SRFI-41 and SRFI-127, but these added more confusion than understanding, as it's unclear to me how they relate to the "sequence" and "stream" concepts in the Racket Reference.

By the way, if you're new to Racket, what do you think about my Racket Glossary project? (*) :slight_smile: Actually, I started this topic to clear up my understanding for a new glossary entry "Sequences and streams" (or a similar title).

(*) If you want to reply, you can either use the Racket Glossary thread or send me a private message here on the Racket Discourse.

1 Like

This not only applies to consuming sequences/streams but also to providing them. My impression is that if I write a library that provides infinite values, I should focus on providing them as something conforming to the sequence API rather than the streams API.

Oops, yes, I see -- I misread the documentation on sequences you linked. I interpreted it as saying that streams were a kind of sequence, the way that generators are a kind of iterable in Python -- although I see that the in-naturals link you shared does say

Returns an infinite sequence (that is also a stream)

Is this that another overloading of the term "sequence"?

No, in this case, "also" means that this stream is both a sequence and a stream:

> (sequence? (in-naturals))
#t
> (stream? (in-naturals))
#t

This reminds me of a surprise on my own when I experimented with sequences and streams. In the section on file ports there's the sentence

A port created by open-input-file, open-output-file, subprocess, and related functions is a file-stream port.

Since an opened file can be read lazily, I assumed the term would "of course" imply that a "file-stream port" is a stream, but it isn't (according to stream?):

> (define a-file (open-input-file "glossary-stats.rkt"))
> (stream? a-file)
#f

But it's a sequence:

> (sequence? a-file)
#t

I just noticed that some in-* functions return streams, and some return sequences that are not streams. The documentation says that in-* can provide better performance for for clauses.

An in-list application can provide better performance for list iteration when it appears directly in a for clause.

An in-stream application can provide better performance for streams iteration when it appears directly in a for clause.

What I am confused about is whether I should wrap a in-stream outside the in-* function that returns stream? Which of the following forms is the best?

> (for/list ([x '(3 1 4)])
     `(,x ,(* x x)))
'((3 9) (1 1) (4 16))
> (for/list ([x (in-list '(3 1 4))])
     `(,x ,(* x x)))
'((3 9) (1 1) (4 16))
> (for/list ([x (in-stream (in-list '(3 1 4)))])
     `(,x ,(* x x)))
'((3 9) (1 1) (4 16))

I'm not a Racket expert, but here's my take nonetheless. :wink:

  • I'd use the first form (without in-list or in-stream) for short lists that are written as lists in the code (i.e. '(...) or (list ...)).
  • I'd use the second form (with in-list, but not in-stream) if the value you're iterating over isn't easily recognizable as a list in the code. In this case, in-list will make it explicit to the reader that you're iterating over a list. (Caveat: Maybe you want your loop to work on different types of containers, although I'd expect that most code will use/expect a specific container type.) I'd also use this second form if performance is important for this concrete loop.
  • The third form (with in-list and in-stream) seems redundant to me. I'd expect that the for forms will generate the fastest code if they "see" just the in-list. Combining in-list with in-stream might even make it more difficult for the for forms to optimize the iteration, but that's just an assumption on my part. :slight_smile:
1 Like

Why does Racket have sequences and streams?

The first few paragraphs in https://docs.racket-lang.org/reference/sequences.html are pretty helpful for this question.

The stream interface allows you to create an ordered collection where you can (efficiently) ask:

  • What is the first element of the collection?
  • What is the collection without the first element?

Obviously, a stream supports iteration over its elements. This iteration doesnā€™t rely on any hidden internal state.

The sequence interface also allows you to create an ordered collection that supports iteration over elements, just like stream. However, unlike stream, a sequence doesnā€™t need to follow the above interface, and it particularly can have internal states. As a result, a sequence is more general than a stream. This is codified in Racket: every stream is automatically a sequence, but a sequence is not necessarily a stream.

A vector, for example, does not fit well to the stream abstraction (b/c a vector without the first element is expensive to compute ā€” requiring an entire vector copy). However, you can have an internal state, which is a vector index, that allows you to iterate over a vector efficiently. So a vector fits well to the sequence abstraction.

However, this doesnā€™t mean you should always use the sequence abstraction, b/c while it is more flexible, itā€™s also more difficult to write compared to the straightforward stream protocol.

Also, itā€™s possible to convert between the two with (in-stream a-stream) and (sequence->stream a-sequence), so it looks like there are no applications where only one or the other works.

(in-stream a-stream) is not really needed. a-stream is already a sequence. in-stream is mostly useful inside a for form, as it allows for-loop iteration optimization.

It is true that you can convert a sequence to a stream, but this conversion is not free.

Are there any recommendations about when to use sequences vs. streams?

What do you mean by ā€œuseā€?

If you meant to ask about creating a new kind of iterable collection, you should see if it needs internal states for iteration. If negative, then use streams. If positive, then use sequences.

But otherwise, you are using an existing collection, so thereā€™s no point to ā€œchooseā€ to use sequences vs streams. It is what it is. in-naturals, for example, is a stream, and thus also a sequence. A vector is only a sequence.

Is there anything else one should know about sequences vs. streams?

Just like ā€œstackā€ which could refer to an ADT and concrete data structure, sequence/stream can also refer to an interface and concrete data.

  • make-do-sequence directly creates a kind of sequence, which is very flexible, but also very unwieldy.
  • stream / stream-cons / for/stream creates a kind of stream that is lazy and memoizing.

My impression is that if I write a library that provides infinite values, I should focus on providing them as something conforming to the sequence API rather than the streams API.

Sequence interface is what iteration needs. But anything that is a stream is also a sequence, so if implementing it as a stream is easier, you should do it instead.

What I am confused about is whether I should wrap a in-stream outside the in-* function that returns stream? Which of the following forms is the best?

This question is about for-loop optimization. To find an answer, you can use the macro stepper to see the expanded code. For example:

(define xs '(1 2 3))
(for/list ([x xs]) x)

expands to:

   (define-values (xs) '(1 2 3))
   (#%app
    call-with-values
    (lambda ()
      (#%app
       alt-reverse
       (let-values ([(pos->vals pos-pre-inc pos-next init pos-cont? val-cont? all-cont?)
                     (#%app make-sequence '(x) xs)])
         (#%app void)
         (#%app
          (letrec-values (((for-loop)
                           (lambda (fold-var pos)
                             (if (if pos-cont? (#%app pos-cont? pos) (quote #t))
                               (let-values ([(x all-cont?/pos)
                                             (let-values ([(x) (#%app pos->vals pos)])
                                               (#%app
                                                values
                                                x
                                                (if all-cont?
                                                  (lambda (pos) (#%app all-cont? pos x))
                                                  (quote #f))))]
                                            [(pos)
                                             (if pos-pre-inc
                                               (#%app pos-pre-inc pos)
                                               pos)])
                                 (if (if val-cont? (#%app val-cont? x) (quote #t))
                                   (let-values ([(fold-var)
                                                 (let-values (((fold-var) fold-var))
                                                   (let-values ([(fold-var)
                                                                 (let-values ()
                                                                   (#%app
                                                                    cons
                                                                    (let-values () x)
                                                                    fold-var))])
                                                     (#%app values fold-var)))])
                                     (if (if (if all-cont?/pos
                                               (#%app all-cont?/pos pos)
                                               (quote #t))
                                           (#%app not (quote #f))
                                           (quote #f))
                                       (#%app for-loop fold-var (#%app pos-next pos))
                                       fold-var))
                                   fold-var))
                               fold-var))))
            for-loop)
          null
          init))))
    print-values)

But

(define xs '(1 2 3))
(for/list ([x (in-list xs)]) x)

is expanded to:

   (define-values (xs) '(1 2 3))
   (#%app
    call-with-values
    (lambda ()
      (#%app
       alt-reverse
       (let-values ([(lst) xs])
         (if (#%app variable-reference-from-unsafe? (#%variable-reference))
           (#%app void)
           (let-values () (#%app check-list lst)))
         (#%app
          (letrec-values (((for-loop)
                           (lambda (fold-var lst)
                             (if (#%app pair? lst)
                               (let-values ([(x) (#%app unsafe-car lst)]
                                            [(rest) (#%app unsafe-cdr lst)])
                                 (if (quote #t)
                                   (let-values ([(fold-var)
                                                 (let-values (((fold-var) fold-var))
                                                   (let-values ([(fold-var)
                                                                 (let-values ()
                                                                   (#%app
                                                                    cons
                                                                    (let-values () x)
                                                                    fold-var))])
                                                     (#%app values fold-var)))])
                                     (if (if (quote #t) (#%app not (quote #f)) (quote #f))
                                       (#%app for-loop fold-var rest)
                                       fold-var))
                                   fold-var))
                               fold-var))))
            for-loop)
          null
          lst))))
    print-values)

You can see that the latter can directly use unsafe-car for accessing an element, while the former requires a lot of indirection to access an element, so the former is slower.

But note that (for/list ([x '(1 2 3)]) x) is as efficient as (for/list ([x (in-list '(1 2 3))]) x). This is because '(1 2 3) is a literal, so for knows that it is a list and automatically adds in-list for you. (It is an optimization that I added a while back).

(for/list ([x (in-stream (in-list xs))]) x) is probably slower than (for/list ([x (in-list xs)]) x), but faster than (for/list ([x xs]) x).

Iā€™d use the first form (without in-list or in-stream) for short lists that are written as lists in the code (i.e. ā€˜(ā€¦) or (list ā€¦)).

(for/list ([x (list 1 2 3)]) x) unfortunately isnā€™t as optimized as (for/list ([x (in-list (list 1 2 3))]) x). So if the performance is a concern, you should do the latter. And unlike, (for/list ([x '(1 2 3)]) x), it is significantly more difficult to make the same kind of optimization.

The third form (with in-list and in-stream) seems redundant to meā€¦.

I agree with this.

6 Likes

@sorawee That's a great answer, wow! Thank you! :slight_smile:

Would you consider turning your answer into a blog post (if you haven't already)? But maybe your answer here would be even more likely to show up in web search results anyway.

In my opinion, that's an interesting property of some of the Racket Reference (and sometimes other Racket documentation). It somehow contains the needed information, but it's hard to discern from the text if you don't already know what the answer to your question is. I think some parts of the Racket documentation, especially in the Reference and the Scribble documentation, are good examples of the curse of knowledge at work.

I'm aware that can be difficult to escape because the people writing the documentation are experts on the subject and sometimes what they write may be based on long-time experience with the history of a feature or on what they were thinking about when they implemented a feature. In that, the big picture of some concepts can get lost in the sense that it doesn't get written down.

I wish more of the Racket documentation would contain "big picture" information like in your answer here! :slight_smile:

Yes, that's what I meant. What you wrote is great information to include in the glossary because practical advice like this is often missing in the Racket documentation and that's what I want to include in the glossary.

I must admit that I'm not sure about this myself. :slight_smile: The question is about "things I don't know I don't know." Maybe a reader of the question thinking, "oh, I know some misunderstanding many people have about sequences vs. streams" or "I noticed these related problems in code I reviewed" or "yes, that's something non-obvious that occurred to me only later."

Edit: I actually confused this with the point following in my original post. You guessed the intention behind my second question correctly, and what I wrote in this post in the previous paragraph was referring to my third point in my original post. Sorry for the confusion.

I experimented a bit with sequence interfaces for a custom collection, see Paste # 58740: Sequence experiments . I'm well aware that the Racket vectors wrapped in the example are already sequences :slight_smile: , but I'm wrapping them here for the sake of showing patterns for implementing sequence interfaces.

It looks to me that the used approaches should work for most cases where you want a custom sequence or sequence API (including situations where you need state, like the vector index here). If you agree, I would just use these examples in the Racket Glossary, without discussing other approaches for implementing stream or sequence APIs there. But if I'm missing a whole group of important situations where the example approaches wouldn't work, I'd be grateful for examples.

(Note that for the glossary, I'm not interested in going into much detail, but "only" give the reader a starting point for each glossary entry. Kind of the 80 % of the Pareto principle, if you will. For example, in the Hash entry I focus only on the most common combinations of equality comparisons, mutability and strength because that's likely sufficient for many/most applications. Also see the glossary introduction.)

The new glossary entry is here.

In the end I used a much simpler example than in the pasterack code.