I solved one of the katas on codewars: it's called "are they the same?". The key thing is deciding if two lists have equal elements.
My solution used for/fold; one of the other solutions used first and rest and explicitly recursed. I'm wondering about the basic conceptual difference between these two functions:
(Those aren't exactly what is used in the problem/kata, but they illustrate what I'm wondering; you can imagine replacing equal? in (equal? x y) and (equal? (first xs) (first ys)) with some other predicate.)
Is there a meaningful difference between those two? I know there are a lot of extra details involved in the for/fold, but I want to verify my mental model here: I think that the fold, with its #:break, will get compiled into something equivalent to the explicit first/rest recursion: for each thing over which we are folding (an x and y pair), it runs the predicate, uses the result to update the accumulator (or break out of the fold), and then proceeds to do the same thing on the rest of the thing over which we are folding.
That seems to match conceptually with the recursion: take the first x and y pair in the lists, run the predicate, and then either return (= break out of the fold) or recurse (= "do the same thing on the rest of the thing...").
On a basic conceptual level, am I right that these are pretty much equivalent?
If so, what differences are there between the two?
FWIW, some naive timing shows that the explicit recursion is about 40% faster than the fold version. So there's a practical difference here, which isn't so surprising. What else is relevant here?
Both versions seem broken, and, even if this counterexample breaks what was supposed to be an unchecked invariant, the two implement different behavior:
As for performance: When you know you'll be operating on a specific type of sequence -- like a list -- you want to give for that "hint" -- like by using in-list. See Iteration Performance.
I think the only time I'd omit that is when the code is supposed to be "generic" for any type of sequence (which in my experience is rare), or, when I'm casually exploring or experimenting with short sequences in the REPL and performance doesn't matter.
With in-list I'd expect the for/fold code to expand into something much like the handwritten recursion (but corrected for details like @LiberalArtist pointed out).
FWIW I'd probably use for/and unless that would be cheating for the kata.
(To really really cheat, I'd give as the solution just (define comp equal?) IIUC the problem correctly. But seriously, it's interesting to think about how you would define equal? for real, over a variety of values.)
Ah, that's an artifact of me trying to extract just the relevant piece of the full solution -- the assumption is that the lists are of the same length. In the actual recursive solution, that is handled.
My own solution -- which was accepted as correct on the codewars site -- has that bug, because for/fold terminates on the shorter input. I commented on the solution and suggested they add tests to exercise that corner case. Thanks for that!
So, for the moment, let's just assume that the lists are the same length.
The differing behavior, I think, is a good example of a meaningful difference between the fold and the recursion: the fold just automatically terminates on the shorter input; the recursion blows up. In some situations, the former behavior may be exactly what you want. But not here...
BTW, is there a nice idiomatic way to get for/fold and friends to somehow signal or handle the situation in which one of the iteration sequences is exhausted, but the other isn't? It looks like for's essential behavior is to just terminate on the shorter sequence, and I don't see a reasonable way to detect or signal that situation.
I asked a couple chatbots for some code, and both landed on something sequence->generator -- but when I really looked at the code, it seemed to be (for this problem, at any rate) just a really fancy complex version of that simplistic first/rest recursion.
No, I don't think so. The iteration/comprehension forms like for/fold follow the example of fold and map in this regard.
If I needed special behavior, I would probably just write out a recursive function by hand.
For example, as a first attempt:
(define (strict-map f as bs)
(cond
[(and (pair? as) (pair? bs))
(cons (f (car as) (car bs))
(strict-map f (cdr as) (cdr bs)))]
[(and (null? as) (null? bs))
null]
[else
(error 'strict-map "lists are different lengths")]))
Or, using racket/match something like:
(define (strict-map f as bs)
(match* [as bs]
[[(cons a as) (cons b bs)] (cons (f a b) (strict-map f as bs))]
[[(list) (list) ] null]
[[_ _ ] (error 'strict-map "lists are different lengths")]))
When writing recursive functions by hand I tend to prefer pattern matching. Because it reduces the car and cdr noise:signal ratio. And because it looks like writing out the various cases in a "by example" style. And, as with for/fold, it tends to expand to equivalently efficient code.
p.s. That strict-map could be rewritten to be fully variadic just like map.
Here is a way to get you started on a "cheaply constructed but expensive" for/fold variant that protects you from sequences of unequal length:
#lang racket
(require (for-syntax syntax/parse))
(define-syntax (for/fold/protected stx)
(syntax-parse stx
[(_ accu ([x xx] ... clauses ...) body ...)
#:with (determine-length ...) (protect #'(xx ...))
#'(cond
[(not (= (determine-length xx) ...))
(error 'for/fold/protect "expected sequences of equal length, given ~a" `[xx ...])]
[else
(for/fold accu ([x xx] ... clauses ...) body ...)])]))
(define-for-syntax (protect syn-lists)
(define lists (syntax->list syn-lists))
(for/list ([l lists])
(syntax-parse l
[((~literal in-list) l) #'stream-length]
[((~literal in-vector) v) #'sequence-length]
;; the next line is wrong
[_ #'length])))
(define ll '(1 2 3))
(define kk '#(a b c d))
(define mm '(1 2 3))
(for/fold/protected
([r '()] #:result (reverse r))
([l (in-list ll)] [m mm] [k (in-vector kk)] #:do [ (printf "~a ~a\n" l k) ])
(cons (list m l k) r))
You'd have to write a generic length-for to get the "wrong" case done properly.
The "cheap" refers to the macro part.
The "expensive" refers to the upfront checking part, which duplicates the traversal.
When I program, I use fold when I don't know the length and want an "inexpensive" way of failing (terminating the program execution) and for/fold when I am okay with abandoning left-over elements of a sequence (or when I know they are guaranteed to be of the same length).
"Learn about Racket's pattern matching" has been on my radar, thank you for prompting me to learn a bit about it.
This approach is lovely! It's so short and so readable. The pattern matching style really appeals to me; as you say, it's nice to describe cases via "shapes" or examples or declaratively, rather than interrogatively with predicates.
Here, it's a perfect description of what I want: the first pattern says "the shape is that of two lists that have at least one element"; the second says "two empty lists"; and the third says "in any other situation, we have an empty and a nonempty list".
And not only that, but when I use that for my solution to the kata, it's 35% faster than the explicit car/cdr solution! I have no idea why. Here's my solution:
(define (comp%% xs ys)
(let recurse ([xs (sort (map sqr xs) <)]
[ys (sort ys <)])
(match* [xs ys]
[[(cons a as) (cons b bs)] (if (equal? a b)
(recurse as bs)
#f)]
[[(list) (list) ] #t]
[[_ _ ] #f])))
Now my question is, why? I tested, and it's not the explicit length checks. My named let defines a function that seems equivalent to the "compAux" function in the other solution. But mine is significantly faster. Any ideas why?
(Another question: I'm not clear on the difference between match and match*. I read the reference, but it's not so clear to my little pea brain.)
Sorry, using match* here was probably an unnecessary distraction. (Unlike many people here, I am definitely not a professional educator.)
match* is just a convenience for match. For now just think of it as writing e.g. (match (list as bs) _) and wrapping each pattern clause in a list.
By the way, I'm pretty sure I first saw this style -- "classic" handwritten recursion but using match -- in some code by @soegaard, to give credit where it's due.
Not immediately sure why. Might be related to details like this let and comment from the source for Racket's map:
(let loop ([l l])
(cond
[(null? l) null]
[else
(let ([r (cdr l)]) ; so `l` is not necessarily retained during `f`
(cons (f (car l)) (loop r)))]))
The match approach also does that -- rebinds as to (cdr as), ditto for bs -- and avoids that retention.
This guess is more plausible if Racket's time says the extra time for your explicit car/cdr variation is mostly "GC".
Having said all that, this guess feels insufficient?
p.s. Tip: When using Racket's time in code, run that using racket from the command-line.
(If instead the code is run in e.g. DrRacket or Emacs Racket Mode, the timing is likely to be contaminated by other stuff running, as well if the code is annoated by errortrace for better stack traces, etc.)
BTW, is there a nice idiomatic way to get for/fold and friends to somehow signal or handle the situation in which one of the iteration sequences is exhausted, but the other isn't?
Not sure if this is what you want, but you could use a macro that checks the lengths before starting the fold: