Difference Lists and Racket

Something I've been playing with and wanted to share.

If you're building up a final list by appending intermediate ones together, it's easy to get quadratic behavior if you're not careful - appending to the end of a constantly growing list in a loop, for example:

;;; Trivial example for demonstration purposes
;;; Tail-recursive but O(N^2)
(define (repeat-list lst n)
  (let loop ([results '()]
             [n n])
    (if (= n 0)
        results
        (loop (append results lst) (sub1 n)))))

(repeat-list '(a b c) 3)

A difference list is a purely functional data structure intended to help avert this - it offers O(1) append and O(N) conversion to a normal list. They're popular in Haskell, and apparently Prolog. This blog post goes into more details about them. You can implement them like a tree, with internal node and leaf structs, but the original paper describing them used partial function application and function composition to build them up - a data structure made of closures, basically. It reminds me of the some of the things in SICP where they show how to make cons, car, and cdr out of nothing but function applications. And it all easily translates to Racket:

(define ((list->dlist list-a) list-b) (append list-a list-b))
(define (dlist-append f g) (compose1 f g))
(define (dlist->list dlst [tail '()]) (dlst tail))

;;; Tail-recursive O(N)
(define (repeat-list lst n)
  (let loop ([results (list->dlist '())]
             [n n])
    (if (= n 0)
        (dlist->list results)
        (loop (dlist-append results (list->dlist lst)) (sub1 n)))))

(If you're not familiar with the syntax used in that definition of list->dlist, see the relevant Racket Guide section.)

However...

While it's an interesting and sometimes handy technique, it's not as useful in Racket as it can be in languages like Haskell where appending lists is a binary operation that works on 2 at a time - with Lisp-family languages, append can combine any number of lists in time proportional to the sum of their lengths; the bad cases like the first example in the above-mentioned blog post just don't happen unless you're writing really poor code. And with Racket, since stack overflows from too deep of recursion aren't a thing, you can just write a non-tail-recursive function with repeated appends that's still linear (Since the final list is built from end to start as the call stack unwinds, just like with difference lists):

(define (repeat-list lst n)
  (if (= n 0)
      '()
      (append lst (repeat-list lst (sub1 n)))))

So in conclusion, nifty but not really of significance.

2 Likes

I think you can also use the new 4.16 Treelists , treelist-append is not O(1), but at least you avoid the O(N^2) time.

(treelist-append tl ...)
Appends the elements of the given tls into a single treelist. If M treelists are given and the resulting treelist’s length is N, then appending takes O(M log N) time.

Yeah, there's other alternatives that can be used. I just thought that difference lists were interesting because of how they're built out of function calls.

Today at RacketCon during his talk, @samth brought up the new treelists and gave a toy benchmark comparing a naive append-happy O(N^2) list reversal function to the equivalent using treelists (Unsurprisingly, the latter won handily). I took notes, and added a version using difference lists.

Results on my laptop:

$ racket dlist-benchmark.rkt
Standard reverse
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 3 gc time: 0
Naive reverse
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 57 gc time: 0
cpu time: 765 real time: 8045 gc time: 62
Treelist reverse
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 1 gc time: 0
cpu time: 0 real time: 18 gc time: 0
cpu time: 15 real time: 230 gc time: 15
Difference list reverse
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 2 gc time: 0
cpu time: 0 real time: 52 gc time: 0

Takeaways: reverse is screaming fast, even on a list of a half million elements. Difference lists (AKA function composition) are very respectable. Treelists are pretty decent too, despite their more complicated internals.


Code:

#lang racket/base

(require racket/treelist)

;;; Naive reverse
(define (lreverse l)
  (if (null? l)
      '()
      (append (lreverse (cdr l)) (list (car l)))))

(define (tlreverse l)
  (if (null? l)
      (treelist)
      (treelist-append (tlreverse (cdr l)) (treelist (car l)))))

(define (dlist . elems) (list->dlist elems))
(define ((list->dlist list-a) list-b) (append list-a list-b))
(define (dlist-append f g) (compose1 f g))
(define (dlist->list dlst [tail '()]) (dlst tail))
(define (dlreverse l)
  (if (null? l)
      (dlist)
      (dlist-append (dlreverse (cdr l)) (dlist (car l)))))

(define-values (list50 list500 list5k list50k list500k)
  (apply values (map (lambda (n) (build-list n values)) '(50 500 5000 50000 500000))))

(define (run-benchmarks title to-list rev test-cases)
  (collect-garbage)
  (displayln title)
  (for ([test (in-list test-cases)])
    (time (void (to-list (rev test))))))

(run-benchmarks "Standard reverse" values reverse (list list50 list500 list5k list50k list500k))
(run-benchmarks "Naive reverse" values lreverse (list list50 list500 list5k list50k))
(run-benchmarks "Treelist reverse" treelist->list tlreverse (list list50 list500 list5k list50k list500k))
(run-benchmarks "Difference list reverse" dlist->list dlreverse (list list50 list500 list5k list50k list500k))