Basic recursion: How to eliminate the `reverse` without relying on `append`?

I am processing some data and need to do a simple transformation, which can be simplified into a basic recursion problem.

I have come up with a solution but I want to eliminate the 2 reverse, which are used to keep the number's order of occurrence unchanged.

Just build up the list without extra traversing.

( I checked my old CS101 code and found that they mostly rely on many appends which lead to more traversing. Right?)

Problem:

(list '(a 1) '(a 2) '(b 1) '(b 2) '(b 3) '(b 4) '(c 1) '(c 2) '(c 3)

into 

((a 1 2) (b 1 2 3 4) (c 1 2 3))

My current solution:

(define (classify l)
  (letrec ([aux
            (lambda (prev-label acc l)
              (cond [(empty? l)
                     (list (reverse acc))]
                    [(eq? (caar l) prev-label)
                     (aux prev-label (cons (cadar l) acc) (cdr l))]
                    [else
                     (cons (reverse acc)
                           (aux (caar l) (list (cadar l) (caar l)) (rest l)))]))])
    (aux (caar l) (list (caar l)) l)))

(classify (list '(a 1) '(a 2) '(b 1) '(b 2) '(b 3) '(b 4) '(c 1) '(c 2) '(c 3)))
1 Like

A bunch of cons's to build a list followed by a single reverse to get it in the correct order is typically going to be more efficient than a bunch of appends, yes; less list traversal and fewer allocations of new cons cells.

Here's a way that avoids reversals by using a right fold to traverse the list right to left instead of the typical left to right:

(define (classify lst)
  (foldr (lambda (elem accum)
           (cond
             ((null? accum) (list elem)) ; First call with the last element of the list
             ((eq? (car elem) (caar accum)) ; Current symbol is the same as the one at the head of the list
              (cons (append elem (cdar accum)) (cdr accum))) ; Append it to the existing numbers
             (else (cons elem accum)))) ; New symbol, make it the new head of the list
         '() lst))

(classify '((a 1) (a 2) (b 1) (b 2) (b 3) (b 4) (c 1) (c 2) (c 3))) ; => '((a 1 2) (b 1 2 3 4) (c 1 2 3))

It does use append, but only of 2-element lists as the head lists being appended onto, which is acceptably efficient in my book. It's when the list you're appending onto is long (And/or when it keeps growing in length at each step of the recursion) that it gets bad.


Another approach, using Racket's group-by to get a list of lists ('((a 1) (a 2)) ((b 1) ...))), and then turning those into the desired format:

(define (classify lst)
  (map (lambda (group) (cons (caar group) (map cadr group))) (group-by car lst))
3 Likes

Using cons to accumulate the result list will reverse the order.
It's common to have a pass that accumulates the result followed
by a pass that restores the order. This is more effecient than
using append (which often leads to quadratic time use).

Here are two alternatives:

#lang racket

; s:   symbol
; n:   number
; ns:  numbers       = (list number ...)
; g:   group         = (list symbol number ...)
; gs:  groups        = (list group ...)
; a:   association   = (list symbol number)
; as:  associations  = (list association ...)
; ass: associationss = (list associations ...)

(define (association-symbol a) (first  a))
(define (association-number a) (second a))

; classify: associations -> groups
(define (classify as) 
  ; hash table from symbols to numbers in reverse order
  (define ht (make-hash))
  ; add n to the group tagged named s
  (define (add s n) (hash-update! ht s (λ (ns) (cons n ns)) '()))
  ; enter all elements in the hash table
  (for ([a as])
    (add (association-symbol a)
         (association-number a)))
  ; extract groups
  (for/list ([(s ns) (in-hash ht)])
    (cons s (reverse ns))))

(define (classify2 as)
  (define ass (group-by first as))
  (for/list ([as ass])
    (define s  (association-symbol (first as)))
    (define ns (map association-number as))
    (cons s ns)))


(define as (list '(a 1) '(a 2) '(b 1) '(b 2) '(b 3) '(b 4) '(c 1) '(c 2) '(c 3)))

(classify  as)
(classify2 as)

1 Like

The hash table solution does not necessarily preserve the order of groups. One possibility to preserve the order is to record the minimal index of items in each group, and then print groups sorted by this minimal index.

1 Like

Thanks very much, these alternatives are much better than my naive solution!


Inspired by @shawnw , I also made a version with foldr :)

(let ([lst (list '(a 1) '(a 2) '(b 1) '(b 2) '(b 3) '(b 4) '(c 1) '(c 2) '(c 3))])
  (foldr (lambda (v l)
           (if (empty? l)
               (list v)
               (let ([label (car v)]
                     [number (cadr v)]
                     [prev-group (car l)]
                     [prev-group-label (caar l)])
                 (cond
                   [(eq? label prev-group-label)
                    (cons (cons label (cons number (cdr prev-group))) (cdr l))]
                   [else
                    (cons (list label number) l)]))))
         '()
         lst))
1 Like

Here's my solution, in which I tried to do "two" passes: one going down the list to order the items, and, one coming back up to accumulate the items. It also does not use append or reverse (explicitly).

The weird use of and was just to accommodate the empty list on input.

Probably not something I'd ever really use, because it's too verbose and it would not accommodate an input that's not in order of its heads; but it is fun(ctional).

#lang racket

;; lst :- (list (list head item) ..1)
(define (classify lst)
  (define (descend descent-list [descent-items '()] [descent-heads'()])
    (cond [(empty? descent-list)
           (values '(()) descent-items descent-heads (car descent-heads))]
          [else
           (define-values (ascent-list ascent-items ascent-heads label)
             (descend (cdr  descent-list)
                      (cons (cadar descent-list) descent-items)
                      (cons (caar  descent-list) descent-heads)))
           (define ascend
             (if (equal? label (car ascent-heads))
                 (cons  (cons (car ascent-items)              (car ascent-list)) (cdr ascent-list))
                 (list* (list (car ascent-items)) (cons label (car ascent-list)) (cdr ascent-list))))
           (values ascend (cdr ascent-items) (cdr ascent-heads) (car ascent-heads))]))
  (match/values (descend lst)
    [(accu items heads label)
     (cons (cons label (car accu)) (cdr accu))]))

(classify (list '(a 1) '(a 2) '(b 1) '(b 2) '(b 3) '(b 4) '(c 1) '(c 2) '(c 3)))
1 Like

Here's another alternative that doesn't use reverse or append:

(define (classify input)
  (define (classify-one input current-label)
    (if (null? input)
        (values '() '())
        (match-let ([ (list label val) (car input) ])
          (if (eq? current-label label)
              (let-values ([ (result remaining) (classify-one (cdr input) current-label) ])
                (values (cons val result) remaining))
              (values '() input)))))
  
  (if (null? input)
      '()
      (let ([ label (caar input) ])
        (let-values ([ (one remaining) (classify-one input label) ])
          (cons (cons label one) (classify remaining))))))

(classify (list '(a 1) '(a 2) '(b 1) '(b 2) '(b 3) '(b 4) '(c 1) '(c 2) '(c 3)))

; => '((a 1 2) (b 1 2 3 4) (c 1 2 3))
2 Likes

Another nice and short one that uses indexes to form the groups:

(define (classify lst)
  (match-define (list heads items) (apply map list lst))
  (map (lambda (head)
         (cons head (map (lambda (idx) (list-ref items idx)) (indexes-of heads head))))
       (remove-duplicates heads)))
;; ---------------------------------------------------------------------------------------------------
;; Racket solution using for/fold (a foldl that allows the use of several accumulators, not just one)

(define (classify-4 l0)
  (define first-key (first (first l0)))

  (for/fold ([global '()]               #; (reverse global) ; is the Classifier between l0 & l
             [local  [list first-key]]  #; (reverse local)  ; classifies 1 key, right before l in l0
             [last-key-seen first-key]  ;; the key preceeding `l` in `l0` 
             #:result #;"of this loop is:" (reverse (plus local global)))
            ([key-value-pair (in-list l0)])
    (define current-key (first key-value-pair))
    (define current-val (second key-value-pair))
    
    (if (eq? last-key-seen current-key)
        (values global              (cons current-val local)       current-key)
        (values (plus local global) (list current-val current-key) current-key))))

(define (plus l-accu g-accu)
  (cons (reverse l-accu) g-accu))

I ran stress tests of all solutions presented here.

Observation 1 The solution presented first is the fastest one, across a spectrum of sizes of lists and uses of the classify function.

Observation 2 The above solution (I am using Racket-y style here) is the only one that beats the first solution by a factor of 2 or 3 across the spectrum.

[Full file available on request.]

I'm curious how you tested this. I was surprised at your results because I've found the style I showed, while possibly not aesthetically pleasing, to be pretty speedy typically.

I created a list that has 20,000 labels, with each label having a random number of values (varying from 1 to 1,000 values). I then ran my version 10 times, and your version 10 times. When I average the 5 best runs for each version, my average was 784 ms and yours was 1,388 ms.

It's quite possible I'm simply misunderstanding your post.

Here's the gist of the code:

It's important to run the file with only one classify function being tested at a time; hence, one is commented out.

[

six variants of classify in plain Racket
gist.github.com

](six variants of classify in plain Racket · GitHub)

The variants are in the above-mentioned gist. (A typed version of the file is also available.)

This file includes unit tests. To stress-test the variants from the command line, I ran the script
below with

$ ./xstress-plain size-of-list: 1000 number-of-runs: 200

with a whole bunch of pairs of numbers (shifting 0s from left to right and right to left, but also
varying numbers. The typed variants run somewhat faster than the untyped ones.

I ran them on a (mostly quiescent) M2 MacBook Air, 16GB, macOS 13.5.2, 8 cores (4+4),


#! /bin/sh
#| -*- racket -*-
exec racket -tm "$0" -- ${1+"$@"}
>#
#lang racket

;; stress test the plain `classify` version

(provide main)

;; -----------------------------------------------------------------------------
(require "classify-plain.rkt")
(require rackunit)

;; -----------------------------------------------------------------------------
(define (main _size-of-list size-of-list _number-of-uses number-of-uses)
(define n (string->number size-of-list))

(define (stress-me f msg)
(define-values [l0 expected0] [example-2 n])
(printf "---------------------------------\n")
(printf "~a\n" msg)
(collect-garbage) (collect-garbage) (collect-garbage)
(define result '())
(time (for ([k (string->number number-of-uses)]) (set! result (f l0))))
(check-equal? result expected0))

(stress-me classify-1 version-1)
(stress-me classify-2 version-2)
(stress-me classify-3 version-3)
(stress-me classify-4 version-4)
(stress-me classify-5 version-5)
(stress-me classify-6 version-6))

(define [example-2 n]
(define A (list '(a 1) '(a 2)))
(define B (list '(b 1) '(b 2) '(b 3) '(b 4)))
(define C (list '(c 1) '(c 2) '(c 3)))
(define input
(append
(apply append (make-list n A))
(apply append (make-list n B))
(apply append (make-list n C))))
(define expected
(list (cons 'a (apply append (make-list n (map second A))))
(cons 'b (apply append (make-list n (map second B))))
(cons 'c (apply append (make-list n (map second C))))))
(values input expected))

(module+ test
(main 'size-of-list: "2" 'number-of-uses "1"))