A bunch of cons's to build a list followed by a single reverse to get it in the correct order is typically going to be more efficient than a bunch of appends, yes; less list traversal and fewer allocations of new cons cells.
Here's a way that avoids reversals by using a right fold to traverse the list right to left instead of the typical left to right:
(define (classify lst)
(foldr (lambda (elem accum)
(cond
((null? accum) (list elem)) ; First call with the last element of the list
((eq? (car elem) (caar accum)) ; Current symbol is the same as the one at the head of the list
(cons (append elem (cdar accum)) (cdr accum))) ; Append it to the existing numbers
(else (cons elem accum)))) ; New symbol, make it the new head of the list
'() lst))
(classify '((a 1) (a 2) (b 1) (b 2) (b 3) (b 4) (c 1) (c 2) (c 3))) ; => '((a 1 2) (b 1 2 3 4) (c 1 2 3))
It does use append, but only of 2-element lists as the head lists being appended onto, which is acceptably efficient in my book. It's when the list you're appending onto is long (And/or when it keeps growing in length at each step of the recursion) that it gets bad.
Another approach, using Racket's group-by to get a list of lists ('((a 1) (a 2)) ((b 1) ...))), and then turning those into the desired format:
Using cons to accumulate the result list will reverse the order.
It's common to have a pass that accumulates the result followed
by a pass that restores the order. This is more effecient than
using append (which often leads to quadratic time use).
Here are two alternatives:
#lang racket
; s: symbol
; n: number
; ns: numbers = (list number ...)
; g: group = (list symbol number ...)
; gs: groups = (list group ...)
; a: association = (list symbol number)
; as: associations = (list association ...)
; ass: associationss = (list associations ...)
(define (association-symbol a) (first a))
(define (association-number a) (second a))
; classify: associations -> groups
(define (classify as)
; hash table from symbols to numbers in reverse order
(define ht (make-hash))
; add n to the group tagged named s
(define (add s n) (hash-update! ht s (λ (ns) (cons n ns)) '()))
; enter all elements in the hash table
(for ([a as])
(add (association-symbol a)
(association-number a)))
; extract groups
(for/list ([(s ns) (in-hash ht)])
(cons s (reverse ns))))
(define (classify2 as)
(define ass (group-by first as))
(for/list ([as ass])
(define s (association-symbol (first as)))
(define ns (map association-number as))
(cons s ns)))
(define as (list '(a 1) '(a 2) '(b 1) '(b 2) '(b 3) '(b 4) '(c 1) '(c 2) '(c 3)))
(classify as)
(classify2 as)
The hash table solution does not necessarily preserve the order of groups. One possibility to preserve the order is to record the minimal index of items in each group, and then print groups sorted by this minimal index.
Here's my solution, in which I tried to do "two" passes: one going down the list to order the items, and, one coming back up to accumulate the items. It also does not use append or reverse (explicitly).
The weird use of and was just to accommodate the empty list on input.
Probably not something I'd ever really use, because it's too verbose and it would not accommodate an input that's not in order of its heads; but it is fun(ctional).
;; ---------------------------------------------------------------------------------------------------
;; Racket solution using for/fold (a foldl that allows the use of several accumulators, not just one)
(define (classify-4 l0)
(define first-key (first (first l0)))
(for/fold ([global '()] #; (reverse global) ; is the Classifier between l0 & l
[local [list first-key]] #; (reverse local) ; classifies 1 key, right before l in l0
[last-key-seen first-key] ;; the key preceeding `l` in `l0`
#:result #;"of this loop is:" (reverse (plus local global)))
([key-value-pair (in-list l0)])
(define current-key (first key-value-pair))
(define current-val (second key-value-pair))
(if (eq? last-key-seen current-key)
(values global (cons current-val local) current-key)
(values (plus local global) (list current-val current-key) current-key))))
(define (plus l-accu g-accu)
(cons (reverse l-accu) g-accu))
I ran stress tests of all solutions presented here.
Observation 1 The solution presented first is the fastest one, across a spectrum of sizes of lists and uses of the classify function.
Observation 2 The above solution (I am using Racket-y style here) is the only one that beats the first solution by a factor of 2 or 3 across the spectrum.
I'm curious how you tested this. I was surprised at your results because I've found the style I showed, while possibly not aesthetically pleasing, to be pretty speedy typically.
I created a list that has 20,000 labels, with each label having a random number of values (varying from 1 to 1,000 values). I then ran my version 10 times, and your version 10 times. When I average the 5 best runs for each version, my average was 784 ms and yours was 1,388 ms.
It's quite possible I'm simply misunderstanding your post.
Here's the gist of the code:
It's important to run the file with only one classify function being tested at a time; hence, one is commented out.
with a whole bunch of pairs of numbers (shifting 0s from left to right and right to left, but also
varying numbers. The typed variants run somewhat faster than the untyped ones.
I ran them on a (mostly quiescent) M2 MacBook Air, 16GB, macOS 13.5.2, 8 cores (4+4),
#! /bin/sh
#| -*- racket -*-
exec racket -tm "$0" -- ${1+"$@"}
>#
#lang racket
;; stress test the plain `classify` version
(provide main)
;; -----------------------------------------------------------------------------
(require "classify-plain.rkt")
(require rackunit)
;; -----------------------------------------------------------------------------
(define (main _size-of-list size-of-list _number-of-uses number-of-uses)
(define n (string->number size-of-list))
(define (stress-me f msg)
(define-values [l0 expected0] [example-2 n])
(printf "---------------------------------\n")
(printf "~a\n" msg)
(collect-garbage) (collect-garbage) (collect-garbage)
(define result '())
(time (for ([k (string->number number-of-uses)]) (set! result (f l0))))
(check-equal? result expected0))
(stress-me classify-1 version-1)
(stress-me classify-2 version-2)
(stress-me classify-3 version-3)
(stress-me classify-4 version-4)
(stress-me classify-5 version-5)
(stress-me classify-6 version-6))
(define [example-2 n]
(define A (list '(a 1) '(a 2)))
(define B (list '(b 1) '(b 2) '(b 3) '(b 4)))
(define C (list '(c 1) '(c 2) '(c 3)))
(define input
(append
(apply append (make-list n A))
(apply append (make-list n B))
(apply append (make-list n C))))
(define expected
(list (cons 'a (apply append (make-list n (map second A))))
(cons 'b (apply append (make-list n (map second B))))
(cons 'c (apply append (make-list n (map second C))))))
(values input expected))
(module+ test
(main 'size-of-list: "2" 'number-of-uses "1"))