Purely Functional Data Structures anyone?

dominik.pantucek · August 21, 2022, 10:59am

While working on my own non-TR implementation of PFDS RB Tree following[1] and heavily consulting the pfds implementation in the Racket package catalog[2][3][4], when trying to implement key deletion, I found an issue with the implementation by Hari Prashanth K R (krhari). Apparently he is no longer involved with Racket and apparently Asumu Takikawa only migrated the package to the new package system (huge thanks for that, btw).
A minimal erring example is:

#lang typed/racket

(require pfds/red-black-tree)

(random-seed 1)
(define numbers : (Listof Integer)
  (set->list
   (list->set
    (ann 
     (for/list ((i : Integer (in-range 0 100)))
       (ann (random 0 10000) Integer))
     (Listof Integer)))))
(displayln numbers)

(define t : (RedBlackTree Integer) (redblacktree <))
t
(define l (redblacktree->list t))
l

(define t1
  (for/fold ((t : (RedBlackTree Integer) t))
            ((n : Integer (in-list numbers)))
    (insert n t)
    #;((cast insert (Integer)
     (ann t (RedBlackTree Integer))
     (ann n Integer)))))

t1

(define t2
  (for/fold ((t : (RedBlackTree Integer) t1))
            ((n (in-list numbers)))
    (delete n t)))

t2

And obligatory output:

(2463 8863 6799 4239 2255 575 8991 527 2718 5022 7166 1518 4174 7710 5325 3773 6541 829 557 8973 4028 652 9324 5212 6732 1596 8476 5067 7787 5707 1051 2298 7322 4490 6329 8889 5593 8825 1736 9416 6504 8808 6424 4120 4824 4728 3031 2503 183 679 7319 6535 343 1591 39 2327 2230 5558 3142 8774 1174 8838 1398 1046 2054 725 4309 7957 2837 6293 9845 7013 3893 7124 9636 2164 5124 4003 3235 7699 5139 3795 179 659 3155 6451 6066 3698 4450 9921 3345 1504 224 5616 960 8880 4512 1680 5696)
#<RBTree>
'()
#<RBTree>
Invariaance violation 'sub1
  context...:
   /home/joe/.local/share/racket/8.5/pkgs/pfds/pfds/red-black-tree.rkt:206:0: bal-right
   /home/joe/.local/share/racket/8.5/pkgs/pfds/pfds/red-black-tree.rkt:148:0: del-help
   [repeats 2 more times]
   /home/joe/.local/share/racket/8.5/pkgs/pfds/pfds/red-black-tree.rkt:136:0: delete
   body of "/home/joe/Projects/Programming/algorithms/pfds-rb-test.rkt"

Of course I was looking for an error in my own implementation before I realized, that probably the original implementation of re-balancing after deletion may be flawed. The original paper by Chris Okasaki just states:

Finally, it is natural to wonder whether similar improvements are possible for
deleting an element from a red-black tree. Unfortunately, it appears that the answer
is no. The two most straightforward algorithms known to us are still the well-known
bottom-up and top-down algorithms, amply described elsewhere (Cormen et al.,
1990; Weiss, 1998).

And guess what, the referenced publication (Weiss, 1998) contains only some Java implementation with hints on how to make it "more functional" to put it at best.
Does anyone have more experience with PFDS in Racket and particularly the methods for re-balancing RB trees in functional setting?
What I really like about the current implementation in pfds/red-black-tree is the reuse of the pattern-matching re-balancing at least for parts of the code. Frankly I do not know at the moment whether a proper reuse is even possible in this case.

I've got some ideas how to explore this problem further - should be easy to pin-point the problematic case(s) with my 4 (yes, four) implementations - but I'd like to see any hints from someone independent as I have been looking into this probably for too long.

There are more issues with the implementation in pfds package - but it's more of a "nice to have" improvements in reality. The issue described here is the only severe one I found. And of course, the insertion goes straight from the original paper and is implemented correctly. Or at least it matches my implementation from scratch which seems to be working OK so far

[1] https://www.cambridge.org/core/services/aop-cambridge-core/content/view/62BC5EA75A2C95E3F6EE95AE3DADF0E5/S0956796899003494a.pdf/red-black-trees-in-a-functional-setting.pdf
[2] pfds
[3] 7 Red-Black Trees
[4] https://github.com/takikawa/tr-pfds/blob/master/pfds/red-black-tree.rkt

plragde · August 21, 2022, 4:22pm

The cleanest implementation I know of uses a nice framework based on split and join operations, which works not only for RB trees but for several other balanced binary search tree implementations. It's work by Blelloch, Ferizovic, and Sun (2016), but the idea is much older. I use it in my flânerie Functional Data Structures, which might interest you.

https://cs.uwaterloo.ca/~plragde/flaneries/FDS/Sets_and_Maps.html

simonls · August 21, 2022, 7:16pm

I am a bit unsure whether this topic is about persistent data structures in general, or red black trees in particular.

Are you using them as binary search trees?
So in a persistent use case that would mean you want a data structure that is ordered, has at least O(log n) lookup and you can keep around old versions of the tree with structural sharing.

I am not sure how red-black trees behave when you try to insert multiple identical/equivalent elements.
I guess this is an implementation detail / there are different choices:

you error, or do nothing if it is already present (the latter for sets)
you increment a count (multiset with identical elements)
you keep a list (multiset with equivalent elements)
this seems similar to some (chaining) hash table implementations

If the red-black tree allows multiple same elements and lookup isn't of primary concern, but instead random access, or fast iteration, then Persistent Vectors could be useful.

What are the operations that you really care about?

The wikipedia article on red black trees mentions this:

In 2008, Sedgewick introduced a simpler version of the red–black tree called the left-leaning red–black tree [23] by eliminating a previously unspecified degree of freedom in the implementation. The LLRB maintains an additional invariant that all red links must lean left except during inserts and deletes.

[Edit: one of the references there: Purely Functional Left-Leaning Red-Black Trees]
That seemed interesting, but I am not sure whether that makes a persistent version easier to implement.

The article also mentions this link http://coding-geek.com/how-does-a-hashmap-work-in-java/#JAVA_8_improvements which explains that the chaining uses small lists or for more elements converts those to red-black-trees. So for the multiset case with equivalent elements (needing to keep the individual ones) the chain could be another red-black-tree that uses another comparison function (comparing address identity?).

Another question is why trees, instead of persistent hash tables?
If it is because of order then this might be interesting: Simple, Deterministic Dictionaries

benknoble · August 21, 2022, 7:31pm

Matt Might has a nice article about the topic in Racket https://matt.might.net/articles/red-black-delete/ including a Functional Pearl paper https://matt.might.net/papers/germane2014deletion.pdf

plragde · August 22, 2022, 12:15am

Racket's immutable hash tables are, under the hood, purely-functional red-black trees.

Edit: that was true historically, but now they are hash array mapped tries.

simonls · August 22, 2022, 1:08am

I like this paper: Runtime and Compiler Support for HAMTs https://www.cs.utah.edu/plt/publications/dls21-tzf.pdf

I guess a better phrasing would have been why this particular tree.
Since reading that paper I like stencil vectors. The thing I dislike about red-black trees is the small branching factor leading to tall trees and wasting lots of memory on pointers to other nodes, but if there is nothing to be able to choose between more than 2 branches at a time (as seems to be the case with general binary search trees), then maybe there is no alternative.

gus-massa · August 22, 2022, 4:15pm

Somewhat related: One of the proposed features for Rhombus is Persistent Data Structures

simonls · August 24, 2022, 1:53am

I wrote some code to draw red black trees and different permutations of the insert order.
Note however that I "hacked" my local copy of pfds to give me access to internal functions.
If you want to use the code you either have to apply the "hack" or create your own version of make-rbtree, rbtree->lazytree and rbtree-insert the rest should work without modifications.

#lang racket

;; packages: raco pkg install --auto lazytree pfds threading pict-abbrevs
(require threading
         data/lazytree
         pict
         pict/tree-layout
         pict-abbrevs

         (except-in pfds/red-black-tree map)
         ;; I added this submod to get at the functions I need
         ;; the orig code doesn't give you access to the internal structure
         ;(module+ internals
         ;  (provide RBTree-tree get-left get-right RBNode-color RBNode-elem))
         (submod pfds/red-black-tree internals))

;; create and convert rbtree to lazy-rbtree (using lazytree as generic interface)
(define (make-rbtree lst)      (apply redblacktree < lst))
(define (rbtree-insert rb val) (insert val rb))
(define (rbtree->lazytree rb)
  (define-syntax-rule (def name l r)
    (define (name n) (if (null? n) n (list (l n) (r n)))))
  (def rbtree-children get-left    get-right)
  (def rbtree-data     RBNode-elem RBNode-color)
  (make-tree rbtree-children (RBTree-tree rb) #:with-data rbtree-data))

;; create picts for rbtree visualisation
(define (circled content color)
  (cc-superimpose
   (disk 30 #:color (symbol->string color)
         #:draw-border? #t
         #:border-color "white")
   (colorize (text (~a content)) "white")))
(define nil
  (tree-layout
   #f #f
   #:pict
   (cc-superimpose
    (filled-rectangle 30 20)
    (colorize (text "nil") "white"))))
(define (leaf content color)     (tree-layout nil nil #:pict (circled content color)))
(define (node content l r color) (tree-layout   l   r #:pict (circled content color)))

(define (visualize-lazy-rbtree lazy-rbtree)
  (define (node->pict . n)
    (match n
      [(list (list content color) l r) (node content l r color)]
      [(list (list content color))     (leaf content     color)]
      [(list (list))                   nil]))
  (binary-tidier (export-tree node->pict lazy-rbtree)))

;; generate numbers -> rbtree -> lazy-rbtree -> visualize
(define rbtree->pict (λ~> rbtree->lazytree visualize-lazy-rbtree))
(define (show x)     (display "sequence: ") (displayln x) x)

(define pos/any/c (-> (>=/c 1) any/c))
(define/contract numbers   pos/any/c (λ~>> (in-inclusive-range 1) stream->list))
(define/contract snumbers  pos/any/c (λ~> numbers shuffle))
(define/contract visualize pos/any/c (λ~> snumbers show make-rbtree rbtree->pict))

(define/contract (visualize/steps lst)
  (-> list? any/c)
  (define steps
    (for/fold ([tree (make-rbtree null)]
               [seq   null]
               [steps null]
               #:result (reverse steps))
              ([step (in-list lst)])
      (define new-tree (rbtree-insert tree step))
      (define new-seq  (cons step seq))

      (define str (string-join (map ~a (reverse new-seq)) " "))
      (define txt (inset (text (~a "insert sequence: " str)) 3))
      (define background (filled-rectangle (pict-width txt) (pict-height txt) #:color "white"))
      (define new-steps
        (cons (list (lc-superimpose background txt)
                    (rbtree->pict new-tree))
              steps))

      (values new-tree new-seq new-steps)))

  (define width (apply max (map pict-width (flatten steps))))
  (define aligned
    (for/list ([s (in-list steps)])
      (match-define (list txt tree) s)
      (define hi (inexact->exact (floor (max 0 (/ (- width (pict-width tree)) 2)))))
      (vl-append txt
                 (blank 0 10)
                 (inset tree hi 0))))

  (apply vc-append aligned))

(define/contract (visualize/permutations lst)
  (-> list? any/c)
  ;; avoiding in-permutations because it has inversed order
  (apply ht-append
         (add-between
          (for/list ([p (in-list (permutations lst))])
            (visualize/steps p))
          (blank 10 0))))

(define (saveit path p) (save-pict path p) p)
(module+ main
  (saveit "rbtree.png" (visualize 20))
  (saveit "rbtree-permutations.png" (visualize/permutations (numbers 3))))

And here is how it looks:

sequence: (10 19 1 16 17 7 12 11 9 20 8 18 14 15 2 4 5 3 6 13)

Maybe this could be useful for creating pretty documentation or exploring / explaining corner cases.

dominik.pantucek · August 26, 2022, 3:43pm

Thanks everyone for the replies - they are truly appreciated. I really like the split/join approach suggested by @plragde and all the digging @simonls has done gave me some insights I didn't think about before.
However - no matter what my intentions were when I started looking into this, the original question remains, although it should be rephrased:
Did anyone fiddle with the pfds/red-black-tree implementation and tried to fix it?
There are apparent errors where in append functions it explicitly creates some red nodes with red children - but fixing just those is not enough to fix the whole implementation.
I also tried similar approach - which somewhat works with respect to the no red children of red node rule, however it breaks the black path length rule after few deletes.
Maybe the question should be: Is it possible to use the original pattern-matching code from Chris Okasaki's paper to implement deletion?
I'll try to go through it once again in the following weeks and see whether I can find the root of the problem (probably including a clean implementation to double-check) and we'll see how deep is this rabbit hole.

benknoble · August 26, 2022, 5:42pm

Perhaps you missed it, but I believe MM's post/article on deletion are specifically addressing shortcomings w/ Okasaki's approach, which omitted deletion?

EmEf · August 26, 2022, 8:30pm

@camoy has a short article on improving red-black-trees (wip). He originally wrote it w. Racket but now it’s in Haskell.

Topic		Replies	Views
Writing a simple interpreter which can add numbers in tree form General	4	226	January 29, 2023
Racket v8.13 is now available Announcements release	3	720	May 25, 2024
Racket v8.16 draft release notes Internals release-management , release	18	394	March 2, 2025
Racket v8.16 is now available Announcements release	3	273	March 4, 2025
Difference Lists and Racket Show & Tell data-structures	3	138	October 6, 2024

Purely Functional Data Structures anyone?

Related topics