Custom sort with a comparator

Hi,

I want nothing fancy: a sort function (from a library), taking for arguments a list or a vector (of strings, in my case) and a comparator which I want to write myself according to my needs.

The comparators are described here. They take two arguments, and return comparison, which consists of three values: lesser, greater, and equivalent. It looks like what I need, but I can't find a sort function to feed my custom comparator to.

The ā€œcomparatorā€ that you mentioned is not a part of Racket. Itā€™s part of a user package called Rebellion, created by @notjack, and you need to additionally install the package to use it.

If you want to use Rebellion, the sort functionality that you are looking for is the sorting transducer.

If you want to use pure Racket, the standard sort function would be sort. It accepts a comparator as a second argument, although this comparator has a different signature than the one in Rebellion: it should return a boolean, indicating whether the first element is less than the second element.

Any idea why do these two approaches exist? The boolean comparator isn't just as universal, or is it?

That's a great question! Why don't you think about it a bit more? If someone gives you a boolean comparator, can you construct a three-way comparator? What about the other direction?

Totally agree. As a matter of fact, it mystifies me for years. Let's see what we have.

  1. Why boolean comparators? Because we can use standard boolean predicates for a comparator, which is enough in common cases.

  2. Can we construct a three-way comparator out of a boolean one? This is trivial, unless I misunderstand the question.

  3. Can we feed a three-way comparator to a function which requires a boolean one? Yes, we (can?), but here the black magic starts (for me).

A working example, Python. _path_compare() is an "old style", as they call it, three-way comparator. functools.cmp_to_key() is the black magic.

        lst = os.listdir(src)
        dirs = sorted(
            [Path(x) for x in lst if (src / x).is_dir()],
            key=functools.cmp_to_key(
                (lambda xp, yp: _path_compare(yp, xp))
                if _ARGS.reverse
                else _path_compare
            ),
        )

Why and how they banned the three-way comparators, beats me. The Python docs. So far I hoped to get away with my ignorance :blush: ...

More accurately, Boolean predicates are embedded in three-way comparators. The reason is in the typesā€”a three-way comparator has as its range a sum type with three nullary constructors (letā€™s use Haskell syntax for this)

data Ordering = LT | EQ | GT

while a Boolean predicate has as its range, well, a Boolean type, which is a sum type with two nullary constructors

data Bool = True | False

We can think of how to map back and forth between them. Suppose I have a three-way comparator, I can construct three Boolean predicates out of this (letā€™s identify them with their negated counterparts) with the maps

  1. [LT ā†¦ True, EQ ā†¦ False, GT ā†¦ False], this is ā€œless thanā€;
  2. [LT ā†¦ False, EQ ā†¦ True, GT ā†¦ False], this is ā€œequal toā€;
  3. [LT ā†¦ False, EQ ā†¦ False, GT ā†¦ True], this is ā€œgreater thanā€.

As can be seen, three-way comparators are more informative than Boolean predicates, which is why we have an embedding at hand, not an isomorphism. Correspondingly, we can also construct three-way comparators from Boolean predicates by inversion of the above maps, but this is more unfortunate because False will end up corresponding to two cases, and we donā€™t know which! What can that possibly mean? That means weā€™ve lost information (or proofs, as proof theorists will point out) by going back and forth.

Okay, type-theoretic nonsense aside, the reason sort doesnā€™t need a three-way comparator is indeed that it only needs to know about ā€œless thanā€ (or, unsurprisingly, ā€œgreater thanā€). It all depends on which interpretation you want.

2 Likes

The rationale for srfi 67 "Compare Procedures" has a section on 2-way vs 3-way comparisons.

https://srfi.schemers.org/srfi-67/srfi-67.html#node_sec_6

1 Like

I think it might help to see a worked-out example in Racket:

#lang typed/racket

(: 2to3 (āˆ€ [Ī±] (-> (-> Ī± Ī± Boolean)
                   (-> Ī± Ī± (U '< '= '>)))))
(define ((2to3 <) a b)
  (cond
    [(< a b)
     '<]
    [(< b a)
     '>]
    [else
     '=]))

(: 3to2 (āˆ€ [Ī±] (-> (-> Ī± Ī± (U '< '= '>))
                   (-> Ī± Ī± Boolean))))
(define ((3to2 cmp) a b)
  (eq? '< (cmp a b)))

Note that 3to2 in my example isn't the same as Python's functools.cmp_to_key(), from what I understand from its docs.

In Racket, we have various comparison functions, each of which defines a domain of values which it can compare and a notion of ordering: for example, string<?, string-ci<?, string>?, string-locale<?, and string-locale-ci<? (among others) all define different orders for sorting strings.

In contrast, Python values have a built-in notion of ordering according to the <, <=, ==, !=, >=, and > operators, which can be customized using the special __lt__(), __le__(), __eq__(), __ne__(), __gt__(), and __ge__() methods. For Python APIs that use a "key function", sorting is customized by returning a wrapper/replacement object where the built-in comparison operators implement the desired order, rather than, as in Racket, supplying a different function to use for sorting. I haven't looked at the implementation, but I would imagine functools.cmp_to_key() works by returning wrapper objects that use the provided three-way comparison function to implement the comparison methods. So 3to2 in my example would be like the implementation of __lt__() on the wrapper object, not like functools.cmp_to_key() as a whole.