Using Hash-Tables as Multisets

Hi, Racket Discourse.

I have been thinking a lot about multisets recently, because I want to implement a particular "computer algebra" system which codifies arithmetic/algebra in terms of "closed-form numbers", to a degree, based on the work of William Bricken, et al. [0]


skip this part for the TL;DR

The basic idea is that you have three elementary "forms", denoted by:
(...) analog to (exp ...);
[...] analog to (log ...);
<...> analog to (neg ...).

The exponential and logarithmic forms are typically interpreted in base e, Euler's number, but they are in essence base-agnostic.

These forms act as containers which then have the usual arithmetical and algebraic rules:

(x) := #^x
[x] := log_#(x)
<x> := -x
where # is an arbitrary base.

Writing expressions next to each other is implied addition. Whole numbers can be written, e.g.,

0 := _ ;; (void)
1 := ()
2 := ()()
3 := ()()()
etc.

We may write the usual operations, like addition,

A+B := A B

and subtraction,

A-B := A<B>
where <<x>> := x

on to multiplication,

A*B := ([A][B])

and division,

A/B := ([A]<[B]>)

and exponentiation,

A^B := (([[A]][B]))

and so on.

There are a few axiomatic equations (3, if I recall correctly) that are used to establish the general properties of the system, and from which the rest of it can then be deduced.

It also introduces an interesting construction, which he simply calls J := [<()>] (that he refers to as the additive imaginary, although πi might be a more familiar characterization), analog to the imaginary number i (multiplicative imaginary).

This can be used to "negate" an expression A by way of (J [A]), lessening somewhat the primacy of the <...> form.

I am not a mathematician, so the full semantics of the above escapes me, but it has been on my mind quite often since I came upon it.


It is notable that the ordering of the content does not matter--only the multiplicity and containment relationship are significant--which makes for an interesting place to apply pattern matching.

Racket does not have "explicit" support for multisets, although there are libraries such as rebellion which do offer them as reified data structures.

The point of this whole digression, is to ask whether there is a particular reason not to use immutable hash-tables as multisets, where the keys themselves are immutable hash-tables, and their multiplicity is stored in the values of the keys?

One would have to differentiate between the different "types" of containers, but I reckon this can be done easily enough by using a special key in the hash-table which maps to itself, which indicates its "type", i.e. #hash(('EXP . 'EXP) ...) [1] would be an exponential form, or something like that, obviously barring multiple such "special keys" in any given form.

I have seen that there have been recent discussions around the semantics/syntax of matching on hash-tables, as in this thread, which is why I ask, because I don't know much more about them than how to use them, for the most part.


[0] I have his texts on hand, but I am not sure about what the etiquette is regarding sharing that kind of thing, and the general idea is expounded upon in the texts linked on his site.
Edit:
[1] Or perhaps a gensym key and type value, but you get the idea.

Some thoughts about this:

If I wish to destructure (deconstruct?) these forms using match, I have to consider how permissive I want the matching to be when defining the expanders. It is perhaps not obvious at first, but "reifying" the structure of the numbers in this manner makes it possible to reify zero, 0, as ([])=[()]=_, or any similarly nested form, which essentially means that you can always match a pattern, even if none is technically present, depending on how this is approached. [0]

If I look at the multiplicity aspect of the matching, this can perhaps be resolved by assigning to each form a special key, NIL := (gensym), which has a multiplicity value of +inf.0, which means that there are always an "infinite" number of these forms in any given form.

As mentioned, the more deeply nested NIL-equivalent forms are also candidates for these "ambiguous" matches. I haven't ever used app in a match-form before, but looking at it, I assume that it could be useful, if you have a predetermined algorithm for attempting to reduce forms to decide whether they are NIL-equivalent.

Another note on equivalent forms, is that numerals in general are also candidates for matching, because integers and rationals have definite representations as forms; either "flatly", i.e., n := () ..n, or more complexly, but again, perhaps something like app would be useful here. I haven't even considered how one would approach approximated numbers.

However, the above might be better suited to being expressed using macros/syntax, because I get the feeling I would need to "duplicate" patterns if I have to supply the particulars to the procedure used with app. I can't yet say.

Multiplicity in general should be "easy" to interface with, because it is easy to test whether a particular pattern describes a subset of the form being matched, although it might be necessary to take into account the "specificity" of forms when matching them. For example, I would consider it preferable to match a more deeply nested pattern first to see if it exists, before matching a shallower pattern, because this avoids the more deeply nested pattern being left out due to premature association.

I think it would be better to stick to the multiplicities being only whole numbers 0 through, for argument's sake, positive infinity. One could do negative numbers and all kinds of things, but I think this will unnecessarily duplicate concepts which are already expressed within the forms themselves. The key part is that the structure of the forms is finite and can be represented using counts of positive integers.


[0] I guess this could be said of any system where one could "match" zero, but nonetheless, it is an interesting consideration.