hash-table pattern for
match is problematic. It is buggy and inefficient. We would like to deprecate the
hash-table pattern and create a new pattern that does a better job.
A preliminary discussion is in match: fix hash-table pattern and its doc, add more optimizations by sorawee · Pull Request #4532 · racket/racket · GitHub. I will summarize viewpoints from the discussion in each section to show possible design spaces.
Everyone seems to agree on the name
hash. One downside of
hash is that the
hash constructor constructs only immutable hash tables, but the
hash pattern should be able to destruct mutable hash tables. However, there doesn't seem to be a good alternative name.
Should key and value be grouped together?
One reason why we should not group key and value together is to make the pattern resemble the constructor. The
hash function is used like this:
(hash key-1 val-1 key-2 val-2)
So perhaps the
hash pattern should do the same.
On the other hand, one reason why we should group key and value together is to make it clearer to readers. I personally really don't like reading.
(hash 'abc 'def' 'ghi 'jkl 'mno 'pqr 'stu 'vwu)
'mno a key or a value? To answer that I need to start parsing from
'abc. On the other hand, with grouping, I can immediately tell that
'mno is a key.
(hash ['abc 'def'] ['ghi 'jkl] ['mno 'pqr] ['stu 'vwu])
A counterargument is that one can use newline as a way to group already, which IIUC is the "Clojure style".
(hash 'abc 'def' 'ghi 'jkl 'mno 'pqr 'stu 'vwu)
A counter-counterargument is that it is unnecessarily verbose, and the "Racket style" tends to group things explicitly.
One more datapoint is that the current
hash-table pattern does group key and value together.
Currently, @samth advocates for no grouping, while I (@sorawee) advocates for grouping.
Full and partial matching
A full matching requires that there is no keys besides those listed in the pattern. A partial matching allows extra keys.
(match (hash 1 2 3 4) [(hash [1 a]) a] [_ 'none])
'none under the full matching semantics, and
2 under the partial matching semantics.
The question that I want to ask here is about what could be supported. There are three choices.
- Only support full matching.
- Only support partial matching.
- Support both.
IIUC, currently @samth advocates for (1) for the initial implementation, though he is open to further extension in the future. @notjack advocates for (2). @AlexKnauth and I advocate for (3).
Default matching behavior
If you choose "support both" in the previous section, congrats, you have one more thing to think about: what should be the default matching behavior?
- Partial matching by default. That is to say the above program would return
2. However, the pattern could support something like
(hash [1 a] #:full)to fully match.
- Full matching by default.
- Explicit mode is required. Or perhaps there should be two separate patterns
@AlexKnauth advocates for partial matching by default, but users can use the rest feature (see below) to do full matching.
(match (hash 1 2 3 4) [(hash [1 a] #:rest (empty-hash))]) ; full matching
I in fact considered this option before, but the fact that we need to support another pattern to recognize an empty hash doesn't seem ideal to me (technically
(? empty-hash?), but an explicit
empty-hashpattern could make matching much more efficient)
I prefer full matching by default, but users can use the rest feature to do partial matching
(match (hash 1 2 3 4) [(hash [1 a] #:rest _)]) ; partial matching
Note that this very same program would be a partial matching in @AlexKnauth's preference, too.
My another preference is partial matching by default, but if
#:fullis given, it would do the full matching.
The rest feature allows users to extract the residue from matching. For example:
(match (hash 1 2 3 4 5 6) [(hash [1 a] #:rest h) h])
(hash 3 4 5 6).
There are many possibilities for this feature:
Syntax: keyword vs ellipsis
Should the rest feature be expressed by
#:rest, as shown above, or ellipsis, as shown below?
(match (hash 1 2 3 4 5 6) [(hash [1 a] h ...) h])
One reason to not prefer ellipsis is that ellipsis-bound variable has always been a list in
match. But we might not necessarily want a list as a result (see next section).
Hash table vs list
What should the residue be? A hash table or a list?
hash-table pattern extracts the residue as a list (via ellipsis).
(match (hash 1 2 3 4 5 6) [(hash-table [1 _] h ...) h])
'((3 4) (5 6)).
A hash table extraction is much more efficient when the matching value is an immutable hash table, taking time proportional to the number of key-value pairs if keys are not arbitrary patterns (see next section). In contrast, list extraction takes linear time to the size of the hash table.
I also believe that a hash table result is the preferred format that can be used in future computation. In contrast, for list extraction to be useful, the list probably needs to be converted back to a hash table.
Currently, @samth seems to be against the rest feature in general for the initial implementation, but seems to prefer ellipsis/list extraction if the rest feature is to be implemented. @notjack seems to prefer ellipsis/hash extraction, though not against keyword/hash extraction. I prefer the keyword/hash extraction. @AlexKnauth seems to prefer a hash extraction.
What should the key position be? A couple of possibilities:
- The key position can be an arbitrary pattern. The
hash-tablepattern currently does this. The disadvantage is that it is extremely inefficient in general and doesn't seem to be useful. We can of course specialize cases where we know how to make matching efficient. But is it worth supporting the feature in the first place?
- The key position must be a literal. This is the specialized clause in the current
hash-tablepattern that can be matched efficiently.
- The key position can be an arbitrary expression. This is like literals, but it would also allow variable reference, etc, without making matching inefficient.
- Or maybe we could allow
id, literals, and
(== expr)but nothing else. The intention is that it could potentially be useful to extract one arbitrary key without any constraint out (via
hash-iterate-first). However, for the
idcase, the corresponding value pattern must also be an
_, so that there is no constraint from the value pattern.
I'm not aware of anyone's position on this except mine, which is to support arbitrary expressions, but not arbitrary patterns. Ideally, I want to support
id too, but that seems too complicated.
Should duplicate keys be allowed to successfully match? That is to say, should:
(hash [1 a] [1 b])
be equivalent to:
(hash [1 (and a b)])
Or should duplicate keys always fail matching? Or should it error?
If literals are given, should
match warn/error at compile-time?
One data point is that the
hash-table pattern always fail matching on duplicate keys, but doesn't cause an error (though there's a bug that could make matching successful).
I'm not aware of anyone's position on this, and I personally don't have a preference either. Supporting duplicate keys is probably the cleanest semantics, I think?