RFC: hash table pattern matching

sorawee · February 8, 2023, 4:53am

Again, thanks to everyone.

I was surprised that the hash-table can match partial cases

It’s unclear what you mean by this. Are you talking about how one can use _ ... to do partial matching? Or are you talking about how if keys are all literals, then it will do partial matching? If the latter, that’s a bug.

I like (hash a b c d h …), where h is a hash like the initial hash. I don’t expect too long list of key-values, and in that case I expect to use the vertical alignment.

In my prototype, hash would not have grouping, but hash* would. This is a compromise that we seem to reach.

Does the macro detect (hash a b c d _ …). and avoid the creation of the rest part? It looks very difficult to remove the creation of the unused rest part later in the optimization.

In my prototype, #:rest _ would avoid the creation of the rest part.

Do you feel strongly about the syntax choice of _ ... over #:rest _? Right now I’m pretty convinced by @shhyou’s and @soegaard’s argument that (1) ... should be reserved for lists and (2) it’s awkward to write a match expander that generates .... But if you have a case for _ ..., I’d be interested to hear.

What about removing the _

My personal opinion is a strong no. But again, if you have a case for it, I’d be interested to hear.

What happens with (hash x 10 y 10)? Can it bind (hash ‘a 10)?

That depends on whether or not you allow duplicates, and whether the key positions are expressions or patterns.

In my prototype, key positions are expressions and duplicates are allowed, so x and y wouldn’t bind anything, and it needs to be already bound. If they are both bound to 'a, then the match would succeed, and fail otherwise.

What happens with weak hashes

Can you come up with a concrete example? But I’d say that without concurrency, matching a weak hash should work fine in my prototype.

@pschlie:

why not simply introduce some syntactic sugar

Yes, that’s exactly what I have right now in my prototype. hash* is the more general pattern (supporting the #:default clause), and hash is a syntactic sugar for hash*. hash* has grouping, but hash doesn’t.

@shhyou:

I think we are mostly in agreement. Though I want to note the following:

full match is just some additional size check

In my prototype, it turns out that full matching needs to be more complicated than that. Mainly because we allow duplicates (#:default clause also adds complication, but that’s not too bad).

And we allow duplicates because it improves efficiency significantly for partial matching. If we disallow duplicates, we need to detect them. Since keys can be expressions in general (even with Ryan’s proposal to use == explicitly), we need to check for duplicates at run-time. This turns out to be substantially slower than no check. See https://racket.discourse.group/t/rfc-hash-table-pattern-matching/1686/16 for more information.

what would be a practical use case of …

For this feature, I’m mostly inspired by object destructuring in JavaScript, which is used extensively to e.g. manipulate React states.

let nextState = (state) => {
  let {timestamp, ...xs} = state;  
  ......
};

I wonder whether there is a need to differentiate between mutable/immutable hashes, hasheq, hasheqv, hash (equal) and hashalw.

Yes, the prototype preserves key-comparison method and weak/strong/mutable/immutable/ephemeron-ness.

@gus-massa:

So it may be not so strange that hash produce slow code.

I can argue in the opposite direction that, if I know that a pattern can produce inefficient code, without being clear under what condition it produces inefficient code, I might avoid using it altogether out of fear of being inefficient.

match already has this problem. For example, I don’t know how efficient (list x xs ... y) is. Optimally it can be done in linear time, but does match do that? I don’t know, so I have been avoiding using it.

Not allowing key positions to be general patterns guarantees efficiency, allowing users to be able to use it without any concern.

@ryanc:

My current prototype should address your first reply.

@soegaard:

In analogy with list-rest I suggest introducing a hash-rest

It would be easy to do this.

However, I wrote below that hash* to hash is like struct* to struct and unlike list* to list, so perhaps adding list-rest might mislead people to think that there’s an analogy to list{,*,-rest}.

f an ellipsis-like notation is needed for hash, how about ht… instead of …?

I think that would be acceptable, though I personally would prefer #:rest, since everyone already knows what a keyword is. ... also usually can be used in a more liberal fashion (e.g. multiple ... in the same pattern), so that might raise a false hope that ht... can get used multiple times, etc, etc.

@benknoble:

what about dict that could match any gen:dict?

That would be a good thing to support, though I think it’s orthogonal to the hash table matching. The dict operations on a hash table in particular can be much slower than hash operations on a hash table. So unless that gets improved, hash table patterns would still be useful in practice.

@jbclements:

Right, we have hash* with grouping for that. There’s also hash for people who prefer no grouping, as a compromise.

@greghendershott:

it feels verbose needing to write #:default:

Yeah, #:default is kinda long. My inspiration is from ~optional syntax-parse pattern, which has #:defaults.

If there is another sensible keyword, I would be happy to change it. But I don’t think #:else or #:or match the meaning well. I want to avoid not having any keyword at all, since this seems like an “extension” to me, and I want to accommodate for future extensions too. Keywords seem like the appropriate way to do that.

@ryanc:

The mixture of expressions and patterns is confusing/suprising

A couple of points:

We already have a mixture of expressions and patterns in other patterns, like (app <expr> <pat>) and (? <expr> <pat>). So I personally don’t find expressions in key positions to be surprising.
It would be good to make distinction between literals and expressions if that accomplishes something. But I don’t think it does much (besides enabling a very minor optimization that can push some computation on literals to compile-time — which doesn’t change asymptotic complexity). All we want for efficient matching is that we can hash-ref on a concrete key value, and expressions suffice for that.
On the other hand, key pattern with these restrictions makes is not ergonomic to use, and even makes the documentation more complicated and difficult to understand. Compare:

<pat> ::= (hash [<key-pat> <pat>] ...)
<key-pat> ::= <literal>
            | '<literal>
            | (== <expr>)

with:

<pat> ::= (hash [<expr> <pat>] ...)

Let alone the fact that <key-pat> would be documented pages down below, so users need to scroll up and down to understand the pattern.

Minor thing, but it’s not possible right now to document == properly, since == is a match expander which should be recognized by binding, unlike other match patterns. So https://github.com/racket/racket/pull/2145 probably is needed to be resolved first.

@notjack:

I strongly believe the default hash pattern should use partial matching

Per the discussion above, I made a compromise by implementing both hash and hash*. hash to hash* is like struct to struct*, in that hash/struct requires full matching, and hash*/struct* allows partial matching. So if you want partial matching, you can use hash*.

In general, most Racket patterns require full matching, so I think it’s good to have at least one pattern for hash table matching that has full matching by default. hash seems like a good candidate.

This will break whenever

The hash pattern is a newly added pattern, so it couldn’t possibly break any existing code. Perhaps you meant to say people might misunderstand the meaning of the pattern? But that’s why we have documentation. We also are not trying to mislead people. The discussion so far seems to prefer hash and hash* precisely because they are consistent with other patterns. We are trying to reduce confusion!

FWIW, I am in agreement with you that partial matching seems more useful than full matching, so I definitely would use hash* more than hash. But I think it’s good to have both, with different matching mode, regardless.

gus-massa · February 8, 2023, 2:02pm

I was surprised that the current implementation that is live in Racket 8.7 can match partial hashes.

Let's look at list-no-order instead:

#lang racket #;8.7
(match (list -1 4)
  [(list-no-order (? even?) (? positive?)) 'yes]
  [_ 'no])
; ==> 'no


(match (list 7 -2)
  [(list-no-order (? even?) (? positive?)) 'yes]
  [_ 'no])
; ==> 'yes

(match (list 2 2 2)
  [(list-no-order (? even?) (? positive?)) 'yes]
  [_ 'no])
; ==> 'no

I expect the same results for hash. Something like

#lang racket #;8.9
(match (hash -1 #t 4 #t)
  [(hash (? even?) #t (? positive?) #t) 'yes]
  [_ 'no])
; ==> I expect 'no

I think list-no-order and hash are equivalent, and I expect a total disjoint match for both. I think it's very difficult to implement that efficiently (probably np-hard). I'm not sure if it's the best election for the default implementation, but it's what my fingers feels is correct.

There are four options (or more). IIUC in your proposal

no grouping / total :-> hash
no grouping / partial :-> ???
grouping / total :-> ???
grouping / partial :-> hash*

and there is also hash-rest that I think is not defined in your proposal. Are these correct?

About partial matching for processing xlm, I understand it's an important case. I have been using Racket to edit a few xml files, and the problem is not only that it may change in the future but also that I already don't care about a half of the items and I'd prefer to write only the relevant fields in match.

sorawee · February 8, 2023, 6:57pm

I think list-no-order and hash are equivalent, and I expect a total disjoint match for both. I think it’s very difficult to implement that efficiently (probably np-hard). I’m not sure if it’s the best election for the default implementation, but it’s what my fingers feels is correct.

Indeed, list-no-order and hash-table (the current pattern in 8.7/8.8) are essentially equivalent (if you don’t count the bug that makes hash-table allow partial matching).

But now, we have an opportunity to design a totally new pattern with new semantics. My question is, should list-no-order and the new pattern still be equivalent?

And this bring us back again to the point that I wrote earlier:

I can argue in the opposite direction that, if I know that a pattern can produce inefficient code, without being clear under what condition it produces inefficient code, I might avoid using it altogether out of fear of being inefficient.

match already has this problem. For example, I don’t know how efficient (list x xs … y) is. Optimally it can be done in linear time, but does match do that? I don’t know, so I have been avoiding using it.

Not allowing key positions to be general patterns guarantees efficiency, allowing users to be able to use it without any concern.

And just to repeat what @shhyou said, if you do want the list-no-order semantics, you can still have it by writing (app hash->list (list-no-order XYZ ...)). That “the name is more scary” is a good thing! It warns users that its usage is inefficient, so it must be used with caution. On the other hand, they can use the hash pattern without any worry. It is guaranteed to be efficient.

There are four options (or more). IIUC in your proposal

no grouping / partial = (hash ...... #:partial) or (hash ...... #:rest _)
grouping / total = (hash* ....... #:full)

Unless you really want to have four patterns, which I think would be cluttering.

For hash-rest, as I wrote above, I think it’s better to not have it because hash{,*,-rest} is not like list{,*,-rest}, and having them could make it appear that they are similar than they should. Though again, we can easily add it if that’s the consensus.

Note that you can already do (hash{,*} ...... #:rest .....) in my proposal.

notjack · February 9, 2023, 2:01am

Let's not name the second form hash*. The * suffix is a confusing and unreadable naming pattern. I'd rather have (hash ...) and (partial-hash ...) patterns. (Well, what I'd actually like are (hash ...) and (complete-hash ...) patterns, so that partial matching is the default, but that's separate from the naming.)

AlexKnauth · February 9, 2023, 3:59am

I don't think we need a complete-hash pattern at all.

I'd rather have both hash: no grouping / partial / no rest, and hash*: grouping / partial / optional rest

With the hash* able to specify total through an (empty-hash) rest pattern or a keyword option or something.

notjack · February 9, 2023, 6:31am

Name the latter something other than hash*.

AlexKnauth · February 9, 2023, 1:19pm

Why can't it be called hash*?

I don't want to use a non-grouping hash pattern like (hash k v ... ...). I like grouping, with parens or brackets, like (hash* [k v] ...). I want the name for the grouping version to be within a couple characters of the convenience of hash because it's the one I expect to use everywhere.

If (hash k v ... ...) has to be that way because of symmetry with the expression version, I want hash* to have the same default semantics, but with the grouping and optional rest-pattern support. They can both have "partial" or "allow extra" semantics by default.

ryanc · February 9, 2023, 1:24pm

I disagree. I always resented the unnecessary characters when I used the hash-table pattern form. And the fundamental difference between the proposed hash and hash* is key-value grouping, not completeness. The * naming pattern is part of the Racket idiom; since Racket users have already paid the one-time confusion fee, the * name gives us a shorter name and a kind of consistency: hash looks like the constructor syntax, hash* is "similar but a little bit different (grouped)".

sorawee · February 9, 2023, 4:13pm

I mostly agree with @ryanc.

The * naming pattern is part of the Racket idiom; since Racket users have already paid the one-time confusion fee, the * name gives us a shorter name and a kind of consistency

I will argue further that the consistency covers full/partial-ness as well. The struct pattern requires every field listed, with the constructor syntax, while the struct* pattern allows some fields to be omitted, with the grouping syntax. So “completeness” does play a part in the consistency: hash to hash* is like struct to struct*.

But even without considering struct/struct* patterns, if we take "the hash pattern should mirror the hash constructor" (which is what @ryanc proposed a while back) as a design principle, it makes sense that the hash pattern will require full matching. When you use the constructor, you must list all contents. In the same vein, the hash pattern would require all keys to be listed.

That leaves us with hash*, which does things differently from hash. As @shhyou said, it’s much easier to generate code with grouping. As @notjack said, it’s much easier to work with JSON with partial matching. So that’s what hash* has by default.

This goes without saying: just like @jbclements, I respect y’all’s opinions enormously. But your opinions are not all compatible. What I’m trying to do here is to come up with the best compromise.

gus-massa · February 9, 2023, 6:36pm

So, no arbitrary predicates for hash and hash*. What about ==? Is something like this popular?

#lang racket
(define name 'my-long-name)

(match (list 10 'hello 'my-long-name)
  [(list-no-order (== name) 10 x) x])

I expect some difficult to to spell words and other stuff to be defined once. Also, I had some code somewhat like

#lang racket

(define (find x field)
  (match x
    [(list (== field) 10 x)
     (do-something x)]))

(find (read) 'color)
(find (read) 'size)

(My version used car and eq? instead of match.)

sorawee · February 9, 2023, 8:05pm

You would just write field. That’s what I mean by “key positions as expressions”.

#lang racket

(define (find x field)
  (match x
    [(hash* [field val])
     (do-something val)]))

(find (read) 'color)
(find (read) 'size)

gus-massa · February 10, 2023, 2:15pm

I think it's confusing, and I prefer

(define (find x field)
  (match x
    [(hash* [(quote field) val])
     (do-something val)]))

or

(define (find x field)
  (match x
    [(hash* [(== field) val])
     (do-something val)]))

The problem is that the docs say that quote use equal?, but I expect to use automatically the correct comparison, like eq? or eqv?. (What about future extension to other dictionaries, for example to bound-identifier-mapping?)

The other option is == that uses equal? by default but it can use other comparisons. But it is a macro, so I'm not sure it is possible.

sorawee · February 10, 2023, 5:48pm

Ironically, I think what you just mentioned supports my argument for expressions in key positions. If we have patterns in key positions, the key-comparison mode is probably going to be all wrong.

Consider:

(define x (list 42))
(define y (list 42))
(match (hasheq x 1)
  [(hash* [(== y) v]) v]
  [_ 'no-match])

What should this program evaluate to? If I read the doc of ==, I might think that the answer is 1. But a matching that actually returns 1 would be inefficient because now it needs to search the entire hash table for a key that is equal? to y.

In my proposal, guaranteeing efficient lookup is a design principle. I am matching a hasheq, so I want the key-comparison mode to be eq?, so that I can quickly hash-ref on y. In my proposal:

(define x (list 42))
(define y (list 42))
(match (hasheq x 1)
  [(hash* [y v]) v]
  [_ 'no-match])

would result in 'no-match

Indeed, the current hash-table pattern is similarly “wrong”:

(define x (list 42))
(define y (list 42))
(match (hasheq x 1)
  [(hash-table [(== y) v] [_ _]) v]
  [_ 'no-match])

evaluates to 1, instead of 'no-match.

To reiterate, I think using == and literal patterns is not a good idea. They already have their semantics, which are to use equal? for comparison (by default). But what we want for efficient matching on a hash table is to hash-ref on keys quickly, mandating that we use whatever key-comparison mode that the matching hash table has. Expressions in key positions allow us to do this.

gus-massa · February 10, 2023, 6:50pm

Completely 100% agree that

#lang racket 
(define x (list 42))
(define y (list 42))
(match (hasheq x 1)
  [(hash* [(whatever y) v]) v]
  [_ 'no-match])

should use eq?. I'm just wondering what is better:

reuse (quote y) and change the docs
reuse (== y) and change the docs
just use y
use a new special magical identifier

sorawee · February 11, 2023, 4:09am

To be clear, I totally understand why (hash* [<expr> <pattern>]) could have caused confusion. But when I weigh the pros/cons of alternative approaches, I feel other approaches are significantly more confusing/convoluted.

For the four possibilities above, I think:

(1) and (2) would be convoluted and confusing. Elsewhere, == and literals match using equal?. But under hash*, == and literals could have a different behavior… It seems difficult to explain. (hash* [<expr> <pattern>]) is at least easy to explain. Also note that (1) alone by itself is very limiting. It wouldn't be able to match a dynamically computed value.

(4) seems not ergonomic. If I understand correctly, you mean something like:

(match ..........
  [(hash* [(key k1) v1] [(key k2) v2] [(key k3) v3]) .........])

But if the key position must only be (key <expr>) anyway, then why not just write <expr>? Unless it’s possible to write something else besides (key <expr>), but I don’t know what that means.

So to me, (3) still seems like the best choice. I think it would require users to get used to the key position being expressions, but after that, it would be natural: you write key expressions to match the corresponding values.

Another thing to consider is how subtle the failure is. For (3), you probably will get an unbound id if you expect the key position to be a pattern that could bind. That is, it fails fast. For (1) and (2), you would get the wrong key-comparison mode, which seems much more difficult to debug. It could give the appearance that things work fine, but they actually don't.

LiberalArtist · February 11, 2023, 7:07am

My strongest opinion is that I want both "full" and "partial" matching to be available and similarly ergonomic (in contrast to (hash-table [_ _] ...)). As @notjack has argued, JSON APIs are an important use case for partial matching. On the other hand, for data I control, I very often want exhaustive matching.

sorawee:

@ryanc:

The mixture of expressions and patterns is confusing/suprising

A couple of points:

We already have a mixture of expressions and patterns in other patterns, like (app <expr> <pat>) and (? <expr> <pat>). So I personally don’t find expressions in key positions to be surprising.

It would be good to make distinction between literals and expressions if that accomplishes something. But I don’t think it does much (besides enabling a very minor optimization that can push some computation on literals to compile-time — which doesn’t change asymptotic complexity). All we want for efficient matching is that we can hash-ref on a concrete key value, and expressions suffice for that.

On the other hand, key pattern with these restrictions makes is not ergonomic to use, and even makes the documentation more complicated and difficult to understand. Compare:
<pat> ::= (hash [<key-pat> <pat>] ...)
<key-pat> ::= <literal>
            | '<literal>
            | (== <expr>)  
with:
<pat> ::= (hash [<expr> <pat>] ...)
Let alone the fact that <key-pat> would be documented pages down below, so users need to scroll up and down to understand the pattern.

Minor thing, but it’s not possible right now to document == properly, since == is a match expander which should be recognized by binding, unlike other match patterns. So https://github.com/racket/racket/pull/2145 probably is needed to be resolved first.

My second-strongest opinion is that keys should be some form of patterns, not expressions. The only "primitive" patterns that have expressions as sub-forms are app and ? (== is documented as being a library extension), which exist specifically for the purpose of escaping from the language of patterns to the language of expressions. (The paper introduces ? before app, but it seems like (? <expr>) could conceptually desuggar to (app <expr> (not #f)).)

Points 3 and 4 to me sound like arguments for better linking in the documentation, if anything.

On point 2, do we necessarily want to commit to a bunch of calls to hash-ref? I could imagine wanting to make use of hash-keys-subset?, for example, with a quoted hashtable for comparison generated at compile time.

sorawee · February 11, 2023, 8:51am

On hash-keys-subset?: we do want to call hash-ref a bunch of times.

hash-keys-subset? consumes two hash tables, so if we are to use it, we would need to construct a testing hash table, which is not cheap for both time and space. In contrast, the hash-ref approach doesn’t allocate anything, and has an opportunity to short-circuit so that there’s no need to hash-ref on later keys
It would be difficult to implement the #:default support with hash-keys-subset. For each clause with #:default, we have to determine if the default expression will be used. But it’s also not necessary that the key needs to exist.

Regarding the key position, so far there seems to be three possibilities.

Allow general patterns
Only allow limited patterns, namely == and literals
Only allow expressions

For both (1) and (2), there’s a problem about the key-comparison mode. See e.g. RFC: hash table pattern matching - #33 by sorawee for the discussion.

And specifically for (1), it’s also incredibly inefficient. See e.g. RFC: hash table pattern matching - #23 by sorawee for the discussion.

rocketnia · February 12, 2023, 1:49pm

Some thoughts:

Keys as expressions

Key positions in patterns make a lot more sense as expressions, primarily because of efficiency, but also as a matter of scoping. Keys are usually something you already have that you're using to look something up. When using them like that, they come from the surrounding scope.

That makes them similar to the pattern operators themselves. If all places where the binding flow switched back and forth were marked with ==, we wouldn't have (list a b c) patterns, we'd have ((== list) a b c) patterns, or maybe (== (list (== a) (== b) (== c))) patterns to account for the meanings of #%app and the parentheses.

What I just did there... Since == in a pattern position switches the binding flow to expression-style, == as an expression must switch the flow back to pattern-style, right?

So even if keys are expressions by default, there can still be room to have keys that are patterns for the sake of anyone willing to accept the performance implications. They just need to be written (== key-pattern).

I suppose just being able to write down a key pattern isn't the whole story. Prioritizing key expressions, and prioritizing efficient implementation in terms of hash-ref, leads us into supporting duplicate key lookups, which calls a few things into question:

Now every time someone uses a rest argument or asserts exhaustiveness (#:full), there has to be a little more bookkeeping than normal in terms of keeping track of the entries we have left to see. That's the kind of bookkeeping that could have caught duplicates.
Now the patterns can't be used as a makeshift set rewriting language, because any two entries in the pattern might actually refer to the same entry in the subject. There's no use expecting [string? _] [string? _] [string? _] to check that there are at least three string keys.

But I kinda think people can put up with those things, or find workarounds to regain that lost ground if needed (like continuing to use exisitng options, or adding a keyword to something).

Pattern-expression harmony and backwards compatibility

Since adding table keys is pretty much the only backwards-compatible move JSON-like data producers can make a lot of the time, and since that only succeeds at not breaking people's programs when people do partial matching, I think partial matching by default will be for the best, not just for JSON APIs but for general data munging.

I believe the principle behind how patterns coincide with expressions is that when they coincide, the pattern should match only values that are backwards-compatible in some way with results of the expression, as understood in their role as inhabitants of the expression's type, and using (match-equality-test) (usually equal?) as the notion of backwards compatibility for types where it's unclear.

For associative data structures, thanks to their role as one of the more embarrassingly effective abstractions and their ubiquity in data formats, I think this notion of backwards compatibility is clear enough.

People haven't mentioned dicts much yet. I briefly wondered just now if dicts would be less obligated to stay consistent with notions of equal? for their backwards compatibility, since dicts aren't generally equal? to other dicts even if they have the same behavior in regard to the dict methods.

However, I do think it's important to have a hash pattern, specifically (or maybe a map pattern once that Rhombus interface exists). That's because dicts overlap with lists, so a lot of JSON-like data munging is going to accidentally reinterpret lists as associative collections if it uses dict patterns.

Still, I think every hash pattern (hash-whatever ...) should be equivalent to whatever the corresponding (and (? hash?) (dict-whatever ...)) pattern would do. That's because hash utilities don't tend to be very picky about what kind of hash they receive, so there's still an overall sense of cross-compatibility between different kinds of hashes even if there are exceptions here and there.

A design for identifier punning and optional grouping

Hmm, technically the ~@ syntax is part of an ellipsis DSL and serves only the purpose of grouping s-expressions in extra parentheses. It could be handy as an optional grouping construct here, so that you only need to group things if you want to and don't have to switch to another form to do it.

I don't think I actually like (~@ key value) as a grouping syntax for hash entries, though. It doesn't look right, and I'd love to find something that supports identifier-punning patterns like that {timestamp, ...xs} JavaScript example, as well as optional defaults.

After some iteration on this, I think [key => value] seems to pan out, and it looks a lot better.

Once we add a default to the mix, we could allow both [key ~? default => value] and [key => value ~? default]. The [key ~? default => value] form would come in handy when the value pattern was very long and the default was very short, like when the default was (hash) and the value was another whole hash pattern with its own defaults. The [key => value ~? default] form would be good for the also common situation in which each layer of data had multiple cases to branch on and thus only one layer of data was ever being matched at any given time.

For the JavaScript-style identifier punning, an entry could be written with the => value omitted, making it [key] or [key ~? default]. In this case, the value pattern would be inferred to be a variable name derived from key. For there to be a clear name to infer, the key expression must be one of the following, all of which would infer the value pattern foo:

A string "foo", either on its own or quoted with ' ` #' or #`.
A symbol foo, quoted with ' ` #' or #`.
A keyword #:foo, quoted with ' ` #' or #`.

Rest arguments can look like [rest ...], thus permitting [(? hash-empty?) ...] for exhaustiveness checking. Or we could have the #:full keyword for exhaustiveness checking too. Actually, I think I prefer #:final as the option. I think it signals the right "are you sure this hash won't be extended with more keys someday?" risk assessment. Only one rest entry or exhaustiveness check would be allowed in a pattern.

As per one of my goals here, all these entry syntaxes could be written like key => value without the grouping braces, like key => value. This enables some very concise inline hash patterns like (=>hash 'a => apples 'b => berries) and (=>hash 'a 'b 'c). The parse is still unambiguous since neither [=>] nor [~?] is in the quoted form that's required for a [key] entry.

The pattern I've described could easily work as an expression too. Just as in JavaScript, an expression like (=>hashalw 'a 'b 'c) would use the same kind of identifier punning to bundle up a set of local variables for easy debug printing or to facilitate a context-passing style.

A couple of things would be different in the expression form: The ~? default option wouldn't make sense and would not be supported, and multiple rest args would make sense here as a way to splice multiple hashes together. To control the splicing, it could have support for a #:combine/key argument like hash-union does.

I think that's pretty much all I would have wanted from this design. A major goal here was to make some of these division points less necessary. Some of that's fixable by tacking on keyword arguments until all the features are there, but I don't think that works a solution for grouping or identifier punning, since some notations like those are too concise and seamless to interrupt with a bunch of configuration options.

AlexKnauth · February 12, 2023, 3:30pm

I don’t want => arrows in the middle of key-value pairs. It’s unnecessary, it takes up space, and more space the more entries there are.

I also don’t want … ellipses involved in the syntax of the rest patterns. As an example with lists: (list 0 (list 1 2) …) and (list-rest 0 (list 1 2)) being different demonstrates that even on lists, … ellipses are not for rest patterns.

sorawee · February 12, 2023, 3:42pm

I forgot to mention that I experimented with both #:rest (? hash-empty?) and the current #:full‘s strategy. The former can be a tiny bit faster on a smaller matching hash table, but is significantly slower on a larger matching hash table. That’s why I stick with the current #:full strategy for now. We could have both and dispatch on an appropriate strategy based on the hash table size, but that’s an optimization that can be done later.

On keys as expressions: I don't think (== pat) to make "room to have keys that are patterns" would work nicely. If I have:

(define == add1)
(match (hash 1 2 2 3)
  [(hash* [(== 1) v]) v])

Is that matching against a pattern 1 or matching against a key value 2?

What could work, I think, is something like:

(match (hash 1 2 2 3)
  [(hash* [#:key-pattern (? odd? search-for-me) v])

which also demonstrates that keywords in a group are more versatile for future extension.

I’m not sure if I understand your proposal about the “JavaScript-style identifier-pruning”.

One fundamental difference between JavaScript and Racket is that a key must always be a string in JavaScript objects, but a key doesn’t need to be a string/symbol in Racket hash tables. The restriction on keys enables JavaScript to provide a nifty object literal syntax that puns identifiers as strings (or symbols in Racket). For example, {timestamp: 123} in JavaScript would be equivalent to (hash 'timestamp 123) in Racket, where ' is needed for the Racket counterpart. The same applies to "patterns". In JavaScript, {timestamp, ...xs} is a syntactic sugar for {timestamp: timestamp, ...xs}. The intention is that this makes it very convenient to find a value corresponding to the key “timestamp” and then bind it to a variable timestamp.

I don’t know what the behavior of (hash* [timestamp]) in your proposal is. It would be weird to be a syntactic sugar for (hash* ['timestamp timestamp]) (I’m sticking with my proposed syntax for now), since Racket doesn’t have this identifier/symbol pun anywhere else. It would make sense to be a syntactic sugar of (hash* [timestamp timestamp]), but I’m not sure how that is going to be useful, since it requires that timestamp already be bound outside match. Moreover, inside the branch, timestamp would get shadowed by the timestamp id pattern, which seems confusing to me.

I experimented with the idea of "JavaScript-style identifier-pruning" in my entry for the 2021 Syntax Parse Bee. But for that one, it's a separate form, so it doesn't need to stay consistent with other parts in the match ecosystem.

Topic		Replies	Views
Hash-table name? General	2	283	February 13, 2023
`match` for parsing into hash Questions & Answers	2	46	January 30, 2025
How to turn a list into a hash table? Questions & Answers question	2	583	October 31, 2022
Trying to learn macros, have question on code snippet Questions & Answers	4	296	January 4, 2022
Behavioural differences in `match` for splicing/single pattern variables Questions & Answers	10	133	February 27, 2025

RFC: hash table pattern matching

Keys as expressions

Pattern-expression harmony and backwards compatibility

A design for identifier punning and optional grouping

Related topics