"Filter" Match-Expander: Better Way to Achieve This?

Hi, Racket Discourse.

I've been looking for a way to "filter" on certain patterns in lists in match-forms, somewhat like list-no-order, but more general in the sense that one can specify "indeterminate" arbitrarily distributed patterns as opposed to ones which you know the multiplicity of beforehand.

What I mean, is that I can't say something like (list-no-order (cons x 1) ... rest ...). From what I understand, list-no-order is already "slow", for what that term's worth, so it makes sense.

What I've come up with, is the following:

I call the expander select but filter might be a better term.

(require (for-syntax racket/match
                     syntax/parse))

;; this is the expander I am referring to
(define-match-expander select
  (lambda (stx)
      (syntax-parse stx
        [(select PAT)
         (with-syntax ([ooo #'(... ...)])
         #'(app (lambda (lst) (for/list ([elem lst] #:when (match elem [PAT #t] [_ #f])) elem))
                (list PAT ooo)))])))

;; this is just a test of this for a particular use-case I am looking at
(define-match-expander hash-chain
  (lambda (stx)
    (syntax-parse stx
      #:datum-literals (×)
      [(hash-chain KEY × VAL)
       #'(hash-table (KEY (select VAL)))]
      
      [(hash-chain KEY × REST ...)
       #'(hash-table (KEY (select (hash-chain REST ...))))]

      [(hash-chain KEY VAL)
       #'(hash-table (KEY VAL))]
      
      [(hash-chain KEY REST ...)
       #'(hash-table (KEY (hash-chain REST ...)))])))

(define-match-expander hash-chain-and
  (lambda (stx)
    (syntax-parse stx
      [(hash-chain-and [CHAIN ...] ...)
       #'(and (hash-chain CHAIN ...) ...)])))

(match #hash((tenants . (#hash((name . john) (title . dr)
                               (accounts . (#hash((name . work)
                                                  (owed . 20)))))
                   
                         #hash((name . pete) (title . sir)
                               (accounts . (#hash((name . home)
                                                  (owed . 80))
                                            #hash((name . work)
                                                  (owed . 21)))))

                         #hash((name . jeff) (title . mr)
                               (accounts . (#hash((name . home)
                                                  (owed . 100))
                                            #hash((name . work)
                                                  (owed . 42)))))
                   
                         #hash((name . john) (title . mr)
                               (accounts . (#hash((name . play)
                                                  (owed . 53))))))))
  [(hash-chain
    'tenants × (hash-chain-and
                ['title title]
                ['name (and name (not 'john))]
                ['accounts × (hash-chain-and
                              ['owed (? (lambda (x) (< x 100)) owed)]
                              ['name account-name])]))
   (map list title name account-name owed)])
;; => '((sir pete (home work) (80 21)) (mr jeff (work) (42)))

I want to know if there is a "better" way to write select. It is perfectly adequate (from what I can tell), but I am curious.

Edit: prettified the example code a bit.

As an aside, the definition of select doesn't exclude empty matches as it is, which is probably not what you want in general. So, a modification that helps with this, is:

(define-match-expander select
  (lambda (stx)
      (syntax-parse stx
        [(select PAT)
         (with-syntax ([ooo #'(... ...)])
         #'(app (lambda (lst)
                  (define selected (for/list ([elem lst] #:when (match elem [PAT #t] [_ #f])) elem))
                  (and (not (empty? selected)) selected))
                (list PAT ooo)))])))

Consider a case like the above, but where you might have a different condition, such as

;; using previous definition
(match ...
  [(hash-chain
    'tenants × (hash-chain-and
                ['title      title]
                ['name       (and name (not 'john))]
                ['accounts × (hash-chain-and
                              ['owed (? (lambda (x) (>= x 100)) owed)]
                              ['name                            account-name])]))
   (map list title name account-name owed)])
;; => '((sir pete () ()) (mr jeff (home) (100)))

;; using "restricted" definition
(match ...
  [(hash-chain
    'tenants × (hash-chain-and
                ['title      title]
                ['name       (and name (not 'john))]
                ['accounts × (hash-chain-and
                              ['owed (? (lambda (x) (>= x 100)) owed)]
                              ['name                            account-name])]))
   (map list title name account-name owed)])
;; => '((mr jeff (home) (100)))

Edit: even better would be to just change ([ooo #'(... ...)]) to ([ooo #'(... ..1)]), but in any case, you get the idea.

There is the little problem that the procedure in (app (lambda (lst) …) …) assumes lst is a sequence; perhaps a guard like (and sequence? (app …)) would be appropriate. If you really mean to assume it's a list [you only talked about lists, not arbitrary sequences] then (and list? (app …)) and use in-list in the for form.

1 Like

Hi, @benknoble. Yes, you are correct. I am assuming that lst is a list. The use-case stems from querying JSON objects, so the Racket-ization of them results only in list in and hasheq structures. But, I can't see why I might not use something like this more generally. Thank you for the insightful comment.

Weirdly enough, I've--ever since I started programming in Racket--tended to avoid the constructs like in-???. No idea what that's about, but maybe because I've only recently started to appreciate these transformations as something to promote as opposed to something to be "afraid" of.

Makes sense. I suggested the (and list? guard especially to avoid errors when trying to match, say, #f.

I would say the in- specializations are a case of "what's the interface"? If I only ever expect a concrete datatype, using the specialization is good for performance. If the interface is more broad (say, sequence?), or if I'm too lazy to type it out… well, it's omitted.