Opinions on `quasiquote`..`unquote` matching up when not lexically together?

I've been building a library, Punctaffy, that has various notations similar to quasiquote, quasisyntax, etc. However, as I consider using these notations in macros' generated code, I notice something a lot like variable capture.

Here's a simple example without using the library at all, just using quasiquote and unquote:

#lang racket/base

(define-syntax-rule (test subform)
  `(subform ,(+ 3 4)))

(writeln (test (+ 1 2)))   ; prints "((+ 1 2) 7)"
(writeln (test ,(+ 1 2)))  ; prints "(3 7)"

Essentially, an unquote appearing in the argument of test matches up with the quasiquote in test's expansion result.

There are two ways I like to think about this kind of syntax for Punctaffy's purposes:

  • Occurrences of quasiquote and unquote match up like brackets. The ,(+ 1 2) on the last line has an unbalanced unquote bracket, which would ideally be detected as an error.
  • The quasiquote form is like a variable binding form, and unquote is like a variable reference. The ,(+ 1 2) on the last line is therefore like a reference to a variable that has no local binding, and it would ideally be detected as an error. The fact that it's currently matching up with the binding introduced by the quasiquote in test's expansion is the kind of variable capture issue that Racket would usually avoid by giving the quasiquote form's binding location an extra macro-introduction scope that the unquote form's reference location doesn't have.

I think I ran across this earlier on in Punctaffy's development but decided to imitate the quasiquote..unquote scenario at first so as to ease the analogy between Punctaffy's new notations and Racket's existing ones. As long as all usage sites avoid these "error" situations in the first place, the fact that they actually silently proceed with some behavior isn't so bad.

However, as I try to prepare Punctaffy to be a useful library with reasonable compile times and a variety of new quasiquote-shaped utilities, I find myself wanting to report this as an error more proactively rather than ignoring it. (In fact, I find myself wanting to track scopes on quasiquote and unquote for other reasons too, like maintaining additional variable bindings that are local to the region in between them.)

Is anyone attached strongly enough to the behavior of the example above that they'd consider it a mistake for me to diverge from it in Punctaffy? Any reason why?

I can imagine two reasons already:

  • Backwards compatibility. At least for notations in the base Racket distribution like quasiquote and quasisyntax, people might depend on this behavior already. If someone's defined a macro that expands into one of these, they've been able to count on this kind of variable-capture-like behavior without doing any explicit manipulation of scopes to achieve it. For this reason, I don't want to say Racket's notations should change, but it seems like Punctaffy has a chance to do something different.
  • Stylistic consistency. Notations like quasiquote and quasisyntax are far from the only Racket macros that use free-identifier=? to detect occurrences of literals. They're just some of the more insidious since the places capture can occur are arbitrarily deep in a term. If I go down a path where my macros stop detecting literals the usual free-identifier=? way and start treating them as local bindings that participate in the sets-of-scopes system, are Racket programmers going to find them surprising and inconsistent with the rest of the language?

Left to my own direction, I'm probably going to have Punctaffy treat this as an error even if that breaks from convention, but I'd like to check in and see what others think.

I’m a little confused about what you’re trying to say. I don’t see the behavior of (test ,(+ 1 2)) ;=> (3 7) as an error at all.

You’re literally doing a syntactic replacement, so it’s exactly what I would expect the result of the expression ``(,(+ 1 2) ,(+ 3 4))` to be. The quasiquote and unquotes aren’t unbalanced. The quasiquote distributes over the subexpressions just like multiplication over addition. If you had two unquotes in the same “branch” then it would be an error.

However there is precedence for other uses of the notation, e.g. in match. I don’t think you necessarily have to keep original Racket semantics of the notation in your own macros. If the behavior you want is very similar to the original Racket semantics though, it would be nice to not have unexpected behavior.

1 Like

I have no idea how to get that quasiquote inside of an in-line formatted block to properly typeset. I hope you know what I meant.

In CommonMark, you can use any number of backticks to surround inline code, and I think there's some kind of rule for trimming out one whitespace character (or more?) inside the backticks. You can write your example as `` `(,(+ 1 2) ,(+ 3 4))``.

I wrote that demonstration as ``` `` `(,(+ 1 2) ,(+ 3 4))`` ```, and I wrote this one with four backticks.

I don’t see the behavior of (test ,(+ 1 2)) ;=> (3 7) as an error at all.

Thanks, this is exactly the kind of perspective that I want to understand before taking Punctaffy in a direction I might regret. :slight_smile:

You’re literally doing a syntactic replacement, so it’s exactly what I would expect the result of the expression `(,(+ 1 2) ,(+ 3 4)) to be. The quasiquote and unquotes aren’t unbalanced.

Well, take this as an alternative example:

#lang racket/base

(define-syntax-rule (test2 subform)
  (let ([x "secret"])
    (list subform (string-append x " message"))))

(writeln (test2 (+ 1 2)))  ; prints "(3 "secret message")"
;(writeln (test2 x))       ; error

This time, the second of those lines is a compile-time error (correctly, IMO). But the "syntactic replacement" in this case results in:

(let ([x "secret"])
  (list x (string-append x " message"))

Which, on the surface like this, would seem to be well-formed code.

However, such a simple syntactic replacement isn't quite what define-syntax-rule does. It goes to some trouble to avoid allowing x to be captured like this. In Racket, it does so by attaching an extra macro-introduction scope to the lexical information of the macro-introduced occurrences of x.

Why shouldn't the same thing happen with quasiquote and unquote?