One Kind of Binding?

Hello,
This is a topic that's come up from time to time in Qi meetings. I'm moving it out into a separate thread so that others can follow and chime in. It'd be nice to understand:

Could we have the same macro system we use in Racket today but without the need for two distinct kinds of bindings (value bindings and transformer bindings)? And could this provide some advantages?

I want to clarify that this is not about eliminating phases, which are great. It's about whether we can eliminate the dichotomy of bindings so that we just have simple variable binding and reference (at a given phase), and nothing more complicated than that.

For instance, in writing macros, we sometimes use syntax-local-value to look up the transformer binding of a particular identifier.

How many macrologists use syntax-local-value? It feels like an advanced concept.

But I would also ask: how many Racketeers evaluate variables? Why, everybody does!

Here's what I'm wondering: in every case where we use syntax-local-value, are we really just trying to write a variable name? Could it be that this is a basic programming facility that is made complicated by the presence of the value binding / transformer binding dichotomy?

Consider this example:

(begin-for-syntax
  (define abc #'5))

(define-syntax-parser mac
  [(_ q) q])

(mac abc)

This is a made-up example and it doesn't work.

But maybe it should!

Here, the identifier abc is bound in phase 1 to the value #'5.

In the use of the macro mac, we supply this same identifier, knowing that it has a phase 1 binding. As expansion of this macro is happening in phase 1, the expander should have access to this binding, we reason, and should be able to resolve it.

But this code issues the familiar complaint, "q: pattern variable cannot be used outside of a template."

Now, consider this very slight variation:

(begin-for-syntax
  (define abc #'5))

(define-syntax-parser mac
  [(_ q) abc])

(mac abc) ;=> 5

The only change is q to abc in the macro template expression, and it works.

Compare that to this:

(define abc 5)

(match abc
  [q q]) ;=> 5

... which also seems to support that q ought to be bound to abc in the first version.

Even though the second version works, the problem is, the reason we might be trying to do this is that we hope to indicate a phase 1 binding by name at the use site of the macro. But in this second version, during expansion, we're ignoring the input syntax entirely and just referring to the phase 1 binding independently. There is a disconnect here and it doesn't do the same thing as the first version, though it serves to illustrate the issue and one expectation of what should happen here.

The actual way to do this today is the following:

(define-syntax abc #'5)

(define-syntax-parser mac
  [(_ q)
   #:with val (syntax-local-value #'q)
   #'val])

(mac abc) ;=> 5

Note that, instead of an ordinary binding here, we define abc as a special binding -- a transformer binding, via define-syntax. Additionally, instead of simply referring to the variable by name, we introduce a new concept in the form of syntax-local-value to indicate that we are trying to get the transformer binding and not the ordinary binding of the variable.

Would it be possible instead to allow macros to refer to ordinary phase 1 bindings (and not raise a "pattern variable outside template" error)?

I know there are probably decades of Scheme tradition behind these choices and I'm sure there are good reasons, but sometimes, also, the reason is simply historical, so it would be good to know if that's the case here (and whether there's anything we can do about it).

Of course, the main actual purpose for define-syntax isn't to hold arbitrary values but specifically to refer to macros (i.e. syntax -> syntax functions)! The "Qi meeting notes" thread discusses one possible design to achieve that using the one-binding approach, via a (require (for-lang ...)) module-level directive. I don't know how feasible it is. There may be other ways.

I’m not sure I understand what the exact idea is, but there are two things that I want to note.

  1. To refer to a phase 1 value binding by identifier, as opposed to phase 0 meta binding, you can always use syntax-local-eval. The use of syntax-local-eval is “less evil” than normal eval, because it’s the only way to communicate phase 1 expressions to be evaluated in macros (because we can only deal with syntax objects), as far as I understand.
  2. The name define-syntax is historical, but its purpose is not historical. Meta bindings bridge between phase 0 and phase 1, in that they can be referred to (and, apparently, imported) at phase 0 while their values can be retrieved at phase 1. It’s very general, and Racket macros only happen to be one of its use. This is why I like the Rhombus name meta.bridge more.

I’m not seeing how the example relates to bindings, in fact. Whether or not q is a special pattern variable, it’s not a variable reference to abc. If q should be a variable reference to abc, then should #'abc also be? Or is it saying that q should be bound in such way that it’s turned into a syntax-local-eval? (Code injection moment, some may say :P)

1 Like

Following up on what @usao said, I think it's important to distinguish two things. One is the distinction between phase 1 (expansion time) and phase 0 (run time). That is, as you say, very important to Racket. But if we have that distinction, we need something else, which is the ability to connect these two phases, otherwise phase 1 can't do anything. define-syntax is that connection -- the whole point is that it binds things at phase 0.

There are other ways to create this connection, such as Template Haskell style splices, but that's fundamentally just making the connection narrower.

A different thing you could want would be to automatically infer whether you are referring to a define-syntax or define-for-syntax binding and insert syntax-local-value automatically. That could maybe work but I think has the obvious drawbacks of implicitness.

3 Likes
  1. To refer to a phase 1 value binding by identifier, as opposed to phase 0 meta binding, you can always use syntax-local-eval. The use of syntax-local-eval is “less evil” than normal eval, because it’s the only way to communicate phase 1 expressions to be evaluated in macros (because we can only deal with syntax objects), as far as I understand.

Adding this one to my toolbelt, thanks!

I’m not seeing how the example relates to bindings, in fact. Whether or not q is a special pattern variable, it’s not a variable reference to abc. If q should be a variable reference to abc, then should #'abc also be? Or is it saying that q should be bound in such way that it’s turned into a syntax-local-eval? (Code injection moment, some may say :P)

In a way, yes, I'm saying we could treat q as (syntax-local-eval q) on the right-hand-side here. But I feel this would be an implementation detail in relation to the current implementation.

I would expect in your example that, as the template body (inside the syntax quote) is evaluated at phase level one higher than the pattern matching, the references there would be to variables at phase 0. So in this case, #'abc would refer to a variable abc in phase 0 (as it does today), and q (unquoted), as it is pattern-bound in phase 1, is a reference to abc in phase 1. And in this case, we could allow it rather than raise the "pattern variable cannot be used outside template" error.

A different thing you could want would be to automatically infer whether you are referring to a define-syntax or define-for-syntax binding and insert syntax-local-value automatically. That could maybe work but I think has the obvious drawbacks of implicitness.

Yes, agreed. We don't want anything magical like this!

Since this is a spin-off thread, I'll summarize and synthesize what has been said before:

Today, we overload the idea of bindings to achieve the phase1/phase0 connection. Of course, overloading bindings for this purpose might just be the most convenient way to do this in practice. But I don't know that personally, and I'm seeing enough inconveniences, like these:

  • the need for syntax-local-value
  • the inability to use syntax-local-value in tests running at phase 0
  • more generally, the difficulty of testing macros without special phase-shifting infrastructure like the syntax/macro-testing library

Basically, why can't we just write macros as ordinary syntax → syntax functions that we test like ordinary functions? And why not convey compile time information through ordinary phase 1 bindings? That would address all of these points above and avoid the need for infrastructure we've built around the awkwardness of transformer bindings.

So it would be helpful to understand, given the cost of overloading bindings to achieve the connection, what are the benefits of doing so?

And it also warrants asking: can we establish the phase0/phase1 connection through some other means that would avoid these costs, preserving those benefits?

One option could be the for-lang module-level mechanism mentioned in the other thread -- which it seems, assuming it's feasible, does achieve these above benefits. The idea is that (require (for-lang ...)) tells the expander that all imported definitions are to be treated as macros in expanding the source module.

But the tradeoff, at least on the face of it, is that we can no longer define macros in the same module as the code they transform. @hendrikboom3 suggested that one simple way to recover that ability is to write define-syntax as a macro expanding to a submodule + (require (for-lang ...). There may be other ways, like having define-syntax expand to:

(begin-for-syntax
  (define g <transformer-value>)
  (compiler-hook-insert f g))

(pseudocode stolen from @benknoble )

This is no doubt similar to how define-syntax already works under the hood, modulo that g is a regular binding here and not a special kind, and the expander has been informed that g should be treated as a macro in expanding the module.

To summarize this summary: can we have the directives to the expander regarding definitions in a module (i.e. the phase1/phase0 link) be decoupled from the definitions themselves? And would that get us the benefits we hope to gain (without incurring other untenable costs)?

I think part of the story here is that, while in one sense a macro transformer is an ordinary function, in another sense it a callback for the expander. Some of the difficulties with testing are similar to trying to test any callback function outside of the context in which it expects to be called. The expander takes certain actions around each transformer invocation, like adjusting scopes to implement hygiene, and transformers can access functionality from the ambient expander with functions like syntax-local-lift-expression.

I think it is different than what you have in mind, but you might be interested in the Ghuloum & Dybvig proposal for implicit/inferred phase levels in R6RS, which the majority of R6RS implementations have adopted. Those implementations have tended to muddy instantiation and phase separation in a way I find unappealing (see the response to formal comment 92). It's an open question whether it would be possible to implement inferred phase levels with The Separate Compilation Guarantee, and it seems like it would be a lot of work for questionable benefit, but I suspect it might in fact be possible. If this is of interest to you, I wrote some more here (with thanks to @samth for finding old discussions), though I need to reply again when I get some time …

2 Likes

In fact, yes you can! The macros ARE ordinary syntax? -> syntax? functions, and define-syntax does have a SINGLE job, which is informing the expander to invoke a function when seeing a particular identifier during expansion. That we can write (define-syntax (m stx) ...) is just a shorthand for (define-syntax m (lambda (stx) ...)) where the right-hand side of define-syntax is ordinary Racket expression that is evaluated at phase 1 when expanding phase 0 syntaxes.

Here is an illustration:

#lang racket

(begin-for-syntax
  (require syntax/parse/pre)
  (define (M stx)
    (syntax-parse stx
      [(_ expr)
       (printf "M ~s\n" #'expr)
       #'expr])))

(define-syntax m M)

(m 123)

You can test macros just like how you're testing ordinary functions as M above. The problem is that there's no real easy machinery to validate the correctness of a syntax object.

Better macro testing could be nice, but I suspect there are less nuclear ways to achieve that than getting rid of the syntax/value binding distinction.

I'm of the opinion that tests for a piece of code should exercise that code in the same manner that actual users of that could would do so. So for macros, testing the effects of using the macro seems more useful to me than testing it as if it's an ordinary syntax-to-syntax function. That's not how client code interacts with it, so I don't think that should be how tests interact with it either.

1 Like

Meta: an originating thread from Qi's meeting notes.


I have not completely followed the abc/syntax-local-value problems, but I think others have covered my own opinions pretty well. That is, there might be a way to do this that that looks like define-syntax as insert-compiler-hook and for-lang as a (possibly implicit) way to insert it.

I especially agree with @shhyou and @notjack 's characterizations: I suggested a similar implementation strategy, and I suspect one motiviation for Sid's desire for a simpler (?) model is that it would make testing Qi's deforestation pipeline easier. In particular, there we care about the results of expansion and the clients' exercise of the macros (in that we should get the same results through a completely different interpretation of the surface code).

2 other open not-so-open (?) questions, in my mind:

  1. If functions manipulating syntax may eventually become macros, they will still need a rich API of syntax operators. You might be able to make syntax-local-value and syntax-local-eval implicit (vis. evaluation of bindings and expressions at the runtime phase of the ordinary function that happens to manipulate syntax), but I don't think you could remove, e.g., syntax-local-lift-require and the rest of the rich API, nor should you want to.

    Perhaps that's not a real obstacle or part of the original proposal, so feel free to ignore me—but syntax-local-lift-require does get back to the testing thing for me. How to test that it works? Especially after my recent trouble integrating it with Frosthaven Manager (see also needing a new testfile to check the results of test programs). I've landed on something like @notjack proposes, which is exercising client code. In my case, I needed to not only check that #lang code compiled as before but now also check the values it computes. This is probably obvious to advanced macrologists, but it's taken me some experimentation and hard-won bugfixes to get here. (Another chapter in my to-be-written book about building large GUI programs?)

  2. What happens for things like (require (for-lang (only-in racket/base add1))) (add1 4)? Presumably that's a expand-time contract-style error that add1 only accepts numbers and was given syntax? (I think that would be on-par with today.)

I don't see how your proposal actually changes anything in a way that gets the benefits you say.

In particular, you can write for-lang today:

(define-syntax-rule (require/for-lang id mod)
   (begin (require (rename-in mod [id tmp]))
          (define-syntax id tmp)))

But this doesn't make anything easier to test or obviate the need for syntax-local-value.

The problem is that the two kinds of bindings you identify, bindings to ordinary runtime variables and bindings to macros, are both fundamental. The expander obviously needs to know whether something is a macro or not, and if it is, what code to run. That's what define-syntax does, and it's what for-lang would do, and it's what your compiler-hook-insert would do. So there have to be these two kinds of binding. And once you have that, then you start wanting to access it programmatically with syntax-local-value, etc etc.

If the only way you can bind macros is in require, then many uses of syntax-local-value go away, but that's because you can't write the kinds of things that we use it for today. For example, struct communicates to match and provide and struct-copy by expanding to define-syntax. If you got rid of define-syntax, then that wouldn't work. My guess is you're imagining that the information about the struct would be in a define-for-syntax binding. Then you'd have to answer the following question: how do you go from the name of the struct to the information. There are a few options:

  1. There's one big define-for-syntax hash table, keyed by the symbol name of the struct. This breaks shadowing, hygiene, renaming, etc.
  2. There are individual bindings for each struct, with names constructed unhygienically. Also bad for the same reasons.
  3. One big hash table, but this time keyed by the identifier name of the struct, which could be bound at runtime to the runtime struct info. This works better and is sometimes the right idea, but it's a lot of work to maintain the hash table because you're effectively re-implementing the environment that the expander has (this is what my Scheme Workshop 2009 paper is about).
  4. A define-for-syntax binding, plus a way to go from the phase 0 binding to that phase 1 value and/or binding. Now you get the benefits of (3), basically, but the expander maintains the hash table for you. Chez Scheme has an operation like this (called define-property). This is nice, but it's a fairly complex feature to add directly.
  5. Just allow programmatic access to the environment that's already there, rather than adding more complex API on top of it. Then you can build something like define-property yourself. That's syntax-local-value.

One of the other things you talk about is the challenge of testing. I think the above discussion points to why this is hard -- there's a separate data structure that has to be set up in order for your tests to work. This is a pretty common issue in unit testing, and broadly it has two solutions. One is "run some initialization code to set things up before your tests". That's the syntax/macro-testing approach. The other is "make the things you want to test not depend on the environment". This can work for some kinds of program generators, eg a regular expression engine with a fixed set of constructs. Unfortunately, it can't work in genera -- the way that match or <> are influenced by bindings is fundamental to the specification. Otherwise defining separate functions with define-for-syntax and unit-testing them would solve the problem already.

3 Likes

The challenges of testing macros are real (even apart from the issues of phasing discussed elsewhere). Basically, they are the same problems as testing any optimizing compiler, which are:

  1. You want to ensure that the generated code contains the optimizations you were trying to do (which means just checking behavior is insufficient) and
  2. You want to avoid depending on all the details of the generated code (which means just comparing the final results to a reference version is too fragile).

There are lots of ways that different compilers handle this dilemma. A few are:

  1. Racket (and Chez) has a lot of tests that two different things compile to the same result. This works well if you can express the optimization at the source level, but not otherwise.
  2. If you have a IR at an appropriate level, you can check that you get the right kind result after that pass, which is in between fragility and precision. You can also combine this with (1) by comparing the results after that pass.
  3. You can write a predicate over the output, like "does it contain some particular function or instruction".
  4. You can test a behavioral property that your optimization is supposed to produce (like making the program run faster than a similar one where the optimization does not apply).

Racket mostly uses 1, 1 combined with 2 (especially for cp0 and for the Racket BC compiler), and 4. I know LLVM does all of these. I'm sure there are more strategies as well.

5 Likes

I've been using Racket for years, and a couple of months ago I used syntax-local-value in a macro where I wanted to get struct information. Copying verbatim from my file:

(define-syntax (define-struct stx)
  (syntax-parse stx
    [(_ type-name struct-name converter-name struct-parent struct-parent-converter fields ...)
     (define sp (syntax-e #'struct-parent))
     (define si (extract-struct-info (syntax-local-value (car sp))))
     ;; make struct-parent-accessors that looks like this:
     ;; '(-chart-AudioFile -chart-BackgroundFile ...)
     ;; need to reverse the order of this before building it
     ;; to be honest I spent a day trying to make this block work
     ;; and when it worked I literally did not believe it working in front of my eyes
     (define struct-parent-accessors-syntax (reverse (cadddr si)))
     (define struct-parent-accessors
       (for/list ([as struct-parent-accessors-syntax])
         (datum->syntax as `(,(syntax-e as) parent))))

While I could probably tell you what the purpose of this is, I could not tell you how I managed to figure out how to write this. I'm not sure what syntax-local-value does here. I think it gets a variable value? But it's special??

Macros are so confusing...

1 Like

Ah, right! Thank you for pointing that out. That's useful to keep in mind.

As Ben mentioned, one of the original reasons this came up was the need for complicated phase shifting in testing macros in Qi's deforestation, which must ensure not just the result of the computation but also how it was done, i.e. that the appropriate code transformation to optimized / simplified code was performed. Although this seems like a specialized case, in a way, this specialized case reveals that when we test macros in general by just using them, we are really writing more of an integration test than a unit test. But in this example, we are looking specifically to write these rare unit tests for the syntax transformations in isolation.

Here's one example of that. As you can see, we need to use phase1-eval and it was a bit fiddly to get it working, as I recall.

We could potentially achieve that without phase shifting magic using your observation, something like this:

mac-impl.rkt - containing a syntax → syntax function:

(provide where)

(require syntax/parse
         (for-template racket/base))

(define (where stx)
  (syntax-parse stx
    [(_ expr bindings) #'(let bindings expr)]))

mac.rkt - reproviding it as a macro:

(provide where)

(require (prefix-in m: (for-syntax "mac-impl.rkt")))

(define-syntax where m:where)

main.rkt - using the macro:

(require "mac.rkt")

(where (+ a b) ([a 1] [b 2])) ;=> 3

test.rkt - testing it as an ordinary function:

(require syntax/parse/define
         rackunit
         "mac-impl.rkt")

(define-syntax-parse-rule (test-datum-equal? desc a b)
  (test-equal? desc
               (syntax->datum a)
               (syntax->datum b)))

(test-datum-equal? "swaps order of expression and bindings"
                   (where #'(where (+ a b) ([a 1] [b 2])))
                   #'(let ([a 1] [b 2]) (+ a b)))

Good to know!

There's a lot in your response that I'll need to think more about @samth I appreciate your thorough analysis! To respond to a specific point:

My guess is you're imagining that the information about the struct would be in a define-for-syntax binding. Then you'd have to answer the following question: how do you go from the name of the struct to the information.

Yes, that's right.

Could something like this work:

macro module:

(provide decrypt)

(define (my-decrypt key text)
  ...
  plaintext)

(define-syntax-parser decrypt
  ;; note `k` is not syntax-quoted here
  [(_ k code) #:with key k
   #'(my-decrypt key code)])

use-site module:

(require macro-module)

(begin-for-syntax
  (define key #'"sosecret"))

(call-with-input-file "/tmp/tmp.txt"
  (λ (port)
    (displayln
      (decrypt key (read-line port))))) ;=> "attack at dawn"

In the use module, there is no phase 0 binding for the identifier key, but there is a phase 1 binding. In the macro module, couldn't the k carry the source module scope so that its binding in phase 1 there is discovered?

I have only an armchair understanding of set-of-scopes, but from a skim of Matthew's 2014 paper, I found this:

Having a distinct “root” scope for each phase makes most local
bindings phase-specific. That is, in

(define-for-syntax x 10)
(let ([x 1])
  (let-syntax ([y x])
   ....))

the x on the right-hand side of let-syntax sees the top-level
phase-1 x binding, not the phase-0 local binding.

… which appears to suggest that indeed, key in the use-site would see the phase 1 key binding. Does that mean that the unquoted k binding in the macro would in fact have that scope on it? But perhaps, a totally separate mechanism detects it as a pattern variable and triggers the "pattern variable cannot be used outside a template" error? If so, could abstaining from raising this error (at least in the case where a binding is resolvable) avoid the need for syntax-local-value in this case?

In any event, I've put your 2009 paper on my reading list, along with the 2002 one by Matthew on phases, which has been on my list for a long time.

I think this case here makes the argument for a nicer testing utility for macros:

(define-syntax (check-expands-to stx)
  (syntax-parse stx
    [(_ input expected-expansion)
     #`(begin
         (define-namespace-anchor anchor-for-check-expands-to)
         (define actual-expansion
           (parameterize
               ([current-namespace (namespace-anchor->namespace anchor-for-check-expands-to)])
             (expand-syntax-once #'input)))
         #,(syntax/loc stx
             (check-equal? (syntax->datum actual-expansion)
                           (syntax->datum #'expected-expansion))))]))

(module+ test
  (define-syntax (m _) #'(void))
  ;; this check passes
  (check-expands-to (m) (void))
  ;; this check fails and prints out the actual expansion
  (check-expands-to (m) (void 42)))

You can already write macros like the ones you describe:

#lang racket

(begin-for-syntax
  (define x (+ 1 2)))

(define-syntax (m stx)
  (syntax-case stx ()
    [(_ id)
     #`(let-syntax ([n (λ (stx) #`(quote #,id))])
         (n))]))

(m x) ; expands to '3

So you could imagine a version of struct that worked like this:

(define-struct point (x y))

And then you'd write:

(define (manhattan-distance v)
  (match v
    [(point a b) (+ a b)])
(manhattan-distance (make-point 1 2))

But you could not make point the constructor (because it's not a binding at phase 0 at all).

I nerdsniped myself into making a simpler version of this:

(define-syntax (check-expands-to stx)
  (syntax-parse stx
    [(_ input (~and expected-expansion (form:id . _)))
     (define actual (local-expand #'input (syntax-local-context) (list #'form)))
     (quasisyntax/loc stx
       (check-equal? (syntax->datum #'#,actual)
                     (syntax->datum #'expected-expansion)))]))

(module+ test
  (define-syntax (m _) #'(void))
  ;; this check passes
  (check-expands-to (m) (void))
  ;; this check fails and prints out the actual expansion
  (check-expands-to (m) (void 42)))
1 Like

Use quote-syntax instead of syntax (#') to turn a syntax object into a constant expression. The syntax form is a request to interpret the term as a syntax template.

(check-expands-to (quote (x ...)) (quote (x ...)))
;; => syntax: no pattern variables before ellipsis in template
3 Likes

Actually, it turns out I can just use plain ol' quote:

(define-syntax (check-expands-to stx)
  (syntax-parse stx
    [(_ input (~and expected-expansion (form:id . _)))
     #:with actual (local-expand #'input (syntax-local-context) (list #'form))
     (syntax/loc stx
       (check-equal? 'actual 'expected-expansion))]))

I'm approaching this from the perspective of Scheme's guiding principle:

Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary.

So personally, I'm more interested in discovering ways to use (and identifying obstructions to using) general concepts to work with macros than in developing more specialized infrastructure for the purpose.

1 Like