High-level explanation of how the expander's event stream is structured

@notjack

@mflatt @ryanc I'm attempting to give Resyntax more and better information from the macro expander, and to do so I'm taking a closer look at a lot of the log-expand events emitted by the expander. I'm aware of the "event lexer and parser" that the macro stepper uses[1], but I still don't understand a lot of the individual event types. And I definitely don't understand what the overall grammar is for event streams. Could I get a high-level explanation of how the expander's event stream is structured?
[1] https://github.com/racket/macro-debugger/blob/master/macro-debugger-text-lib/macro-debugger/model/deriv-tokens.rkt

in particular, what subsequence of events corresponds to a "single expansion step", in the sense of what expand-once does?

@ryanc

The events are designed to be enough information to reconstruct the macro expansion process as represented by the structures in "deriv-c.rkt". It's an internal protocol, likely to change when the expander changes or when the macro stepper changes. The derivation structures are also likely to change, but probably in more tractable ways.

@notjack

oh yes, I'm definitely aware I'm hacking on internals that are subject to change
so far I've been getting away with just plucking syntax objects out of 'visit events

@ryanc

You can experiment with the event traces and derivation structures with this:

(require macro-debugger/model/debug)
(trace-verbose? #t)
(trace #'(or 1 2) expand)

First line requires the necessary support. Second line prints out the type of event (but not the payload) as it is received. Third line calls the expand function on the given syntax, collects the events, and parses a derivation structure. It looks like expand-once doesn't work as a second argument; typically expand or expand/compile-time-evals is used.

@notjack

I put together my own tool that's similar to that already, actually

(require racket/pretty
         rebellion/collection/vector/builder)

(struct expansion-step (preceding-events visited-form) #:transparent)
(struct expansion-event (signature value) #:transparent)

(define (expansion-history stx)
  (define current-expand-observe (dynamic-require ''#%expobs 'current-expand-observe))
  (define steps (make-vector-builder))
  (define events-since-last-visit (make-vector-builder))

  (define (observe-event! sig val)
    (cond
      [(equal? sig 'visit)
       (vector-builder-add steps (expansion-step (build-vector events-since-last-visit) val))
       (set! events-since-last-visit (make-vector-builder))]
      [else
       (vector-builder-add events-since-last-visit (expansion-event sig val))]))

  (parameterize ([current-expand-observe observe-event!])
    (expand stx))

  (build-vector steps))

(pretty-print
 (expansion-history
 #'(module foo racket/base
     (+ 1 2 3))))

but it's definitely wrong because of things like local expansion. I can see that the event stream has a tree structure to it of some kind, it's not just a flat sequence.

@notjack

@ryanc

You can experiment with the event traces and derivation structures with this:

Okay the derivation structs here are helpful for trying to get the tree structure. The acronyms in the names make things... less helpful.

@ryanc

Right, the expander is like a big-step interpreter, and the derivation structures represent the derivation tree for a particular expansion.

@notjack

The macro stepper GUI has to step through that tree in a linear fashion, right? How do you get from the tree structure to something resembling a linear sequence of steps? And does the stepper skip over local expansion?

I'm only vaguely familiar with the big-step/little-step terminology in PL semantics. I kinda remember that from experimenting with redex like a decade ago, but I don't have much experience with it.

@ryanc

For example: (trace #'(#%plain-lambda (x) (#%plain-app add1 x)) expand) . There's some getting-started junk at the beginning, but line 6 is where the main internal expand function gets started. That visit event means "the macro expander is given this syntax as an expression/definition/body form to expand". Next is resolve (resolving the #%plain-lambda identifier to a binding). Next is enter-prim; we're about to expand a primitive syntactic form. Next is prim-lambda, which says the primitive in question is lambda. The structure that follows is specific to the lambda primitive. First, the formals list is checked and a renaming scope is created and applied (the lambda-renames event). Then the rest of the term is treated as a block and expanded according to the block rules (from enter-block to block->list). That corresponds to pass 1 of blocks' two-pass expansion; then pass 2 is from the enter-list event to exit-list. There are recursive calls to the main expand function within here: events 14-16 are an expand call that stops early because the expander recognizes an expression form in pass 1, and events 20-34 are the expansion of the application expression in the body (in pass 2). After expanding the body, the lambda primitive reassembles the lambda term, maybe applies some syntax properties, and returns (exit-prim/return).

Yes, the macro stepper parses the events into a derivation tree, then interprets the derivation tree as a sequence of updates to a term. When I took the Semantics class from Mitch Wand when I started grad school, we used a simple lambda-calculus variant (maybe PCF, maybe ISWIM, maybe some ad hoc unnamed language) and we gave a big-step (interpreter-like) semantics and a small-step (rewriting) semantics. Then you want to prove them equivalent. The macro stepper's derivation-to-reduction-sequence component follows the spirit of that proof. (It's more complicated when macro hiding gets involved.)

@notjack

Ah I see, so the calls back to the main expander happen during the expansion of the primitive lambda form. Do macro transformations also work like that? I notice that the macro stepper in drracket skips over the entire (#%plain-lambda (x) (#%plain-app add1 x)) form in a file with just #lang racket/base (#%plain-lambda (x) (#%plain-app add1 x)) in it, even when I disable all macro hiding.

@ryanc

You can think of a simple macro as a function that just computes a new term, without invoking the macro expander. The macro expander receives the result and immediately expands it. If a macro calls local-expand , then that call re-enters the macro expander.

@notjack

ahhh, so they're different from primitives in that way

@ryanc

Yes. Primitives are like macros that call local-expand in predictable ways, and whose results don't get re-expanded.

@notjack :red_exclamation_mark:

@notjack

Okay excellent that's very useful to know

I might try and make my own parser for the logged event streams, since Resyntax has very different needs from the macro debugger. I'll need to figure out which of the event types are delimiters that signal the start or end of some group of events. Do you have a list of just those event types? The lexer definition in the macro debugger doesn't make that clear, so I'm just guessing based on the names and on what the logged event streams look like. (edited)

@ryanc

I don't think there are tokens that act reliably as delimiters. I would not recommend trying to parse event streams. I would recommend starting with the derivation tree and pruning away the information you don't need.

@notjack

hmm. alright, I'll try that. I'm going to have quite a few questions for you though since a lot of the names of the derivation tree structs and their fields are very unclear to me.

@ryanc

no problem

@notjack

@ryanc Okay some questions so far:

  • Am I right to assume that in the root node struct, the z1 and z2 fields, when not false, are syntax objects and z1 expands into z2 ?,
  • What's a tagrule?,
  • In the ecte and lift-deriv structs, I see that there's first and second fields in each containing subderivations. Why are there two? What's the difference between them?,
  • Why does ecte have two lists of local actions, one in locals and one in locals2 ?,
  • What's the de2 field in the base derivation for?,

@ryanc

  • Re z1 and z2 : usually, but they can also be lists of syntax objects for block (bderiv ) and "list of expression" (lderiv ) nodes.|,
  • A tagrule represents making an implicit syntactic form explicit, such as (+ 1 2) becoming (#%app + 1 2) ; expansion continues with the second form. Likewise for #%datum and #%top .,
  • The expand/compile-time-evals procedure does two-pass expansion to try to mimic module-level and block-level expansion. (The top level is still hopeless, but this makes it slightly less hopeless.) The first field holds the first expansion pass (head-expansion), and the second field holds the second pass. The first locals appears (?) unused now; the second catches the effects of running the compile-time code (in the case of, eg, a define-syntaxes form).,
  • The lift-deriv node represents an expansion that lifted definitions using syntax-local-lift-expression . The lift-deriv node is where the lifted definitions were caught; the actual lifting happens somewhere within the first derivation. The second field is the re-expansion of the term after lifts are converted to definitions (or nested let-values for lift/let-deriv ).,
  • de1 is the "disarmed" version of the initial expression. I believe it should always be eq? to e1 , but it's possible that there are some early-stage syntax property adjustments that I'm forgetting about that de1 was also catching.,

@benknoble

Any volunteers to archive this info somewhere?
https://discord.com/channels/571040468092321801/893314076346826852/1387829848462065696

Copied and edited from original on Racket Discord by @spdegabrielle with permission from authors @notjack @ryanc.

4 Likes