How ellipsis works in macro?

The ellipsis seems to work in the most obvious way, but it is still difficult to understand for me. Below I will explain my current understanding and provide a naive translation to map, but there are quite a few examples do not follow this translation. So I hope someone could clarify the precise behavior of ellipsis.

Intuitively, ellipsis works like list and map, for example,

(define-syntax-rule (foo arg ...)
  (list '(+ 42 arg ...)))

I look the arg ... in the pattern as a list (give it a notation arg*) and look arg ... in the template as map. So the foo macro can be informally translated to:

`(list (+ 42 ,@(map (λ (arg) arg) arg*)))

When the macro be called,

(syntax->datum (expand #'(foo 1 2 3)))
;; (#%app list '(+ 42 1 2 3))

It is equivalent to

`(list (+ 42 ,@(map (λ (arg) arg) '(1 2 3))))
;; '(list (+ 42 1 2 3))

It works as expected.

The 2nd example,

(define-syntax-rule (bar arg ...)
  (list '(+ 42 arg) ...))

(syntax->datum (expand #'(bar 1 2)))
;; '(#%app list '(+ 42 1) '(+ 42 2))

It can be translated to:

`(list ,@(map (λ (arg) `(+ 42 ,arg)) arg*))

`(list ,@(map (λ (arg) `(+ 42 ,arg)) '(1 2)))
;; '(list (+ 42 1) (+ 42 2))

The 3rd example,

(define-syntax-rule (baz (arg1 ...) (arg2 ...) ) 
  (list '(+ 42 arg1 arg2) ...))
  
(syntax->datum (expand #'(baz (1 2) (11 22))))
;; '(#%app list '(+ 42 1 11) '(+ 42 2 22))

There is two ... in the pattern, so use map with varargs.

It can be translated to:

`(list ,@(map (λ (arg1 arg2) `(+ 42 ,arg1 ,arg2)) arg1* arg2* ))

`(list ,@(map (λ (arg1 arg2) `(+ 42 ,arg1 ,arg2)) '(1 2) '(11 22)))
;; '(list (+ 42 1 11) (+ 42 2 22))

N.B. The following macro is illegal, that is nice.

(define-syntax-rule (baz arg1 ... arg2 ...) 
  (list '(+ 42 arg1 arg2) ...))

The 4th example,

(define-syntax-rule (quz (arg1 ...) (arg2 ...) ) 
  (list '(+ 42 arg1 ...)))
  
(syntax->datum (expand #'(quz (1 2 3) (11 22 33))))
;; '(#%app list '(+ 42 1 2 3))

It can be translated to:

`(list (+ 42 ,@(map (λ (arg1 arg2) arg1) arg1* arg2*)) )

`(list (+ 42 ,@(map (λ (arg1 arg2) arg1) '(1 2 3) '(11 22 33))) )
;; '(list (+ 42 1 2 3))

The 5th example,

(define-syntax-rule (quux (arg1 ...) (arg2 ...) ) 
  (list '(+ 42 arg1 ...) (+ 43 arg2 ...)))
  
(syntax->datum (expand #'(quux (1 2 3) (11 22 33))))
;; '(#%app list '(+ 42 1 2 3) (#%app + '43 '11 '22 '33))

It can be translated to:

`(list (+ 42 ,@(map (λ (arg1 arg2) arg1) arg1* arg2*)) 
       (+ 43 ,@(map (λ (arg1 arg2) arg2) arg1* arg2*)))
       
`(list (+ 42 ,@(map (λ (arg1 arg2) arg1) '(1 2 3) '(11 22 33))) 
       (+ 43 ,@(map (λ (arg1 arg2) arg2) '(1 2 3) '(11 22 33))))
;; '(list (+ 42 1 2 3) (+ 43 11 22 33))

The 6th example,

(define-syntax-rule (quuz (arg1 ...) (arg2 ...) ) 
  (list '(+ 42 arg1 arg2 ...) ...))
  
(syntax->datum (expand #'(quuz (1 2) (11 22))))
;; '(#%app list '(+ 42 1 11 22) '(+ 42 2 11 22))

It can be translated to:

`(list ,@(map (λ (arg1 arg2) `(+ 42 ,arg1 ,@(map (λ (arg1 arg2) arg2) arg1* arg2*))) arg1* arg2* ))

`(list ,@(map (λ (arg1 arg2) `(+ 42 ,arg1 ,@(map (λ (arg1 arg2) arg2) '(1 2) '(11 22)))) '(1 2) '(11 22) ))
;; '(list (+ 42 1 11 22) (+ 42 2 11 22))

The 7th example,

(define-syntax-rule (corge (x y) ... ) 
  (list (list x ... y ...)
        (list x y) ...))

(syntax->datum (expand #'(corge (1 2) (11 22))))
;; '(#%app list (#%app list '1 '11 '2 '22) 
;;              (#%app list '1 '2) (#%app list '11 '22))

It can be translated to:

`(list (list ,@(map (λ (e) (car e)) (x y)*)
             ,@(map (λ (e) (cadr e)) (x y)*)
       ,@(map (λ (e) (list (car e) (cadr e))) (x y)*)))

`(list (list ,@(map (λ (e) (car e)) '((1 2) (11 22)))
             ,@(map (λ (e) (cadr e)) '((1 2) (11 22))))
       ,@(map (λ (e) (list (car e) (cadr e))) '((1 2) (11 22)) ))
;; '(list (list 1 11 2 22) (1 2) (11 22))

Following this naive translation, I guess the behavior of the ellipsis is:

When a pattern matches a syntax object, Racket will collect all the matched argN* in the same level into a list (I said "the same level", because I don't know how nested ellipsis work in pattern, e.g. ((a b ...) ...)) and how they interact with ... in template.

When instantiating a template, when encountering a ..., Racket will apply map to the list which previous collected and a lambda which constructs syntax object before ... in the template by selecting corresponding pattern variables..


The first question: Is this explanation correct?

I have some counterexamples that violate this explanation.

Counterexamples 1:

(syntax->datum (expand #'(quuz (1 2) (11 22 33))))
;; '(#%app list '(+ 42 1 11 22 33) '(+ 42 2 11 22 33))

But the naive translation version report error:

`(list ,@(map (λ (arg1 arg2) `(+ 42 ,arg1 ,@(map (λ (arg1 arg2) arg2) '(1 2) '(11 22 33)))) '(1 2) '(11 22 33)))
; ERROR: map: all lists must have same size

So the underlying algorithm of ellipsis is not using the standard map.

Counterexamples 2:

(define-syntax-rule (ce1 (arg1 ...) (arg2 ...) ) 
  (list '(+ 42 arg2 ...) ...)) 
;; ERROR: too many ellipses in template

However, following the naive translation, it should not complain.

`(list ,@(map (λ (arg1 arg2) `(+ 42 ,@(map (λ (arg1 arg2) arg2) arg1* arg2*))) arg1* arg2* ))

`(list ,@(map (λ (arg1 arg2) `(+ 42 ,@(map (λ (arg1 arg2) arg2) '(1 2) '(11 22) ))) '(1 2) '(11 22) ))
;; '(list (+ 42 11 22) (+ 42 11 22))

Comparing with the 6th example, the Counterexamples 2 just lacks arg1 in the body.

Because of these two counterexamples, I think my conjecture was not correct.

Can someone explain the precise behavior of ellipsis? Or correcting my mistake.

Thanks.

1 Like

quuz should be translated to

`(list ,@(map (λ (arg1) `(+ 42 ,arg1 ,@(map (λ (arg2) arg2) arg2*))) arg1*))

because arg2 ... is in inner parentheses.

(list '(+ 42 arg2 ...) ...)

is wrong, because arg2 in arguments has one level of ellipsis, but here you put it twice. In your naive translation you use arg1, but there are no arg1 inside (+ 42 arg2 ...), so it cannot be used in expansion.

I don't know how nested ellipsis work in pattern, e.g. ((a b ...) ...)) and how they interact with ... in template.

It is the same.

(define-syntax-rule (corge (x ...) ... ) 
  (list '(list 'a x ...) ...))

> (corge (a b) (c d))
'((list 'a a b) (list 'a c d))

x has two ellipsises in arguments and two ellipsises around its expansion.

1 Like

The corner cases of ellipses are strange. Here's my favorite example:

> (with-syntax ([(x ...) '(1 2 3)])
    #'((x x ...) ...))
#<syntax ((1 1 2 3) (2 1 2 3) (3 1 2 3))>

The short explanation is that for ellipses you only map over the variables needed by the subtemplate. But different occurrences of a variable might be needed at different ellipses. So the translation of my example is something like this:

(define xs '(1 2 3))
(map (lambda (x1)
       (cons x1 (map (lambda (x2) x2) xs)))
     xs)

Here's how I would think about it. You can build the translation for a template bottom-up. The translation needs to know the ellipsis depth of each pattern variable. The translation includes the code but also a list of unsatisfied requirements. Let's use a slightly more complicated example:

(with-syntax ([(x ...) '(a b)]
              [((y ...) ...) '((1 2) (3 4 5))])
  #'((x (x ...) y ...) ...))

The ellipsis depth of x is 1 and the ellipsis depth of y is 2.

The subtemplate x is translated to the code, say, x1 and it has the unsatisfied requirement that says "x1 comes from x with depth 1"; that is, the unsatisfied requirements are {(x1, x, 1)}.

To translate the ellipsis subtemplate (x ...), we look at those requirements. (If there are no unsatisfied requirements, like (1 ...), that's an error.) Each requirement introduces a variable in the map function. If the remaining depth is 1, then the list argument comes from the variable that holds that pattern variable's value; let's call it xs here. So this template is translated as (map (lambda (x1) x1) xs). We calculate the new unsatisfied requirements by decrementing each depth and dropping any that have reached 0. So this template has no unsatisfied requirements.

The subtemplate y is translated to, say, y2 and it has the unsatisfied requirements {(y2, y, 2)}.

The translation the subtemplate (y ...) is similar to (x ...), except that the depth of the requirement isn't 1, so we have to create a new variable for the map list argument (let's use y1) and instead of dropping the unsatisfied requirement, we adjust it to (y1, y, 1)---that is, we need to bind y1 from the values of y, and there's 1 level of nesting that we haven't discharged yet. The code is (map (lambda (y2) y2) y1).

The subtemplate (x (x ...) y ...) combines the subtemplates we've talked about so far. We combine the expressions using cons and list. We just union the unsatisfied requirements; there are two, for x1 and for y1. Finally, the whole template ((x (x ...) y ...) ...) is a map with the expression for the previous subtemplate in the function body, x1 and y1 as the function arguments, and xs and the variable holding the y matches (say, ys) as the list arguments. There are no remaining unsatisfied requirements, so the template is okay.

3 Likes

I used arg1* and arg2* instead of just arg1* or arg2* is because

  1. Yes, the arg2 is in inner parentheses, but its (I mean the pattern variable arg2) depth is still 1, which has the same depth as arg1. So I think using both arg1* arg2* as map's argument doesn't affect the final result.

    N.B. As @ ryanc pointed that "ellipsis depth" is about pattern variable, not variable in template. I don't think the arg2 ... in inner parentheses does matter.

  2. It allow us to blindly apply the map to list arguments. That would be more convenient and mechanical. I know that, in the quuz example, the inner ... only acts on arg2 (a subtemplate before ...) , but the subtemplate could potentially include arg1. For example,

    (define-syntax-rule (quuz2 (arg1 ...) (arg2 ...) ) 
      (list '(+ 42 arg1 (arg1 arg2) ...) ...))
    

    Using both arg1* arg2* as map arguments is more flexible.

     `(list ,@(map (λ (arg1 arg2) `(+ 42 ,arg1 ,@(map (λ (arg1 arg2) `(,arg1 ,arg2)) arg1* arg2*))) ag1* arg2* ))
    

The expression (list '(+ 42 arg2 ...) ...) you mentioned occurs in the "Counterexamples 2" in the topic. I suppose we are discussing the same thing.

(define-syntax-rule (ce1 (arg1 ...) (arg2 ...) ) 
  (list '(+ 42 arg2 ...) ...)) 
;; ERROR: too many ellipses in template

This seems to reflect that the underlying algorithm of ellipse is actually based on bottomup (as @ryanc said) rather than topdown. If it was topdown, it would firstly collect all the matched argN* in depth 1, i.e. arg1* arg2*, then when encountering an outermost ..., it apply map to arg1* arg2*. So the translation would be:

`(list ,@(map (λ (arg1 arg2) `(+ 42 ,@(map (λ (arg1 arg2) arg2) arg1* arg2*))) arg1* arg2* ))

;; An example:
`(list ,@(map (λ (arg1 arg2) `(+ 42 ,@(map (λ (arg1 arg2) arg2) '(1 2) '(11 22) ))) '(1 2) '(11 22) ))
;; '(list (+ 42 11 22) (+ 42 11 22))

The strange thing is that the following macro (see the 6th example) is OK:

(define-syntax-rule (quuz (arg1 ...) (arg2 ...) ) 
  (list '(+ 42 arg1 arg2 ...) ...))

Note that I just add arg1 before arg2. If the quuz is OK, I can't find any reason for the ce1 to fail.

Yes, the arg2 is in inner parentheses, but its (I mean the pattern variable arg2 ) depth is still 1, which has the same depth as arg1

No. In (list '(+ 42 arg1 arg2 ...) ...) we have outer ... for arg1 (and its environment) and inner ... for arg2.

For example, in (quuz (1 2) (11 22 33)) arg2 ... = 11 22 33. Then we have (list '(+ 42 arg1 11 22 33) ...).

If the quzz is OK, I can't find any reason for the ce1 to fail

(a b c) ... in expansion is correct, when at least one of a b c is an ellipsis argument. So (+ 42 arg1 5 6) ... is OK, but (+ 42 5 6) ... is an error. arg2 ... expands inside inner parentheses into its value, so (+ 42 arg2 ...) is already expanded list, unless arg2 has no second level ellipsis.

1 Like

You are right, thanks.

The ellipsis mechanism is more complicated than I imagined....

For example, our old friend:

(define-syntax-rule (quuz (arg1 ...) (arg2 ...) ) 
  (list '(+ 42 arg1 arg2 ...) ...))

My old translation is:

`(list ,@(map (λ (arg1 arg2) `(+ 42 ,arg1 ,@(map (λ (arg1 arg2) arg2) arg1* arg2*))) arg1* arg2* ))

In most case, it is "correct" because I assumed arg1* and arg2* has the same length. If their length are different, the result will be different.

For example,

(syntax->datum (expand #'(quuz (1 2 3) (11 22))))
;; '(#%app list '(+ 42 1 11 22) '(+ 42 2 11 22) '(+ 42 3 11 22))

However,

`(list ,@(map (λ (arg1 arg2) `(+ 42 ,arg1 ,@(map (λ (arg1 arg2) arg2) '(1 2 3) '(11 22)))) '(1 2 3) '(11 22) ))
; ERROR: map: all lists must have same size

I once thought that we could write a truncate version of map to fix this error, but not!.

(define (map-truncate proc . lol)
  (let* ((min-len (apply min (map (λ (l) (length l)) lol)))
         (truncated-lol (map (λ (l) (take l min-len)) lol)))
    (apply map proc truncated-lol)))

`(list ,@(map-truncate (λ (arg1 arg2) `(+ 42 ,arg1 ,@(map-truncate (λ (arg1 arg2) arg2) '(1 2 3) '(11 22)))) '(1 2 3) '(11 22) ))
;; '(list (+ 42 1 11 22) (+ 42 2 11 22))

Moreover, you may write

(define-syntax-rule (quuz (arg1 ...) (arg2 ...) ) 
  (list '(+ 42 arg1 arg2 ...) ... '(arg2 ...))

and it will expand to

> (quuz (1 2 3) (11 22))
'((+ 42 1 11 22) (+ 42 2 11 22) (+ 42 3 11 22) (11 22))

I found an old paper that explained the precise behavior of ellipsis., see Kohlbecker's 1986 paper, Syntactic Extensions in the Programming Language Lisp. It use extend-syntax though, but I guess it is the predecessor of syntax-rules in Racket.

TR;DL, directly translating ellipsis to map might not be the best way to reason about the ellipsis system. The better way is through some new concepts and terminology (e.g. prototype, environment, ellipsis-depth, etc).

The most important concept is that there are two kinds of "depth":

  1. Each variable from a pattern has a depth, called dp.

  2. Each variable from a template also associated with a ellipsis-depth, called de. (N.B. different occurrences of the same variable may have different de.)

When a macro call matches a pattern, the macro expander will firstly build an environment based on the pattern and the macro call. That environment like @ryanc 's requirements.

For example,

(with-syntax ([(x ...) '(a b)]
              [((y ...) ...) '((1 2) (3 4 5))])
  #'((x (x ...) y ...) ...))

The macro expander firstly builds the environment {(x . (1 (a b))) (y . (2 ((1 2) (3 4 5)))}. The 1 and 2 are dp for the pattern variables x and y.

In order to transcribe a template, the exander just topdown transcribes its sub-templates, but with special treatment for prototypes (the term that immediately precedes an ellipsis is a prototype). For example, to transcribe ((x (x ...) y ...) ...), it will compute de for x and y. Here, the first the de of the 1st x is 1, 2nd x is 2, y is 2. The transcription can proceed, as long as at least one pattern variable dp = de. Obviously, the condition is currently satisfied, so we can proceed. Since the algorithm is topdown, so the result must be like ((a _ _) (b _ _)). This process was called "environment split" or decompose. For each splitted environment, its dp decreases 1. Then the expander recursively transcribes sub-templates with those splitted environments.

There are major 2 situations that make the algorithm indicate an error (assuming the ellipses are legal terminated ellipsis-lists, p.101).

  1. If dp < de for all variables the algorithm indicates an error (p.121)

    The counterexample 2 is under this situation.

  2. If dp > de for any variable the algorithm indicates an error (p.121)

In addition, if a transcription specification (template) prototype contains pattern variables extracted from more than one pattern prototype, the lists described by the various pattern components must be the same length in all calls to the macro being declared (p.101).

For example,

(define-syntax-rule (baz (arg1 ...) (arg2 ...) ) 
  (list '(+ 42 arg1 arg2) ...))
  
(syntax->datum (expand #'(baz (1 2) (11 22))))
;; '(#%app list '(+ 42 1 11) '(+ 42 2 22))

(syntax->datum (expand #'(baz (1 2) (11 22 333))))
;; syntax: incompatible ellipsis match counts for template

However, this condition seems too strong, because the following is also OK

(define-syntax-rule (quuz (arg1 ...) (arg2 ...) ) 
  (list '(+ 42 arg1 arg2 ...) ...))
  
(syntax->datum (expand #'(quuz (1 2 3) (11 22))))
;; '(#%app list '(+ 42 1 11 22) '(+ 42 2 11 22) '(+ 42 3 11 22))

IMO, this argument could be refined to "If a transcription specification (tempalte) prototype contains pattern variables with the same depth extracted from more than one pattern prototype, the lists described by the various pattern components must be the same length in all calls to the macro being declared."

Also noticed that (p.120),

When it encounters an ellipsis in the transcription specification, it determines the pattern variables occurring within that ellipsis’ prototype and restricts the environment to a smaller environment containing only the prototype variables (p.120).

So irrelevant pattern variables do not affect.

1 Like