Transparent Syntax Extensions

Hello everyone!

My question is: is there a way in Racket to implement transparent syntax extensions from inside the language?

What I mean is I’d like to be able to (somehow) define a new syntax object of arbitrary form (with some kind of a macro, or smth), and transparently use it later in code, without messing with the compiler.

I've read that Racket allows for completely different languages to be defined and used almost transparently within Racket code (Rhombus, for example, allows to call Racket functions defined in neighbouring files and vice versa), but core difference from what I want to achieve is to be able to mix my new definitions with plane Racket code within one block of code, not in separate files.

Any help would be appreciated!

1 Like

Do you mean something like this:

#lang racket

(define-syntax-rule (Selfofly first-body-form body-forms ...)
  (begin
    (printf "ignoring the first body form ~a\n" 'first-body-form)
    (printf "here is a new form of syntax\n")
    (printf "it can be used within the same file and module right away\n")
    body-forms ...))

(Selfofly (/ 1 0))

(Selfofly (+ 2 2) 'is 4)

(define (some-function x)
  (* 10 x))

(Selfofly 'i 'can 'call (some-function 2) 'too)
1 Like

No, that's not exactly what I mean. This does produce a new syntax, but it is not of arbitrary form - here it must be an S-expression. What I want is to be able to inject pieces of code of ANY form, as long as I'm able to parse it in any way

If you manipulate the #lang you might be able to get a modicum of mixing going. But that’s above my pay grade :slight_smile:

1 Like

Can you give an example of a non–S-expression syntax you want to produce?

I’m not entirely clear what you have in mind, and I think we might be using terms in different ways.

When you say “produce,” I first think of the result of a macro transformer function. That result is fundamentally just a data structure. S-expression concrete syntax is one way of writing down a data structure, but you could write the result of @EmEf’s example using some other concrete syntax, like Sweet expressions or @-expressions:

#lang at-exp racket
(define-syntax-rule (Selfofly first-body-form body-forms ...)
  @begin[
    @printf["ignoring the first body form ~a\n" 'first-body-form]
    @printf{here is a new form of syntax~n}
    @printf{it can be used within the same file and module right away~n}
    body-forms ...])

Likewise, when you write, “define a new syntax object of arbitrary form”, I think of define-ing a syntax object like this:

$ racket
Welcome to Racket v8.17 [cs].
> (define stx
    #'(+ 1 2))
> stx
#<syntax:string:2:4 (+ 1 2)>

But I wonder if, instead, you mean a macro like include-algol (or maybe something more like #lang at-exp, even):

#lang at-exp racket
(require algol60/algol60)
@literal-algol{
  begin
    printsln (`hello world')
  end
}

Then again, maybe that isn’t what you mean, because, while include-algol uses non–S-expression syntax, it expands to:

(#%module-begin
 (require algol60/algol60)
 (begin
   (define other99520
     (lambda () (printsln (lambda (val) (void)) (lambda () "hello world"))))
   (other99520)))
1 Like

Could you be needing the tools described in 13.7 Reader Extension ?

-- hendrik

1 Like

Well, I may not be able to get my wording correct as I'm just vaguely familiar with Racket, or, to be honest, with Lisps in general, but I'll try to explain my request in more details.

So, I'm currently researching the possibilities of metaprogramming in different languages. I'm looking for a language where it is possible from inside the language to manipulate the language's grammar via it's macro system. I want to be able to define new types of AST-nodes and write parsers for them, all from within the boundaries of the language itself, making it self-expandable.

This would ultimately allow for ANY syntax extensions, meaning I could be mixing the language in question and C++ if I'd be determined enough to implement C++ parser with this tooling.

So the question actually is: is it possible to do something like that in Racket? Or to reach any type of similar behaviour.

Here is an example of what i'd like to be able to do:

class Selfofly {
    method doCoolStuff(int x, int y) {
        (+ x y)
    }
}

(Selfofly.doCoolStuff 2 2)

First, I want to make sure I understand what you're saying. I think you're saying you'd like to be able to add (for instance) a curly-brace-java-like syntax, and intermingle this with the parenthesized syntax, is that right? So, to be very specific, it appears to me that you would definitely not be happy with the parenthesized version of this:


(class Selfofly
  (method doCoolStuff ([int x] [int y])
          (+ x y)))

(Selfofly doCoolStuff 2 2)

If I'm incorrect, then please disregard the following. If I'm right, though, then let me ask: would you like (in the system you're proposing) to also be able to build a system where the close brace on the class is omitted? That is, would you like to be able to make this a legal program?

class Selfofly {
    method doCoolStuff(int x, int y) {
        (+ x y)
    }

(Selfofly.doCoolStuff 2 2)

?

If I'm understanding you correctly, the challenge in building a system such as the one you describe comes when trying to mix together the various syntactic extensions; it's very easy to design an ambiguous system, where there are many different legal parsings of the same text. Consider, for instance, a system where indentation is significant, intermixed with one where it's not. Things get interesting quickly.

I think you might be interested in reading more about rhombus and about shrubbery; in my opinion, one of the key syntactic insights of the last fifty years, and one that's not yet widely understood, is the separation between "reader" and "parser"; that is, that intermingling of macros defined in separate places is made plausible by the initial agreement upon a common set of "shape" rules. In lisp-like languages, for instance, this common shape is defined by parenthesizing everything. In languages such as Python and Rhombus, by contrast, you see things like indented blocks preceded by a line ending with a colon, parenthesized lists with elements separated by commas, et cetera.

One compelling piece of negative evidence comes from early versions of Rust, which I got to work on back in 2011 or so; the macro system there was initially designed to allow macros to specify their own internal parsers, that would consume tokens until they decided that they were done. In my opinion, this made for a clunky and difficult-to-use system, where any macro-designer had to be sensitive to the structure of the internal set of AST nodes, and made macro hygiene nearly impossible.

This is not to say that you can't build such a system, though! Racket is totally set up to allow you to define your own #lang, and give it its own reader and parser; you can define the way in which code is parsed, right down to the character level; Rhombus is a perfect example of this.

John

2 Likes

This looks promising, so I'll be looking into it in detail, thx)

It is actually exactly what I'm looking for! :slight_smile: I'm interested in exploring these ambiguities in my thesis, so I need a system with as much freedom as possible in terms of creating them.

That's actually interesting, thanks! I'll surely be looking into it!

Now, speaking of forms - I've read about Rhombus and Shrubbery notation, and I get the idea. Let's say that we stick not only to well-shaped forms of any kind, but to valid Rhombus syntax, just as an example. Does Racket allow to expand it's reader in a way to mix just these two languages seemlessly?

The tools @hendrikboom3 mentioned are one approach, modifying the reader to teach it about a new kind of concrete syntax. That's how #lang at-exp or #lang scribble/manual add @-expressions.

The literal-algol macro I mentioned works differently, entirely at the level of what @jbclements calls the "parser". I used @-expressions because they are a nicer way to write this, but it could also have been written as:

#lang racket
(literal-algol "begin\n"
               "  printsln (`hello world')\n"
               "end")

In this case, the literal-algol macro transformer receives string literals at compile time and is responsible for parsing them in the traditional character-by-character sense. This violates the separation @jbclements wrote about, and it makes it very hard to escape from Algol 60 back to Racket. (In fact, literal-algol doesn't even try to make that possible.) But it has some appeal, because it lets the normal Racket reader deal with delimiting and such, avoiding the things @jbclements explained make reader extension difficult.

The Scribble reader is an interesting intermediate point. While we mostly use it to write text in a nice way, it is a generalized system for mixing concrete syntax in a principled way. The make-at-readtable API lets you control the parsing of the parts, and multiple extensions can be used at once as long as they have different command characters (like @ in the normal mode). The Scribble reader framework manages the tricky delimiting issues, while you get control over a well-defined area. Hypothetically, you could switch between Algol and Racket like this:

#lang algol-exp racket
(require algol60/interop)
💻{
  begin
    printsln (💻[(string-append "Hello" "World")])
  end
}
1 Like

If you're looking at this sort of thing from a research perspective, you might be interested in:

I was looking for more @elibarzilay’s writing about Scribble, as the things I was remembering weren’t discussed in detail in “The Scribble Reader: An Alternative to S-expressions for Textual Content”. The most relevant thing I remembered is an old mailing list thread:

Separately, for a more thorough introduction to these parts of Racket, you might consider:

2 Likes

One issue here is what defines a block of code and how fine grained that can be. Take this example that mixes TypeScript and Racket.

function add(x : number, y : number) : number
{
    return (+ x y) // Racket block here... 
}

Well, this isn't correct, as a return statement in TS requires a semicolon. Fine, an easy fix.

Now consider this...

function make_pair(x : number, y : number) : number
{
    return [x y] // Racket block here?
}

This is either a Racket block or a TS array literal with a bug in it. And a parser can't determine the intent in general. So, it has to present the ambiguity to the programmer and have them decide.

This requires a parser that has, when creating every possible AST node, do so by parsing the text as every possible known language and then backtrack as needed. This is a big explosion in complexity.

I then thought, let's add an #endlang directive:

#lang racket
...
#endlang

#lang ts
...
#endlang

Then the question becomes is there enough utility in having multiple languages in a single file versus just having separate files. Personally, I would prefer separate files. Because the context switch of opening the file is a good thing. I am using a new language after all.

Also, as background, you could mix C# and F# libraries together in a single executable in .Net for quite some time. In practice, it never is done as it ends up compromising both the C# and F# code. A Racket and TypeScript combination would end up as a compromise for the same reasons.

So, the value proposition in mixing languages is very complicated and one can argue that the technical complications don't merit the actual benefits that could be gained.

Nathan Dykman

1 Like

Thanks everyone for your answers, quite a lot to think and read about!

An example of using the Scribble reader this way is in @jeapostrophe’s #lang remix RacketCon talk at 5:47.

He even gets syntax highlighting and binding arrows working for this!

#lang remix
(require remix/stx0
         remix/datalog0)
(define graph (make-theory))
@datalog[graph]{
 edge(a, b). edge(b, c). edge(c, d). edge(d, a).
 path(X, Y) :- edge(X, Y).
 path(X, Y) :- edge(X, Z), path(Z, Y).
 path(X, Y)?
}

This test file from the repository has a lot of examples: remix/tests/racket/remixd.rkt at master · jeapostrophe/remix · GitHub


From a different direction, Leif’s thesis (defense talk, shorter talk) on interactive and visual syntax extensions might be interesting.

2 Likes

I am reminded of:

Honu: Syntactic Extension for Algebraic Notation through Enforestation

A few idea and info about this large topic:

as mentioned a good tutorial ,i personally started with that, about reader and parsing is this one:

about swapping to another parser in the same file, this should require some directives like in SRFI 105 "curly infix" where you use #!curly-infix to say you enter a curly infix mode with { } .
But even there is no need if your parser include both the normal scheme syntax with parenthesis ( ) and can parse the scheme language and the syntax with { } as it is the case in SRFI 105 ( SRFI 105: Curly-infix-expressions )

If you want to use some indentation then a directive should be mandatory, for example SRFI 110 allow some sweet expression without parenthesis but indented ( SRFI 110: Sweet-expressions (t-expressions) ) as it should be hard for a parser to adapt and swap from a full parenthese syntax to an indented syntax in the same source file without directives warning that you enter one mode or another. The #!sweet directive in this case.

you can imagine a #!rhombus directive. Or a sort of algorithm, i make some starting from finite state machine, that detect the syntax and language of the next line/expression of your source code (but perheaps not so easy with indent,i do not know Shrubbery notation enough to have an exact opinion)

Yes a language could become ambiguous for example if you want to get rid of { } and use ( ) for both prefix and infix. This is what i allow in Scheme+ ( Scheme+ for Racket )
This parsing then require an algorithm (based on finite state machine and other things) to decide if an expression is infix or prefix.

To do that, and generally in every language construction, in scheme you can use:

  • macros
  • "external parsing" ( a reader in Racket terminology)
  • syntax transformers in modern scheme : Syntactic Extension
  • Racket has extensions to the normalized scheme that can help a lot

Regards,

Have you seen this package Infix Expressions for Racket ?