Extending the reader to support Kawa-like treatment of colons

I posted this message on google groups before learning that it's (likely) no longer used, so I'm cross-posting it here:

I've had an idea to implement a #lang that would provide some syntactic extensions akin to those provided by the Kawa Scheme, particularly with regard to the usage of a colon.

In particular, in Kawa:

  • the sequence x: will be read as a keyword equivalent to #:x

  • the sequence a:b will be read as ($lookup$ a (unquote b))

  • likewise, the sequence a:b:c will be read as ($lookup$ ($lookup$ a (unquote b)) (unquote c)) and so on

  • if two colons appear in a row, as in, say a::b, they will be treated as a separate token, i.e.

(call-with-input-string "a::b" read)

will return the symbol a

and

(call-with-input-string "(a::b)" read)

will return the list of three tokens, (a :: b)

Kawa treats each pair of colons as a separate token, so

(call-with-input-string "(:::::::)" read)

returns the list (:: :: :: :)

but I would actually prefer to treat a sequence of two or more colons as a single token.

In either case, I wonder if someone could guide me how to do this, preferably using Racket's internal machinery that's already out there.

I found that atoms in Racket are being read by the read-symbol-or-number? procedure from expander/read/symbol-or-number.rkt, but I don't think it is sufficiently tweakable - in particular because if I read a colon, I might have to unread it if I can peek another colon.

Is there any simple way out of this situation, or do I have to actually implement a Lisp reader from scratch? (I'd be OK with that, because I already wrote a reader in Kawa that I think I could adapt, but then I'd probably have some questions about dealing with syntax objects. The alternative would be to modify Racket's sources to make the reader API more flexible, e.g. allowing to parameterize expander's read with the read-symbol-or-number function, which I think would be more in the spirit of Racket, but it would probably also take more time to downstream)

Thanks in advance,

Panicz

1 Like

Hi Panicz,

Since the standard reader reads x: and a:b etc. as symbols, I'll suggest the following:

  • use the existing reader to get a syntax-objects
  • do a pass over the syntax object and replace symbols containing colons

This approach has the advantage, that you don't need to write your own lexer/parser.

In super I used that technique to add support for id[expr].
The standard reader would produce (id (expr)).
But the syntax object has information on the original type of bracket used
and on whether there were space between id and [expr].

You can see how I did here:

In your case, you can make a recursive descent and for each identifier,
rewrite it if it contains colons.

1 Like

An alternative could be adding a terminating macro to the readtable, but I think you'd still need some wrapping for the keyword case: 13.7.1 Readtables

1 Like

Thanks. I've been thinking about that.
The main problem is that when I read first::last, they need to be read as 3 separate tokens, so calling read should first return first, then :: on its second call, whereas if I read a whole symbol and then chunk it, I will get a list of 3 symbols which is not quite what I want.

Other that this, there is some potential in this solution - I could just require there to be a space separator before ::, and then install a readtable extension for a colon. While it may not be perfect, maybe it will suffice.

The standard reader would read:

(foo first::last)

as a list of the symbol foo followed by the symbol first::last.

The "adjustment pass then detects colons in symbols and rewrites

(foo first::last)

to

(foo first :: last).

That is, since you have the entire AST you don't need to rewrite first::last into (first :: last).

With a terminating macro, : would end a symbol (unless escaped), so no whitespace would be needed. It's similar to abc"def"2.