Extending the reader to support Kawa-like treatment of colons

panicz · April 5, 2025, 10:00am

I posted this message on google groups before learning that it's (likely) no longer used, so I'm cross-posting it here:

I've had an idea to implement a #lang that would provide some syntactic extensions akin to those provided by the Kawa Scheme, particularly with regard to the usage of a colon.

In particular, in Kawa:

the sequence x: will be read as a keyword equivalent to #:x
the sequence a:b will be read as ($lookup$ a (unquote b))
likewise, the sequence a:b:c will be read as ($lookup$ ($lookup$ a (unquote b)) (unquote c)) and so on
if two colons appear in a row, as in, say a::b, they will be treated as a separate token, i.e.

(call-with-input-string "a::b" read)

will return the symbol a

and

(call-with-input-string "(a::b)" read)

will return the list of three tokens, (a :: b)

Kawa treats each pair of colons as a separate token, so

(call-with-input-string "(:::::::)" read)

returns the list (:: :: :: :)

but I would actually prefer to treat a sequence of two or more colons as a single token.

In either case, I wonder if someone could guide me how to do this, preferably using Racket's internal machinery that's already out there.

I found that atoms in Racket are being read by the read-symbol-or-number? procedure from expander/read/symbol-or-number.rkt, but I don't think it is sufficiently tweakable - in particular because if I read a colon, I might have to unread it if I can peek another colon.

Is there any simple way out of this situation, or do I have to actually implement a Lisp reader from scratch? (I'd be OK with that, because I already wrote a reader in Kawa that I think I could adapt, but then I'd probably have some questions about dealing with syntax objects. The alternative would be to modify Racket's sources to make the reader API more flexible, e.g. allowing to parameterize expander's read with the read-symbol-or-number function, which I think would be more in the spirit of Racket, but it would probably also take more time to downstream)

Thanks in advance,

Panicz

soegaard · April 5, 2025, 12:07pm

Hi Panicz,

Since the standard reader reads x: and a:b etc. as symbols, I'll suggest the following:

use the existing reader to get a syntax-objects
do a pass over the syntax object and replace symbols containing colons

This approach has the advantage, that you don't need to write your own lexer/parser.

In super I used that technique to add support for id[expr].
The standard reader would produce (id (expr)).
But the syntax object has information on the original type of bracket used
and on whether there were space between id and [expr].

You can see how I did here:

github.com/soegaard/super

lang/reader.rkt

main

#lang racket/base
;; The function `make-meta-reader` is used to implement meta langauges
;; that adjusts an existing language.
;; We want
;;    #lang super <lang>
;; to behave mostly as <lang>, but we want to:
;;   - use #%app and #%top from super/main
;;   - adjust forms of the type:
;;        id[expr]
;;    (an expression consisting of an identifer followed directly
;;     be an expression in square brackets)
;;    The expression
;;        id[expr]
;;    is rewritten to 
;      (#%ref id expr ...).

(require (only-in syntax/module-reader make-meta-reader))

;; The procedure `make-meta-reader` produces adjusted versions
;; of `read`, `read-syntax` and `read-get-info`.

This file has been truncated. show original

In your case, you can make a recursive descent and for each identifier,
rewrite it if it contains colons.

LiberalArtist · April 5, 2025, 1:24pm

An alternative could be adding a terminating macro to the readtable, but I think you'd still need some wrapping for the keyword case: 13.7.1 Readtables

panicz · April 5, 2025, 1:44pm

Thanks. I've been thinking about that.
The main problem is that when I read first::last, they need to be read as 3 separate tokens, so calling read should first return first, then :: on its second call, whereas if I read a whole symbol and then chunk it, I will get a list of 3 symbols which is not quite what I want.

Other that this, there is some potential in this solution - I could just require there to be a space separator before ::, and then install a readtable extension for a colon. While it may not be perfect, maybe it will suffice.

soegaard · April 5, 2025, 2:28pm

The standard reader would read:

(foo first::last)

as a list of the symbol foo followed by the symbol first::last.

The "adjustment pass then detects colons in symbols and rewrites

(foo first::last)

to

(foo first :: last).

That is, since you have the entire AST you don't need to rewrite first::last into (first :: last).

LiberalArtist · April 5, 2025, 3:02pm

With a terminating macro, : would end a symbol (unless escaped), so no whitespace would be needed. It's similar to abc"def"2.

Topic		Replies	Views
Illegal use of syntax value at phase 1 General	5	237	June 8, 2023
Reader module for reading multiple lines Questions & Answers	1	172	December 25, 2021
Is it possible to use "s-exp" reader to parse expressions without parens wrapped? Questions & Answers	4	613	April 25, 2022
Eval in a function General	21	800	May 3, 2023
Syntax Colours for Custom Dispatch Macros Questions & Answers	0	34	May 3, 2025

Extending the reader to support Kawa-like treatment of colons

Related topics