What would it take to write an independent Racket interpreter?

Hello all,

I was curious what it would take to "reimplement Racket". As a Scheme derivative, it has a small core language[1], but this includes only the language and no standard library. I found a nice folder[2] of tables with about 1700 primitives in the Chez Scheme implementation that might suffice as the minimal standard library.

It looks like the main language-level differences compared with e.g. SICP Scheme are:

  1. Syntax objects: (quote-syntax)
  2. Modules. Including multiple phases, since (begin-for-syntax) is in the core language
  3. Delimited continuations: (with-continuation-mark)

Implementing on interpreter for that core grammar would get you "Racket the language but with empty standard library", and then implementing the 1700 primitives might get you the full language.

My question is: Would this be enough to reimplement Racket in its entirety, and run all the unit tests for the entire PLaneT package repository?

I'm not planning to do this; I'm just curious what it would take.

[1] 1.2 Syntax Model

[2] racket/kernel.ss at master · racket/racket · GitHub

3 Likes

One difficulty in answering this question is pinning down what Racket "in its entirety" means. Much of Racket is implemented in Racket.

If you think about the main Git repository, it is organized as:

https://github.com/racket/racket
├── pkgs/
│   └── ...
└── racket/
    ├── collects/
    │   └── ...
    └── src/
        └── ...

A useful boundary is between the contents of the racket/src/ directory, which contains the "Racket VM" implementations, and everything else, which contains code written in Racket that should be the same for all Racket implementations. The Racket VM provides the environment described in § 18.1.1 Initialization, where only "primitive modules with names that start with #% are defined". These provide the primitives accessed by ffi/unsafe/vm.

I think that's what you mean by "standard library", but usually I would use "standard library" to refer to the contents of the racket/collects/ directory.

Even inside the Racket VM, some components are implemented in Racket. These can be run in "user space" on an existing Racket (primarily for development), but they are primarily compiled in a special mode (e.g. to produce C or R6RS libraries) and embedded into the VM. The expander package implements the macro expander, reader, and module system for both Racket CS and Racket BC. Racket CS also uses thread, io, regexp, and schemify layers implemented in Racket, while Racket BC implements equivalent functionality in C.

Both Racket BC and Racket CS use a C library, rktio, to provide thread-safe non-blocking IO and similar OS-level functionality.

Replacing Racket "in its entirety" could involve replacing some or all of the above components, but, more likely, you'd want to re-use many of them.

From another perspective, at the language level, a Racket implementation must be able to compile and instantiate linklet forms as documented in § 14.14 Linklets and the Core Compiler. Then, it must provide the very large number of primitive functions and values you mentioned. As Matthew Flatt recently put it:

It's all the primitives that would matter if you're trying to imitate Racket, and the complexity there is why Racket is complicated to build.

For more detail, you might find the paper Rebuilding Racket on Chez Scheme (Experience Report) useful.

11 Likes

I agree. Oversimplifying there are 4 layers:

4: Libraries written in Racket
3: from #%Kernel to Racket
2: from Chez Scheme to #%Kernel
1: Chez Scheme

Most of the transition from from BC to CS was about (2). There were some changes in (3) like rewriting the expander, and regular expressions. Also some changes to (1) like adding ephemerons and continuations marks. This took approximately 4 years, and there are still a few corner cases to fix and room for improvement.

So, rewriting the question of @LiberalArtist: Which of these layers do you "want" to rewrite?

Also, are you asking about a new implementation that is as fast as the current one or 10x slower? (I think 100x slower can cut the time to 1 year, but I never tried.)

About the 1700 primitives, many of them can be written in Racket and could be moved to libraries. For example add1 that is probably a primitive only for historical reasons, or list-pair? that is an internal function used only in the optimizer. This would be helpful to projects that try to make a new backend, but it's also a lot of work.

5 Likes

The answers given so far cover things well, but I'll offer a few extra comments.

On the macro expander:

As @LiberalArtist and @gus-massa noted, the macro expander is implemented in Racket, and that implementation covers things like quote-syntax, phases, and the module system. The expander can't be a Racket library in the #lang sense, because it implements that library mechnanism, but it can be a library in terms of an underlying library system that is potentially much simpler. In Racket CS, for example, the expander is wrapped as a Chez Scheme R6RS library.

Implementing a similar macro expander is maybe not that difficult, but building a separate and fully compatible macro expander is probably not practical. That would be similar to building a RnRS Scheme implementation that is not merely a drop-in replacement for another according to the standard, but would work for every use of the other Scheme implementation. So, for the foreseeable future, any compatible and practical implementation of Racket is just going to use the main implementation of the macro system.

As @LiberalArtist noted, the "linklet" interface represents the macro expander's communication to an underlying compiler. Depending on how the underlying compiler works and how much performance is needed, the mapping through linklets can be fairly straightforward, but a lot of work went into this layer for Racket CS.

On the other primitives:

As @LiberalArtist and @gus-massa also noted, the 1700 "primitives" include things at different levels of primitiveness:

  • Some primitive are things that could be straightforwardly implemented as libraries, although there may be a performance advantage to building them into the kernel and directly supporting them in the compiler. This category is probably more than half of the primitives, and a big chunk is just operations on numbers (exact, inexact, complex, etc.).

  • Some primitives provide OS facilities like filesystem and network access. They could be implemented in Racket as libraries using a foreign-function interface, but since OS APIs are published in C, it's even easier to use a combination of Racket and C. The "rktio" library is the C half.

  • Finally, some primitives implement and reflect runtime facilities like the thread scheduler, delimited continuations, garbage collection, foreign-function interface, and access to compilation/evaluation. These parts make Racket a kind of OS itself (nested in a process within the host operating system). It's a bigger subset of the primitives than you might guess, and it's tricky to make it a layer that's isolated from the other kinds of primitives. For example, there's a connection between make-vector, which might seem purely a library entity, and memory limits as imposed on a custodian that manages the thread allocating a vector, which requires an extra check when the vector to allocate is large enough. Access to underlying OS facilities also interact frequently with Racket-as-OS primitives. Racket CS layers this implementation somewhat (e.g., across the "thread" and "io" layers), but it's also the main source of special hooks that let lower layers call upper layers.

All of this could be implemented on top of a very small compiler and core runtime system. But making it perform well ends up blurring the lines, such as having the compiler recognize arithmetic operations to inline them with specialized number representations, or building support for continuation marks into the compiler.

8 Likes

Thanks all. These comments and the link to the paper about the Racket-on-Chez migration were all helpful.

I was mainly asking about the non-Racket parts of the stack: Since Racket can be compiled to bytecode, I was thinking an "alternate interpreter" could bootstrap itself by including precompiled versions of the expander or similar and having a minimal bytecode interpreter. You could also take a standardized Scheme implementation as a given, since there are many of those already.

Performance wasn't a concern with this question, so a 100x slowdown would be okay.

Racket supports compiling source files to .zo files. I assume "linklet" is synonymous with "bytecode" as stored in these files, but a more accurate term since the bytecode is S-expressions and not a simple list of operations.

It seems Racket CS decreased the amount of non-Racket code in the compiler, so such a project would be much easier now than 5 years ago. From my reading of the report and these comments, you would mainly need to

  1. Reimplement rktio in any language that can export C ABI symbols(C/Rust/Lisp/etc.)
  2. Change Racket to support any standard Scheme, and not just the modified Chez Scheme
  3. Implement a .zo interpreter. This should be similar to the fully-expanded grammar I linked in my first post, but the linklet language looks even simpler.
  4. Precompile the expander or other compiler-critical parts to a linklet.
  5. Implement the 1700 primitives(or move some of them to libraries) in terms of standard Scheme or rktio. These primitives are provided to the .zo interpreter as it is executing the precompiled code.
  6. Use the bootstrapped Racket compiler to compile itself: Re-export the bytecode from step 4, and add in the full "#lang racket" standard library.
  7. Test on the full package repository.

After all of these, you would have a working Racket compiler that has none of the original non-Racket code. Does that sound right?

Bytecodes are used in Racket BC.
The BC compiler produces platform independent bytecode.
At runtime the bytecode is JIT compiled machine code and then executed.

Flatt made a presentation on how it works internally:

The paper The Racket Virtual Machine and Randomized Testing

has more information on bytecodes.

If you want to explore what is needed to make a bytecode interpreter,
you can check my old project meta.
It is an bytecode interpreter written in Racket with lots of comments.
[Since bytecode structures change occasionally, I think you need Racket 6.5 to run it]

However bytecodes belong to Racket BC. The new approach in Racket CS is to rewrite
the Racket source into something that the Chez Scheme understands.
The expander first fully expands the Racket source and package the result as a linklet [1].
Then the compiler compiles the linket into something that Chez Scheme can handle.

If you want to add a new backend, then it would be natural to reuse the expander and
focus on implementing a linklet to backend-platform.

[1] 14.14 Linklets and the Core Compiler

2 Likes

I have a number of thoughts on this topic.

  1. There are approximately 4 separate implementations of Racket: Racket CS, Racket BC, Pycket, and RacketScript. Pycket is pretty complete for these purposes, RacketScript is still in progress.
  2. It really depends on your goals. Here are a few possible goals, and what it might take to achieve them.
    • Port Racket to run on a different Scheme (eg Gambit or Guile). This is very doable if you want a mostly-working implementation. You would need to adapt two things, roughly. One is the Scheme code in racket/src/cs/rumble, which implements the core Racket primitives (things like impersonators and the GC interface and Racket's equality semantics) on top of Chez Scheme. Then you'd need to adapt the schemify code generator to produce code in the other Scheme. Of those, rumble will be substantially more work.
    • Port Racket to run on a totally different VM, similar to Pycket or RacketScript (ie, you want to target Graal or the .NET VM). For this, you need to roughly write a translator from the linklet layer to your VM, and then implement all the primitives. Here, depending on the VM, you might or might not want to use the IO library, rktio, and the thread library. For example, if you want Racket threads to be lightweight threads on the JVM, it might be hard to do that while reusing the existing thread code, which is built on top of Chez engines.
    • Eliminate C code from Racket. Here you "just" need to write a VM for Chez or a similar Scheme in some other language, and implement the Racket IO layer on top of that, handling all the stuff that rktio does, and then apply one of the approaches above.
    • Write more of Racket in Racket. Here you'd want to pick some layer currently implemented in rumble (for example, impersonators/chaperones) and rewrite that in Racket on top of the lower level primitives, and then pre-compile that to a linklet as currently happens with the expander/io/thread/schemify code.
  3. It's not possible to port anything like the current Racket CS implementation to "portable" Scheme (for any of the many possible meanings for that). Access to things like delimited control, FFI, GC hooks, threading, etc are not portable in the necessary way, even though many full-featured Scheme systems offer many of those features.
  4. The current Racket CS implementation already uses the strategy of pre-compiling libraries to linklets, and including them in the binary. That's how the io/thread/expander/schemify libraries all get compiled in.
  5. The relationship between the expander and the rest of the runtime system is one of callbacks, not something that you can just run ahead of time. It's possible to pre-compile individual Racket programs, and then ditch the expander to run them (that's how the expander itself is written in Racket) but this is not automated in general (that would be a great thing to add eventually) and doesn't work in general (and so would not come close to working on the package repository or even the standard library).
5 Likes

I don't follow this, but I'm curious, when an executable is created with raco exe does it use or even need the expander? The fully expanded code is compiled to create the executable, correct? Are you just saying that removing the expander from the runtime system at this point is possible? Mostly just curious what the potential benefits are that we are missing out on.

Apologies if I'm completely missing the point! I'm coming at this as a Racket user and not as an implementer.

Some executable need the expander, and some don't. For example, a Racket program can call eval and that needs the expander. More basically, even the results of raco exe use the module system, which is also implemented in the expander.

For some executables, they don't need the expander, though. It would be possible to use something like the way the expander itself works to create custom Chez executables that didn't need the Racket libraries at all. This would work only for some programs, but it would be a useful feature to have.

2 Likes

Ah, I see. Thanks for the response!