Racket to Rust transpiler (educational purpose)

Hi everyone,

I am new to Racket and want to learn how to create programming languages.
So I worked through the Beautiful Racket tutorials and want to test my knowledge by creating a programming language with Pyret-like syntax.
I also decided that the programming language should be able to transpile to other languages by generating source code (in this case Rust).
So the workflow is Pyret syntax -> Racket parse tree -> Rust source code.
Currently, I managed to get the parse tree up and some Rust source code generation, but some of the parse tree does not generate code, it passes the symbols through instead of evaluating the parse tree.
I speculate that I am struggling with macros and recursion, but need some help.
Are there perhaps existing projects/resources that you know of that I can use to fill my knowledge gaps?

2 Likes

I am new to Racket and want to learn how to create programming languages.
So I worked through the Beautiful Racket tutorials and want to test my knowledge by creating a programming language with Pyret-like syntax.

I also decided that the programming language should be able to transpile to other languages by generating source code (in this case Rust).

The approach used in Beautiful Racket (BR) is well suited for implementing languages on top of the Racket VM. If you want to make a "Pyret" to Rust computer, you can still write the lexer and parser using methods from BR. This will get you a representation of the "Pyret" source as a parse tree.
The last step from parse tree to Rust is best solved with "traditonal" methods.

Instead of recommending a project, I'll point you towards a book:

 "Essentials of Programming Languages"
 https://eopl3.com/preface.html

image

Finally, a small tip: The last pass needs to output a Rust program. Instead of generating and appending strings, generate a free form tree - and then use an emitter to print the tree on the output port.

That is, instead of (string-append (generate-function-header) (generate-function-body)) write (list (generate-function-header) (generate-function-body)). Since string-append takes time proportional to the length of the strings, the code generator can easily use quadratic time if string-append is used instead of list.

(define (mangle id)
  ; rewrite identifier in Pyret to Rust identifier
  id)
  
; emit : tree -> void
;   display the elements in the tree in order
(define (emit x)
  (cond
    [(or (number? x)
         (string? x)) (display x)]
    [(symbol? x)      (display x) (display " ")]
    [(identifier? x)  (display (mangle x))]
    [(list? x)        (for-each emit x)]
    [else
     (displayln x)
     (error 'emit "got ~a" x)]))

(emit '(fn main () "{" 
           println! "(" "\"Hello world!\"" ")" ";" 
        "}"))

This will output:

fn main {println! ("Hello world!");}

Here is a small small set of helpers (where ~ is short for "format") that are useful
in the code generator:

(define (~parens . t)      (list "(" t ")"))
(define (~braces . t)      (list "{" t "}"))
(define (~string . t)      (list "\"" t "\""))
(define (~commas ts)       (add-between ts ","))
(define (~statement . t)   (list t ";"))
(define (~function id args . t) (list 'fn (mangle id) (~parens (~commas args)) t))

(emit (~function 'main '()
         (~braces 
          (~statement 'println! (~parens (~string "Hello world!"))))))

This produces the same result as before.

2 Likes

Thank you for your thorough reply.
Great! I really like Beautiful Racket's approach and Racket's #lang feature, so the parse tree step is sorted then.

The last step from parse tree to Rust is best solved with "traditonal" methods.

Do you perhaps have any example projects where someone uses Racket for these "traditional" methods, or is the main idea still to run on top of the Racket VM?

Thank you for the book reference, I will definitely take a look!

Finally, a small tip: The last pass needs to output a Rust program. Instead of generating and appending strings, generate a free form tree - and then use an emitter to print the tree on the output port.

Cool, I will try that. Here is an snippet of the code I wrote (Glitch is the name of the DSL):

#lang br/quicklang
(require racket/system)

; Creates a Rust file
; Runs Rust formatter on the expander's output
(define-macro (glitch-module-begin PARSE-TREE ...)
  #'(#%module-begin
     (with-output-to-file "output.rs" #:exists 'replace
       (lambda () (printf "~a" "/* This code is generated by Glitch */")))
     PARSE-TREE ...
     (void (system "rustfmt \"output.rs\""))))
(provide (rename-out [glitch-module-begin #%module-begin]))

; During runtime this writes to file. 
; Call the glitch->rust function for each node in parse tree.
; Each production rule must have a macro or function according to #brag.
; EXPR is the output from the production rule macro.
(define (glitch->rust EXPR)
  (with-output-to-file "output.rs" #:exists 'append
    (lambda () (printf "~a" EXPR))))

I know that the write-to-file approach is inefficient because I am calling it on each node, but I just wanted to get some output that I can use to run the Rust formatter on.
Do I understand correctly that the expander will loop through the parse tree (in this case inside the glitch-module-begin macro), call the emit function on each node and print to the output port, and I have to make sure that the nodes are symbols instead of strings? And I make sure it is symbols by returning syntax objects from the production rules that are symbols only?

; A production rule for import statements
(define-macro (import-stmt MODULE FUNCTION )
  #'(list (format "use ~a::{~a};" 'MODULE 'FUNCTION))) ; This needs to change to symbols only?
(provide import-stmt)

Thank you so much again for your reply. Racket is an incredible language but has so many new concepts.

Thank you for your thorough reply.
Great! I really like Beautiful Racket's approach and Racket's #lang feature, so the parse tree step is sorted then.

The last step from parse tree to Rust is best solved with "traditonal" methods.

Do you perhaps have any example projects where someone uses Racket for these "traditional" methods, or is the main idea still to run on top of the Racket VM?

Thank you for the book reference, I will definitely take a look!

Finally, a small tip: The last pass needs to output a Rust program. Instead of generating and appending strings, generate a free form tree - and then use an emitter to print the tree on the output port.

Cool, I will try that. Here is an snippet of the code I wrote (Glitch is the name of the DSL):

#lang br/quicklang
(require racket/system)

; Creates a Rust file
; Runs Rust formatter on the expander's output
(define-macro (glitch-module-begin PARSE-TREE ...)
  #'(#%module-begin
     (with-output-to-file "output.rs" #:exists 'replace
       (lambda () (printf "~a" "/* This code is generated by Glitch */")))
     PARSE-TREE ...
     (void (system "rustfmt \"output.rs\""))))
(provide (rename-out [glitch-module-begin #%module-begin]))

; During runtime this writes to file. 
; Call the glitch->rust function for each node in parse tree.
; Each production rule must have a macro or function according to #brag.
; EXPR is the output from the production rule macro.
(define (glitch->rust EXPR)
  (with-output-to-file "output.rs" #:exists 'append
    (lambda () (printf "~a" EXPR))))

Here's my take:

The intermediate representation the emitter gets should have been transformed ahead of time so it is structured somewhat like rust-with-too-many-parentheses, in whatever nesting is convenient. Avoid string handling like the plague when you produce this intermediate representation.

The intermediate representatino will probaby be a tree structure, just like Racket code is a tree structure.

And your emitter function should itself be recursive and call itself recursively down your intermediate representation, emitting pieces of text to the output file as it goes. You want to avoid doing string manipulation at all -- just write stuff to the output file in the correct order.

I know that the write-to-file approach is inefficient because I am calling it on each node, but I just wanted to get some output that I can use to run the Rust formatter on.

Writing directly to the file is not as inefficient as repeatedly concatenating ever longer and longer strings.

Do I understand correctly that the expander will loop through the parse tree (in this case inside the glitch-module-begin macro), call the emit function on each node and print to the output port, and I have to make sure that the nodes are symbols instead of strings? And I make sure it is symbols by returning syntax objects from the production rules that are symbols only?

Your expander will have to produce whatever your emitter likes to see, using whatever notation is convenient.

-- hendrik

3 Likes

Not quite.

The phases during executation of a racket program are:

read -> expand -> compile -> eval

With a #lang glitch approach you as the compiler writer first replaces the standard reader.
Using brag, ragg or parser-tools you can write a reader that represents the input program as a syntax object.

Now the standard Racket expander receives the program as a syntax object.
From the expander's view point this is just a normal program.
The job of the expander is to rewrite a program with macros to a program without macros.
A macro call in the program will cause the expander to call the associated syntax transformation.
This will eventually produce a program without macro calls.

The compiler can now produce machine code that for the program (represented as a syntax object).

Finally the user can run the program.

In your Glitch example, you have this macro definition for glitch-module-begin

(define-macro (glitch-module-begin PARSE-TREE ...)
  #'(#%module-begin
     (with-output-to-file "output.rs" #:exists 'replace
       (lambda () (printf "~a" "/* This code is generated by Glitch */")))
     PARSE-TREE ...
     (void (system "rustfmt \"output.rs\""))))

The expander will thus produce a program that looks like:

  #'(#%module-begin
     (with-output-to-file "output.rs" #:exists 'replace
       (lambda () (printf "~a" "/* This code is generated by Glitch */")))
     PARSE-TREE ...
     (void (system "rustfmt \"output.rs\""))))

This will be compiled and when the compiled program is run,
it will generate the Rust program.

That is: Your #lang glitch program produces a Racket program that when run, prints a Rust program.

It works, but I think it is simpler to skip the part that produces the Racket program.
The lexer/parser produces a parse tree (a syntax object) so you can use standard
recursive descent to rewrite the tree. [Note that syntax-parse and friends work outside macros].

If you wanted a Glitch to Racket compiler, then it would make sense to write macros for Glitch-construct.
A nice example can be found in Danny Yoo's "Brainf*ck" tutorial.

https://www.hashcollision.org/brainfudge/

The tutorial predates BR so it uses ragg instead of brag.

2 Likes