How to make small executables?

The smallest executable I can get is about 49mb from nearly 60mb. it's clear that there is no pruning of unused packages, libraries, etc.

I have used the two "old" tricks: use #lang racket/base and the demod trick for another which, combined, result in about an 18% reduction. That's non-trivial, but 49mb is still crazy big. For nim, the same functionality is 75k (for a statically compiled language that competes with c and c++).

I certainly don't expect to match nim or c++, but something around 2mb would be nice. Gambit gets down to 6mb without doing anything special. Chicken gets down to 2.5mb without doing anything special. Clearly, compiling through c makes stuff way smaller. That's not how Racket does it, but there must be something.

For example, the entire repl should be factor out and most of the runtime should be factored out if there are no macros.

Any way to get part way with what's available today? Is there any hope of improving the situation?

On the good side, Racket 8.10 compiled code runs in 1/2 the time of Gambit when being careful about correct typing throughout. My example trivially adds up a 100,000 element homogeneous vector of float64 elements using a loop. Have to be sure to use f64vector specific functions and start with (sum 0.0t0). Fully typed up and down is the way to go and what nim and Julia do because they are very typed languages (Julia lets you be dynamic, but static provides best performance.

Julia is 4x faster than Gambit so that makes Julia 2x faster than Racket. That is an amazing performance. Nim, which is within 5-10% of c++ on simply stuff, using totally unsafe compiler options to match c++ is 12x faster than Racket on this task, which is like totally unfair.

I am very happy with indications of Racket performance when one is careful about types (which is understood--that's a lot of how static languages get fast). I am not so thrilled with the size.

I have to say that while the multi-language thing is more of a novelty for me, you can really see the advantage of having lots of people working on Racket. Improvement is really significant.

I should test "script" through the interpreter. It should be pretty close given the Chez "jit" approach.

2 Likes

Thanks for taking the time to do the comparisons and write them up! I read it with interest.

What application are you building to make the comparisons?

My initial thought is when you use the Racket tools to build an app you get a lot…but if the app does little or nothing that is all wasted.

This does make me consider where Racket is a good choice, and what factors should I consider with these sort of engineering decisions?

  • app size
  • app speed
  • developer time
  • developer expertise
  • what else?

Best regards

Stephen :beetle:

It just a micro-benchmark as I described: sum a vector of 100,000 elements of float64. Do it 1000 times to get a decently consistent measurement. So, all the batteries are't being used. The compiler is really a packager as the code is already compiled line by line (as was Chez). Unfortunately, until someone on the development team responds we won't know the answer but it seems like you can't factor the bundled package. That's hard, but most advanced compilers/linkers only link what you use so it can be done. For "real" apps, they should make it a higher priority. For demonstrator and learning apps, it's a low priority.

My apologies.

A problem with benchmarks - even extensive suites of benchmarks let alone singletons - is they don’t even approach the complexity of a real app like a game, or a web forum like discourse, spreadsheet, browser or even a (non-trivial)compiler.

If you want to make small fast executables maybe you want to write an assembler or compiler in Racket, rather than use the Racket compiler?

Best regards
Stephen

PS book Essentials of Compilation: An Incremental Approach in Racket

If you want to build a significantly more minimal application, then you could take the following approach:

  1. Compile your Racket program to a single, stand-alone linklet using the tools here: https://github.com/racket/racket/blob/master/racket/src/expander/README.txt#L73-L83
  2. Compile that linklet to Scheme using these tools: https://github.com/racket/racket/blob/master/racket/src/cs/README.txt#L295-L297
  3. Compile that Scheme code, along with the necessary supporting portions of the runtime, using the Chez Scheme compiler.
  4. Link the result into a boot file as described here: Using Chez Scheme

Whether this is notably smaller would depend on the size of your application and what portions of the Racket standard library and runtime system you use, as well as whether you want to preserve full Racket semantics for various operations (such as error messages).

This process won't work currently for every program (the extraction tools are designed for building the core of Racket) and is, as you can see, not automated at all. Probably you would run into some other issues that would need to be fixed as well. I would be happy to answer questions and provide advice if you wanted to go this route, and once someone does it, that would make it easier for other people to try.

4 Likes

That's almost three orders of magnitude!

This reminds me of Android pacages. They are oftern enormous. Many megabutes. And for a task that I am hard-out to imagine could not have been run on a PDP-11 in the old days.

I wonder if the causes are similar.

-- hendrik

Apart from the library situation (see below), I think the startup times are very long. For smallish command line tools I've written, the startup times on my computer are 250 ms when compiled with raco make or raco exe. (By the way, I also remember binary sizes of about 50 MB.)

In my experience, just a

#lang racket/base

(displayln "Hello world")

takes 150 ms. Use #lang racket or (require racket/match) and you're at 250 ms. A "Hello world" with #lang typed/racket takes 500 ms on my computer.

For comparison, the runtime of a compiled Chicken Scheme "Hello world" program is 10 ms. Statically compiled programs take about 5 ms.

Again, I'm only talking about startup times here. I use "Hello world" as a benchmark, assuming that the actual runtime in this case is negligible.

Long startup times are a problem if you call a Racket process many times in a loop, e.g. from a shell script for each file in a directory. Another problem can be invoking Racket programs from other Racket programs as subprocesses. (Of course, there are workarounds for both of these situations - as long as you have control over the code of the Racket programs.)

Regarding developer time, it depends a lot on the problem you're solving. If you have to implement a library yourself that's available as a third-party package in other programming languages, this can lengthen your development time in comparison quite a bit. Regarding the language itself, I think you can be quite productive once you gained some experience with the language and the basics of the standard library (e.g. handling of lists, vectors, strings, regexes and file I/O).

In this case, it's probably much easier to use a different programming language, unless maybe you've already written much of the code in Racket. :slight_smile: And even in the latter case, it's probably easier to write some code in a statically compiled language and call this code with Racket's foreign function interface.

1 Like

Interesting. I've given both Gambit and Chicken a try. Gambit makes compilation quite easy. Performance is satisfactory. Gambit is a bit of its own version of r7rs. I've tried Chicken, too. Also, easy to compile. I think both are good with some unique characteristics that are challenging. REPLs are very inconsistent across all Schemes. A reasonable portion of Gambit is undocumented despite conscientious efforts by its talented founder. Nearly all the work is done by a single person. Chicken is also missing lots of documentation.

I don't expect Racket to come especially close to the size/performance of a true compiled language. Racket seems the most complete of any Scheme and it was interesting that some reasonable performance is attainable. I'll try some of those Chez steps, but that seems like a lot of work for some little person "portable" utilities.

The situation is similar in LISP-land. SBCL's "save and die" isn't a particular reasonable way to create an executable. The smallest attainable is around 10mb. Clozure CL (no relation to Clojure) has an actual compiler, but doesn't seem to be in active maintenance. ECL seems to target dynlibs that can be called from c.

Julia in its short life has become more professional and more mature than any Lisp or Scheme. There are couple of ways to statically compile packages, but they run with the Julia runtime. There is an experiment in static compilation with a variety of caveats (haven't tried it yet). Because Julia is JIT-compiled and there is high interest in performance might reasonably some day get to creating statically linked free-standing executables. Though certainly a fully-featured general purpose language, this work will likely focus on Julia's strong point of numerical computation.

Thanks for all the input. This is just an experiment and I"ll try the Chez suggestions to see if I can even get through it. Gambit really seems the best bet for statically compiled Scheme.

Gambit vs. Chicken

Benchmarks suggest that the runtime performance of Gambit is about the same as Racket's while Chicken is about a factor 2 slower. On the other hand, Chicken supports more SRFIs and has better documentation.

Racket cross-compilation

One thing I like about Racket is that you can cross-compile for other platforms, including for MacOS from non-MacOS systems. The downside is that the first compilation for a platform can take a very long due to downloading and compiling packages for the target platform. See raco-cross and my raco-exe-multitarget. The latter wraps raco-cross to provide a simpler interface, especially if you want to compile for more than one target platform.

Julia

Julia looks quite interesting to me. However, when I checked recently, creating binaries seemed cumbersome and/or brittle. In the absence of precompiling code, startup times are even much worse than Racket's. Therefore, I'm looking forward to better compilation support in Julia. :slight_smile:

Just for the record, the compiler and executable builder only include "libraries" in the sense of modules that are actually used by your program. On the other hand, they don't attempt to automatically eliminate modules that your program requires but doesn't really need, nor do they omit parts of modules that your program never uses.

The compiler doesn't deal with "packages" at all, so it certainly doesn't keep any code around just because of what package it comes from.

The floor on executable size and startup time is Racket's (and Chez Scheme's) implementation of primitives and the runtime system, which concretely means the Chez Scheme bootfiles petite.boot, scheme.boot, and racket.boot. Racket's tooling is not currently set up to omit any of that (meaning e.g. that you always have the full compiler available at runtime, even if your application doesn't use it), but @samth's post explains how to manually create an executable that does not depend on the runtime system.

If you are building Racket from source, you can also experiment with the configure flags --enable-compressboot and --enable-compressmore, though you may find you are trading (startup) time for space. You could also try Racket BC for different tradeoffs, though Racket CS is definitely the future.

I don't know what distinguishes Racket from a "true compiled language" for you (even ignoring the fact that compilation is a property of implementations, not languages). By default, Racket is compiled ahead-of-time to machine code.

3 Likes

I am not trying to be contentious here. "true compiled language" means able to create statically linked, free-standing executable binary. Racket and Chez and sbcl try to differentiate themselves from "JIT" for reasons I don't understand. But, I do understand that all compile line by line for single line entries in REPL and for entire files for loaded files and that compiled code is in memory as machine code to be executed as called within the REPL. So--good performance and a stand-alone statically linked binary executable would only be slightly faster (maybe). The difference is in the size of the executable and less startup latency. c++ and nim only compile/link whatever your source calls: even if you import/include a library if you never call anything from that library the linker won't bother linking it in (though the compiler might issue a warning). Hard to compete with that--different trade-offs:

  • Scheme/Lisp with compile AOT within the REPL: don't even worry about compiling; compiled code isn't really stand-alone because the REPL gives access to the runtime and loaded libraries. Very nice execution time compared to languages that compile to some sort of byte-code for a VM, in which parsing and lexical analysis is done, but interpretation still happens.

  • c/c++/nim/go: wait to compile/link/build and then run it. For nim this is all crazy fast. Execution almost certainly faster than REPL based AOT languages, at least from the little I've played with. And the stand-alone distributable (somewhat!) will be way smaller than most dynamic languages can create.

I understand the benefit of incremental compile in "REPL-driven" development, but the advantage is somewhat overstated on today's fast machines as long as the thing your building isn't defined by some arbitrarily messy and long make file. nim pretty much incorporates link and build so no make needed for nim-only projects. One can also rid oneself of the script hell of make/cmake with declarative build systems like xmake or meson. xmake is extremely fast and it has a simple build and run command. So, the style is still "stop, build, run" but it's all so fast it hardly matters for small projects.

I don't perceive any of this to be a problem for Racket, Gambit, Chez--they are just a different use case. I am just experimenting with how/small fast one can get primarily to provide an executable that isn't dependent on a bunch of dependencies having to be installed for stand-alone programs. After I build the chez for racket version of chez (wish that were 10x easier, but that's a chez issue not a Racket issue) I'll see what that's like. Gambit is pretty nice for a stand-alone compile model. It has decent-to-good performance; compilation is easy; executable size is 1/6 of Racket. Gambit has some nice extensions, but a lot of its library is undocumented and I often can't tell which version of "included" functions from R6RS or R7RS are even included in documentation hell (as it's basically a one-person--a really good person--show).

Stop worrying about defending Racket's virtues; it's got a lot of virtues. Stop worrying about my choice of compiled language: I'll use c++ or nim for anything serious. LISP/Scheme do really well for compiling dynamic languages. Julia is great for JIT compiling and now has static (but not stand-alone) compilation for packages (so the "package latency" problem is now solved!).

Separately, I don't see how the marginal 1-person Schemes or LISPs are going to survive: indeed, many are effectively dead already. Joining forces like Racket/Chez or Gerbil/Gambit is the way to go. Schemers need to know that the world doesn't need 50 implementations and the esoteric "benefits" claimed won't enable their survival. We already have good choices that are surviving nicely.

Full precompile for packages has already occurred and the major packages now pre-compile. So first load on your machine’s installation is pretty bad. But, then it’s really, really fast until you upgrade that package.

Dynamic languages have problems with compilation because it’s too easy to get “sloppy” about carefully factoring the runtime. S-expression languages have an easier time because each s-expression is a little bit more indedendent.

I agree with your Chicken/Gambit/Racket assessment. I don’t really think Chicken’s doc is as good as reputed: it is really only doc for the tooling, which is needed of course. But, except for Racket and Chez, doc of the language is generally poor. The authors say that they don’t need to document the standard stuff in the language. But, except for a reasonably sized subset, there is no standard stuff. Support for homogeneous arrays/vectors and I/O is notoriously inconsistent. Robustness of REPLs is also very inconsistent, with Racket seeming to be best especially when running from Dr. Racket. Haven’t tried Chez yet because I need to be a sysop to build the Chez with Racket commits. (Will someone just make a binary available?....)

IMO, if a bunch of small executables are required (e.g., individual command-line tools), then Racket is not the right choice.

That being said, one workaround that could work in some projects is lumping a bunch of tools into one executable.

I know it somehow feels less elegant and even embarrassing to send someone a tool that is 50mb and does one very small task, but with most bigger projects nobody will pay attention to that size.

Both comments are good. Kind of send a package of little utilities, but then an arg has to be the one the user wants to run... As in:

./utils foo 3 4

It's not bad, but the first tip is the more appropriate: not a great fit for Racket; and, for that matter, not a great fit for any Lisp or Scheme. A better fit for bash or zsh--because that is what shell scripts are for. Possibly better for Python where a suitable Python is slightly more likely to be installed, but only slightly. Best fit for nim, c++, c, or go.

Using a single binary is a known pattern and an extra argument isn't necessarily needed. At least on Posix and Windows platforms, a program can query the name it was started with and perform different actions based on that.

Busybox is an example for this approach. Busybox provides the same binary under lots of different names and the names are hardlinks to the same binary, so you need the storage space only once.

By the way, today I learned that the Windows NTFS filesystem also supports hardlinks. So far, I only knew of NTFS junctions, which only work for directories.

1 Like

Racket's toolchain supports creating a distribution with multiple utilities without duplicating shared code. I put together an example:

Observe that the SharedCollects version (two utilities sharing common code) is about the same size as the SingleUtility version, and smaller than the EmbeddedCollects version (duplicating common code). You get the second utility effectively for free.

I'm definitely not trying to be contentious either! But I'm concerned that there may be misunderstandings.

If you look at the SingleUtility version in my example, it's definitely a "free-standing executable binary", and you can see that it only dynamically links to libc:

$ patchelf --print-needed build/SingleUtility/bin/foo 
libdl.so.2
libm.so.6
librt.so.1
libpthread.so.0
libc.so.6

AFAIK you could compile Racket to statically link to libc, too, if you really wanted to: if you did so, your foo executable would also be statically linked.

Of course, static linking will increase the executable size! In the other direction, you could configure Racket to dynamically link to zlib, lz4, and ncurses-related things and get a smaller executable, but a less portable one.

In my examples, the .zo files contain x86_64 machine code: there is no further compilation to be done "just-in-time".

make pb-fetch
cd racket/src/ChezScheme
./configure
make
make run # starts a Chez repl

You shouldn't need any escalated privileges to build Racket's Chez. Starting from a clean checkout of https://github.com/racket/racket, it's just:

make pb-fetch
cd racket/src/ChezScheme
./configure
make
make run # starts a Chez repl

Alternatively, Guix has it packaged as chez-scheme-for-racket: https://packages.guix.gnu.org/packages/chez-scheme-for-racket

On a relatively recent Debian Bullseye, Ubuntu Jammy Jellyfish, or newer, you can do:

sudo apt install guix
guix pull
guix install chez-scheme-for-racket
1 Like

Thanks. Another thing to try.

I didn’t mean “sysop” as a matter of privileges: more as a matter of extra work prone to squirrelly failures.

Is there a homebrew distribution of chez for racket?

No such distribution on homebrew.

What a mess as I knew it would be.

Your instructions worked though I had to run make twice.

I had tried to use the ChezScheme specific instructions but the build files would never be created or copied from a parent folder in the repo.

Where on earth are the executables hidden? They have to exist somewhere. I want to put a symlink in /usr/local/bin where it belongs. I could see no way to do that.

Looking at the make script doesn't help much:

run: $(ZUO)
	$(ZUO) $(workarea) run

So, where on earth is the workarea? That is not a normal environment variable so I can't expand its definition in any way that I know.