Code compression options

A while ago, someone asked me this question about Chez code compression options when building Racket:

Would it be an option to instead turn off compression and keep doing
things as usual?

In theory, this should be possible. I see two significant downsides:

  1. Compiled code would be much larger—maybe twice as big—and, if I
    recall correctly, load times would be worse, too. With the move to
    Racket CS, existing Racket code moved from a world of small and
    cheap bytecode to a world of machine code: the default compression
    settings have been tuned to avoid an unacceptable worsening of
    binary size and load time.

Interesting (I’m curious how load time can be improved by (1) reading
files in memory instead of merely mmap’ing, and (2) decompressing.)

I realized I don't really understand the details, either. Beyond some general recollection that performance tuning was done, I'm not even confident that my characterization of the tradeoffs was exactly right.

I remember some commits around a year ago, e.g. configure and makesfiles: make code compression more configurable · racket/racket@548aca0 · GitHub and cs: compress boot files by default on Windows · racket/racket@4cf538f · GitHub, but I didn't manage just to track down the related discussion I vaguely recall. Also, I had thought there was some compression going on by default on Unix, but the message for 4cf538f suggests that their isn't.

3 Likes

Well, configure --help-cs says:

  --enable-compress       compress compiled code (enabled by default)
  --enable-compressmore   compress compiled code even more
  --enable-compressboot   compress boot files

There are two kinds of code that can be compressed: code in .zo files and code in .boot files. The content of .boot files tends to be embedded in the Racket executable.

Compression of code in .zo files is enabled by default everywhere (except for a few days last week when I had that wrong in the new build system). The --enable-compress flag is about .zo files. Edit: The --enebale-compressmore flag is also about .zo files and covers the metadata that that surrounds machine code.

Boot files that implement Racket CS ("petite.boot" + "scheme.boot" + "racket.boot") are compressed on Windows and not on other platforms. That is, the --enable-compressboot default varies.

1 Like

Thanks!

For .zo files, is it also a time-for-space tradeoff, or does loading .zo files involve, say, copying, such that mmap wouldn't help much anyway?

Can Racket CS use .zo files compiled by a Racket that was configured with a different option for --enable-compress and/or --enable-compressmore?

Compression is a clear win for machine code in .zo files due to the way that it's instantiated on demand. Some machine code that has not yet been demanded can stay compressed in memory.

Since metadata is needed immediately when loading, compression doesn't help reduce its memory footprint. That part is a tradeoff in file size and bytes to load versus time to decompress. Filesystem caches tend to make the loading part fast enough that decompression time dominates.

The --enable-compress and --enable-compressmode flags affect only how .zo files are created by default, and any build can use both compressed and uncompressed files. Also, there are environment variables like PLT_LINKLET_COMPRESS and PLT_LINKLET_COMPRESS_DATA that override the build-time default, although I see that the environment variables not currently documented.

1 Like

There's one aspect of this I'm still not sure I understand: What makes compression pay off for machine code in .zo files that isn't true of ordinary .so/.dylib/.dll files?

My vague impression has been that system dynamic linkers/loaders work differently than Racket CS's, and some difference—this is where things get especially handwavy, but something to do with garbage collection or the distinction between loading and instantiation or something—means that Racket CS has to do more work that mmaping the file.

Aside from not being sure if that's right, I also wonder if part of the answer is that compression could in fact be useful for ordinary ELF etc., too. I've used filesystem-level compression with ZFS and BTRFS, and, on a recent Debian installation, for example, /usr/bin and /usr/lib compress down to about 40% of their uncompressed size.