Code compression options

LiberalArtist · April 21, 2022, 4:52pm

A while ago, someone asked me this question about Chez code compression options when building Racket:

Would it be an option to instead turn off compression and keep doing
things as usual?

In theory, this should be possible. I see two significant downsides:

Compiled code would be much larger—maybe twice as big—and, if I
recall correctly, load times would be worse, too. With the move to
Racket CS, existing Racket code moved from a world of small and
cheap bytecode to a world of machine code: the default compression
settings have been tuned to avoid an unacceptable worsening of
binary size and load time.

Interesting (I’m curious how load time can be improved by (1) reading
files in memory instead of merely mmap’ing, and (2) decompressing.)

I realized I don't really understand the details, either. Beyond some general recollection that performance tuning was done, I'm not even confident that my characterization of the tradeoffs was exactly right.

I remember some commits around a year ago, e.g. configure and makesfiles: make code compression more configurable · racket/racket@548aca0 · GitHub and cs: compress boot files by default on Windows · racket/racket@4cf538f · GitHub, but I didn't manage just to track down the related discussion I vaguely recall. Also, I had thought there was some compression going on by default on Unix, but the message for 4cf538f suggests that their isn't.

LiberalArtist · April 22, 2022, 4:38am

Well, configure --help-cs says:

  --enable-compress       compress compiled code (enabled by default)
  --enable-compressmore   compress compiled code even more
  --enable-compressboot   compress boot files

mflatt · April 22, 2022, 12:03pm

There are two kinds of code that can be compressed: code in .zo files and code in .boot files. The content of .boot files tends to be embedded in the Racket executable.

Compression of code in .zo files is enabled by default everywhere (except for a few days last week when I had that wrong in the new build system). The --enable-compress flag is about .zo files. Edit: The --enebale-compressmore flag is also about .zo files and covers the metadata that that surrounds machine code.

Boot files that implement Racket CS ("petite.boot" + "scheme.boot" + "racket.boot") are compressed on Windows and not on other platforms. That is, the --enable-compressboot default varies.

LiberalArtist · April 22, 2022, 5:51pm

Thanks!

For .zo files, is it also a time-for-space tradeoff, or does loading .zo files involve, say, copying, such that mmap wouldn't help much anyway?

Can Racket CS use .zo files compiled by a Racket that was configured with a different option for --enable-compress and/or --enable-compressmore?

mflatt · April 22, 2022, 7:58pm

Compression is a clear win for machine code in .zo files due to the way that it's instantiated on demand. Some machine code that has not yet been demanded can stay compressed in memory.

Since metadata is needed immediately when loading, compression doesn't help reduce its memory footprint. That part is a tradeoff in file size and bytes to load versus time to decompress. Filesystem caches tend to make the loading part fast enough that decompression time dominates.

The --enable-compress and --enable-compressmode flags affect only how .zo files are created by default, and any build can use both compressed and uncompressed files. Also, there are environment variables like PLT_LINKLET_COMPRESS and PLT_LINKLET_COMPRESS_DATA that override the build-time default, although I see that the environment variables not currently documented.

LiberalArtist · May 27, 2023, 1:38pm

There's one aspect of this I'm still not sure I understand: What makes compression pay off for machine code in .zo files that isn't true of ordinary .so/.dylib/.dll files?

My vague impression has been that system dynamic linkers/loaders work differently than Racket CS's, and some difference—this is where things get especially handwavy, but something to do with garbage collection or the distinction between loading and instantiation or something—means that Racket CS has to do more work that mmaping the file.

Aside from not being sure if that's right, I also wonder if part of the answer is that compression could in fact be useful for ordinary ELF etc., too. I've used filesystem-level compression with ZFS and BTRFS, and, on a recent Debian installation, for example, /usr/bin and /usr/lib compress down to about 40% of their uncompressed size.

Topic		Replies	Views
Chez for architectures without native backends Distro Packagers question , chez	29	2383	August 21, 2022
Optimisation pipeline now available on Compiler Explorer Show & Tell developer-tools , internals	2	832	December 24, 2023
Where can I learn about rackets compilation system in detail? Internals	7	331	December 12, 2023
Understanding building and compile times in DrRacket General build , compile	11	131	December 29, 2024
Why does my code run significantly faster on BC than CS? General	18	437	February 17, 2024

Code compression options

Related topics