Chez for architectures without native backends

In light of this enhancement in Racket 8.5:

I'm trying to understand how to adapt Guix's package of Racket's Chez Scheme variant to support e.g. powerpc64le-unknown-linux-gnu, which is one of Guix's supported systems. (I think the same would apply for mips64el-unknown-linux-gnu, riscv64-unknown-linux-gnu, and i586-unknown-gnu, i.e. the Hurd.)

Currently, Guix bootstraps Racket's Chez Scheme by building Racket BC [3M], running racket rktboot/main.rkt to generate bootfiles for the inferred current system, and then doing essentially ./configure && make && make install: we pass flags like --prefix= and --threads, but currently we do not explicitly specify the machine type. The Guix package definition also does not currently deal with cross-compilation, though we'd like it to. Elsewhere, we have a functions to convert from our ${arch}-${os} representation of system types (inherited from Nix, e.g. x86_64-linux) to Chez machine types and to report on the state of upstream Chez support for various systems (e.g. threading, whether bootfiles are checked in), but they handle some edge cases poorly and need more work.

Is that approach supposed to work for platforms without native code generation?

From my reading of the Chez configure script, I thought supplying --pb (or maybe we should use --pbarch?) still required a Chez machine type to be either inferred or supplied via --m=, and I don't know how to translate these architectures to machine types.

Yes, the more I look at it, it's clear that the claim to support other platforms was premature. Racket CS can now work in principle, but some pieces need to be filled in.

For a start, the Chez Scheme configure script shouldn't require a machine to go with --pb. You can get past that obstacle by providing any machine type with -m=, even using a made-up machine name. A good generic choice might be a pbarch name like tpb64le. Still, there's no longer a reason to require a machine type.

For the corners where pb mode needs machine-specific configuration, the intent is that "version.h" detects the machine as needed. For example,"version.h" recognizes ppc64 to enable big-endian mode in the runtime system... but that's not right for ppc64le. No doubt other things need to be fixed or added in "version.h".

Meanwhile, Racket's configure script looks for specific architectures to pick a pbarch machine type, and it also still requires a non-pb machine type. We can fill in more cases, but there should also be a way to specify a pbarch variant directly.

Probably the way forward is to fix configure scripts to not require a non-pb machine type, and then see what happens when building for different platforms.

1 Like

I gave this a try, cross-compiling from x86_64-linux-gnu to powerpc64le-linux-gnu, but configure did too much error checking:

starting phase `configure'
source directory: "/tmp/guix-build-pb-chez-" (relative from build: ".")
build directory: "/tmp/guix-build-pb-chez-"
configure flags: ("-m=tpb64le" "--pb" "--disable-x11" "--threads" "--installprefix=/gnu/store/24iibwyyjy6l0an61rm063cszpglaxwz-pb-chez-" "--threads" "ZLIB=-lz" "LZ4=-llz4" "--libkernel" "--nogzip-man-pages")
Don't select pb using -m or --machine, because pb needs the
 machine as the kernel host machine. Instead, use --pb or --pbarch
 to select a pb (portable bytecode) build.
error: in phase 'configure': uncaught exception:
%exception #<&invoke-error program: "./configure" arguments: ("-m=tpb64le" "--pb" "--disable-x11" "--threads" "--installprefix=/gnu/store/24iibwyyjy6l0an61rm063cszpglaxwz-pb-chez-" "--threads" "ZLIB=-lz" "LZ4=-llz4" "--libkernel" "--nogzip-man-pages") exit-status: 1 term-signal: #f stop-signal: #f> 
phase `configure' failed after 0.0 seconds
command "./configure" "-m=tpb64le" "--pb" "--disable-x11" "--threads" "--installprefix=/gnu/store/24iibwyyjy6l0an61rm063cszpglaxwz-pb-chez-" "--threads" "ZLIB=-lz" "LZ4=-llz4" "--libkernel" "--nogzip-man-pages" failed with status 1

Using -m=ignored caused configure to complain that it wasn't a recognized machine type. I also tried -m=ta6osx as a valid but irrelevant machine type, which got a bit further, but ultimately failed with this error:

powerpc64le-linux-gnu-gcc  -m64 -O2 -Wpointer-arith -Wall -Wextra -Wno-implicit-fallthrough -c  -DPORTABLE_BYTECODE -I../boot/pb    pb.c
powerpc64le-linux-gnu-gcc  -m64 -O2 -Wpointer-arith -Wall -Wextra -Wno-implicit-fallthrough -c  -DPORTABLE_BYTECODE -I../boot/pb    main.c
cp -p main.o ../boot/pb/main.o
powerpc64le-linux-gnu-ar rc ../boot/pb/libkernel.a statics.o segment.o alloc.o symbol.o intern.o gcwrapper.o gc-011.o gc-par.o gc-ocd.o gc-oce.o number.o schsig.o io.o new-io.o print.o fasl.o vfasl.o stats.o foreign.o prim.o prim5.o flushcache.o schlib.o thread.o expeditor.o scheme.o compress-io.o random.o ffi.o pb.o 
powerpc64le-linux-gnu-gcc  -m64 -O2 -Wpointer-arith -Wall -Wextra -Wno-implicit-fallthrough  -o ../bin/pb/scheme ../boot/pb/main.o ../boot/pb/libkernel.a -lz -llz4  -liconv -lm -lncurses
powerpc64le-linux-gnu-ld: cannot find -liconv
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:19: ../bin/pb/scheme] Error 1
make[1]: *** [Makefile:21: build] Error 2
make: *** [Makefile:20: build] Error 2
error: in phase 'build': uncaught exception:
%exception #<&invoke-error program: "make" arguments: ("-j" "16") exit-status: 2 term-signal: #f stop-signal: #f> 
phase `build' failed after 4.2 seconds
command "make" "-j" "16" failed with status 2
note: keeping build directory `/tmp/guix-build-pb-chez-'
builder for `/gnu/store/3phmm6jsmw6bchxlb4pcsw1kycw42sz7-pb-chez-' failed with exit code 1
build of /gnu/store/3phmm6jsmw6bchxlb4pcsw1kycw42sz7-pb-chez- failed
View build log at '/var/log/guix/drvs/3p/hmm6jsmw6bchxlb4pcsw1kycw42sz7-pb-chez-'.
guix build: error: build of `/gnu/store/3phmm6jsmw6bchxlb4pcsw1kycw42sz7-pb-chez-' failed

which could well be a symptom some broader problem with cross-compilation (which I have not dealt with before even for fully-supported architectures), but it also made me question whether the -m=ta6osx is influencing things: do I recall correctly that -liconv is needed on Mac but not with glibc?

1 Like

I should have reported back yesterday that I tried some of these things and ran into similar trouble. The configure script in Git HEAD now omits the check that rejected -m=tpb64le (but I miswrote, and it should be tpb64l without the e). You could try applying a patch to configure to see if it lets you get further.

1 Like

Thanks! I've rebased my Guix branch for Zuo on top of the updates for 8.5, so I think I will try it there—and probably also try to get cross-compilation working more generally, at least at the VM level—and then see if it makes sense for Guix to add a patch to 8.5 or just wait for 8.6.

Meanwhile, building Chez out of source did not turn out to be enough to fix the test failures. I'll try disabling parallel tests next.

@mflatt if you need access to an unsupported machine, the GCC Compile Farm has a few big machines, like a big Sparc64 with Debian. You can request an account there. Despite the name, they're open to any FLOSS project.


Thanks for that suggestion! I was granted an account, and so far I've used it to repair the build for non-threaded ppc64le (just needed some configure refinements) and sparc64 (deep alignment problems there). I haven't yet pushed the changes for those, but soon.