@samth: (1) thanks, done. (2), done. (3) Here's the commit message I'm condensing:
Matthew Flatt mflatt@racket-lang.org 2023-12-10T02:28:32Z
529f14dd3467ac4262878a6b5792d30b39950b43
set FD_CLOEXEC on all file descriptors and change subprocess
handling
Intended to address #4836, this is a relatively small change to the
implementation, and I think it will work right for nearly all uses of
Racket. Still, changing the rules here might create a file-descriptor
leak in a program that uses both foreign libraries or embeds Racket
and that also uses subprocess
. Setting
current-subprocess-keep-file-descriptors
to null
(which is an
existing part of the Racket API) should restore the old behavior in
that case.
The root of the problem is that a fork
+exec
on Unix keeps all file
descriptors from the original process open in the new process. Racket
has long used a traditional correction of closing all file descriptors
in a forked process, specifically by trying close
on every file
descriptor up to the maximum value. That worked fine when the
file-descriptor limit was something like 256 or 1024. These days, it
can be 1M, and 1M erroring close
system calls can take a while. Many
OSes offer an enumeration of open file descriptors through a
"/proc/self/fd" or /dev/fd" device that acts like a directory, but
accessing that information is not completely reliable (due to the
possibility of chroot
, for example), and it's not so easy in the
phase between fork
and exec
that permits only async-signal-safe
functions.
A modern alternative is to set the FD_CLOEXEC flag on descriptors so
that they are automatically closed on the exec
step, and leave
FD_CLOEXEC off on the file descriptors that you intend to be
comunicated to the subprocess. This strategy works as long as everyone
plays along, because FD_CLOEXEC is not the default. Also, if you have
fork
calls that might happen concurrently with file-descriptor
creation, the two-step process of creating a file descriptor and then
setting its FD_CLOEXEC flag is another problem. Linux helps to avoid
the race by having a variant of each descriptor-creating system call
with an extra flag to set FD_CLOEXEC, but that's not portable.
The change here adapts Racket at the rktio level to set FD_CLOEXEC on
all created file descriptors, but mostly through portable APIs (i.e.,
not Linux-specific) plus an extra lock to prevent a subprocess
-based
fork
concurrent to a file descriptor's creation. The extra lock
works because rktio controls the call to fork
. Foreign libraries are
expected to play along as well as they can, and libraries like glib,
sqlite3, and Gtk+ do seem to work that way (i.e., they set FD_CLOEXEC,
and they tend to use the atomic interface where available).
Meanwhile, the old behavior for subprocess
is still available by
setting the current-subprocess-keep-file-descriptors
parameter to
null
--- an API that was already in place, because Racket's behavior
on Windows was already closer to the FD_CLOEXEC world. Also, if
FD_CLOEXEC is not available from the operating system (as determined
at compile time), everything works as before.
There's no Racket-level or even rktio-level way here to create file
descriptors without FD_CLOEXEC when FD_CLOEXEC is supported, except
for pipes at the rktio level. Adding a way to do that would make
sense, if it one day seems useful. Even better, subprocess
could
take a list of file descriptors to share, which seems more
declarative; the null
value of
current-subprocess-keep-file-descriptors
was meant to support an
extension in that direction.
The Windows implementation of subprocess
was closer to right before,
mainly because non-sharing of file handles is the default there. The
way subprocess
created inherited stdin, stdout, and stderr pipes
created a race among calls to subprocess
in parallel places,
however, and that's fixed here.
My reading of this (and a quick read of the commit suggests that I could be right) is that it replaces a potentially-very-expensive scan of all open file descriptors with an automatic close-on-exec flag whenever it's available. So the change should make the new version of racket lower-cost than the old one. I've rephrased this bullet to hopefully make that more clear. Here's my new proposed text:
The Die Macht der Abstraktion'' language levels are no longer present, replaced by the
Schreibe dein Programm'' language levels which have been
available for several years.
The for/fold
form has a more consistent treatment of the rare situation
in which an iteration clause shadows an accumulator. This could break code
that depended on the old behavior.
Racket automatically sets the close-on-exec flag when opening a file, on
systems where this is available. This change lowers the cost of avoiding
problems that can occur when file descriptors become accidentally shared
between processes.
Match includes hash
and hash*
patterns.
The vector-set/copy
function allows creation of a new vector that differs
at only one index. This change also adds vector-append
and vector-copy
primitives.
The pregexp-quote
function brings the functionality of regexp-quote
to pregexps.
The C FFI convention-based converter supports PascalCase and CamelCase in
addition to an underscore-based convention.
The racket/case
library allows case
-like forms that use different
equality comparisons, such as eq?
and equal-always?
.
Scribble rendering to HTML includes hidden buttons on heading titles.
The interval-map
data structure supports iterator functions in the style
of gen:ordered-dict
.