Is there plans for improving the performance of the generated code from the racket compiler?

Hello,

Hope all is well.

I am just curious if the racket core team has plans to making the generated code from the racket compiler faster. In a recent benchmark Racket vs Lisp SBCL - Which programs are fastest?, racket is faster than lisp (sbcl) three times.

I am aware that benchmarks do not mean anything but im just curious if racket will be faster in the future.

thanks.

1 Like

I just took a quick look at the SBCL benchmarks, it looks like SBCL supports compiler directives that disable safety checks. While I think it would in principle be possible to add unsafe uses of the ffi library to simulate something like this, I'm not sure this kind of race-to-the-bottom is necessarily a good use of anyone's time. Just my two cents, of course.

1 Like

Racket also supports compiler directives to disable safety checks, although they may disable fewer checks than SBCL.

2 Likes

Since the link in the original post no longer works, here are the two implementations (Racket and SBCL):

SBCL is 3 to 10 times faster than Racket with a few exceptions. Also memory usage in Racket is much higher. However it is unclear what could be optimized by proper compiler settings.

That may not play a big role for recreational or academic purposes, but - IMHO - hinders adoption of Racket in professional situations severly.

In forums I often read "but it's faster than Python!", which might be (hopefully) the case. But taking the slowest mainstream language on the planet as a comparison should not be considered meaningful.

Racket is the perfect version of any Lisps I know out there with regards to tooling, documentation & learning material, stdlib, ecosystem. I find that performance improvements should be a critical goal if wider adoption is an intended target.

4 Likes

@louis771

Benchmarks are difficult. They only make sense, when apples are compared to apples.
Some of the language benchmarks are not exactly equivalent.

So instead of a general "be faster" comment, pick a single benchmark that measures a specific feature. Maybe the benchmark code can be improved - or maybe we find something that can be improved in Racket itself.

PS: I do not agree with the premise that speed alone "hinders adoption of Racket in professional situations". Look at Python. They solve the problem by writing speed
critical code in C. A lot of "Python" libraries are just C libraries in disguise.

4 Likes

I agree. I haven't checked the benchmarks apart from some quick looks, but in the past I noticed benchmarks in the Benchmarks Game sometimes use very unidiomatic code. I believe the benchmark results often depend more on how much time someone invested in the optimization than on what language implementations are faster for ideomatic code or can be tuned relatively easily. I suspect the Benchmark Games benchmarks are only mentioned so often because other benchmarks are harder to find (availability bias).

On the other hand, the Python community is so much larger that these bindings and other extensions actually get written. So in practice the "out of the box" speed will matter for Racket much more than for Python.

Apart from that, sometimes it's very difficult to find a subset of code to be written in low-level languages that will make the code overall significantly faster. If that's the case, "native" language implementation speed will matter more than speed including low-level code. That applies to both Racket and Python, but Python is more widespread to begin with, so Racket needs to provide more to be similarly attractive.

1 Like

I'm ashamed, but it took me a few years to realize that OpenCV was not written in Python.

(Does Racket has a OpenCV wrapper package? Is there a nice blog post that uses it for something nice?)

1 Like

There are two older projects on Github:

Repository search results · GitHub

Alternatively, you could try using the Python bindings for OpenCV through Pyffi (if you are on macOS or Linux).

pyffi - Use Python from Racket

1 Like

In general, the better performance that you see in SBCL on those benchmarks is mostly a result of the following:

  1. Better support for lightweight parallelism for CPU-intensive tasks (as compared to Racket futures).
  2. Better use of vector instructions (eg AVX), which Racket mostly does not generate.
  3. More ability to express extremely low-level code, as you see here.

Obviously improving all of those (especially the first one) would be nice, but they aren't mostly in the way of the kinds of applications that people usually build with Racket. Certainly Racket has performance issues that it would be good to fix, but I don't think the benchmarks game is a good guide to them.

1 Like

My knowledge about parallelism in specific and low-level programming in general could be painlessly engraved on my eyeball, so this is probably a dumb and/or obvious question, but I was wondering about it. Feel free to say "it's complicated, don't worry about it" or "this is how it works everywhere in all Lisps" and I'll drop it.

As I understand it, Racket's threads (A) are managed by Racket itself instead of going to the operating system's thread scheduler and (B) all run on the same CPU core. Do I have that right? If so, why?

1 Like

Racket has three related constructs for managing independent tasks: threads, futures, and places. Threads work as you say. Futures are also managed by Racket but can be run on different OS threads. Places are effectively separate copies of the VM, which always run in separate OS threads.

2 Likes

I believe a short summary would be “threads are units of concurrency; places are units of parallelism” (and futures are murky?), but that shorthand works best with an idea of concurrency vs. parallelism. (Practically, I’m also using Sam’s description in my head.)

1 Like

FWIW Jim Bender curated a bibliography on Scheme related papers on " Distributed, Parallel, and Concurrent Programming"

Implicit in Sam's listing is that threads are cheaper than futures, and futures are cheaper than places.

1 Like

I thought that places ran in a separate process, not a separate thread. Is that wrong?

Separately: based on what I've read about futures it feels like they basically aren't worth using because they overly restrict the operations you can use and it's hard to predict if you will invalidate the future. That's the impression that the docs give, anyway.

There is a mode where you can run places will run in separate processes, but in most environments, they are in the same OS process, I believe.

1 Like

Futures are certainly somewhat limited but I wouldn't say they're not worth using; it just requires somewhat more care to use them well.

@benknoble The way I would put it is that threads guarantee concurrency and provide no parallelism, futures provide opportunistic concurrency and parallelism but you cannot rely on either, and places guarantee both.

I found Matthew's talk Incremental Parallelization of Dynamic Languages | Air Mozilla | Mozilla, in Video useful context for futures. Unfortunately, it looks like Mozilla Air moved to a new CMS and didn't import old videos, and the Internet Archive page I linked to doesn't seem to have archived the actual video: maybe someone can find another source?

With respect to discussions about the limitations of futures one might read in various places, a big change came with the move to Racket CS, and I think we are (certainly I am) still gaining experience with what is now possible.

Racket BC was originally single-threaded, and most operations ended up blocking futures from running in parallel. (This was explained well in the Mozilla talk. I particularly remember a picture of a bike covered with an extreme number of locks.) What worked in Racket BC was primarily carefully written numeric code.

In Racket CS, most primitives are now future-safe (by virtue of Chez Scheme's support for OS-level threads), basically the opposite of the Racket BC situation! I found the diff from guide: update discussion on futures for Racket CS · racket/racket@4fcecee · GitHub an interesting view into what changed.

That said, there is certainly also room for improvement, e.g. (as @samth wrote here) finding a way to make IO operations "synchronized" rather than "blocking".

Are you thinking of the --processes mode for e.g. raco setup? IIUC, places always use OS threads in the same OS process when they are able to run in parallel, and that mode exists to enable process parallelism when parallel places aren't supported (as was the case on non-x86{,_64} with Racket BC), but has to be implemented explicitly in raco setup. I don't think ordinary places ever run in separate OS processes.

(I say "normal places" because loci by Paulo Matos and racket/place/distributed exist, and prop:place-location provides some extensibility.)

(Tangentially, I wonder if the bottleneck from the OS page table described in docs: describe some limits of place scaling · racket/racket@b223ce4 · GitHub and this thread also applies with Racket CS. I don't have any machines with more than 16 cores to check.)

I agree with @samth on this, but, if someone wants to spend time making the benchmarks look faster, adding (#%declare #:unsafe) would probably give at least some benefit without requiring deep thought. (Unlike using (#%declare #:unsafe) in real life!)

5 Likes