Why does my code run significantly faster on BC than CS?

default.kramer · February 14, 2024, 11:40pm

I'm working on a NES emulator in Typed Racket and I've just gotten it complete enough to do some realistic performance testing. Racket 8.12 BC can run the emulator at 105 FPS, but the same code using 8.12 CS peaks at 45 FPS. (This is "headless" emulation speed, unaffected by racket/gui.)

The profiler showed me that it is the CPU emulation which is taking the majority of the time. This code really only does these few things:

Read and write RAM, using unsafe-bytes-set! and unsafe-bytes-ref
Fixnum arithmetic (eg. unsafe-fx+ and unsafe-fxior)
Conditionals based on fixnum comparisons

Is it possible that CS is actually more than 2x slower than BC for this kind of workload? Are there any CS-specific performance pitfalls to be aware of? If BC is compiled and CS is interpreted it would seem to explain things, but I thought CS was compiled also.

I'll be happy to share the code once I do a little cleanup if anyone is interested. Thanks in advance!

jjsimpso · February 15, 2024, 4:11am

I'm no expert, but what platform did you run the performance tests on? CS compiles to native machine code(generally speaking, at least) while BC compiles to bytecode(again, generally speaking).

From what I understand, CS's native code is not as compact as BC bytecode. Perhaps the BC JIT compiler is more efficient in this case. Someone who knows more than me will have to comment on that possibility though.

default.kramer · February 15, 2024, 5:59am

I ran it on Windows 10, x64 (Intel Core i7). Thanks for confirming that CS does compile to native machine code.

jbclements · February 15, 2024, 6:00am

I'm wondering if there's an unfortunate interaction between Typed Racket, CS, and unsafe operations. How hard would it be to strip the types out, just to see what difference it makes?

jbclements · February 15, 2024, 6:01am

Another random check: are you running this code using DrRacket, or by running it at the command-line?

default.kramer · February 15, 2024, 6:05am

I tested it from the command line like racket my-file.rkt. When you say "strip the types out", do you mean change the #lang to racket? I don't think that will be too difficult, I'll try it out soon.

jbclements · February 15, 2024, 6:08am

Yes, that's what I mean.

LiberalArtist · February 15, 2024, 4:26pm

Are you using the FFI at all? There are some ways of writing FFI code that would end up making copies of byte strings on CS, but not copying on BC.

default.kramer · February 15, 2024, 10:38pm

Updated times:

Typed, CS: 45 FPS (within 1 FPS every run)
Untyped, CS: 50FPS (within 1 FPS every run)
Typed, BC: 124-130 FPS
Untyped, BC: 122-128 FPS

So it would seem that Typed Racket + CS is causing a bit of slowdown, but nothing major.

default.kramer · February 15, 2024, 10:39pm

No, I'm not using FFI yet. But that's good to know as I expect I will need FFI if/when I try to implement the audio output.

EmEf · February 16, 2024, 1:36am

John meant something different: whether TR and R interact in your program. If a program mixes R and TR, there are bad cases where the type-protection scheme imposes serious penalties (order of magnitude). This is not the case with your program.

;; - - -

Is it possible that your installation of Racket/CS did not compile the libraries?

default.kramer · February 16, 2024, 7:57pm

How could I check whether the libraries were compiled or not?

default.kramer · February 16, 2024, 8:06pm

I found the Inspecting Compiler Passes documentation and was able to view the linklet and the machine code that CS generates. Nothing jumps out at me, but that's mostly because my emulate-one-instruction procedure is very large and hard to read. Maybe BC is better than CS at optimizing large procedures? In any case, with this tool in hand I think I should be able to refactor the code starting with smaller, simpler functions and verifying the machine code at each step.

jjsimpso · February 16, 2024, 8:25pm

Please share your results if you are able to improve the CS performance(or even if not).

benknoble · February 16, 2024, 9:17pm

For example: do the packages you have installed have compiled directories? You could try running raco setup with your CS installation to make sure everything is compiled.

soegaard · February 17, 2024, 1:03am

Whether the files are compiled or not - that will only affect the startup time.
Here the issue is that the number of fps dropped.

EmEf · February 17, 2024, 1:40am

It depends how the per-s is measured, say if it includes the start-up time and is about short runs.

mfandl · February 17, 2024, 10:34am

I do not know the details, but seeing "Maybe BC is better than CS at optimizing large procedures?" made this jump out of the depths of my memory:

If you write the same program in Racket and in Chez, it will run at almost exactly the same speed, unless it has very very large functions that are nonetheless important to compile efficiently, in which case there is interpretation overhead

discord post by samth

but I guess from what I read above the long function was indeed compiled? maybe you could try setting PLT_CS_COMPILE_LIMIT to something larger and see if it makes any difference.

see 18.7 Controlling and Inspecting Compilation

default.kramer · February 17, 2024, 5:54pm

The PLT_CS_COMPILE_LIMIT did it! I bumped it up to 20000 and it now it runs very fast. Thanks everyone! I will make the code public pretty soon.

Topic		Replies	Views
Is there plans for improving the performance of the generated code from the racket compiler? Internals question	16	1049	May 28, 2024
Language implementation design decisions and trade offs Questions & Answers question	11	520	May 14, 2022
Why does Typed Racket run faster when `#:no-optimize`? Questions & Answers question , performance	3	384	November 3, 2023
CS ≠ BC for call-with-values Questions & Answers question , racket-cs-versus-bc	4	294	September 8, 2022
What would it take to write an independent Racket interpreter? Questions & Answers question	9	1344	May 5, 2022

Why does my code run significantly faster on BC than CS?

Related topics