I'm working on a NES emulator in Typed Racket and I've just gotten it complete enough to do some realistic performance testing. Racket 8.12 BC can run the emulator at 105 FPS, but the same code using 8.12 CS peaks at 45 FPS. (This is "headless" emulation speed, unaffected by racket/gui.)
The profiler showed me that it is the CPU emulation which is taking the majority of the time. This code really only does these few things:
Read and write RAM, using unsafe-bytes-set! and unsafe-bytes-ref
Fixnum arithmetic (eg. unsafe-fx+ and unsafe-fxior)
Conditionals based on fixnum comparisons
Is it possible that CS is actually more than 2x slower than BC for this kind of workload? Are there any CS-specific performance pitfalls to be aware of? If BC is compiled and CS is interpreted it would seem to explain things, but I thought CS was compiled also.
I'll be happy to share the code once I do a little cleanup if anyone is interested. Thanks in advance!
I'm no expert, but what platform did you run the performance tests on? CS compiles to native machine code(generally speaking, at least) while BC compiles to bytecode(again, generally speaking).
From what I understand, CS's native code is not as compact as BC bytecode. Perhaps the BC JIT compiler is more efficient in this case. Someone who knows more than me will have to comment on that possibility though.
I'm wondering if there's an unfortunate interaction between Typed Racket, CS, and unsafe operations. How hard would it be to strip the types out, just to see what difference it makes?
I tested it from the command line like racket my-file.rkt. When you say "strip the types out", do you mean change the #lang to racket? I don't think that will be too difficult, I'll try it out soon.
John meant something different: whether TR and R interact in your program. If a program mixes R and TR, there are bad cases where the type-protection scheme imposes serious penalties (order of magnitude). This is not the case with your program.
;; - - -
Is it possible that your installation of Racket/CS did not compile the libraries?
I found the Inspecting Compiler Passes documentation and was able to view the linklet and the machine code that CS generates. Nothing jumps out at me, but that's mostly because my emulate-one-instruction procedure is very large and hard to read. Maybe BC is better than CS at optimizing large procedures? In any case, with this tool in hand I think I should be able to refactor the code starting with smaller, simpler functions and verifying the machine code at each step.
For example: do the packages you have installed have compiled directories? You could try running raco setup with your CS installation to make sure everything is compiled.
I do not know the details, but seeing "Maybe BC is better than CS at optimizing large procedures?" made this jump out of the depths of my memory:
If you write the same program in Racket and in Chez, it will run at almost exactly the same speed, unless it has very very large functions that are nonetheless important to compile efficiently, in which case there is interpretation overhead
but I guess from what I read above the long function was indeed compiled? maybe you could try setting PLT_CS_COMPILE_LIMIT to something larger and see if it makes any difference.