Sham reminds me a little bit of python's Numba. With Numba you just add little annotations to the original code. I think if you just want something that is easy to use to speed up some numeric computation function, that is an advantage over Sham.
Sham on the other hand seems to give you an actual low-level language that you use to write your function giving you more control, so it may be a bit more work, but I think the additional choice and control over what you get as a result is Sham's upside.
Sham is a more general tool that can be used to implement many different DSLs, so it makes sense that it has to be more explicit. Probably Sham could also be used to implement a racket language that sometimes uses sham where it can automatically deduce that it would be beneficial (or easy to do so) and use racket in other cases, with such an language it could probably become as easy as Numba.
The define-ast
is very interesting and goes way beyond what Numba offers.
The paper built an automata language, that reminded me of what I read on ripgrep's page:
Summarizing, ripgrep is fast because:
- It is built on top of Rust's regex engine. Rust's regex engine uses finite automata, SIMD and aggressive literal optimizations to make searching very fast. (PCRE2 support can be opted into with the
-P/--pcre2
flag.)- Rust's regex library maintains performance with full Unicode support by building UTF-8 decoding directly into its deterministic finite automaton engine.
- It supports searching with either memory maps or by searching incrementally with an intermediate buffer. The former is better for single files and the latter is better for large directories. ripgrep chooses the best searching strategy for you automatically.
[...]
So using Sham to implement rackets regexp could be an interesting research project, I wonder what the benchmarks would show. But I also don't know a lot about the current implementation.
Overall I only had a quick look at the paper, so I don't really know how everything works out in practice.
I think with Sham I would be more likely to use llvm (indirectly).