Micro-Benchmark of Summing a Floating Point Vector
Take this with a grain of salt. All micro-benchmarks are fraught; it's easier to do wrong than to do correctly. Please correct anything I have done incorrectly or suggest a better approach, as long as it's not particularly intricate code: it does matter that good performance can be achieved with straight forward code. For this trivial example, it is in all the languages.
This benchmark compares summing a floating point array in the following languages:
Julia: Design around matrix arithmetic; relatively easy to implement strong typing (though not required)
Nim: a nice c/c++ substitute; requires typing every variable; static compilation
SBCL: a performant lisp which makes typing possible; REPL statements compiled before execution
Gambit: a scheme that compiles via c to object code; typing possible but optional
Racket: a scheme with extensive libraries; typing possible but optional; REPL statements compiled before execution; for reasons unclear to me the compiled executable was actually faster than REPL execution though the manual claims that shouldn't be expected
The code for each language appears after the benchmarks. Benchmarks were run on my 2021 MacBook Pro M1. Benchmark timing varies quite a bit depending on other process running on the machine, even if the benchmarks are run in quick succession--yet another reason for healthy dose of salt.
Language | Timing in seconds for 1000 executions |
---|---|
Julia | .07 (measured with internal BenchmarkTools) |
Nim | .04 (second run) |
Gambit compiled | .63 (second run) |
SBCL in REPL | 1.2 |
Racket in REPL | 1.9 |
Racket executable | .71 (second run) |
Comments
Racket performs very respectably compared to sbcl even in the REPL. The executable, despiting being > 60mb, performs very well compared to compiled Gambit--essentially a tie. Of course, it's not even fair to compare to Julia and Nim, both of which are strongly typed languges. Julia can also be dynamically typed, but it is usually easy to get "type stability" with Julia's excellent type inference.
I tried typed/racket but it did not like the do loop and I could not figure out how to introduce the requested type hints. I am fair perplexed by loop. I don't think I could put a do loop in a let block as the do loop introduces its own scope--so not clear it would accomplish anything. Eager to see how to use typed/racket correctly.
Look at the code below. Which language would you choose for numerical work?
Code:
Julia
function bench_vec_sum(len, fval)
v = fill(fval, len)
return sum(v)
end
Nim
# compile with nim c -d:danger --app:console --opt:speed --cc:clang --mm:orc benchmark_vector.nim
import std/algorithm # for fill
import os # for cmdline parameters
import std/strutils # to parse cmdline parameters
proc sum_vec_f64(vec: seq[float64]): float64 = # returns float64
var sum: float64 = 0.0
for x in vec: # iterator is faster than calculating index and referencing each slot
sum += x
return sum
proc vec_bench(vlen: int, fval: float64) =
var vec: seq[float64]
newseq(vec, vlen) # create and allocate memory for it
vec.fill(fval)
discard sum_vec_f64(vec) # ignore the return value
proc dobench(n: int, vlen: int, fval: float64) =
for i in 1 .. n:
vec_bench(vlen, fval)
# run it and ignore the return value
var fval: float64 # value for element of vector
var n: int # number of runs
var vlen: int # length of vector
if paramCount() == 3:
n = parseInt(paramStr(1))
vlen = parseInt(paramStr(2))
fval = parseFloat(paramStr(3))
dobench(n, vlen, fval)
else:
quit("Requires exactly 3 integer parameters. Exiting.")
Gambit
;; #!/usr/bin/env gsi-script -:d0
(declare (block)) ; mysterious incantation to enable inlining of procedures
; sum the vector
(define (f64vector-sum vec)
(let ((len (f64vector-length vec))
(sum 0.0))
(do ((i 0 (+ i 1)))
((= i len) sum)
(set! sum (+ sum (f64vector-ref vec i))))))
(define (f64vec-bench len fval)
(let ((vec (make-f64vector len fval)))
(f64vector-sum vec)))
(define (repeat n len fval)
(do ((i 0 (+ i 1)))
((= i n))
(f64vec-bench len fval)))
(repeat 1000 100000 0.5) ; do a sum 1000 times
sbcl
sbcl
; the following make no difference on execution time
; (declaim (optimize (speed 3) (debug 0) (safety 0)))
; (declaim (inline vec-bench)) ; 30 to 40% performance improvement
; (declaim (inline vector-sum))
; sum the vector
(defun f64vector-sum (vec)
(let ((len (length vec))
(sum 0.0d0))
(do ((i 0 (+ i 1)))
((= i len) sum)
(setf sum (+ sum (aref vec i))))))
(defun f64vec-bench (len fval)
(let ((vec (make-array len :element-type 'double-float :initial-element fval)))
(f64vector-sum vec)))
(defun repeat (n len fval)
(do ((i 0 (+ i 1)))
((= i n))
(f64vec-bench len fval)))
(repeat 1000 100000 0.5d0) ; create vector and sum 1000 times
racket
#lang racket
; (require ffi/vector) not being used
(require racket/flonum)
; sum the vector
(define (vector-sum vec)
(let ((len (flvector-length vec))
(sum 0.0))
(do ((i 0 (add1 i)))
((= i len) sum)
(set! sum (fl+ sum (flvector-ref vec i))))))
;(define (vector-sum-new vec) ; much slower
; (for/sum ((sum vec)) sum))
(define (vector-bench len fval)
(let ((vec (make-flvector len fval)))
(vector-sum vec)))
(define (repeat n len fval)
(do ((i 0 (add1 i)))
((= i n))
(vector-bench len fval)))
; use as follows:
(time (repeat 1000 100000 0.5)) ; create vector 1000 times and sum