It’s an optimization gone wrong, I think.
vector-set!
in untyped Racket is “slow” because it needs to check that (1) the first argument is really a vector (2) the first argument is mutable and (3) the index is not out of bounds. To make the operation fast, we can switch to use either unsafe-vector-set!
or unsafe-vector*-set!
. The latter is the fastest, but doesn’t work on impersonated vectors. The former is slower than unsafe-vector*-set!
but faster than vector-set!
.
Typed Racket, in an attempt to make the code fast, attempts to use unsafe-vector*-set!
or unsafe-vector-set!
. But using these operations liberally could crash your programs if things go wrong. So Typed Racket needs to add some checks back manually, like checking that the index is not out of bounds.
It turns out that there are two issues:
-
Typed Racket doesn’t exploit mutability information. Even if it knows that the vector is definitely mutable through types, it will still add a check that the vector is mutable. This is the largest cost, but could also be eliminated easily.
-
But even if Typed Racket exploits mutability information, it turns out that vector-set!
is still faster than impersonator check + unsafe-vector*-set!
+ unsafe-vector*-length
.
The unsafe path really wins only when there are no impersonator checks and mutability checks at all. But since impersonator-ness is not tracked with types, I think it would be difficult (at least with the current Typed Racket) to make the unsafe path faster than the safe path.
Here are my benchmark programs:
The first program is an equivalent to a Typed Racket program that uses vector-set!
in a loop with Typed Racket optimization.
#lang racket/base
(require racket/unsafe/ops)
(define (benchmark-vector)
(define x (make-vector 1))
(define start 0)
(define end 100000000)
(define inc 1)
(define (for-loop pos)
(when (unsafe-fx< pos end)
(define i pos)
(define new-i 0)
(define new-v x)
(when (immutable? new-v)
(vector-set! new-v new-i i))
(cond
[(impersonator? new-v)
(if (unsafe-fx< new-i (unsafe-vector-length new-v))
(unsafe-vector-set! new-v new-i i)
(vector-set! new-v new-i i))]
[else (if (unsafe-fx< new-i (unsafe-vector*-length new-v))
(unsafe-vector*-set! new-v new-i i)
(vector-set! new-v new-i i))])
(for-loop (unsafe-fx+ pos inc))))
(for-loop start))
(time (benchmark-vector)) ; cpu time: 314 real time: 324 gc time: 0
The second program is like the first program, but without mutability check. This is what Typed Racket can and ideally should do under the unsafe path.
#lang racket/base
(require racket/unsafe/ops)
(define (benchmark-vector)
(define x (make-vector 1))
(define start 0)
(define end 100000000)
(define inc 1)
(define (for-loop pos)
(when (unsafe-fx< pos end)
(define i pos)
(define new-i 0)
(define new-v x)
(cond
[(impersonator? new-v)
(if (unsafe-fx< new-i (unsafe-vector-length new-v))
(unsafe-vector-set! new-v new-i i)
(vector-set! new-v new-i i))]
[else (if (unsafe-fx< new-i (unsafe-vector*-length new-v))
(unsafe-vector*-set! new-v new-i i)
(vector-set! new-v new-i i))])
(for-loop (unsafe-fx+ pos inc))))
(for-loop start))
(time (benchmark-vector)) ; cpu time: 89 real time: 100 gc time: 0
The third program is like the second program, but without impersonator check. This is the fastest. Typed Racket might be able to target this code if the type keeps track of impersonator information.
#lang racket/base
(require racket/unsafe/ops)
(define (benchmark-vector)
(define x (make-vector 1))
(define start 0)
(define end 100000000)
(define inc 1)
(define (for-loop pos)
(when (unsafe-fx< pos end)
(define i pos)
(define new-i 0)
(define new-v x)
(if (unsafe-fx< new-i (unsafe-vector*-length new-v))
(unsafe-vector*-set! new-v new-i i)
(vector-set! new-v new-i i))
(for-loop (unsafe-fx+ pos inc))))
(for-loop start))
(time (benchmark-vector)) ; cpu time: 64 real time: 74 gc time: 0
The fourth program is an equivalent to a Typed Racket program that uses vector-set!
in a loop without Typed Racket optimization.
#lang racket/base
(require racket/unsafe/ops)
(define (benchmark-vector)
(define x (make-vector 1))
(define start 0)
(define end 100000000)
(define inc 1)
(define (for-loop pos)
(when (unsafe-fx< pos end)
(define i pos)
(define new-i 0)
(define new-v x)
(vector-set! new-v new-i i)
(for-loop (unsafe-fx+ pos inc))))
(for-loop start))
(time (benchmark-vector)) ; cpu time: 81 real time: 92 gc time: 0
Again, considering that the second program is slower than the fourth program, I think it might make sense to not bother optimizing vector operations.