Why does Typed Racket run faster when `#:no-optimize`?

neteroster · November 3, 2023, 1:55am

Hello everyone, I am a beginner who has just started learning Racket. I am conducting some experiments with Typed Racket, and I have written the following code:

#lang typed/racket

(: benchmark-vector (-> Integer Integer Void))

(define (benchmark-vector len n)
  (define x (make-vector len 0))
  (for ([i (in-range n)])
    (vector-set! x 1 i)
    (vector-ref x 1)))

(time (benchmark-vector 1000 1000000000))

I compiled this code using raco make and ran it with racket xx.zo. The runtime of this code is mostly cpu time: 3765 real time: 3758 gc time: 0 (I tested it multiple times, and the difference is not significant).

I wanted to see how much slower the code would be if I turned off the optimization of Typed Racket, so I changed the first line to #lang typed/racket #:no-optimize. To my surprise, the code actually ran faster, and the output times were mostly cpu time: 1406 real time: 1411 gc time: 15.

Could you please explain why this is happening? Did I do something wrong? Thank you, everyone! I love many concepts in Racket and hope to learn more about it!

Additional Infomations:

In DrRacket with debugging, with #:no-optimize, the program did run slower.
With #lang typed/racket/optional, the program runs as fast as #lang typed/racket #:no-optimize
With #lang typed/racket/no-check, the program also runs as fast as #lang typed/racket #:no-optimize

samth · November 3, 2023, 1:54pm

The primary optimization being applied here is that Typed Racket knows that x is a vector and so it inlines some of the vector-ref and vector-set! code. Unfortunately, because that code has to do checks for immutable vectors and impersonated vectors, it makes the overall code size larger, which can inhibit inlining elsewhere.

It might be that those optimizations are not paying for themselves right now, or that they could be tweaked in some way.

sorawee · November 3, 2023, 2:09pm

It’s an optimization gone wrong, I think.

vector-set! in untyped Racket is “slow” because it needs to check that (1) the first argument is really a vector (2) the first argument is mutable and (3) the index is not out of bounds. To make the operation fast, we can switch to use either unsafe-vector-set! or unsafe-vector*-set!. The latter is the fastest, but doesn’t work on impersonated vectors. The former is slower than unsafe-vector*-set! but faster than vector-set!.

Typed Racket, in an attempt to make the code fast, attempts to use unsafe-vector*-set! or unsafe-vector-set!. But using these operations liberally could crash your programs if things go wrong. So Typed Racket needs to add some checks back manually, like checking that the index is not out of bounds.

It turns out that there are two issues:

Typed Racket doesn’t exploit mutability information. Even if it knows that the vector is definitely mutable through types, it will still add a check that the vector is mutable. This is the largest cost, but could also be eliminated easily.
But even if Typed Racket exploits mutability information, it turns out that vector-set! is still faster than impersonator check + unsafe-vector*-set! + unsafe-vector*-length.

The unsafe path really wins only when there are no impersonator checks and mutability checks at all. But since impersonator-ness is not tracked with types, I think it would be difficult (at least with the current Typed Racket) to make the unsafe path faster than the safe path.

Here are my benchmark programs:

The first program is an equivalent to a Typed Racket program that uses vector-set! in a loop with Typed Racket optimization.

#lang racket/base

(require racket/unsafe/ops)

(define (benchmark-vector)
  (define x (make-vector 1))
  (define start 0)
  (define end 100000000)
  (define inc 1)
  (define (for-loop pos)
    (when (unsafe-fx< pos end)
      (define i pos)
      (define new-i 0)
      (define new-v x)
      (when (immutable? new-v)
        (vector-set! new-v new-i i))
      (cond
        [(impersonator? new-v)
         (if (unsafe-fx< new-i (unsafe-vector-length new-v))
             (unsafe-vector-set! new-v new-i i)
             (vector-set! new-v new-i i))]
        [else (if (unsafe-fx< new-i (unsafe-vector*-length new-v))
                  (unsafe-vector*-set! new-v new-i i)
                  (vector-set! new-v new-i i))])
      (for-loop (unsafe-fx+ pos inc))))
  (for-loop start))

(time (benchmark-vector)) ; cpu time: 314 real time: 324 gc time: 0

The second program is like the first program, but without mutability check. This is what Typed Racket can and ideally should do under the unsafe path.

#lang racket/base

(require racket/unsafe/ops)

(define (benchmark-vector)
  (define x (make-vector 1))
  (define start 0)
  (define end 100000000)
  (define inc 1)
  (define (for-loop pos)
    (when (unsafe-fx< pos end)
      (define i pos)
      (define new-i 0)
      (define new-v x)
      (cond
        [(impersonator? new-v)
         (if (unsafe-fx< new-i (unsafe-vector-length new-v))
             (unsafe-vector-set! new-v new-i i)
             (vector-set! new-v new-i i))]
        [else (if (unsafe-fx< new-i (unsafe-vector*-length new-v))
                  (unsafe-vector*-set! new-v new-i i)
                  (vector-set! new-v new-i i))])
      (for-loop (unsafe-fx+ pos inc))))
  (for-loop start))

(time (benchmark-vector)) ; cpu time: 89 real time: 100 gc time: 0

The third program is like the second program, but without impersonator check. This is the fastest. Typed Racket might be able to target this code if the type keeps track of impersonator information.

#lang racket/base

(require racket/unsafe/ops)

(define (benchmark-vector)
  (define x (make-vector 1))
  (define start 0)
  (define end 100000000)
  (define inc 1)
  (define (for-loop pos)
    (when (unsafe-fx< pos end)
      (define i pos)
      (define new-i 0)
      (define new-v x)
      (if (unsafe-fx< new-i (unsafe-vector*-length new-v))
          (unsafe-vector*-set! new-v new-i i)
          (vector-set! new-v new-i i))
      (for-loop (unsafe-fx+ pos inc))))
  (for-loop start))

(time (benchmark-vector)) ; cpu time: 64 real time: 74 gc time: 0

The fourth program is an equivalent to a Typed Racket program that uses vector-set! in a loop without Typed Racket optimization.

#lang racket/base

(require racket/unsafe/ops)

(define (benchmark-vector)
  (define x (make-vector 1))
  (define start 0)
  (define end 100000000)
  (define inc 1)
  (define (for-loop pos)
    (when (unsafe-fx< pos end)
      (define i pos)
      (define new-i 0)
      (define new-v x)
      (vector-set! new-v new-i i)
      (for-loop (unsafe-fx+ pos inc))))
  (for-loop start))

(time (benchmark-vector)) ; cpu time: 81 real time: 92 gc time: 0

Again, considering that the second program is slower than the fourth program, I think it might make sense to not bother optimizing vector operations.

neteroster · November 3, 2023, 2:52pm

@sorawee @samth

Thank you for your detailed explanation! Although I don't currently need such high performance for vector, your explanation has helped me better understand the optimizations in Typed Racket. I look forward to Racket resolving this minor issue in the future. Thanks again!

Topic		Replies	Views
Performance optimizations (by Typed Racket, or otherwise) Questions & Answers question , performance	2	504	December 6, 2021
Suggestions on refactoring codes using Typed Racket Questions & Answers typed-racket	11	209	June 8, 2024
Pre-release: Shallow and Optional Types Announcements	6	473	September 16, 2022
How to speed up array indexing from untyped Racket General	11	463	July 16, 2022
Typed Racket guidelines Questions & Answers typed-racket	9	232	March 20, 2024

Why does Typed Racket run faster when `#:no-optimize`?

Related topics