Why is plot slower with unicode symbols in ticks?

Yes, the binary search loop appears to be executed N times where N is the string length.

Here’s another experiment, which tests a shorter string length (10) rather than a very long string length (1000) in my previous experiments.

#lang racket

(require racket/draw)

(define LEN 10)
(define DIM 50)
(define N 5000)

(define (make c font-face)
  (define font
       (send the-font-list find-or-create-font
       (send the-font-list find-or-create-font
  (define target (make-bitmap DIM DIM))
  (define dc (new bitmap-dc% [bitmap target]))
  (send dc set-font font)
  (for ([i N])
    (send dc draw-text (make-string LEN c) 0 0 #t)))

(time (make #\※ "Arial"))

This outputs:

cpu time: 6034 real time: 6091 gc time: 51

I sum measured time durations across loops.

The computation for next-s takes
cpu: 5143.318999998254 real: 5157.742999998184 gc: 43.06900000000002

So 5143ms out of 6034ms is spent here.

Here’s an even further breakdown:

pango_layout_get_unknown_glyphs_count takes
cpu: 1188.8119999987075 real: 1209.8289999986505 gc: 1.9859999999999998

The binary search loop takes
cpu: 2029.091999999183 real: 2048.196999998843 gc: 0.597

Finding a fallback font takes
cpu: 1477.6769999980947 real: 1521.060999997697 gc: 37.26500000000002

For a longer string (the above program with LEN = 2000 and N = 1), the binary search loop and pango_layout_get_unknown_glyphs_count clearly become dominating, while finding a fallback font takes little time.

The whole program:
cpu time: 3622 real time: 3668 gc time: 12

next-s computation:
cpu: 3456.7010000000123 real: 3474.726999999999 gc: 3.0200000000000005

cpu: 1529.943999999996 real: 1532.640999999997 gc: 0

Binary search loop:
cpu: 1815.3040000000024 real: 1817.046999999995 gc: 1.305

Finding a fallback font:
cpu: 89.2680000000002 real: 106.98999999999342 gc: 1.607

That won't hold in the general case, though, where my labels are multiple characters long with mixes of the problematic characters.

I suspect that would help my case, since there's a small finite set of characters for labels; this won't address whatever pango_layout_get_unknown_glyphs_count is, though.

Could we precompute the values of ok-count for the whole string (possibly skipping that if we know it won't be needed)? That might get from O(N^2 \log N) to O(N), no?

[The first term is N \sum_{1 \le i \le N} \log i, which I'm confident is N \log\left(\prod_i i\right) = N \log N! \sim N^2 \log N where the last approximation is due to Stirling's formula. See also complex analysis - Value of Summation of $\log(n)$ - Mathematics Stack Exchange]

Here's the message link: https://discord.com/channels/571040468092321801/893314076346826852/1061920892881489920

Thanks! Much of what was said there has been said here too, but one comment that @mflatt made that seems worth bringing over is:

It's also possible that upgrading Pango (and maybe adjusting things like whether fallback is handled manually on the Racket side) would address issues. I'm not sure how Pango has evolved for Mac OS, though, and building a new version of the library is probably not easy.