Suggestions on refactoring codes using Typed Racket

Antigen-11 · June 1, 2024, 4:03am

Hello everyone!

Recently, there has been widespread discussion in the community about the performance of the Racket language. It applies to a benchmarks game and I find it interesting to solve those problems about DNA because I'm a medical student.

Here I try refactoring this benchmark code using Typed Racket in order to make it more readable and accelerate its execution. As a result, however, the new program seems inefficient compared with the original one.

Does anyone have any suggestions for improvement?

soegaard · June 1, 2024, 9:02am

How do you time it?

In other words, do you compile first and then run?

Antigen-11 · June 1, 2024, 10:12am

Yes, I do. I use raco make to compile source codes and racket to execute those bytecode files directly.

$ raco make *.rkt
$ cat input25000000.txt | time racket compiled/typed-kn_rkt.zo
A 30.295
T 30.151
C 19.800
G 19.754

AA 9.177
TA 9.132
AT 9.131
TT 9.091
CA 6.002
AC 6.001
AG 5.987
GA 5.984
CT 5.971
TC 5.971
GT 5.957
TG 5.956
CC 3.917
GC 3.911
CG 3.909
GG 3.902

1471758	GGT
446535	GGTA
47336	GGTATT
893	GGTATTTTAATT
893	GGTATTTTAATTTATAGT
101.76user 6.02system 1:47.81elapsed 99%CPU (0avgtext+0avgdata 645292maxresident)k
0inputs+0outputs (0major+214582minor)pagefaults 0swaps
$ cat input25000000.txt | time racket compiled/kn_rkt.zo 
A 30.295
T 30.151
C 19.800
G 19.754

AA 9.177
TA 9.132
AT 9.131
TT 9.091
CA 6.002
AC 6.001
AG 5.987
GA 5.984
CT 5.971
TC 5.971
GT 5.957
TG 5.956
CC 3.917
GC 3.911
CG 3.909
GG 3.902

1471758	GGT
446535	GGTA
47336	GGTATT
893	GGTATTTTAATT
893	GGTATTTTAATTTATAGT
63.97user 2.36system 1:06.34elapsed 99%CPU (0avgtext+0avgdata 925476maxresident)k
0inputs+0outputs (0major+380529minor)pagefaults 0swaps

EmEf · June 1, 2024, 1:43pm

My conjecture is that your re-implementation of sequence-* and generator may slow down things.
I have therefore ported Matthew’s code as faithfully as possible using require/typed and a bit of a
macro trick. The types are a bit more approximate than yours at first glance but I doubt this will make
a difference.

I don’t know where the input file is and I am out of time.

Feedback much appreciated — Matthias

p.s. I moved the “tests” into a (module+ test .. you may wish to undo this move again.

#lang typed/racket/base

(require/typed racket/sequence
  [sequence-fold (-> (-> Natural Natural Any) Natural [Sequenceof Natural] Any)])

(require/typed
 racket/generator
 [yield (-> Integer Any)])

;; ---------------------------------------------------------------------------------------------------
(module generator racket/base
  (provide create-generator)

  (require racket/generator)

  (define (create-generator name f loop)
    (generator () (f))))

(require/typed
 'generator
 [create-generator (-> Symbol (-> Void) Boolean (-> Integer))])
;; ---------------------------------------------------------------------------------------------------

;;;
;;; The Computer Language Benchmarks Game
;;; https://salsa.debian.org/benchmarksgame-team/benchmarksgame/

;;; contributed by Matthew Flatt, modified by
;;; modified by James Bergstra

;;; Notes on the implementation: the strategy is to map the DNA letters to the
;;; bytes 0, 1, 2, 3, and then create a hash function that is simply the
;;; concatenation of these two-byte codes. This is handy because the slow part
;;; of this test is building the hash table, and this hash function means that
;;; we can take advantage of overlapping DNA sub-sequences to get a
;;; constant-time hash function (that does not depend on the sequence length).
;;;
;;; The bottleneck in this code seems to be Racket's hash table. The time to
;;; create the last hash table (for the len-18 string) seems to be about half
;;; the runtime of the whole program.

;; Map A->0, C->1, G->2 T->3 (and lowercase too)
(: dna->num (Byte -> Natural))
(define dna->num
  (let ([tbl (make-bytes 256 255)])
    (for ([ch : Byte (in-list (bytes->list #"ACGTacgt"))]
          [ii (in-list '(0 1 2 3 0 1 2 3))])
      (bytes-set! tbl ch ii))
    (lambda ({ch : Byte}) (bytes-ref tbl ch))))

;;; map a hash key back to a string (needed for printing)
(: unhash (-> Natural Natural String))
(define (unhash key len)
  (let ([rval (make-string len)])
    (sequence-fold
      (lambda ({key : Integer} {pos : Natural})
        (string-set! rval pos (string-ref "ACGT" (bitwise-and key 3)))
        (arithmetic-shift key -2))
      key
      (in-range len))
    rval))

;;; Ideally this would serve all-counts, but my attempt to do that
;;; was horribly slow.
(: hashes (-> Natural Bytes Any (-> Integer)))
(define (hashes keylen dna as-codes)
  (create-generator
   'generator
   (λ ()
     (let ([key 0] [ishift (* 2 keylen)] [thresh (sub1 keylen)])
       (for
           ([bb : Byte (in-bytes dna)]
            [ii : Natural (in-range (bytes-length dna))])
         (set! key (arithmetic-shift
                    (+ key (arithmetic-shift (if as-codes bb (dna->num bb)) ishift)) -2))
         (if (>= ii thresh) (yield key) #f))
       ))
   #false))

(: all-counts (-> Natural Bytes [HashTable Natural Natural]))
(define (all-counts keylen dna)
  (let ([table : [HashTable Natural Natural] (make-hasheq)]
        [key : Natural 0]
        [ishift : Natural (* 2 keylen)]
        )
    (for
      ([bb : Byte (in-bytes dna)]
       [ii : Natural (in-range (bytes-length dna))])
      (set! key (arithmetic-shift (+ key (arithmetic-shift bb ishift)) -2))
      (define r : (U False Natural) (hash-ref table key))
      (if (>= ii (- keylen 1)) (hash-set! table key (add1 (if r r 0))) #f)
      )
    table))

(: readbuf (-> Input-Port (-> Byte Integer) Bytes))
(define (readbuf in foo)
  (let ([s (open-output-bytes)])
    ;; Skip to ">THREE ..."
    (regexp-match #rx#"(?m:^>THREE.*$)" in)
    ;; Copy everything but newlines
    (for ([l (in-bytes-lines in)])
      (write-bytes l s))
    ;; Replace letters with numbers 0, 1, 2, 3
    (let ([actg : Bytes (get-output-bytes s)])
      (for ([ii (in-range (bytes-length actg))])
           (bytes-set! actg ii (foo (bytes-ref actg ii))))
      actg)))

(define-type PosCart [Pairof Natural Natural])

(: write-freqs (-> [HashTable Natural Natural] Natural Void))
(define (write-freqs table len)
  (let* ([content : [Listof PosCart] (hash->list table)]
         [total : Real (exact->inexact (apply + ((inst map Real PosCart) cdr content)))])
    (for ([a : PosCart
             (sort content (λ ({x : PosCart} {y : PosCart}) (> (cdr x) (cdr y))))])
      (printf "~a ~a\n" 
              (unhash (car a) len)
              (real->decimal-string (* 100 (/ (cdr a) total)) 3)))))

(: write-one-freq (-> {HashTable Natural Natural} Bytes Void))
(define (write-one-freq table key)
  (let* ([pre : Any (hash-ref table ((hashes (bytes-length key) key #f)) #false)]
         [cnt : Integer (if (exact-integer? pre) pre 0)])
    (printf "~a\t~a\n" cnt key)))


(module+ test
  (define dna (readbuf (current-input-port) dna->num))

  (write-freqs (all-counts 1 dna) 1)
  (newline)

  (write-freqs (all-counts 2 dna) 2)
  (newline)

  ;; Specific sequences:
  (for ([seq '(#"GGT" #"GGTA" #"GGTATT" #"GGTATTTTAATT" #"GGTATTTTAATTTATAGT")]) 
    (write-one-freq (all-counts (bytes-length seq) dna) seq)))

Antigen-11 · June 1, 2024, 2:48pm

Thanks for your reply.

I've tested this program and indeed it is more efficient than my reimplementation. Though it is still slower than the original one, I think your modification points the way.

Here is the result.

$ cat input25000000.txt | time racket emef-typed-kn.rkt 
A 30.295
T 30.151
C 19.800
G 19.754

AA 9.177
TA 9.132
AT 9.131
TT 9.091
CA 6.002
AC 6.001
AG 5.987
GA 5.984
CT 5.971
TC 5.971
GT 5.957
TG 5.956
CC 3.917
GC 3.911
CG 3.909
GG 3.902

1471758	GGT
446535	GGTA
47336	GGTATT
893	GGTATTTTAATT
893	GGTATTTTAATTTATAGT
86.84user 2.57system 1:29.44elapsed 99%CPU (0avgtext+0avgdata 1085928maxresident)k
64inputs+0outputs (0major+393115minor)pagefaults 0swaps

EmEf · June 1, 2024, 3:44pm

Could you post ’numbers’? Thanks.

soegaard · June 1, 2024, 3:59pm

I am guessing that "input25000000.txt" was generated by this program:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/fasta-racket-1.html

Antigen-11 · June 1, 2024, 4:09pm

@soegaard is right.

diff program output for this 10KB input file (generated with the fasta program N = 1000) with this output file to check your program output has the correct format, before you contribute your program.
Generate a larger input file (using one of the fasta programs with command line arguments: 25000000 > input25000000.txt) to check program performance.

EmEf · June 1, 2024, 7:40pm

OK, I created two main submodules (appended below) to time the three variants (untyped, emef-typed, and hz-typed):


[matthias@Texas DNA-discourse]$ for x (1 2 3 4 5); do ./untyped.rkt < input.txt | diff - output.txt ; done
28d27
< cpu time: 1055 real time: 1067 gc time: 17
28d27
< cpu time: 1074 real time: 1086 gc time: 18
28d27
< cpu time: 1074 real time: 1087 gc time: 18
28d27
< cpu time: 1074 real time: 1087 gc time: 18
28d27
< cpu time: 1076 real time: 1089 gc time: 18
[matthias@Texas DNA-discourse]$ for x (1 2 3 4 5); do ./emef-typed.rkt < input.txt | diff - output.txt ; done
28d27
< cpu time: 1107 real time: 1120 gc time: 21
28d27
< cpu time: 1108 real time: 1120 gc time: 21
28d27
< cpu time: 1107 real time: 1121 gc time: 21
28d27
< cpu time: 1108 real time: 1120 gc time: 21
28d27
< cpu time: 1107 real time: 1120 gc time: 22
[matthias@Texas DNA-discourse]$ for x (1 2 3 4 5); do ./hz-typed.rkt < input.txt | diff - output.txt ; done
28d27
< cpu time: 1733 real time: 1793 gc time: 17
28d27
< cpu time: 1739 real time: 1798 gc time: 18
28d27
< cpu time: 1735 real time: 1794 gc time: 17
28d27
< cpu time: 1739 real time: 1799 gc time: 18
28d27
< cpu time: 1737 real time: 1797 gc time: 18

The emef-variant is roughly 3.5% slower than the untyped one. This is in all likelihood due to the implied contracts on yield and the contract on the fold-function passed to sequence-fold.

The hz-variant is roughly 65% slower than the untyped one, and as I wrote before, probably due to re-implementing some Racket functionality. That’s good exercise but not good for performance.

Here are the main submodules:

(module+ main ;; for the untyped and emef-variant

  (define n 1000)

  (define (main)
    (define dna (readbuf (current-input-port) dna->num))

    (write-freqs (all-counts 1 dna) 1)
    (newline)

    (write-freqs (all-counts 2 dna) 2)
    (newline)

    (for ([seq '(#"GGT" #"GGTA" #"GGTATT" #"GGTATTTTAATT" #"GGTATTTTAATTTATAGT")]) 
      (write-one-freq (all-counts (bytes-length seq) dna) seq))

    (for ([n (in-range n)])
      (all-counts 1 dna))

    (for ([n (in-range n)])
      (all-counts 2 dna))

    ;; Specific sequences:
    (for ([n (in-range n)])
      (for ([seq '(#"GGT" #"GGTA" #"GGTATT" #"GGTATTTTAATT" #"GGTATTTTAATTTATAGT")]) 
        (all-counts (bytes-length seq) dna))))

  (collect-garbage) (collect-garbage) (collect-garbage)
  (time (main)))

and

(module+ main ;; for the hz-variant 

  (define n 1000)

  (: go (-> Positive-Integer DNA Table))
  (define (go k bts)
    (define table : Table (make-hasheq))
    (all-counts k bts #t table))

  (define (main)

    (define IN (current-input-port))
    (define OUT (current-output-port))

    (define bts (readbuf IN))
    (bytes-map! nucl->num bts)
   
    (write-freqs (go 1 bts) 1 OUT)
    (newline OUT)

    (write-freqs (go 2 bts) 2 OUT)
    (newline OUT)

    ;; Specific sequences:
    (for ([seq (in-list '(#"GGT" #"GGTA" #"GGTATT" #"GGTATTTTAATT" #"GGTATTTTAATTTATAGT"))])
      (write-one-freq (go (key-bytes-length seq) bts) seq #f OUT))
    
    (for ([n (in-range n)])
      (go 1 bts))
    
    (for ([n (in-range n)])
      (go 2 bts))

    ;; Specific sequences:
    (for ([n (in-range n)])
      (for ([seq '(#"GGT" #"GGTA" #"GGTATT" #"GGTATTTTAATT" #"GGTATTTTAATTTATAGT")])
        (go (cast (bytes-length seq) Positive-Integer) bts))))

  (collect-garbage) (collect-garbage) (collect-garbage)
  (time (main)))

Antigen-11 · June 2, 2024, 5:54am

I just realized that this make-hash-sequence function encodes every frame multiple times and got it fixed. It's currently still slower than your version but not that inefficient. Thanks for the effort you put into this problem!

cadence · June 6, 2024, 2:32am

Have you tried using profile-flame-graph to find which operations in the code are taking the most time?

Antigen-11 · June 8, 2024, 4:30pm

I've never used that library yet. I usually profile my code with the profile library. But I'll give it a try. Thanks for your suggestion!

Topic		Replies	Views
Why does Typed Racket run faster when `#:no-optimize`? Questions & Answers question , performance	3	358	November 3, 2023
Is typed racket compilation slow when using prefab structs? Questions & Answers typed-racket	9	291	May 2, 2023
Where to start with optimizing Racket programs? Questions & Answers	25	515	January 9, 2024
Typed Racket guidelines Questions & Answers typed-racket	9	221	March 20, 2024
Is there plans for improving the performance of the generated code from the racket compiler? Internals question	16	923	May 28, 2024

Suggestions on refactoring codes using Typed Racket

Related topics