Fast I/O and byte strings

badkins · December 5, 2023, 5:52pm

I have some experimental code that provides for fast byte string I/O. An example cat program is roughly 4.7x faster than using in-bytes-lines and byte string I/O. Additionally, using this instead of strings allowed me to drop the runtime of a Racket program from 1,117 seconds down to 50 seconds for a single file (22x faster), or from 149 minutes down to 6.7 minutes for the full data set. Not quite as fast as C++ (using string_view and a similar block I/O method) which was 1.2 minutes, but not bad.

There are two ideas:

For the I/O, I use the traditional manual buffering approach i.e. read big chunks of data into a buffer, and manually "parse" lines by finding indices of newline bytes.
Additionally, instead of using byte strings, I've created byte string views, analogous to C++ string_view, which are simply structs with a bytes buffer, an index to the beginning of the view, and the index to the exclusive end of the view. This minimizes allocations and copying significantly.

As part of the work, I had to recreate some functions to operate on byte string views instead of byte strings (split, trim, etc.). One of these is a soundex algorithm that is also provided.

The code was somewhat "stream of consciousness" to prove the concept, but I hope to refine the code and create some packages later.

Here are the files in a gist:

gist.github.com

https://gist.github.com/lojic/63dff68a3b84d6ab8e28d5d7cea807fc

block-input.rkt

#lang racket

(require "./bytes-view.rkt")

(provide create-block-in
         fill-buffer!
         next-line!)

(define buffer-size (* 256 1024))

This file has been truncated. show original

block-output.rkt

#lang racket

(require "./bytes-view.rkt")

(provide create-block-out
         block-write-bytes!
         flush-buffer!)

(define buffer-size (* 256 1024))

This file has been truncated. show original

bytes-view-soundex.rkt

#lang racket

(require "./bytes-view.rkt")
(provide bytes-view-soundex)

(define empty-view    (create-bytes-view #""))

;; "-123-12_-22455-12623-1_2_2" ; original from wikipedia

;;              ABCDEFGHIJKLMNOPQRSTUVWXYZ

This file has been truncated. show original

There are more than three files. show original

Topic		Replies	Views
Language implementation design decisions and trade offs Questions & Answers question	11	517	May 14, 2022
FFI: Converting from _pointer to byte string Questions & Answers question , ffi	8	404	October 6, 2022
Racket set performance General performance	11	550	December 27, 2021
Where to start with optimizing Racket programs? Questions & Answers	25	525	January 9, 2024
Suggestions on refactoring codes using Typed Racket Questions & Answers typed-racket	11	209	June 8, 2024

Fast I/O and byte strings

Related topics