X-expression string length

joeld · January 11, 2022, 3:57pm

I thought I would try and calculate the length of an X-expression's string representation without actually converting the whole thing to a string. I theorized that if I could do it with math and immutable strings, it would be faster. But it turned out to be three times slower! [edit: I removed all uses of match-like forms and got it down to a roughly 2x difference]

gist.github.com

https://gist.github.com/otherjoel/277cb0a76474a579e9aba95d0d6284fb

xexpr-len.rkt

#lang racket/base

(require racket/contract/base
         racket/match
         racket/symbol
         txexpr
         xml)

;; Function for calculating the length in characters of an x-expression if it
;; we converted to a string --- without actually converting it into a string!

This file has been truncated. show original

I’m curious if anyone has any insight as to why this is. Am I holding it wrong, or is (string-length (xexpr->string x)) just the fastest method?

The use-case is I have large x-expressions that I need to cleanly break up into chunks no larger than 5000 characters when converted to strings in order to fit under and API limit.

sorawee · January 11, 2022, 4:48pm

txexpr->value‘s implementation uses txexpr? to decide how it should parse the datum, where txexpr? needs to handle a lot of cases.

But if we assume that txexpr->value‘s input is a txexpr?, then there are many cases that we don’t need to check, allowing us to specialize/simplify the code significantly.

If you use the following version instead:

(define (txexpr->values x)
  (match x
    [(list tag (? list? attrs) children ...)
     (values tag attrs children)]
    [(list tag children ...)
     (values tag '() children)]))

it should be faster than the xexpr->string version.

joeld · January 11, 2022, 5:00pm

That did it, this method is now about 5 times faster. Thanks!

Yes, it makes sense that any use of txexpr? slows things dow a lot, because it has to walk down the entire x-expression every time to verify it, and because of the map here it ends up being called several times for the same values.

ryanc · January 11, 2022, 6:11pm

Beware, the string? case doesn't account for entity-escaping characters like <.

joeld · January 11, 2022, 11:22pm

Good point. I think the only ones I’d need to look for are <, >, and &.

Topic		Replies	Views
Greater than sign in javascript script in Racket Questions & Answers question	3	209	October 16, 2023
Valid numeric entity refused by xexpr->xml Questions & Answers	2	20	December 4, 2024
Html generating: xml library reinterpreting & Questions & Answers	17	111	February 24, 2025
[Newbie] How to improve my terrible calculator? Questions & Answers	2	351	January 23, 2023
When there is " in XML, how to get the whole string? Questions & Answers racket	10	68	September 3, 2024

X-expression string length

Related topics