X-expression string length

I thought I would try and calculate the length of an X-expression's string representation without actually converting the whole thing to a string. I theorized that if I could do it with math and immutable strings, it would be faster. But it turned out to be three times slower! [edit: I removed all uses of match-like forms and got it down to a roughly 2x difference]

I’m curious if anyone has any insight as to why this is. Am I holding it wrong, or is (string-length (xexpr->string x)) just the fastest method?

The use-case is I have large x-expressions that I need to cleanly break up into chunks no larger than 5000 characters when converted to strings in order to fit under and API limit.

txexpr->value‘s implementation uses txexpr? to decide how it should parse the datum, where txexpr? needs to handle a lot of cases.

But if we assume that txexpr->value‘s input is a txexpr?, then there are many cases that we don’t need to check, allowing us to specialize/simplify the code significantly.

If you use the following version instead:

(define (txexpr->values x)
  (match x
    [(list tag (? list? attrs) children ...)
     (values tag attrs children)]
    [(list tag children ...)
     (values tag '() children)]))

it should be faster than the xexpr->string version.

1 Like

That did it, this method is now about 5 times faster. Thanks!

Yes, it makes sense that any use of txexpr? slows things dow a lot, because it has to walk down the entire x-expression every time to verify it, and because of the map here it ends up being called several times for the same values.

Beware, the string? case doesn't account for entity-escaping characters like <.

2 Likes

Good point. I think the only ones I’d need to look for are <, >, and &.