I thought I would try and calculate the length of an X-expression's string representation without actually converting the whole thing to a string. I theorized that if I could do it with math and immutable strings, it would be faster. But it turned out to be three times slower! [edit: I removed all uses of match-like forms and got it down to a roughly 2x difference]
I’m curious if anyone has any insight as to why this is. Am I holding it wrong, or is (string-length (xexpr->string x)) just the fastest method?
The use-case is I have large x-expressions that I need to cleanly break up into chunks no larger than 5000 characters when converted to strings in order to fit under and API limit.
txexpr->value‘s implementation uses txexpr? to decide how it should parse the datum, where txexpr? needs to handle a lot of cases.
But if we assume that txexpr->value‘s input is a txexpr?, then there are many cases that we don’t need to check, allowing us to specialize/simplify the code significantly.
If you use the following version instead:
(define (txexpr->values x)
(match x
[(list tag (? list? attrs) children ...)
(values tag attrs children)]
[(list tag children ...)
(values tag '() children)]))
it should be faster than the xexpr->string version.
That did it, this method is now about 5 times faster. Thanks!
Yes, it makes sense that any use of txexpr? slows things dow a lot, because it has to walk down the entire x-expression every time to verify it, and because of the map here it ends up being called several times for the same values.