I've just started using Ryan Culpepper's scramble/regexp library for constructing regular expressions in a structured way (a la "SRE"s), and I'm liking it a lot. One thing I haven't figured out yet is a nice way to write the regular expression ".", that matches any character. It looks like the right way is with ... well, no, I'm not sure. I think (inject ".") works, but there must be a nicer way?
I should say, having abstraction means I can at least (define-RE dot (inject ".")), or even (define-RE d (inject ".")), so this is not a problem without nice workarounds, but I feel like I must be missing something obvious.
p.s.: why not use Alex Shinn's irregex ? Sadly, the performance of the irregex library seems quite a bit worse than the built-in regexp library, in my experience.
Well, I totally believe that... I'm the maintainer! But I stopped maintaining it when I discovered how much slower it was. Specifically, I think that implementing fast regexps is hard, and chez & racket have done a lot of work on making it fairly fast, and my only issue is with the surface syntax, so I think the approach that Ryan takes, of compiling SRE's into native regexps, is almost certainly the right one.
Ultimately, this is kind of the "cross-platform tools are rarely faster" concept; if irregex came up with a clever way to make regexp matching faster, I'm relatively confident that lower-level implementations, such as the ones attached to chez/racket would use those techniques too, and probably have access to lower-level knobs and dials that can allow them/us to tune things better.
I don't have a benchmark. My personal experience comes from parsing GEDCOM files, an utter abomination of a file format that helps you understand just how terrible things were in the old days, and how hard it is to actually define and stick to a sensible set of conventions. Parsing multi-megabyte gedcom files with irregex was very slow, and parsing them with racket regexps was much faster. It's possible that I was "doing it wrong" somehow in irregex, but I couldn't tell you how.
Also, after spending 15 minutes looking for regexp benchmarks, I'm coming to realize what should have been obvious to me from the start, which is that it might well be the case that regexp engines can be tuned for different structures, lengths of pattern, et cetera. So in fact it might be the case that certain libraries, because of the choices that they make, are much better for certain matching tasks and much worse for others.
I don't remember why I didn't include a notation for ".", but it was probably some combination of overlooking it, not needing it, and not knowing what to call it.
This example is very interesting to me personally. My daughter did a lot of work at Ancestry.com during Covid and I, too, now have a multi-megabyte GEDCOM file. (I remarked to my daughter that the data Ancestry.com holds would be a lot more useful if they hired a few graph theorists, but it might not be a good business plan to give people tools to "finish" the job in a short time.)
I'll look at your code. I don't have much experience in either regexps or graph algorithms, but if others want to look at this problem together I'd be happy to contribute where I can.