Lexer matching case insensitive

The original SRE specification contains the definition of uncase:

UNCASE is a regexp operator producing a regexp that matches any
case permutation of any string that matches (: <sre> ...).
For example, the regexp

(uncase "foo")

matches the strings foo, foO, fOo, fOO, Foo, ...

I can not find it in parser-tools-lib.

How to match case-insensitive instead?

Something like:

(define-lex-abbrevs
  [letter     
   (union (char-range #\a #\z) (char-range #\A #\Z))]
  [digit      
   (char-range #\0 #\9)]
  [identifier 
   (concatenation letter 
                  (repetition 0 +inf.0 (union letter digit)))]
   etc

If you need to case-insensitive keywords (reserved identifiers),
you can do something like:

; Reserved keywords in Pacal ignore case.
; Thus the "PrOgRaM" and "program" are 
; the same keyword. Since the lexer has
; no builtin support for matching mixed case
; strings, we define our own lexer 
; transformation, mixed, that turns
;   (mixed "foo") into
;   (concatenation 
;     (union #\f #\F) (union #\o #\o) (union #\o #\o))
; Remember to use string-downcase on the 
; resulting lexeme.

(require (for-syntax syntax/parse))
(define-lex-trans mixed
  (λ (stx)
    (syntax-parse stx
      [(_ datum)
       (define str (string-downcase (syntax->datum #'datum)))
       (define STR (string-upcase str))
       #`(concatenation
          #,@(for/list ([c (in-string str)]
                        [C (in-string STR)])
               #`(union #,c #,C)))])))
; The following lexer transformation turns
;   (union-mixed "foo" "bar") into
;   (union (mixed "foo") (mixed "bar"))

(define-lex-trans union-mixed
  (λ (stx)
    (syntax-parse stx
      [(_ str ...)
       #`(union (mixed str) ...)])))

And now we can use this as a rule in the lexer:

  [reserved
     (union-mixed
      "div" "or" "and" "not" "if" "for" "to" "downto"
      "then" "else" "of" "while" "do" "begin" "end" 
      "read" "readln" "write" "writeln"
      "var" "const" "array" "type" "bindable"
      "procedure" "function" "program")]
2 Likes

Thanks! The mixed case is what I need. Works fine!