Regular expression

hello,

i try to write a regex since yesterday afternoon, but not sure it could be done with regex as they are built on automata theory and somethings can not be done in this theory.

It is not i'm lazy :smile: , but on this sunny saturday i prefer to be outdoor :slight_smile: than having a headache on regular expressions .

Then i want to skip, this expression in a Scheme file:

#! /usr/bin/env racket

i thought of something like that for testing:

> (regexp-match #px"^#![[:blank:]]*/[[:alpha:]|/]*racket[[:blank:]]*\n" "#! /usr/local racket\n zut alors un autre racket   \n") 


(regexp-match
 (pregexp "^#![[:blank:]]*/[[:alpha:]|/]*racket[[:blank:]]*\n")
 "#! /usr/local racket\n zut alors un autre racket   \n")
#f

the problem is that i do not want to skip all 'racket' expression as it is the start of a program and racket is a keyword in the module ,see:

#! /usr/bin/env racket

#lang reader SRFI-105

;; interpolate field caller

;; Damien MATTEI

;; export PATH=/Applications/Racket/bin:$PATH

;; TODO: use Makefile

(module interpolate-field-caller racket
	
	(require Scheme+)
		 

	(require setup/dirs)
	(require racket/date)
	(require srfi/13) ; for at least string-contains

	(require xml
		 (except-in 2htdp/batch-io xexpr?)) ; for: read-lines
	
	(display "Scheme+ : interpole_fields") (newline)

if i do it like that,my parser got:

|(#! /usr/bin/env racket

#lang reader SRFI-105

;; interpolate field caller

;; Damien MATTEI

;; export PATH=/Applications/Racket/bin:$PATH

;; TODO: use Makefile

(module interpolate-field-caller racket
)|

cutting the program in the middle of code...

my previous parser looks like that:

(skip-comments-and-empty-lines in)

  (let loop ()
    (define try-read (regexp-try-match #px"^#![[:blank:]]*/[[:ascii:]]*racket[[:blank:]]*\n" in))
    (when  try-read
	   ;;(display "executable") (newline)
	   (display "|" stderr) (display try-read stderr) (display "|" stderr) (newline stderr)
	   (display (car try-read) (current-output-port))
	   (loop)))

  (skip-comments-and-empty-lines in)

  (let loop ()
    (when  (regexp-try-match #px"^#!curly-infix[[:blank:]]*\n" in)
      (loop)))

  (skip-comments-and-empty-lines in)

  (let loop ()
    (when  (regexp-try-match #px"^#lang reader SRFI-105[[:blank:]]*\n" in)
      ;;(display "srfi 105") (newline)
      (loop)))

  (skip-comments-and-empty-lines in)
  
  (when (regexp-try-match #px"^#!r6rs[[:blank:]]*\n" in)
	(set! flag-r6rs #t)
	(display "Detected R6RS code: #!r6rs" stderr) (newline stderr) (newline stderr))

if i can find a way in posix regex to define alpha character and / but not newline it could be ok but in the doc i only find 'ascii' which is too global....

Hi, @damien_mattei.

I am not sure I understand the question properly, but is this sort of what you have in mind? I am probably missing something, here.

#lang racket

(define text
  #<<HERE
#! /usr/bin/env racket

#lang reader SRFI-105

;; interpolate field caller

;; Damien MATTEI

;; export PATH=/Applications/Racket/bin:$PATH

;; TODO: use Makefile

(module interpolate-field-caller racket
	
(require Scheme+)
		 

(require setup/dirs)
(require racket/date)
(require srfi/13) ; for at least string-contains

(require xml
         (except-in 2htdp/batch-io xexpr?)) ; for: read-lines
	
(display "Scheme+ : interpole_fields") (newline))
HERE
  )

(match (regexp-match
        (pregexp "^#![[:alnum:][:blank:]/]+racket\n(.*?)$")
        text)
  [(list _ not-skipped)
   (displayln not-skipped)]
  [_ #false])

;=>
#|

#lang reader SRFI-105

;; interpolate field caller

;; Damien MATTEI

;; export PATH=/Applications/Racket/bin:$PATH

;; TODO: use Makefile

(module interpolate-field-caller racket
	
(require Scheme+)
		 

(require setup/dirs)
(require racket/date)
(require srfi/13) ; for at least string-contains

(require xml
         (except-in 2htdp/batch-io xexpr?)) ; for: read-lines
	
(display "Scheme+ : interpole_fields") (newline))
|#

Hello Christiaan,
i do not understand all your example but if i test just your regexp:

(regexp-match (pregexp "^#![[:alnum:][:blank:]/]+racket\n(.*?)$") "#! /usr/local/env racket\n zut alors un autre racket   \n") 
'("#! /usr/local/env racket\n zut alors un autre racket   \n" " zut alors un autre racket   \n")

me i want just the string truncated at the end of the firrst 'racket\n'

the problem could just be summarize with this in Racket (i put aside here the parser,SRFI 105,Scheme+,etc):

Welcome to DrRacket, version 8.14 [cs].
Language: racket, with debugging; memory limit: 8192 MB.
> (regexp-match #px"^#![[:blank:]]*/[[:ascii:]]*racket[[:blank:]]*\n" "#! /usr/local/env racket\n zut alors un autre racket   \n") 
'("#! /usr/local/env racket\n zut alors un autre racket   \n")

i want a regex that give as result:
'("#! /usr/local/env racket\n")

a regexp that stop on the first 'racket'

the reason is that i had a parser for curly infix SRFI 105 that works well with Racket,R6RS syntax ,that skip #lang etc, skip pragma directive such as #!r6rs ,etc but now i have files in Scheme that can self executed with #! /usr/local/env racket and i need to have this new features .

the parser and the file to parse can be found in this gist but solving the regular expression above is enought .

Oh, so basically the opposite of what I did, like so? I apologize, I am probably still not right.

(match (regexp-match
        (pregexp "^(#![[:alnum:][:blank:]/]+racket\n).*?$")
        text)
  [(list _ stop-here) stop-here]
  [_ #false])

;=> "#! /usr/bin/env racket\n"

your solution should work if i take the 'car' in some case:

(regexp-match (pregexp "^#![[:alnum:][:blank:]/]+racket\n(.*?)$") "#! /usr/local racket\n")
'("#! /usr/local racket\n" "")
> (regexp-match (pregexp "^#![[:alnum:][:blank:]/]+racket\n(.*?)$") "#! /usr/local racket")
#f
> (regexp-match (pregexp "^#![[:alnum:][:blank:]/]+racket\n(.*?)$") "#! /usr/local racket\n zut alors un autre racket   \n")
'("#! /usr/local racket\n zut alors un autre racket   \n" " zut alors un autre racket   \n")

reading again some regular expression doc i found this solution that seems ok for use:

#px"^#![[:print:]]*racket"

using [:print:] that do not accept newlines

1 Like

Oh, that's awesome, TIL :heart: