Rewind a port (file) or other method?

hello,

i just want to skip comments and blank lines at the beginning of a file, i find a solution that rewind the port/file using file-position but is there other method/solution?


(require (only-in racket/base [do do-scheme])) ; backup original do

(include "while-do-when-unless.scm")

(include "SRFI-105.scm")

(define (test-blank-lines-or-comments li)
  (or (not (non-empty-string? li)) ; empty line
      (regexp-match #px"^[[:blank:]]*;+" li))) ; space, comments

(define (skip-comments-and-empty-lines in)
  
  (define li '())
  (define fpos '())
  (define cpt -1)

  (do
      (set! cpt (+ 1 cpt))
      (set! fpos (file-position in))
      (set! li (read-line in))
    while (test-blank-lines-or-comments li))

  (file-position in fpos) ;; rewind to the code to parse after comments or spaces

  (display "SRFI-105.rkt : skip-comments-and-empty-lines : number of skipped lines (comments, spaces) at beginning : ")
  (display cpt)
  (newline)
  )


(define (literal-read-syntax src in)

  (skip-comments-and-empty-lines in)
  
  (define lst-code (process-input-code-tail-rec in))
  lst-code)
   

note 'do' is defined this way:

;; > (define i 0) 
;; > (do (display-nl "toto") (set! i (+ i 1)) while (< i 4)) 
;; toto
;; toto
;; toto
;; toto
;; Warning: this 'do' break the one of scheme !
(define-syntax do
  (syntax-rules (do)
    ((do b1 ...
       while pred)
     (let loop () b1 ... (when pred (loop))))))

it works:

Welcome to DrRacket, version 8.12 [cs].
Language: reader "../Scheme-PLUS-for-Racket/main/Scheme-PLUS-for-Racket/src/SRFI-105.rkt", with debugging; memory limit: 8192 MB.
SRFI-105.rkt : skip-comments-and-empty-lines : number of skipped lines (comments, spaces) at beginning : 12

regards

damien

The regexp matching functions like regexp-match take a pattern and an input.

The pattern is (or/c regexp? byte-regexp? string? bytes?).

The input is (or/c string? bytes? path? input-port?). The interesting point here is you can match a regexp directly on an input port. You don't need to read a string from the port first.

Also, the regexp-try-match variant doesn't consume non-matches from the port.

So I think you could simply do something like:

(define in (open-input-string "\n;comment\n  ;λ\n12 ;comment"))
(let loop ()
  (when (regexp-try-match #px"^(;.+)?\n" in)
    (loop)))
(require (only-in racket/port port->string))
(print (port->string in)) ; => "12

I didn't think much about that particular regexp, it doesn't try to handle all your cases, but you can take it from there. :smile:

1 Like

regexp-try-match is the solution because as you say it doesn't consume non-matches on the port. Previously i searched a solution as peek-char for read-line but it does not exist a peek-line in Racket or Scheme.

I suppose regexp-try-match do a lot of peek-char and it is a solution.

About the regexp there is a problem in both code, the mine had an error, it does not check well the lines containing only spaces, and there is no solution with a single regexp for comments and empty lines of spaces or tab. We must use 2 regexp.
And we still have to iterate in a loop because regular expression are based on automata theory and they have no 'memory' : we can add a quantification on a pattern but we cannot add a quantification on clusters of patterns. So we cannot crunch many lines of comments and spaces in a single shot.

My corrected code should have be like this:

(define (test-blank-lines-or-comments li)
 (or (not (non-empty-string? li)) ; empty line
		 (regexp-match #px"^[[:blank:]]*$" li) ; only spaces, tabs
		 (regexp-match #px"^[[:blank:]]*;+" li))) ; space,tabs, comments
  

but i will use this new version of code ,which is now more simple with regular expressions and regexp-try-match:

(define (skip-comments-and-empty-lines in)
  (do
      while (or (regexp-try-match #px"^[[:space:]]" in)  ; skip space,tab,new line,...
		       (regexp-try-match #px"^;[^\n]*\n" in)))  ; and also comments

and the main code still looks like that:

(display "Possibly skipping some header's lines containing space,tabs,new line,etc  or comments.") (newline) (newline)
(skip-comments-and-empty-lines in)

(when (regexp-try-match #px"^#!r6rs[[:blank:]]*\n" in)
	(display "Detected R6RS code. (#!r6rs)") (newline) (newline))

tested on piece of file looking like:

;; Damien Mattei

;; 2024

; modify it to be recompiled by Racket ?

#!r6rs
(module start-logic-racket racket
(provide (all-defined-out))
(require "../Scheme-PLUS-for-Racket/main/Scheme-PLUS-for-Racket/Scheme+.rkt")
(require "racket/operation+.rkt")

...

the result is:

Possibly skipping some header's lines containing space,tabs,new line,etc  or comments.

Detected R6RS code. (#!r6rs)

note: with this solution there is no way to know how many lines are skipped.

Use port-count-lines! to enable automatic line counting.
Then use port-next-location to get the line number before and after.

1 Like

yes, i added the info (even if it is not necessary):

  (port-count-lines! in) ; turn on counting on port
  
  (display "Possibly skipping some header's lines containing space,tabs,new line,etc  or comments.") (newline) (newline)
  (skip-comments-and-empty-lines in)

  (when (regexp-try-match #px"^#!r6rs[[:blank:]]*\n" in)
	(display "Detected R6RS code. (#!r6rs)") (newline) (newline))

  (declare lc cc pc)
  (set!-values (lc cc pc) (port-next-location in))
  (display "SRFI-105.rkt : number of skipped lines (comments, spaces, directives,...) at header's beginning : ")
  (display lc)
  (newline)
  (newline)

and the output result:

Possibly skipping some header's lines containing space,tabs,new line,etc  or comments.

SRFI-105.rkt : number of skipped lines (comments, spaces, directives,...) at header's beginning : 15

fun thing is that it counts one more lines from the opening of file , counting an invisible to my procedure line catched by the Racket parser:
#lang reader "../Scheme-PLUS-for-Racket/main/Scheme-PLUS-for-Racket/src/SRFI-105.rkt"
is in the count even if i cannot access to it from my parser that works after the Racket parser.

That might be because port-count-lines numbers lines starting from 1 not 0?

So if you've skipped 1 line, port-next-location is going to tell you "line number 2 is current".


I believe port-count-lines was added to support giving Racket syntax objects the kind of line and column numbers that would be useful in error messages -- where, because reasons, the convention is line numbers are from 1 but column numbers are from 0. :person_shrugging:

It's fine to use port line counting for other purposes, you just need to keep that in mind.

1 Like

yes it should start counting from 1 like in the Racket's GUI,see the screenshot:

so 15 is the next line to be read in the buffer.

And #lang reader directive pass the file to SRFI-105.rkt procedure at the beginning with buffer positioned to line 2