Extracting lines from big file

Hi. I want to extract and/or process certain lines from a rather big file (330 MB, 6.5 million lines). The condition for which lines to handle is based on whether a certain part of the line read is found in a hash set of 9300 words. I may be old-fashioned, but I think I should read the big file line by line, not to use too much memory.
I have not used Racket too much yet, but I found the recursive style "read file line by line" example here: https://rosettacode.org/wiki/Read_a_file_line_by_line#Racket
The thing that troubles me with that solution, is how do I fit the needed condition and processing in there. I need to pass references to my existing hash set and out-file.
I feel a more "old school" loop would be easier to do, but will it be as efficient as a recursive solution?

1 Like

One way: Use parameterize to provide a binding to a function used inside the iterator.

Another way: Define a function taking your "external references" as arguments, which produces a function suitable for call-with-input-file.

Here's one way to do it:

(define (process-file infile outfile print-line-number?)
  (define (process-lines [line-number 1])
    (define line (read-line (current-input-port) 'any))
    (unless (eof-object? line)
      (if print-line-number?
          (printf "~a: ~a~n" line-number line)
          (printf "~a~n" line))
      (process-lines (add1 line-number))))
  
  (with-output-to-file outfile
    (lambda () (with-input-from-file infile process-lines))))

Since the helper function process-lines is defined within the scope of process-file, it has access to print-line-number?. You can use the same technique for your hash table. And you don't need to pass the out-file either since with-output-to-file takes care of that.

I also used an optional argument to process-lines to track the line numbers.

3 Likes
(define my-function (make-parameter println))

(define (read-next-line-iter file)
	   (let ((line (read-line file 'any)))
	     (unless (eof-object? line)
               ((my-function) line)
	       (read-next-line-iter file))))


(call-with-input-file "d:/foobar.txt" read-next-line-iter)
(newline)

(parameterize ((my-function (lambda (line)
                              (println (if (odd? (string-length line)) "odd" "")))))
  (call-with-input-file "d:/foobar.txt" read-next-line-iter))


(newline)

(define (make-my-function dict)
  (lambda (file)
    (let loop ((line (read-line file 'any)))
      (unless (eof-object? line)
        (println (hash-ref dict line 'nope))
        (loop (read-line file 'any))))))

(call-with-input-file "d:/foobar.txt"
  (make-my-function (hash "22" 'yep)))

gives the following output for a suitable foobar.txt:

"1"
"22"
"333"
"4444"
"55555"
"4444"
"333"
"22"
"1"

"odd"
""
"odd"
""
"odd"
""
"odd"
""
"odd"

'nope
'yep
'nope
'nope
'nope
'nope
'nope
'yep
'nope
1 Like

Thanks a lot! I found this solution easy to understand and adapt, so I'll stick with that one.

1 Like