Hi. I want to extract and/or process certain lines from a rather big file (330 MB, 6.5 million lines). The condition for which lines to handle is based on whether a certain part of the line read is found in a hash set of 9300 words. I may be old-fashioned, but I think I should read the big file line by line, not to use too much memory.
I have not used Racket too much yet, but I found the recursive style "read file line by line" example here: https://rosettacode.org/wiki/Read_a_file_line_by_line#Racket
The thing that troubles me with that solution, is how do I fit the needed condition and processing in there. I need to pass references to my existing hash set and out-file.
I feel a more "old school" loop would be easier to do, but will it be as efficient as a recursive solution?
1 Like
One way: Use parameterize to provide a binding to a function used inside the iterator.
Another way: Define a function taking your "external references" as arguments, which produces a function suitable for call-with-input-file
.
Here's one way to do it:
(define (process-file infile outfile print-line-number?)
(define (process-lines [line-number 1])
(define line (read-line (current-input-port) 'any))
(unless (eof-object? line)
(if print-line-number?
(printf "~a: ~a~n" line-number line)
(printf "~a~n" line))
(process-lines (add1 line-number))))
(with-output-to-file outfile
(lambda () (with-input-from-file infile process-lines))))
Since the helper function process-lines
is defined within the scope of process-file
, it has access to print-line-number?
. You can use the same technique for your hash table. And you don't need to pass the out-file either since with-output-to-file
takes care of that.
I also used an optional argument to process-lines
to track the line numbers.
3 Likes
(define my-function (make-parameter println))
(define (read-next-line-iter file)
(let ((line (read-line file 'any)))
(unless (eof-object? line)
((my-function) line)
(read-next-line-iter file))))
(call-with-input-file "d:/foobar.txt" read-next-line-iter)
(newline)
(parameterize ((my-function (lambda (line)
(println (if (odd? (string-length line)) "odd" "")))))
(call-with-input-file "d:/foobar.txt" read-next-line-iter))
(newline)
(define (make-my-function dict)
(lambda (file)
(let loop ((line (read-line file 'any)))
(unless (eof-object? line)
(println (hash-ref dict line 'nope))
(loop (read-line file 'any))))))
(call-with-input-file "d:/foobar.txt"
(make-my-function (hash "22" 'yep)))
gives the following output for a suitable foobar.txt
:
"1"
"22"
"333"
"4444"
"55555"
"4444"
"333"
"22"
"1"
"odd"
""
"odd"
""
"odd"
""
"odd"
""
"odd"
'nope
'yep
'nope
'nope
'nope
'nope
'nope
'yep
'nope
1 Like
Thanks a lot! I found this solution easy to understand and adapt, so I'll stick with that one.
1 Like