Tyrn
March 19, 2024, 5:38pm
1
Hi,
This is a small and clean example: a function to make a string of initials out of a string of [co]authors.
Coauthors consist of a sequence of authors separated by comma
Authors consist of a sequence of barrels (double- and multi-barreled names permitted) separated by dash
Barrels consist of words separated by spaces and/or periods.
A fragment of unit test data set:
((".. , .. ") "")
((" ,, .,") "")
(("l") "L.")
((", a. g, ") "A.G.")
(("- , -I.V.-A,E.C.N-, .") "I.V-A.,E.C.N.")
((" e.B.Sledge ") "E.B.S.")
(("Elisabeth Kubler-- - Ross") "E.K-R.")
((" Fitz-Simmons Ashton-Burke Leigh") "F-S.A-B.L.")
(("Arleigh \"31-knot\"Burke ") "A.B.")
There are two functions doing this job using the same tools and logic.
One (initials
):
(define (initial-create str)
(string-upcase (substring str 0 1)))
(define (initials coauthors)
(define (nicknames-drop coauthors)
(string-replace (regexp-replace* #rx"\"(?:\\.|[^\"\\])*\"" coauthors " ") "\"" " "))
(define (into-authors-split coauthors)
(filter (λ (author) (non-empty-string? (regexp-replace* #px"[\\s.\\-]+" author "")))
(string-split coauthors ",")))
(define (into-barrels-split-n-join author)
(define (into-names-split-n-join barrel)
(define (into-names-split barrel)
(filter non-empty-string? (regexp-split #px"[\\s.]+" barrel)))
(string-join (map (λ (name) (initial-create name)) (into-names-split barrel)) "."))
(define (into-barrels-split author)
(filter (λ (barrel) (non-empty-string? (regexp-replace* #px"[\\s.]+" barrel "")))
(string-split author "-")))
(string-append
(string-join (map (λ (barrel) (into-names-split-n-join barrel)) (into-barrels-split author)) "-")
"."))
(string-join (map (λ (author) (into-barrels-split-n-join author))
(into-authors-split (nicknames-drop coauthors)))
","))
Two (inits
):
(define (inits coauthors)
(string-join
(map (λ (author)
(string-append
(string-join
(map (λ (barrel)
(string-join (map (λ (name) (initial-create name))
(filter non-empty-string? (regexp-split #px"[\\s.]+" barrel)))
"."))
(filter (λ (barrel) (non-empty-string? (regexp-replace* #px"[\\s.]+" barrel "")))
(string-split author "-")))
"-")
"."))
(filter (λ (author) (non-empty-string? (regexp-replace* #px"[\\s.\\-]+" author "")))
(string-split
(string-replace (regexp-replace* #rx"\"(?:\\.|[^\"\\])*\"" coauthors " ") "\"" " ")
",")))
","))
The functions are doing their job pretty good. Within this approach one cannot do much else. My problem, I can't decide which one is less ugly and more readable. Since I look at them for quite some time, I can't trust my vision any more.
What would you do in my place? Is there a third way?
Tyrn
March 19, 2024, 6:52pm
2
Another take . Just tried to push all the removable cognitive load away from the ugly and vital part.
(define (inits coauthors)
(define (nicknames-drop coauthors)
(string-replace (regexp-replace* #rx"\"(?:\\.|[^\"\\])*\"" coauthors " ") "\"" " "))
(define (valid-author? author)
(non-empty-string? (regexp-replace* #px"[\\s.\\-]+" author "")))
(define (valid-barrel? barrel)
(non-empty-string? (regexp-replace* #px"[\\s.]+" barrel "")))
(define (into-valid-names-split barrel)
(filter non-empty-string? (regexp-split #px"[\\s.]+" barrel)))
(string-join
(map (λ (author)
(string-append (string-join (map (λ (barrel)
(string-join (map (λ (name) (initial-create name))
(into-valid-names-split barrel))
"."))
(filter (λ (barrel) (valid-barrel? barrel))
(string-split author "-")))
"-")
"."))
(filter (λ (author) (valid-author? author)) (string-split (nicknames-drop coauthors) ",")))
","))
EmEf
March 19, 2024, 8:16pm
3
Here is how I would present it to future readers:
The binding arrows explain the sequencing.
(Why is nickname dropping possible before we know that these are valid names and barrels?)
The let*
shows me the pipelining inside of a function.
By using s
as the generic name I can easily fold/unfold expressions until I am happy.
(Your 'happy' may differ from 'mine'. I used near-atomic expressions for 'happy' here.)
Then I can name each stage appropriately.
Code:
(define nick-name-pattern #rx"\"(?:\\.|[^\"\\])*\"")
#; {String -> String}
(define (inits-3 coauthors)
(let* ([s (nicknames-drop coauthors)]
[s (string-split s ",")]
[s (filter valid-author? s)]
[s (map author->somethiing s)]
[s (string-join s ",")])
s))
;; -----------------------------------------------------------------------------
#; {String -> String}
(define (nicknames-drop coauthors)
(string-replace (regexp-replace* nick-name-pattern coauthors " ") "\"" " "))
;; -----------------------------------------------------------------------------
#; {String _> Boolean}
(define (valid-author? author)
(non-empty-string? (regexp-replace* #px"[\\s.\\-]+" author "")))
;; -----------------------------------------------------------------------------
;; convert author to initials, if it consists of valid barrels
#; {String -> String}
(define (author->somethiing author)
(let* ([s (string-split author "-")]
[s (filter valid-barrel? s)]
[s (map barrel->something s)]
[s (string-join s "-")]
[s (string-append s ".")])
s))
#; {String -> Boolean}
(define (valid-barrel? barrel)
(non-empty-string? (regexp-replace* #px"[\\s.]+" barrel "")))
#; {String -> String}
(define (barrel->something barrel)
(let* ([s (into-valid-names-split barrel)]
[s (map initial-create s)]
[s (string-join s ".")])
s))
#; {String -> [Listof Boolean]}
(define (into-valid-names-split barrel)
(filter non-empty-string? (regexp-split #px"[\\s.]+" barrel)))
;;-----------------------------------------------------------------------------
(define uses
'[
((".. , .. ") "")
((" ,, .,") "")
(("l") "L.")
((", a. g, ") "A.G.")
(("- , -I.V.-A,E.C.N-, .") "I.V-A.,E.C.N.")
((" e.B.Sledge ") "E.B.S.")
(("Elisabeth Kubler-- - Ross") "E.K-R.")
((" Fitz-Simmons Ashton-Burke Leigh") "F-S.A-B.L.")
(("Arleigh \"31-knot\"Burke ") "A.B.")
])
(require rackunit)
(define (test f)
(for-each (λ (x) (check-equal? (apply f (first x)) (second x))) uses))
(test inits-3)
Tyrn
March 19, 2024, 8:51pm
4
Oh, my... This way one can tackle problems harder than this. Thank you very much!
What are #;
tokens?
Tyrn
March 19, 2024, 11:53pm
5
Why is nickname dropping possible before we know that these are valid names and barrels?
If, by any chance, the nickname contains a delimiter (space, dot, dash), we're in for a lot of trouble. I'd better add a test case for it.
Oh, but there is
(("William J. \"Wild Bill\" Donovan, Marta \"Cinta Gonzalez") "W.J.D.,M.C.G.")
shawnw
March 20, 2024, 6:10am
6
Tyrn:
What are #;
tokens?
Comments out the next s-expression/datum that's read. See 1.3 The Reader