Nested split-map-join, preferred style

Hi,

This is a small and clean example: a function to make a string of initials out of a string of [co]authors.

  • Coauthors consist of a sequence of authors separated by comma
  • Authors consist of a sequence of barrels (double- and multi-barreled names permitted) separated by dash
  • Barrels consist of words separated by spaces and/or periods.

A fragment of unit test data set:

              ((".. , .. ") "")
              ((" ,, .,") "")
              (("l") "L.")
              ((", a. g, ") "A.G.")
              (("- , -I.V.-A,E.C.N-, .") "I.V-A.,E.C.N.")
              (("  e.B.Sledge ") "E.B.S.")
              (("Elisabeth Kubler-- - Ross") "E.K-R.")
              (("  Fitz-Simmons Ashton-Burke Leigh") "F-S.A-B.L.")
              (("Arleigh \"31-knot\"Burke ") "A.B.")

There are two functions doing this job using the same tools and logic.

One (initials):

(define (initial-create str)
  (string-upcase (substring str 0 1)))

(define (initials coauthors)
  (define (nicknames-drop coauthors)
    (string-replace (regexp-replace* #rx"\"(?:\\.|[^\"\\])*\"" coauthors " ") "\"" " "))

  (define (into-authors-split coauthors)
    (filter (λ (author) (non-empty-string? (regexp-replace* #px"[\\s.\\-]+" author "")))
            (string-split coauthors ",")))

  (define (into-barrels-split-n-join author)
    (define (into-names-split-n-join barrel)
      (define (into-names-split barrel)
        (filter non-empty-string? (regexp-split #px"[\\s.]+" barrel)))

      (string-join (map (λ (name) (initial-create name)) (into-names-split barrel)) "."))

    (define (into-barrels-split author)
      (filter (λ (barrel) (non-empty-string? (regexp-replace* #px"[\\s.]+" barrel "")))
              (string-split author "-")))

    (string-append
     (string-join (map (λ (barrel) (into-names-split-n-join barrel)) (into-barrels-split author)) "-")
     "."))

  (string-join (map (λ (author) (into-barrels-split-n-join author))
                    (into-authors-split (nicknames-drop coauthors)))
               ","))

Two (inits):

(define (inits coauthors)
  (string-join
   (map (λ (author)
          (string-append
           (string-join
            (map (λ (barrel)
                   (string-join (map (λ (name) (initial-create name))
                                     (filter non-empty-string? (regexp-split #px"[\\s.]+" barrel)))
                                "."))
                 (filter (λ (barrel) (non-empty-string? (regexp-replace* #px"[\\s.]+" barrel "")))
                         (string-split author "-")))
            "-")
           "."))
        (filter (λ (author) (non-empty-string? (regexp-replace* #px"[\\s.\\-]+" author "")))
                (string-split
                 (string-replace (regexp-replace* #rx"\"(?:\\.|[^\"\\])*\"" coauthors " ") "\"" " ")
                 ",")))
   ","))

The functions are doing their job pretty good. Within this approach one cannot do much else. My problem, I can't decide which one is less ugly and more readable. Since I look at them for quite some time, I can't trust my vision any more.

What would you do in my place? Is there a third way?

Another take :sweat_smile: . Just tried to push all the removable cognitive load away from the ugly and vital part.

(define (inits coauthors)
  (define (nicknames-drop coauthors)
    (string-replace (regexp-replace* #rx"\"(?:\\.|[^\"\\])*\"" coauthors " ") "\"" " "))

  (define (valid-author? author)
    (non-empty-string? (regexp-replace* #px"[\\s.\\-]+" author "")))

  (define (valid-barrel? barrel)
    (non-empty-string? (regexp-replace* #px"[\\s.]+" barrel "")))

  (define (into-valid-names-split barrel)
    (filter non-empty-string? (regexp-split #px"[\\s.]+" barrel)))

  (string-join
   (map (λ (author)
          (string-append (string-join (map (λ (barrel)
                                             (string-join (map (λ (name) (initial-create name))
                                                               (into-valid-names-split barrel))
                                                          "."))
                                           (filter (λ (barrel) (valid-barrel? barrel))
                                                   (string-split author "-")))
                                      "-")
                         "."))
        (filter (λ (author) (valid-author? author)) (string-split (nicknames-drop coauthors) ",")))
   ","))

Here is how I would present it to future readers:

The binding arrows explain the sequencing.
(Why is nickname dropping possible before we know that these are valid names and barrels?)
The let* shows me the pipelining inside of a function.
By using s as the generic name I can easily fold/unfold expressions until I am happy.
(Your 'happy' may differ from 'mine'. I used near-atomic expressions for 'happy' here.)
Then I can name each stage appropriately.

Code:

(define nick-name-pattern #rx"\"(?:\\.|[^\"\\])*\"")

#; {String -> String}
(define (inits-3 coauthors)
  (let* ([s (nicknames-drop coauthors)]
         [s (string-split s ",")]
         [s (filter valid-author? s)]
         [s (map author->somethiing s)]
         [s (string-join s ",")])
    s))

;; -----------------------------------------------------------------------------
#; {String -> String}
(define (nicknames-drop coauthors)
  (string-replace (regexp-replace* nick-name-pattern coauthors " ") "\"" " "))

;; -----------------------------------------------------------------------------
#; {String _> Boolean}
(define (valid-author? author)
  (non-empty-string? (regexp-replace* #px"[\\s.\\-]+" author "")))

;; -----------------------------------------------------------------------------
;; convert author to initials, if it consists of valid barrels 

#; {String -> String}
(define (author->somethiing author)
  (let* ([s (string-split author "-")]
         [s (filter valid-barrel? s)]
         [s (map barrel->something s)]
         [s (string-join s "-")]
         [s (string-append s ".")])
    s))

#; {String -> Boolean}
(define (valid-barrel? barrel)
  (non-empty-string? (regexp-replace* #px"[\\s.]+" barrel "")))

#; {String -> String}
(define (barrel->something barrel)
  (let* ([s (into-valid-names-split barrel)]
         [s (map initial-create s)]
         [s (string-join s ".")])
    s))

#; {String ->  [Listof Boolean]}
(define (into-valid-names-split barrel)
  (filter non-empty-string? (regexp-split #px"[\\s.]+" barrel)))

;;-----------------------------------------------------------------------------
(define uses
  '[
    ((".. , .. ") "")
    ((" ,, .,") "")
    (("l") "L.")
    ((", a. g, ") "A.G.")
    (("- , -I.V.-A,E.C.N-, .") "I.V-A.,E.C.N.")
    (("  e.B.Sledge ") "E.B.S.")
    (("Elisabeth Kubler-- - Ross") "E.K-R.")
    (("  Fitz-Simmons Ashton-Burke Leigh") "F-S.A-B.L.")
    (("Arleigh \"31-knot\"Burke ") "A.B.")
    ])

(require rackunit)

(define (test f)
  (for-each (λ (x) (check-equal? (apply f (first x)) (second x))) uses))

(test inits-3)

Oh, my... This way one can tackle problems harder than this. Thank you very much!

What are #; tokens?

Why is nickname dropping possible before we know that these are valid names and barrels?

If, by any chance, the nickname contains a delimiter (space, dot, dash), we're in for a lot of trouble. I'd better add a test case for it.

Oh, but there is :joy:

              (("William J. \"Wild Bill\" Donovan, Marta \"Cinta Gonzalez") "W.J.D.,M.C.G.")

Comments out the next s-expression/datum that's read. See 1.3 The Reader