I'm making a function that calculates whether or not a number is valid. Specifically whether or not an Australian Tax File Number (TFN) is valid (see Clearwater Software for an explanation of what's happening).
Now the TFNs I am receiving are mostly in text format. So I'm getting a string to begin with. In order to perform the maths, I have to cast the string to its individual digits and then map that with a list. I then perform the calculations upon the list of numbers and return a result.
That I as follows:
(define (validate-tfn candidate)
(define tfn-regex (pregexp "(\\d{3})\\s?(\\d{3})\\s?(\\d{3})"))
(define magic-weights '(1 4 3 7 5 8 6 9 10))
(if (regexp-match tfn-regex candidate)
(zero?
(remainder
(foldr + 0 (map * magic-weights
(map string->number
(map string
(string->list
(foldr string-append ""
(cdr (regexp-match tfn-regex candidate)))))))) 11))
#f))
(module+ test
(test-case "returns #t for a valid AU TFN"
(check-true (validate-tfn "123456782")))
(test-case "returns #f for an invalid AU TFN"
(check-false (validate-tfn "123456789"))))
Which feels a bit unwieldy. So I'm looking for suggestions to make the code more elegant.
For instance:
Can I get the value of string->list to be little strings and not chars?
Use something other than the cdr of the regexp-match to get the string with no spaces?
I'm sure there are other things I haven't thought of so I'm keen to explore.
(define tfn-regex (pregexp "(\\d{3})\\s?(\\d{3})\\s?(\\d{3})"))
(define magic-weights '(1 4 3 7 5 8 6 9 10))
(define 0-code (char->integer #\0))
(define (validate-tfn candidate)
(match candidate
[(pregexp tfn-regex (cons _ (app string-append* s)))
(zero? (remainder
(for/sum ([c (in-string s)]
[w (in-list magic-weights)])
(* (- (char->integer c) 0-code) w))
11))]
[_ #f]))
(module+ test
(test-case "returns #t for a valid AU TFN"
(check-true (validate-tfn "123456782")))
(test-case "returns #f for an invalid AU TFN"
(check-false (validate-tfn "123456789"))))
Comments:
Avoid using string-append in a loop (foldr in this case). Each string-append requires a new allocation of the entire string. It’s better to gather all the strings that you want to append, and then string-append* once, which will allocate only once.
You should store the result of (regexp-match tfn-regex candidate) so that you don’t need to compute it several times.
Quick thought: You could simplify things a bit by using match instead of let+cond. Also, there was a missing require -- you were using racket/base but you didn't include racket/list, so the last function wasn't available to valid-nhs-number?.
Also, what is the and in valid-nhs-number? there for? It's checking that the last element in the digits list is not #f, and then it's checking that digit against the result of the calculation, but the calculation is based on only the first 9 digits of the list so the comparison might not make sense.
@robertpostill I focused so much on the checksum processing, that it occurred only later to me that there seem to be two problems with the regular expression:
It's not anchored with ^ and $, so it matches "abc123456782xyz". More subtly, it'll match "1234567890" (ten digits, matching the first nine) or "1234567 890" (ten digits, matching the last nine).
It matches any whitespace, so it matches "123\t456\n782"
In case you deliberately used \s to also match non-breaking space, thin space and so on, it get's more tricky. It'll depend on where your data is coming from and what you want to allow.
Wow, I just wanted to thank everyone for their efforts here. This is exactly the feedback I wanted. Like seeing the match version vs the cond version for example is really helpful. The for/sum form was also very interesting and I'll be looking at that.