What locale is for string-upcase?

(parameterize ([current-locale #f])
  (string-locale-upcase "Straße"))
;  => "STRAßE"
(string-upcase "Straße")
; => "STRASSE"

The documentation states, that string-upcase "uses Unicode’s locale-independent conversion rules".

And, that for current-locale

when locale sensitivity is disabled by setting the parameter to #f, strings are compared, etc., in a fully portable manner, which is the same as the standard procedures.

But it doesn't work for me (Racket 8.9-cs, Window-x64). What value should I set to current-locale, to get "STRASSE" from string-locale-upcase?

FWIW

That's in a macOS terminal where locale reports:

% locale
LANG=""
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

I tried some of these locales:

% locale -a | grep de
de_CH
de_DE.UTF-8
de_AT.ISO8859-1
de_AT.UTF-8
de_AT.ISO8859-15
de_DE.ISO8859-15
de_CH.UTF-8
de_CH.ISO8859-15
de_DE.ISO8859-1
de_CH.ISO8859-1
de_AT
de_DE

but I couldn't see an effect.

Isn’t this a special case of ‘Rechtschreibung’ in German? Technically Strasse is wrong (my remote recollection) and German considers the letter in STRAßE an uppercase s-z. (Don’t blame me. I have only one passport and it’s a US passport :slight_smile: I suspect @Mike Sperber knows better .. he was a professor in Germany.

— Matthias

In the future, could I convince you to avoid screenshots of text? :slight_smile: The utility pbcopy on macOS is a convenient way to copy outputs.

2 Likes

Seit dem 29. Juni 2017 ist das große ß (ẞ) offiziell Bestandteil der amtlichen deutschen Rechtschreibung.[10][11] Damit ist seither zum Beispiel die Schreibweise STRAẞE gleichberechtigt neben der Schreibweise STRASSE zulässig.

Turns out the German rules changed in 2017.

So is it a bug in string-upcase? Or in the documentation for current-locale?

It looks like a bug to me (but I am only 90% sure).

Regardless of whether the expected result is "STRAßE" or "STRASSE" I expect the results to be equal.

Also, I don't understand why I didn't see an effect of setting current-locale.

Are there other locale dependent non-eszett related words to test on?

(displayln (string-downcase "ΧΑΟΣΣ"))
(displayln (string-locale-downcase "ΧΑΟΣΣ"))
(parameterize ([current-locale #f])
  (displayln (string-downcase "ΧΑΟΣΣ"))
  (displayln (string-locale-downcase "ΧΑΟΣΣ")))

displays

χαοσς
χαοσσ
χαοσς
. . bytes->string/utf-8: byte string is not a well-formed UTF-8 encoding
  byte string: #"\356\247\356\221\356\237\356\274\356\274"

Yes, the string-locale-upcase and string-locale-downcase functions were broken for the #f locale. I've pushed a repair.

1 Like