1. ‘:' < ‘;’
2. ‘:’ > ‘;'
Correct answer: it depends! In a fresh install of MySQL, ‘:' < ‘;’. In Postgres, ‘:’ > ‘;'.
Follow my descent into madness (and Unicode collation) to find out why...
1/n
'foo:a'
'bar:b'
'bar:c'
'baz:d'
I wanted to select everything with a prefix of 'bar:' using only '>' and '<' (the reason is another story), so I looked up the ASCII table and found that ':' + 1 = ';'
2/n
I decide to port things to MySQL and suddenly that query returns NOTHING.
I begin to question everything I know and hold dear.
3/n
1. "SELECT ':' < ';'" returns true
2. SELECTing ':' and ';' ORDER BY key ASC from my _table_ returns ';' before ':'
Wat.
Then I remembered another weird text issue...
4/n
With a Turkish locale, lower('I') is not 'i' but a different character 'ı'.
This is a huge pain if you require that lower-cased filename must be unique.
5/n
6/n
"Isn't that just the alphabet?" you ask.
Sure, but what happens when you have letters like 'À', 'Ẵ', and 'Ā'? Someone has to define a sort order and for UTF8 it's the Unicode consortium.
7/n
But it all starts with a huge table of everything in Unicode and "weights" for sorting: unicode.org/Public/UCA/4.0….
8/n
'COLON' -> 023D.0020.0002.003A
'SEMICOLON' -> 023A.0020.0002.003B
These are tuples of four "levels" and the leftmost level is most significant (read: what you should sort by).
9/n
Why?
I have no idea, but it's probably an artifact of needing to squeeze in other characters like "ARABIC COLON" (weight 023B) in between them and scrambling things up in the process.
10/n
> HEX(WEIGHT_STRING(':'))
-> 3A (The ASCII byte for ':')
> HEX(WEIGHT_STRING(
CAST(':' AS CHAR CHARACTER SET 'utf8mb4')
COLLATE 'utf8mb4_unicode_ci'))
-> 023D (The weight from above)
11/n
Postgres infers things from your locale, so for me it's 'UTF8' and 'en_US.UTF-8.'
12/n