Understanding the sort order: LC_COLLATE

Understanding the sort order: LC_COLLATE

scratchpad > understanding-the-sort-order-lc_collate

This script shows all characters from ascii char 32 to 126. That will most likely cover all chars you ever used in file names:

for i in `seq 32 126`; do printf "\$(printf "%o" $i)n"; done | LC_COLLATE="C" sort           | tr -d "n"
for i in `seq 32 126`; do printf "\$(printf "%o" $i)n"; done | LC_COLLATE="en_US.UTF-8" sort | tr -d "n"
for i in `seq 32 126`; do printf "\$(printf "%o" $i)n"; done | LC_COLLATE="de_DE.UTF-8" sort | tr -d "n"

And that is important, because knowing this order means that you can further sort you file system, by defining files and folders which are listed first. We assume:

# /etc/locale.conf
LC_COLLATE=C

We can then use underscores as first letter in folders to make them appear before other folders.

Listing all interesting UTF-8 symbols

The standard contains a block for user defined stuff, the "private use area". Nerdfonts is for example using it to store fancy stuff.

for i in `seq 57344 63743`; do echo -ne "\u$(printf '%x\n' $i) "; done

where 57344 is E000 and 63743 is F8FF

top