Understanding the sort order: LC_COLLATE

Understanding the sort order: LC_COLLATE

scratchpad > understanding-the-sort-order-lc_collate

The environment variable LC_COLLATE is part of the POSIX standard. It controls the locale specific sort order for tools like sort. It can be found in man 5 locale.

The following bash script can be used to create the table below. It shows the most common characters from the ASCII charset – 32 to 126. That will most likely cover all chars you ever used in file names and allows a prediction in which order they will appear, if you prefix file and folder names, e.g. with ~:

paste \
    <(echo -e "|num\n|-"     ; printf "|%s\n" `seq 32 126`) \
    <(echo -e "|C\n|-"       ; for i in `seq 32 126`; do printf "| \\$(printf %o $i)\n"; done | LC_COLLATE="C"           sort) \
    <(echo -e "|de\n|-"      ; for i in `seq 32 126`; do printf "| \\$(printf %o $i)\n"; done | LC_COLLATE="de_DE"       sort) \
    <(echo -e "|unicode\n|-" ; for i in `seq 32 126`; do printf "| \\$(printf %o $i)\n"; done | LC_COLLATE="de_DE.UTF-8" sort) \
num C de_DE de_DE.UTF-8
32
33 ! ! !
34 " " "
35 # # #
36 $ $ %
37 % % &
38 & & '
39 ' ' (
40 ( ( )
41 ) ) *
42 * * +
43 + + ,
44 , , -
45 - - .
46 . . /
47 / / :
48 0 0 ;
49 1 1 <
50 2 2 =
51 3 3 >
52 4 4 ?
53 5 5 @
54 6 6 [
55 7 7 \
56 8 8 ]
57 9 9 ^
58 : : _
59 ; ; `
60 < < {
61 = =
62 > > }
63 ? ? ~
64 @ @ $
65 A A 0
66 B B 1
67 C C 2
68 D D 3
69 E E 4
70 F F 5
71 G G 6
72 H H 7
73 I I 8
74 J J 9
75 K K a
76 L L A
77 M M b
78 N N B
79 O O c
80 P P C
81 Q Q d
82 R R D
83 S S e
84 T T E
85 U U f
86 V V F
87 W W g
88 X X G
89 Y Y h
90 Z Z H
91 [ [ i
92 \ \ I
93 ] ] j
94 ^ ^ J
95 _ _ k
96 | K
97 a a l
98 b b L
99 c c m
100 d d M
101 e e n
102 f f N
103 g g o
104 h h O
105 i i p
106 j j P
107 k k q
108 l l Q
109 m m r
110 n n R
111 o o s
112 p p S
113 q q t
114 r r T
115 s s u
116 t t U
117 u u v
118 v v V
119 w w w
120 x x W
121 y y x
122 z z X
123 { { y
124
125 } } z
126 ~ ~ Z

System wide configuration

With COLLATE=C we can use underscores as first letter in folders to make them appear before other folders:

# /etc/locale.conf
LC_COLLATE=C

Listing all interesting UTF-8 symbols

The standard contains a block for user defined stuff, the "private use area". Nerdfonts is for example using it to store fancy stuff.

for i in `seq 57344 63743`; do echo -ne "\u$(printf '%x\n' $i) "; done

utf-8 symbols

where 57344 is E000 and 63743 is F8FF

top