A Universe of Sorts

§ rank/select as compress/decompress

I haven't found a good naming convention so far for describing order statistics. I'm taking about the common implementation:

vector<int> xs(n);
vector<int> order2ix(n);
for(int i = 0; i < n; ++i) { order2ix[i] = i; }

sort(order2ix.begin(), order2ix.end(),
     [](int i, int j) {
        return make_pair(xs[i], i) < make_pair(xs[j], j);
     };

where order2ix[o] gives us the element with that order statistic. So, order2ix[0] contains the smallest element, order2ix[n-1] contains the largest element, etc. I've previously tried the naming conventions:

§ rank/select

rank: ix -> order, select: order -> ix. The pro of this is that it uses the rank/select naming convention. This leads into the Galois connection aspect of it, but is otherwise not so useful.

§ order2ix/ix2order

The signatures are order2ix: order -> ix, ix2order: ix -> order. This uses the order statistic naming convention, and thereby makes it clear what the query is: you tell me the kth order, I give you the index in the array in order2ix(k). Alternatively, you tell me the index i, and I'll tell you its order statistic in ix2order(k). However, I found implementing this kind of odd. In particular, I need to pause for a second and think about what maps to what in the ix2order[order2ix[o]] = o;.

vector<int> xs(n);
vector<int> order2ix(n);
for(int i = 0; i < n; ++i) { order2ix[i] = i; }

sort(order2ix.begin(), order2ix.end(),
     [](int i, int j) {
        return make_pair(xs[i], i) < make_pair(xs[j], j);
     };

// REVERSE BIJECTION
vector<int> ix2order(n);
for(int o = 0; o < n; ++o) { ix2order[order2ix[o]] = i; }

For me, the major source of disconnect is that this "order" feels somewhat disconnected from the original array xs. So I feel like I'm trying to reason about these three

The indexes 0, 1,..,n
The orders 0th,1st,2nd,..(n-1)th
The array values xs[0],xs[1],xs[2],..xs[n]

and it's unlclear to me how the two arrays order2ix and ix2order relate to each other.

§ `compressed`/ `decompressed`:

Now for the new convention that I hope works better: compressed/ decompressed: The idea is that compressed maps the original numbers to their compressed variants. So it's going to have a signature compressed: ix -> smalluniv, where it compresses the element xs[i] into [0..n-1]. decompressed is the inverse function, which takes a number in the smaller universe, and returns its index in the original array. So we have decompressed: smalluniv -> ix.

§ Why I like this better

I feel this convention is superior, because it's intuitive to me at a glance as to what compressed/ decompressed do and why they should be inverses. I feel it also matches the deep reason for why kth order statistic exists: it lets us perform universe reduction, to go from a large space of a total order to a small space [0..(n-1)]. Furthermore, the very name implies that compressed is the compressed version of something (the original array xs) and that decompressed is the decompressed version of something (the compressed universe [0..(n-1)]). This makes it clear how they're reated to the original array linguistically, which I quite like.

§ rank/select as compress/decompress

§ rank/select

§ order2ix/ix2order

§ compressed/ decompressed:

§ Why I like this better

§ `compressed`/ `decompressed`: