§ rank/select as compress/decompress
I haven't found a good naming convention so far for describing order
statistics. I'm taking about the common implementation:
vector<int> xs(n);
vector<int> order2ix(n);
for(int i = 0; i < n; ++i) { order2ix[i] = i; }
sort(order2ix.begin(), order2ix.end(),
[](int i, int j) {
return make_pair(xs[i], i) < make_pair(xs[j], j);
};
where order2ix[o] gives us the element with that order statistic. So,
order2ix[0] contains the smallest element, order2ix[n-1] contains the
largest element, etc. I've previously tried the naming conventions:
§ rank/select
rank: ix -> order, select: order -> ix. The pro of this is that it
uses the rank/select naming convention.
This leads into the Galois connection aspect of it, but is otherwise not so
useful.
§ order2ix/ix2order
The signatures are order2ix: order -> ix, ix2order: ix -> order. This uses
the order statistic naming
convention, and thereby makes it clear what the query is: you tell me the
kth order, I give you the index in the array in order2ix(k). Alternatively,
you tell me the index i, and I'll tell you its order statistic in ix2order(k).
However, I found implementing this kind of odd. In particular, I need to
pause for a second and think about what maps to what in the ix2order[order2ix[o]] = o;.
vector<int> xs(n);
vector<int> order2ix(n);
for(int i = 0; i < n; ++i) { order2ix[i] = i; }
sort(order2ix.begin(), order2ix.end(),
[](int i, int j) {
return make_pair(xs[i], i) < make_pair(xs[j], j);
};
vector<int> ix2order(n);
for(int o = 0; o < n; ++o) { ix2order[order2ix[o]] = i; }
For me, the major source of disconnect is that this "order" feels somewhat
disconnected from the original array xs. So I feel like I'm trying to reason
about these three
- The indexes
0, 1,..,n - The orders
0th,1st,2nd,..(n-1)th - The array values
xs[0],xs[1],xs[2],..xs[n]
and it's unlclear to me how the two arrays order2ix and ix2order
relate to each other.
§ compressed/ decompressed:
Now for the new convention that I hope works better: compressed/ decompressed:
The idea is that compressed maps the original numbers to their compressed variants.
So it's going to have a signature compressed: ix -> smalluniv, where it
compresses the element xs[i] into [0..n-1]. decompressed is the inverse
function, which takes a number in the smaller universe, and returns its
index in the original array. So we have decompressed: smalluniv -> ix.
§ Why I like this better
I feel this convention is superior, because it's intuitive to me at a glance
as to what compressed/ decompressed do and why they should be inverses. I feel
it also matches the deep reason for why kth order statistic exists: it
lets us perform universe reduction, to go from a large space of a total order
to a small space [0..(n-1)].
Furthermore, the very name implies that
compressed is the compressed version of something (the original array xs)
and that decompressed is the decompressed version of something
(the compressed universe [0..(n-1)]). This makes it clear how they're reated
to the original array linguistically, which I quite like.