§ rank/select as compress/decompress
I haven't found a good naming convention so far for describing order
statistics. I'm taking about the common implementation:
vector<int> xs(n);
vector<int> order2ix(n);
for(int i = 0; i < n; ++i) { order2ix[i] = i; }
sort(order2ix.begin(), order2ix.end(),
[](int i, int j) {
return make_pair(xs[i], i) < make_pair(xs[j], j);
};
where order2ix[o]
gives us the element with that order statistic. So,
order2ix[0]
contains the smallest element, order2ix[n-1]
contains the
largest element, etc. I've previously tried the naming conventions:
§ rank/select
rank: ix -> order
, select: order -> ix
. The pro of this is that it
uses the rank/select naming convention.
This leads into the Galois connection aspect of it, but is otherwise not so
useful.
§ order2ix/ix2order
The signatures are order2ix: order -> ix
, ix2order: ix -> order
. This uses
the order statistic naming
convention, and thereby makes it clear what the query is: you tell me the
kth order, I give you the index in the array in order2ix(k)
. Alternatively,
you tell me the index i
, and I'll tell you its order statistic in ix2order(k).
However, I found implementing this kind of odd. In particular, I need to
pause for a second and think about what maps to what in the ix2order[order2ix[o]] = o;
.
vector<int> xs(n);
vector<int> order2ix(n);
for(int i = 0; i < n; ++i) { order2ix[i] = i; }
sort(order2ix.begin(), order2ix.end(),
[](int i, int j) {
return make_pair(xs[i], i) < make_pair(xs[j], j);
};
vector<int> ix2order(n);
for(int o = 0; o < n; ++o) { ix2order[order2ix[o]] = i; }
For me, the major source of disconnect is that this "order" feels somewhat
disconnected from the original array xs
. So I feel like I'm trying to reason
about these three
- The indexes
0, 1,..,n
- The orders
0th,1st,2nd,..(n-1)th
- The array values
xs[0],xs[1],xs[2],..xs[n]
and it's unlclear to me how the two arrays order2ix
and ix2order
relate to each other.
§ compressed
/ decompressed
:
Now for the new convention that I hope works better: compressed
/ decompressed
:
The idea is that compressed
maps the original numbers to their compressed variants.
So it's going to have a signature compressed: ix -> smalluniv
, where it
compresses the element xs[i]
into [0..n-1]
. decompressed
is the inverse
function, which takes a number in the smaller universe, and returns its
index in the original array. So we have decompressed: smalluniv -> ix
.
§ Why I like this better
I feel this convention is superior, because it's intuitive to me at a glance
as to what compressed
/ decompressed
do and why they should be inverses. I feel
it also matches the deep reason for why kth order statistic exists: it
lets us perform universe reduction, to go from a large space of a total order
to a small space [0..(n-1)]
.
Furthermore, the very name implies that
compressed
is the compressed version of something (the original array xs
)
and that decompressed
is the decompressed version of something
(the compressed universe [0..(n-1)]
). This makes it clear how they're reated
to the original array linguistically, which I quite like.