§ Best practices for array indexing
These are rules I'm going to follow when I solve problems on
codeforces . I've arrived at these rules by repeatedly
noticing the kinds of mistakes I make and attempting to adopt conventions
to eliminate these.
§ Rule 1: Half-open intervals
Intervals are only represented as [begin, past-the-end)
That is, we start at
begin
, include numbers begin+1, begin+2, ...
, upto, but excluding
past-the-end
.
So, for example, [0, 3) = [0, 1, 2]
, while [0, 1) = [0]
, and [0, 0) = []
.
I am not allowed to use [begin, end]
, or (before-begin, past-the-end)
or
any such variation I represent and think of intervals.
§ Rule 2: No relational operators when using for/while/do while
loops.
When I write my loop conditions, I am not allowed to use relational operators.
I must only write i != n
or i == m
.
I am not allowed to write i < n
, i <= n
, i > n
, i >= n
for my for
loops.
§ Rule 3: The loop iterator lives in "getting to" space:
The loop iterator i
in for(int i = a; i != b; ++i)
is to be thought of as
getting to/living right before" the values [a, a+1, a+2, ... b-1]
. In
ASCII-art:
|| a-1 || a || a+1 || ... || b-1 || b || b+1 ||
^^ ^^ ^^ ^^
(i=a) (i=a+1) (i=a+2) ... (i=b)
§ Ruel 4: One always reads loops according to the above rules
for(int i = begin; i != past-the-end; ++i) {
}
§ The rationale for banning relational operators
There is a strong difference in qualia between i < n
and i != n
. The former
makes on aware of when the loop runs; the latter of when the loop quits.
I wish to be cognizant of the precisely when a loop quits.
On writing i != past-the-end
, I know that we quit as soon as we
get past the end . This feels much clearer than being aware that the loops
runs as long as i < n
.
§ half-open indexing: length <-> index
The first advantage of these conventions is that
[begin, past-the-end)
is the same as [begin, begin+length)
. I've found this
to be of great help to flit between length-based-thoughts and
indexing-based-thoughts.
§ half-open indexing: 0
length
The second almost trivial point is that [begin, begin+length)
holds when
length
is zero . That is,
for(int i = begin; i != begin + 0; ++ i) { ... }
does the right thing and iterates over no elements.
§ half-open indexing: -ve
length
The third neat point is that [begin, begin+length)
holds even when length
is negative . Hence, if we want to index the last two elements of an
array, we recall that we start from index (n-1)
, and we want a segment
of length -2
from there, so our loop is:
for(int i = n-1; i != (n-1) + -2; i--) {
}
§ half-open indexing: splitting an array
Finally, this convention helps when attempting to take the beginning part
of a list and the rest of the list. If we want to split an array into
two segments at some index len
,
- The first sub-array is
[0, len)
since it starts at location 0
and has length len
. - Since we have taken
len
elements, the second sub-array must have length n-len
. It must start at index len
since len
is past-the-end for the first sub-array. So, the second sub-array has indeces [len, n-len)
.
Notice how we used both the "length view" and the "past the end" view to
quickly derive the bounds of the second array from the first array.
§ half-open indexing: uniformly generate power-of-2 intervals.
If we want intervals of length 2n: 1, 2, 4, 8, ...
which is common if one
is building data structures such as segment trees and fenwick trees,
in a half-open representation, this literally becomes [a, a+1)
,
[a, a+2)
, [a, a+4)
and so on. On the other hand, if one wants
to use closed interval based indexing, one needs to generate the
series 2n−1, which is [a, a+0]
, [a, a+3]
, [a, a+7]
which is
slightly more finicky.
§ How to deal with strides
If there's a loop
for(int i = 0; i < 5; i += 2) { ... }
the naive i != 5
translation will not work since i
only takes
on even numbers [0, 2, 4, 6, ...]
. For this, we can perform a
simple transformation and always make sure our loop variables increment
by 1
. In a compiler, this is often called as "canonicalizing the loop
induction variable". In LLVM the canonicalization is
performed by the -indvars
pass .
The above example canonicalized becomes:
for(int btc = 0; btc != 5 / 2; btc += 1) { const int i = btc * 2; }
§ In conclusion
These are rules that I dreamed up after noticing my idiosyncracies in
loop-writing.