A Universe of Sorts

§ Best practices for array indexing

These are rules I'm going to follow when I solve problems on codeforces . I've arrived at these rules by repeatedly noticing the kinds of mistakes I make and attempting to adopt conventions to eliminate these.

§ Rule 1: Half-open intervals

Intervals are only represented as [begin, past-the-end) That is, we start at begin, include numbers begin+1, begin+2, ..., upto, but excluding past-the-end. So, for example, [0, 3) = [0, 1, 2], while [0, 1) = [0], and [0, 0) = []. I am not allowed to use [begin, end], or (before-begin, past-the-end) or any such variation I represent and think of intervals.

§ Rule 2: No relational operators when using `for/while/do while` loops.

When I write my loop conditions, I am not allowed to use relational operators. I must only write i != n or i == m. I am not allowed to write i < n, i <= n, i > n, i >= n for my for loops.

§ Rule 3: The loop iterator lives in "getting to" space:

The loop iterator i in for(int i = a; i != b; ++i) is to be thought of as getting to/living right before" the values [a, a+1, a+2, ... b-1]. In ASCII-art:

|| a-1 || a   || a+1 || ... || b-1 ||  b  || b+1 ||
       ^^     ^^     ^^            ^^
     (i=a) (i=a+1)  (i=a+2) ...   (i=b)

§ Ruel 4: One always reads loops according to the above rules

for(int i = begin; i != past-the-end; ++i) {
  // NO: i from begin to past-the-end.
  // YES: [i _getting to_ the beginning] till [i _getting_ past-the-end]
}

§ The rationale for banning relational operators

There is a strong difference in qualia between i < n and i != n. The former makes on aware of when the loop runs; the latter of when the loop quits. I wish to be cognizant of the precisely when a loop quits. On writing i != past-the-end, I know that we quit as soon as we get past the end . This feels much clearer than being aware that the loops runs as long as i < n.

§ half-open indexing: length <-> index

The first advantage of these conventions is that [begin, past-the-end) is the same as [begin, begin+length). I've found this to be of great help to flit between length-based-thoughts and indexing-based-thoughts.

§ half-open indexing: `0` length

The second almost trivial point is that [begin, begin+length) holds when length is zero . That is,

for(int i = begin; i != begin + 0; ++ i) { ... }

does the right thing and iterates over no elements.

§ half-open indexing: `-ve` length

The third neat point is that [begin, begin+length) holds even when length is negative . Hence, if we want to index the last two elements of an array, we recall that we start from index (n-1), and we want a segment of length -2 from there, so our loop is:

for(int i = n-1; i != (n-1) + -2; i--) {
 // loops over the correct segment.
}

§ half-open indexing: splitting an array

Finally, this convention helps when attempting to take the beginning part of a list and the rest of the list. If we want to split an array into two segments at some index len,

The first sub-array is [0, len) since it starts at location 0 and has length len.
Since we have taken len elements, the second sub-array must have length n-len. It must start at index len since len is past-the-end for the first sub-array. So, the second sub-array has indeces [len, n-len).

Notice how we used both the "length view" and the "past the end" view to quickly derive the bounds of the second array from the first array.

§ half-open indexing: uniformly generate power-of-2 intervals.

If we want intervals of length

2^n

: 1, 2, 4, 8, ... which is common if one is building data structures such as segment trees and fenwick trees, in a half-open representation, this literally becomes [a, a+1), [a, a+2), [a, a+4) and so on. On the other hand, if one wants to use closed interval based indexing, one needs to generate the series

2^n - 1

, which is [a, a+0], [a, a+3], [a, a+7] which is slightly more finicky.

§ How to deal with strides

If there's a loop

for(int i = 0; i < 5; i += 2) { ... }

the naive i != 5 translation will not work since i only takes on even numbers [0, 2, 4, 6, ...]. For this, we can perform a simple transformation and always make sure our loop variables increment by 1. In a compiler, this is often called as "canonicalizing the loop induction variable". In LLVM the canonicalization is performed by the -indvars pass . The above example canonicalized becomes:

// btc = backedge taken count. How many times we have
// gone past the "back edge" of the loop to go back to the
// beginning of the loop.
for(int btc = 0; btc != 5 / 2; btc += 1) { const int i = btc * 2; }

§ In conclusion

These are rules that I dreamed up after noticing my idiosyncracies in loop-writing.